blob: ea11b407836a06c19c617f2a90059ce4e396e3a9 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
|
This document tries to describe the software layout and design of
webcheck. It should provide some help for contributing code to this package.
CONTRIBUTING TO WEBCHECK
========================
Contributions to webcheck are most welcome. Integrating contributions will
be done on a best-effort basis and can be made easier if the following are
considered:
* for large changes it is a good idea to send an email first
* send your patches in unified diff (diff -u) format, Git patches or Git
pull requests
* try to use the svn version of the software to develop the patch
* clearly state which problem you're trying to solve and how this is
accomplished
* please follow the existing coding conventions
* please test the patch and include information on testing with the patch
* add a copyright statement with the patch if you feel the contribution is
significant enough (e.g. more than a few lines)
* when including third-party code, retain copyright information (copyright
holder and license) and ensure that the license is GPL compatible
Please email webcheck-users@lists.arthurdejong.org if you want to
contribute. All contributions will be acknowledged in the AUTHORS file.
WEBCHECK DESIGN OVERVIEW
========================
Webcheck has grown and has been refactored over time. While some different
design concepts were used, recently there has been a push towards a modular
plugin-based design.
The graphs blowe should give an overview of the modules and order of calling
the functions.
webcheck - top-level namespace
\- cmd - command-line front-end for webcheck
\- config - configuration settings (imported from most other
| modules, expected to be refactored out)
\- crawler - home of the Crawler class that controls the
| initialisation, crawling, post-processing and
| report generation
\- db - database definitions using SQLAlchemy
| used to persist the crawled data in a SQLite db
\- monkeypatch - hacks to fix third-party bugs
\- myurllib - URL normalisation functions
\- output - utility functions for report generation
|
\- parsers - entry point for content parsing
| \- html - parser modules for HTML content
| \- css - parser module for CSS
|
\- plugins - collection of report and post-processing plugins
|
\- templates - HTML templates for report generation
|