Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
path: root/HACKING
blob: ea11b407836a06c19c617f2a90059ce4e396e3a9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

This document tries to describe the software layout and design of
webcheck. It should provide some help for contributing code to this package.


CONTRIBUTING TO WEBCHECK
========================

Contributions to webcheck are most welcome. Integrating contributions will
be done on a best-effort basis and can be made easier if the following are
considered:

* for large changes it is a good idea to send an email first
* send your patches in unified diff (diff -u) format, Git patches or Git
  pull requests
* try to use the svn version of the software to develop the patch
* clearly state which problem you're trying to solve and how this is
  accomplished
* please follow the existing coding conventions
* please test the patch and include information on testing with the patch
* add a copyright statement with the patch if you feel the contribution is
  significant enough (e.g. more than a few lines)
* when including third-party code, retain copyright information (copyright
  holder and license) and ensure that the license is GPL compatible

Please email webcheck-users@lists.arthurdejong.org if you want to
contribute. All contributions will be acknowledged in the AUTHORS file.


WEBCHECK DESIGN OVERVIEW
========================

Webcheck has grown and has been refactored over time. While some different
design concepts were used, recently there has been a push towards a modular
plugin-based design.

The graphs blowe should give an overview of the modules and order of calling
the functions.

webcheck                    - top-level namespace
 \- cmd                     - command-line front-end for webcheck
 \- config                  - configuration settings (imported from most other
 |                            modules, expected to be refactored out)
 \- crawler                 - home of the Crawler class that controls the
 |                            initialisation, crawling, post-processing and
 |                            report generation
 \- db                      - database definitions using SQLAlchemy
 |                            used to persist the crawled data in a SQLite db
 \- monkeypatch             - hacks to fix third-party bugs
 \- myurllib                - URL normalisation functions
 \- output                  - utility functions for report generation
 |
 \- parsers                 - entry point for content parsing
 |  \- html                 - parser modules for HTML content
 |  \- css                  - parser module for CSS
 |
 \- plugins                 - collection of report and post-processing plugins
 |
 \- templates               - HTML templates for report generation