TODO


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

before next release
-------------------
* go over all FIXMEs in code
* rewrite ftp scheme module
* rewrite file scheme module

probably before 2.0 release
---------------------------
* parse css
* maybe choose a different license for webcheck.css
* make it possible to copy or reference webcheck.css
* make it possible to copy http:.../webcheck.css into place (use scheme system)
* create onmouseover information for links containing useful information for url
* make more things configurable
* make a Debian package
* maybe generate a list of page parents (this is useful to list proper parent links for problem pages)
* figure out if we need parents and pageparents
* make configurable time-out when retrieving a document
* support for mult-threading (maybe)
* divide problems in transfer problems and page problems (transfer problems result in a bad link problem on a page)
* clean up printing of messages, especially needed for multi-threading
* rewrite scheme modules to make proper use of new calling method
* only download complete documents if the mime type is supported
* go over command line options and see if we need long equivalents
* implement a fix for redirecting stdout and stderr to work properly
* put a maximum transfer size for downloading files and things over http
* make error handling of html parser more robust

wishlist
--------
* make code for stripping last part of a url (e.g. foo/index.html -> foo/)
* translate file paths to file:/// urls on the command line
* maybe set referer (configurable)
* support for authenticating proxies
* new config file format (if we want a configfile at all)
* cookies support (maybe)
* integration with weblint
* combine with a logfile checker to also show number of hits per link
* performance and other improvements (we can switch to sets with python 2.4)
* write a guide to writing plugins
* form checking
* spelling checking
* test w3c conformance of pages
* maybe make broken links not clickable
* maybe store crawled site's data in some format for later processing or continuing after interruption
* create output directory if it does not exist
* add support for fetching gzipped content
* write section on internal and external urls in the manual page
* add a favicon to reports
* add a test to see if python supports https and fail elegantly otherwise
* maybe follow redirects of external links