Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* empty module as place holder to parse CSS (referenced ↵Arthur de Jong2005-07-251-0/+20
| | | | | | from __init__.py already) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@91 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* don't replace an already set titleArthur de Jong2005-07-251-1/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@90 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add ChangeLog for release1.9.0Arthur de Jong2005-07-241-0/+500
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@88 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* get files ready for releaseArthur de Jong2005-07-242-5/+59
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@87 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* clean up README removing sections that should be in the ↵Arthur de Jong2005-07-241-80/+23
| | | | | | manual page git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@86 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* rename whatsold and whatsnew plugins to old and newArthur de Jong2005-07-243-4/+4
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@85 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* handle socket errors properlyArthur de Jong2005-07-241-1/+6
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@84 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix for incomplete change in r76, now version should not ↵Arthur de Jong2005-07-241-1/+1
| | | | | | be referenced any more git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@83 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* call make_link() with a link object instead of a URL, ↵Arthur de Jong2005-07-2412-23/+15
| | | | | | removing the need for a mySite in plugins git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@82 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove HTTP status code handling from here as this ↵Arthur de Jong2005-07-241-40/+0
| | | | | | should be done by the HTTP module git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@81 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* only report on internal linksArthur de Jong2005-07-242-0/+4
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@80 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* only add links to crawl list if they are not in there ↵Arthur de Jong2005-07-241-2/+2
| | | | | | allready git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@79 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* flush stdout after each message so that redirecting ↵Arthur de Jong2005-07-241-0/+1
| | | | | | stdout and stderr together to a file works reliably git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@78 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix regular expression matchingArthur de Jong2005-07-231-2/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@77 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* integrate versio.py into config.py, clean up config.py ↵Arthur de Jong2005-07-236-181/+85
| | | | | | removing unused settings and clean up boolean types git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@76 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove logo option since the current output does not use oneArthur de Jong2005-07-233-14/+0
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@75 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* most systems already know about .shtml filesArthur de Jong2005-07-231-4/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@74 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* first step in cleaning up documentation, integrating ↵Arthur de Jong2005-07-234-220/+156
| | | | | | INSTALL in README and BUGS in manual page and adding section on robots handling in manual git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@73 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* Mike Meyer -> Mike W. MeyerArthur de Jong2005-07-2322-25/+25
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@72 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add support for sleep between requestsArthur de Jong2005-07-221-0/+4
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@71 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* don't add . to python path as it's not needed and put ↵Arthur de Jong2005-07-221-12/+2
| | | | | | command line handling in same order as options git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@70 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* change layout to have a simpler layout that also should ↵Arthur de Jong2005-07-222-27/+7
| | | | | | work in MSIE git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@69 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix docstringsArthur de Jong2005-07-221-11/+9
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@68 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* do not use start_time from webcheck saving an importArthur de Jong2005-07-222-4/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@67 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* almost complete rewrite of crawling and site state code ↵Arthur de Jong2005-07-2220-484/+589
| | | | | | making children and parents link objects instead of URLs and giving link member variables better names, change plugins accordingly, make scheme handling more pluggable and only use one function call and have a better pluggable structure for content parsing (currently only html) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@66 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* use lower-case URL attribute in Link instead of ↵Arthur de Jong2005-07-1716-53/+53
| | | | | | upper-case URL git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@65 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* move functionality of rptlib.py to __init__.py so that ↵Arthur de Jong2005-07-1613-35/+35
| | | | | | we can just use the plugins package git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@64 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove __init__.py to be replaced by contents of rptlib.pyArthur de Jong2005-07-161-2/+0
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@63 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add note about pattern matchingArthur de Jong2005-07-161-0/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@62 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* rework scheme code to use more logical function names, ↵Arthur de Jong2005-07-105-177/+125
| | | | | | more clearly mark internal functions and do some major clean-up of the scheme modules code git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@61 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* store mtime in link object instead of age in daysArthur de Jong2005-07-105-9/+19
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@60 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove unneeded import and printArthur de Jong2005-07-102-2/+0
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@59 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* move htmlparse to a more generic parsers package, ↵Arthur de Jong2005-07-093-75/+73
| | | | | | cleaning up the code and simplifying dependencies git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@58 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* clean up HTML output generating XHTML 1.1 without frames ↵Arthur de Jong2005-07-0915-341/+397
| | | | | | and using CSS for styling also getting rid of the images git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@57 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* put plugins in a more logical orderArthur de Jong2005-07-041-5/+6
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@56 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* implement consistent sorting of all lists removing sort ↵Arthur de Jong2005-07-0411-81/+43
| | | | | | functions from rptlib and using lambda functions where needed git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@55 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* handle and document proxy settings with environment ↵Arthur de Jong2005-07-034-9/+13
| | | | | | variables git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@54 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* name webcheck with lower caseArthur de Jong2005-07-038-57/+57
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@53 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* clean up get_reply() function to uses proper recursion ↵Arthur de Jong2005-06-281-23/+16
| | | | | | and don't use self where it doesn't make sense git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@52 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* change to most recent version of the GPL (FSF address ↵Arthur de Jong2005-06-2223-25/+25
| | | | | | change) and update notices git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@51 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* sort external links by URLArthur de Jong2005-06-181-1/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@50 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* split main() part into it's own functionArthur de Jong2005-06-181-4/+7
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@49 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* restructure a couple of things to reduce the number of ↵Arthur de Jong2005-06-182-44/+35
| | | | | | mutual imports and reduce the number of stuff gathered in webcheck.py git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@48 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add simple urllist plugin to list all visited URLsArthur de Jong2005-06-182-0/+37
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@47 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* only include internal links in sitemapArthur de Jong2005-06-181-0/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@46 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add problems plugin to config instead of hard-codingArthur de Jong2005-06-182-4/+4
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@45 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove ugly redirection for overwrite file question ↵Arthur de Jong2005-06-181-5/+1
| | | | | | since we now write all html through a file descriptor git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@44 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* pass reference to Link class to plugins with parameter ↵Arthur de Jong2005-06-1516-84/+60
| | | | | | and make import config where it is used instead of accessing it through another module git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@43 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* make use of base consistent, do not modify it to make a ↵Arthur de Jong2005-06-154-12/+6
| | | | | | nicer URL (at least not now) and do not overwrite it with something silly from webcheck.py git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@42 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also set URL attribute on yanked linksArthur de Jong2005-06-141-0/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@41 86f53f14-5ff3-0310-afe5-9b438ce3f40c