Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
path: root/schemes
Commit message (Collapse)AuthorAgeFilesLines
* add some more debugging information (cache hit or miss)Arthur de Jong2006-01-291-1/+11
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@220 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* trim empty ports (http://host:/) from URLs and do not ↵Arthur de Jong2005-12-291-1/+1
| | | | | | crash on improperly formatted URLs git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@206 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* catch all relevant exceptions when looking up ↵Arthur de Jong2005-12-261-1/+1
| | | | | | content-type header git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@192 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add copyright clarification to specify that generated ↵Arthur de Jong2005-12-175-0/+15
| | | | | | output files are not covered by our copyright git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@186 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove trailing : from netloc if it is presentArthur de Jong2005-12-171-0/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@185 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add configuration option to disable proxy cachingArthur de Jong2005-09-181-0/+4
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@182 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* try to extract character encoding from http response and ↵Arthur de Jong2005-09-171-0/+8
| | | | | | store it in the link object git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@175 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* support basic authentication for http proxies and some ↵Arthur de Jong2005-09-131-5/+11
| | | | | | initial fixes to get proxying HTTPS traffic working git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@171 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix wrapping of documentationArthur de Jong2005-09-101-4/+5
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@168 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* set status to result of fetching the document (not an ↵Arthur de Jong2005-08-201-1/+2
| | | | | | error indicator) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@145 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* move redirect handling code to crawler module, including ↵Arthur de Jong2005-08-193-19/+4
| | | | | | redirect loop detection code git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@141 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* split problems into page problems (parsing errors, wrong ↵Arthur de Jong2005-08-193-10/+11
| | | | | | links, etc) and link problems (errors retreiving the document) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@138 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* pick up configured filenames if present in directoriesArthur de Jong2005-08-162-46/+66
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@135 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add extra debugging infoArthur de Jong2005-08-161-8/+15
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@134 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* use a pool of ftp connections to keep ftp connection to ↵Arthur de Jong2005-08-131-18/+25
| | | | | | a host open to do multiple requests (this greatly speeds up crawling of ftp sites) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@133 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* almost complete reimplementation of the ftp scheme, ↵Arthur de Jong2005-08-131-62/+64
| | | | | | handling errors more gracefully and also crawl normal ftp directories git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@132 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* complete reimplementation of file module, reading ↵Arthur de Jong2005-08-121-21/+49
| | | | | | index.html from directory, otherwise read directory contents git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@130 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* rename parameter to acceptedtypes to not conflict with ↵Arthur de Jong2005-08-124-6/+6
| | | | | | mimetypes module git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@129 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also pass mimetypes to scheme modules to only fetch ↵Arthur de Jong2005-08-124-6/+8
| | | | | | content if we can parse the content type git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@128 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add https module as a wrapper to the http moduleArthur de Jong2005-07-311-0/+26
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@115 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* reimplement http module to be a little more generic and ↵Arthur de Jong2005-07-301-97/+91
| | | | | | clean and handle errors cleaner and more consistently git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@108 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove references to email addresses where they are not ↵Arthur de Jong2005-07-294-10/+10
| | | | | | useful, based on a partial patch by Evelyn Mitchell <efm@tummy.com> git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@99 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* handle socket errors properlyArthur de Jong2005-07-241-1/+6
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@84 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix for incomplete change in r76, now version should not ↵Arthur de Jong2005-07-241-1/+1
| | | | | | be referenced any more git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@83 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* integrate versio.py into config.py, clean up config.py ↵Arthur de Jong2005-07-231-2/+1
| | | | | | removing unused settings and clean up boolean types git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@76 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* most systems already know about .shtml filesArthur de Jong2005-07-231-4/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@74 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* Mike Meyer -> Mike W. MeyerArthur de Jong2005-07-233-3/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@72 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* almost complete rewrite of crawling and site state code ↵Arthur de Jong2005-07-224-70/+74
| | | | | | making children and parents link objects instead of URLs and giving link member variables better names, change plugins accordingly, make scheme handling more pluggable and only use one function call and have a better pluggable structure for content parsing (currently only html) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@66 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* use lower-case URL attribute in Link instead of ↵Arthur de Jong2005-07-173-13/+13
| | | | | | upper-case URL git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@65 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* rework scheme code to use more logical function names, ↵Arthur de Jong2005-07-104-162/+121
| | | | | | more clearly mark internal functions and do some major clean-up of the scheme modules code git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@61 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* store mtime in link object instead of age in daysArthur de Jong2005-07-102-2/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@60 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove unneeded import and printArthur de Jong2005-07-101-1/+0
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@59 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* handle and document proxy settings with environment ↵Arthur de Jong2005-07-031-6/+3
| | | | | | variables git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@54 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* name webcheck with lower caseArthur de Jong2005-07-031-2/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@53 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* clean up get_reply() function to uses proper recursion ↵Arthur de Jong2005-06-281-23/+16
| | | | | | and don't use self where it doesn't make sense git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@52 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* change to most recent version of the GPL (FSF address ↵Arthur de Jong2005-06-223-3/+3
| | | | | | change) and update notices git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@51 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* pass reference to Link class to plugins with parameter ↵Arthur de Jong2005-06-151-1/+1
| | | | | | and make import config where it is used instead of accessing it through another module git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@43 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* claiming copyright on empty files is sillyArthur de Jong2005-06-081-17/+0
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@34 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* redo output writing using a cleaner debugio and change ↵Arthur de Jong2005-06-062-16/+15
| | | | | | debug command line option git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@33 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* rename linkList to linkMapArthur de Jong2005-04-131-3/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@23 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* indent with spaces instead of tabs (tabs are evil)Arthur de Jong2005-04-091-5/+5
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@20 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* rebump loglevel to debugArthur de Jong2005-04-081-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@18 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove link part from scheme modulesArthur de Jong2005-04-083-3/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@17 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* clean up http request code a little and do not set host ↵Arthur de Jong2005-04-081-9/+10
| | | | | | header (it is sent by HTTPConnection already git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@16 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* make nicer file (copyrights) headersArthur de Jong2005-04-073-12/+19
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@15 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix problem with incorrect indentArthur de Jong2005-04-071-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@14 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* tabs to spaces (tabs are evil)Arthur de Jong2005-04-073-62/+62
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@12 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* include patch from Sebastien Delafond ↵Arthur de Jong2005-04-071-10/+11
| | | | | | <sdelafond@gmx.net> (from http://bugs.debian.org/286017) to fix problems with recent versions of python git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@11 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* import Debian package patchesArthur de Jong2005-04-062-17/+41
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@10 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* import of release 1.01.0Arthur de Jong2005-03-294-0/+367
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@2 86f53f14-5ff3-0310-afe5-9b438ce3f40c