Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* get files ready for 1.10.0 release1.10.0Arthur de Jong2007-05-128-18/+154
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@333 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also lower-case reqanchorArthur de Jong2007-05-121-0/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@332 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix some copyright datesArthur de Jong2007-05-125-5/+5
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@331 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* switch robots.txt handling to default on again (broken ↵Arthur de Jong2007-05-123-2/+15
| | | | | | in 1.9.8) and add new --ignore-robots option to be able to ignore robots retrieval git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@330 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* present the default number of redirectsArthur de Jong2007-05-091-2/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@329 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* update copyright informationArthur de Jong2007-05-081-2/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@328 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fixes to make output XHTML 1.1 compliantArthur de Jong2007-04-243-8/+20
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@327 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* handle ID attribute as anchor on any tagArthur de Jong2007-04-241-5/+5
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@326 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* lower-case anchor and errors to include id as optionArthur de Jong2007-04-242-2/+4
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@325 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* correctly parse author informationArthur de Jong2007-04-201-2/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@324 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* introduce HTML parsing using BeautifulSoup with a ↵Arthur de Jong2007-04-204-64/+256
| | | | | | fall-back mechanism to the old HTMLParser based solution git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@323 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* mark encoding problems and output more debuggingArthur de Jong2007-04-201-2/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@322 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix formatting of previous changelog entryArthur de Jong2007-04-201-3/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@321 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix typoArthur de Jong2007-04-201-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@320 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add workaround for bug in idna moduleArthur de Jong2007-04-061-0/+5
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@319 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add some comments to the follow_link() methodArthur de Jong2007-04-061-0/+4
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@318 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* make parsing of URLs and conversion to Link objects a ↵Arthur de Jong2007-04-061-9/+28
| | | | | | little more consistent git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@317 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* use consistent Unicode conversionArthur de Jong2007-04-061-8/+14
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@316 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* document the fact that --force should be used for ↵Arthur de Jong2007-04-061-1/+2
| | | | | | non-interactive use git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@315 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* bail out if reading user input failedArthur de Jong2007-04-061-1/+6
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@314 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* evaluate archive attribute of <applet> tag instead of ↵Arthur de Jong2007-03-311-2/+5
| | | | | | code attribute if that is present git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@313 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* get rid of old base (singular) as bases is now used ↵Arthur de Jong2007-03-141-3/+0
| | | | | | everywhere git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@312 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* clean up a little and simplifyArthur de Jong2007-03-101-8/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@311 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* get files ready for 1.9.8 release1.9.8Arthur de Jong2007-01-158-127/+213
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@309 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* catch any exception in HTTP module and report is as a ↵Arthur de Jong2007-01-151-0/+7
| | | | | | link problem git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@308 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* move section on webcheck design into HACKING documentArthur de Jong2007-01-153-30/+28
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@307 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix the bugreporting section to more clearly state the ↵Arthur de Jong2007-01-151-6/+16
| | | | | | needed information git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@306 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* switch to using latest syntax of python-supportArthur de Jong2007-01-133-3/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@305 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* present sitemap with all basesArthur de Jong2006-10-231-1/+5
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@304 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add USE_ROBOTS optionArthur de Jong2006-10-231-0/+4
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@303 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* include list of bases in Site classArthur de Jong2006-10-231-10/+13
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@302 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* print reason why link is yanked if availableArthur de Jong2006-10-231-1/+6
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@301 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* include link to homepage in package descriptionArthur de Jong2006-09-291-0/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@300 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* explicitly transform username and password to string in ↵Arthur de Jong2006-09-041-2/+2
| | | | | | case either one isn't supplied git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@299 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also handle SSL related socket errors (e.g. SSL time-out)Arthur de Jong2006-07-131-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@298 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add set_encoding method to Link object to do some basic ↵Arthur de Jong2006-07-133-14/+23
| | | | | | encoding sanity checks git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@297 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* get files ready for 1.9.7 release1.9.7Arthur de Jong2006-07-026-26/+442
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@295 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* always keep navigation on topArthur de Jong2006-06-291-0/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@294 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* store internal, external and yanked regular expressions ↵Arthur de Jong2006-06-242-12/+12
| | | | | | in a map allowing them to be serialized git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@293 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* switch to using python-support and follow recent python ↵Arthur de Jong2006-06-233-3/+7
| | | | | | policy git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@292 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* split Build-Depends-Indep into Build-Depends and ↵Arthur de Jong2006-06-051-1/+2
| | | | | | Build-Depends-Indep git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@291 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also install favicon.ico in deb package (plus cosmetic fix)Arthur de Jong2006-06-051-1/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@290 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix typos and fix example explanationArthur de Jong2006-06-041-3/+4
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@289 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* do not split list of strings on comma's inside the ↵Arthur de Jong2006-06-041-2/+4
| | | | | | quoted strings git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@288 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* make DeSerializeException a class instead of a function ↵Arthur de Jong2006-06-041-1/+2
| | | | | | and add FIXME git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@287 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add --continue option to resume the crawling from the ↵Arthur de Jong2006-06-043-4/+40
| | | | | | point where the previous crawl stopped git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@286 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* handle break signals in all codeArthur de Jong2006-06-021-6/+6
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@285 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add code to serialize crawled data during crawl and ↵Arthur de Jong2006-06-021-1/+12
| | | | | | again after crawl git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@284 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* raise a custom exception instead of IOErrorArthur de Jong2006-06-021-9/+11
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@283 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add TODOsArthur de Jong2006-05-311-0/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@282 86f53f14-5ff3-0310-afe5-9b438ce3f40c