Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
path: root/webcheck/crawler.py
Commit message (Expand)AuthorAgeFilesLines
* Split functionality into Link.get_or_create()Arthur de Jong2013-12-151-8/+1
* Rename some functionsArthur de Jong2013-12-151-13/+13
* Small simplificationArthur de Jong2013-12-151-1/+1
* Move SQLite initialisation to db moduleArthur de Jong2013-12-151-10/+2
* Move static files to webcheck/staticArthur de Jong2013-12-021-3/+3
* Fix missing importArthur de Jong2013-12-021-0/+1
* Use crawler.base_urls instead of crawler.basesArthur de Jong2013-09-281-33/+27
* Introduce a site_name in the crawlerArthur de Jong2013-09-281-0/+5
* Get response size and modified date from requestArthur de Jong2013-09-281-3/+9
* Provide function for template-based report renderingArthur de Jong2013-09-221-1/+1
* Explicityly close database sessionsArthur de Jong2013-09-221-0/+2
* Initialise crawler with a configurationArthur de Jong2013-09-201-31/+44
* Expose configured plugins via crawler.pluginsArthur de Jong2013-09-201-13/+14
* Get default configuration from config moduleArthur de Jong2013-09-201-1/+11
* pass a string to RobotFileParser because of problems with...Arthur de Jong2012-08-291-1/+1
* support MAX_DEPTH == 0Devin Bayer2011-11-161-1/+1
* implement a MAX_DEPTH configuration option to limit crawl...Arthur de Jong2011-11-041-1/+4
* switch to using the logging frameworkArthur de Jong2011-10-141-30/+25
* simplify logging of depthArthur de Jong2011-10-141-2/+1
* fix missing import (broken in r452)Arthur de Jong2011-10-081-1/+1
* also handle exceptions while parsing (e.g. issue when rea...Arthur de Jong2011-10-081-6/+9
* ensure that the database is emptied completely and move t...Arthur de Jong2011-10-081-12/+2
* switch to using MozillaCookieJar because LWPCookieJar has...Arthur de Jong2011-10-081-2/+2
* rename Crawler.add_internal() to Crawler.add_base() and a...Arthur de Jong2011-10-071-9/+25
* rename Site to CrawlerArthur de Jong2011-10-071-4/+3
* move some more initialisation from cmd to crawler and mak...Arthur de Jong2011-10-071-17/+28
* move some file-handling functions to webcheck.utilArthur de Jong2011-10-071-0/+5
* move version and homepage definition from config to the w...Arthur de Jong2011-10-071-1/+2
* pass the IO timeout to urllib2Arthur de Jong2011-09-161-3/+2
* use fully qualified plugin namesArthur de Jong2011-09-161-10/+10
* move all the code except the command-line handling to the...Arthur de Jong2011-09-161-0/+422