Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
path: root/crawler.py
Commit message (Expand)AuthorAgeFilesLines
...
* make a _urlclean() function to always store a proper URL ...Arthur de Jong2005-07-301-2/+12
* import time as we need it for sleepArthur de Jong2005-07-291-0/+1
* do an extra breadth first traversal of the site to combin...Arthur de Jong2005-07-291-5/+61
* remove references to email addresses where they are not u...Arthur de Jong2005-07-291-3/+3
* turn tocheck list into fifo queueArthur de Jong2005-07-271-1/+1
* only add links to crawl list if they are not in there all...Arthur de Jong2005-07-241-2/+2
* fix regular expression matchingArthur de Jong2005-07-231-2/+3
* Mike Meyer -> Mike W. MeyerArthur de Jong2005-07-231-1/+1
* add support for sleep between requestsArthur de Jong2005-07-221-0/+4
* almost complete rewrite of crawling and site state code m...Arthur de Jong2005-07-221-0/+330