Arthur de Jong
Open Source / Free Software developer
index
:
webcheck
master
A website link and structure checker
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
webcheck
/
crawler.py
Commit message (
Expand
)
Author
Age
Files
Lines
*
Split functionality into Link.get_or_create()
Arthur de Jong
2013-12-15
1
-8
/
+1
*
Rename some functions
Arthur de Jong
2013-12-15
1
-13
/
+13
*
Small simplification
Arthur de Jong
2013-12-15
1
-1
/
+1
*
Move SQLite initialisation to db module
Arthur de Jong
2013-12-15
1
-10
/
+2
*
Move static files to webcheck/static
Arthur de Jong
2013-12-02
1
-3
/
+3
*
Fix missing import
Arthur de Jong
2013-12-02
1
-0
/
+1
*
Use crawler.base_urls instead of crawler.bases
Arthur de Jong
2013-09-28
1
-33
/
+27
*
Introduce a site_name in the crawler
Arthur de Jong
2013-09-28
1
-0
/
+5
*
Get response size and modified date from request
Arthur de Jong
2013-09-28
1
-3
/
+9
*
Provide function for template-based report rendering
Arthur de Jong
2013-09-22
1
-1
/
+1
*
Explicityly close database sessions
Arthur de Jong
2013-09-22
1
-0
/
+2
*
Initialise crawler with a configuration
Arthur de Jong
2013-09-20
1
-31
/
+44
*
Expose configured plugins via crawler.plugins
Arthur de Jong
2013-09-20
1
-13
/
+14
*
Get default configuration from config module
Arthur de Jong
2013-09-20
1
-1
/
+11
*
pass a string to RobotFileParser because of problems with...
Arthur de Jong
2012-08-29
1
-1
/
+1
*
support MAX_DEPTH == 0
Devin Bayer
2011-11-16
1
-1
/
+1
*
implement a MAX_DEPTH configuration option to limit crawl...
Arthur de Jong
2011-11-04
1
-1
/
+4
*
switch to using the logging framework
Arthur de Jong
2011-10-14
1
-30
/
+25
*
simplify logging of depth
Arthur de Jong
2011-10-14
1
-2
/
+1
*
fix missing import (broken in r452)
Arthur de Jong
2011-10-08
1
-1
/
+1
*
also handle exceptions while parsing (e.g. issue when rea...
Arthur de Jong
2011-10-08
1
-6
/
+9
*
ensure that the database is emptied completely and move t...
Arthur de Jong
2011-10-08
1
-12
/
+2
*
switch to using MozillaCookieJar because LWPCookieJar has...
Arthur de Jong
2011-10-08
1
-2
/
+2
*
rename Crawler.add_internal() to Crawler.add_base() and a...
Arthur de Jong
2011-10-07
1
-9
/
+25
*
rename Site to Crawler
Arthur de Jong
2011-10-07
1
-4
/
+3
*
move some more initialisation from cmd to crawler and mak...
Arthur de Jong
2011-10-07
1
-17
/
+28
*
move some file-handling functions to webcheck.util
Arthur de Jong
2011-10-07
1
-0
/
+5
*
move version and homepage definition from config to the w...
Arthur de Jong
2011-10-07
1
-1
/
+2
*
pass the IO timeout to urllib2
Arthur de Jong
2011-09-16
1
-3
/
+2
*
use fully qualified plugin names
Arthur de Jong
2011-09-16
1
-10
/
+10
*
move all the code except the command-line handling to the...
Arthur de Jong
2011-09-16
1
-0
/
+422