Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
path: root/serialize.py
Commit message (Collapse)AuthorAgeFilesLines
* use sets instead of sequences for children, embedded, ↵Arthur de Jong2007-07-131-1/+1
| | | | | | etc to improve deserialization performance with a factor 25 but now require python 2.4 of more recent git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@343 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* give the matched URL a name to make code more readableArthur de Jong2007-07-131-1/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@342 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* be a little more verbose when raising parsing exceptionsArthur de Jong2007-07-131-5/+5
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@341 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* improve deserialization and handling of Unicode stringsArthur de Jong2007-07-061-16/+13
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@336 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* store internal, external and yanked regular expressions ↵Arthur de Jong2006-06-241-3/+3
| | | | | | in a map allowing them to be serialized git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@293 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* do not split list of strings on comma's inside the ↵Arthur de Jong2006-06-041-2/+4
| | | | | | quoted strings git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@288 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* make DeSerializeException a class instead of a function ↵Arthur de Jong2006-06-041-1/+2
| | | | | | and add FIXME git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@287 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* raise a custom exception instead of IOErrorArthur de Jong2006-06-021-9/+11
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@283 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* split crawler.crawl() function into crawler.crawl() and ↵Arthur de Jong2006-05-161-2/+2
| | | | | | crawler.postprocess() functions git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@279 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* flag deserialized links as changed so they will be ↵Arthur de Jong2006-05-161-0/+1
| | | | | | reserialized again git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@276 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* import crawler late as to simplify dependenciesArthur de Jong2006-05-151-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@270 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix typo in FIXMEArthur de Jong2006-05-151-3/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@269 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* only write serialized data if it is different from the ↵Arthur de Jong2006-05-151-10/+20
| | | | | | constructor's default value git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@267 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* clear anchors, linkproblems and pageproblems from to be ↵Arthur de Jong2006-05-151-0/+4
| | | | | | deserialized links to avoid duplicates as a link can be deserialized multiple times git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@266 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove the call to crawl() from deserialize as this ↵Arthur de Jong2006-05-151-3/+3
| | | | | | could be a partial deserialize that needs more tweaking to the site before the call to crawl() git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@265 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add serialize module that allows serializing and ↵Arthur de Jong2006-05-071-0/+313
deserializing all crawler state (site and links) to and from a file, this module is not called anywhere yet git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@257 86f53f14-5ff3-0310-afe5-9b438ce3f40c