Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* upgrade to standards-version 3.7.2 (no changes needed)Arthur de Jong2006-05-311-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@281 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* update feature list from deb package descriptionArthur de Jong2006-05-311-2/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@280 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* split crawler.crawl() function into crawler.crawl() and ↵Arthur de Jong2006-05-163-7/+12
| | | | | | crawler.postprocess() functions git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@279 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also serialize remaining links after crawlArthur de Jong2006-05-161-0/+8
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@278 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove anchor debugging statementsArthur de Jong2006-05-161-2/+0
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@277 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* flag deserialized links as changed so they will be ↵Arthur de Jong2006-05-161-0/+1
| | | | | | reserialized again git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@276 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix sortingArthur de Jong2006-05-161-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@275 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* update link to fancytooltipsArthur de Jong2006-05-162-2/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@274 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add makebackup option to open_file() so we can implement ↵Arthur de Jong2006-05-151-10/+15
| | | | | | updating files (e.g. serialization files) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@273 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix some stupid typosArthur de Jong2006-05-151-3/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@272 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add code to serialize links to a file while crawling the ↵Arthur de Jong2006-05-151-2/+16
| | | | | | site git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@271 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* import crawler late as to simplify dependenciesArthur de Jong2006-05-151-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@270 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix typo in FIXMEArthur de Jong2006-05-151-3/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@269 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add _ischanged attribute to link objects to indicate ↵Arthur de Jong2006-05-151-0/+10
| | | | | | change since the constructor (or serialization) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@268 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* only write serialized data if it is different from the ↵Arthur de Jong2006-05-151-10/+20
| | | | | | constructor's default value git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@267 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* clear anchors, linkproblems and pageproblems from to be ↵Arthur de Jong2006-05-151-0/+4
| | | | | | deserialized links to avoid duplicates as a link can be deserialized multiple times git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@266 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove the call to crawl() from deserialize as this ↵Arthur de Jong2006-05-151-3/+3
| | | | | | could be a partial deserialize that needs more tweaking to the site before the call to crawl() git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@265 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* make decoding try/fall-back code a lot simpler and ↵Arthur de Jong2006-05-151-12/+7
| | | | | | handle case where encoding is specified as empty string git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@264 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* improve warning text and add comment concerning trying ↵Arthur de Jong2006-05-121-1/+2
| | | | | | of encodings git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@263 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* ignore unknown entities instead of throwing an errorArthur de Jong2006-05-121-2/+5
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@262 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* include favicon.ico file in generated reportArthur de Jong2006-05-073-0/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@261 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* ensure that we are not importing anything weird by using ↵Arthur de Jong2006-05-071-0/+9
| | | | | | invalid scheme names git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@260 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* support floats as parameter for --waitArthur de Jong2006-05-071-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@259 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix usage of dashArthur de Jong2006-05-071-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@258 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add serialize module that allows serializing and ↵Arthur de Jong2006-05-071-0/+313
| | | | | | deserializing all crawler state (site and links) to and from a file, this module is not called anywhere yet git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@257 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix typo in docstring and add commentArthur de Jong2006-05-071-1/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@256 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* move html escaping and unescaping functions to parsers.htmlArthur de Jong2006-05-072-36/+55
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@255 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* use unichr() to generate Unicode characters, not chr()Arthur de Jong2006-05-071-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@254 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* return None explicitlyArthur de Jong2006-05-071-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@253 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* some more small code improvements thanks to pycheckerArthur de Jong2006-05-075-4/+11
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@252 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* implement checking for id and name tags in anchorsArthur de Jong2006-05-061-12/+39
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@251 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* bump copyright noticesArthur de Jong2006-05-063-3/+3
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@250 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also add all unfetched links from a site to make this ↵Arthur de Jong2006-04-271-0/+5
| | | | | | method recallable git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@249 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* make get_link() function a public class functionArthur de Jong2006-04-271-5/+5
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@248 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* move URL checking bit to right function and improve ↵Arthur de Jong2006-04-271-5/+5
| | | | | | anchor debugging messages even further git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@247 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* fix remaining references to escape instead of htmlescapeArthur de Jong2006-04-271-7/+7
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@246 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* support passing a URL to add_reqanchor() plus some minor ↵Arthur de Jong2006-04-271-3/+7
| | | | | | comments changes git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@245 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* handle problems in regular expressions passed on the ↵Arthur de Jong2006-04-271-39/+43
| | | | | | command line a little more gracefully git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@244 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* rename escape() function to htmlescape() to make it a ↵Arthur de Jong2006-04-234-10/+10
| | | | | | little clearer what we're escaping git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@243 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* code improvements thanks to pylintArthur de Jong2006-04-2327-372/+460
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@242 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also sort parent list by URL if titles are the sameArthur de Jong2006-04-231-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@241 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also properly handle time-out problems which only pass ↵Arthur de Jong2006-04-231-3/+7
| | | | | | one parameter with the exception git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@240 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* implement a time-out setting with a default of 10 secondsArthur de Jong2006-04-112-0/+7
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@239 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* revert to borderless links as they look ugly in some ↵Arthur de Jong2006-04-111-2/+0
| | | | | | (most) cases git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@238 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* rename slow plugin to sizeArthur de Jong2006-04-112-10/+13
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@237 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* do not fail on unknown encodings (fall back to system ↵Arthur de Jong2006-04-071-3/+6
| | | | | | encoding) and add some TODOs to do extra encoding checking git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@236 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* split urlescape() from _urlclean() and ensure that all ↵Arthur de Jong2006-03-262-6/+14
| | | | | | anchors are consistently URL-encoded git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@235 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* only report missing anchors for pages that were fetched ↵Arthur de Jong2006-03-261-6/+6
| | | | | | and some clean-ups git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@234 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* put a border around linksArthur de Jong2006-03-261-4/+6
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@233 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* properly close html files on no outputArthur de Jong2006-03-269-0/+9
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@232 86f53f14-5ff3-0310-afe5-9b438ce3f40c