Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
path: root/parsers
Commit message (Collapse)AuthorAgeFilesLines
* also feed style tag content to the CSS parser to parse ↵Arthur de Jong2005-08-201-0/+7
| | | | | | inline CSS git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@148 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove some debugging functions from CSS parserArthur de Jong2005-08-201-3/+0
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@147 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* first attempt at a very simple CSS parser that just ↵Arthur de Jong2005-08-201-1/+28
| | | | | | summarises links to images and imported CSS files git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@146 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* add checking of unescaped spaces to the html parser, ↵Arthur de Jong2005-08-201-25/+41
| | | | | | including line and column information git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@144 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* split problems into page problems (parsing errors, wrong ↵Arthur de Jong2005-08-191-1/+1
| | | | | | links, etc) and link problems (errors retreiving the document) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@138 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also pass mimetypes to scheme modules to only fetch ↵Arthur de Jong2005-08-121-6/+18
| | | | | | content if we can parse the content type git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@128 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* put compiled regular expression on module level so that ↵Arthur de Jong2005-08-121-2/+4
| | | | | | it is compiled only once git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@125 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* make parsing handle errors a little more gracefully, ↵Arthur de Jong2005-08-011-3/+6
| | | | | | thanks to Stefan Schröder <stefan@tokonoma.de> for all the testing git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@122 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* also catch AttributeError for problem in HTMLParser not ↵Arthur de Jong2005-07-311-1/+1
| | | | | | fully supporting continuing after errors git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@119 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* replace numeric entity refs with their proper values ↵Arthur de Jong2005-07-311-2/+11
| | | | | | based on patch by Eric W.Brown <eric@saugus.net> git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@117 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* put new html parser in placeArthur de Jong2005-07-311-88/+113
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@116 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* remove references to email addresses where they are not ↵Arthur de Jong2005-07-293-5/+5
| | | | | | useful, based on a partial patch by Evelyn Mitchell <efm@tummy.com> git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@99 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* empty module as place holder to parse CSS (referenced ↵Arthur de Jong2005-07-251-0/+20
| | | | | | from __init__.py already) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@91 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* don't replace an already set titleArthur de Jong2005-07-251-1/+2
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@90 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* Mike Meyer -> Mike W. MeyerArthur de Jong2005-07-231-1/+1
| | | | git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@72 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* almost complete rewrite of crawling and site state code ↵Arthur de Jong2005-07-222-20/+65
| | | | | | making children and parents link objects instead of URLs and giving link member variables better names, change plugins accordingly, make scheme handling more pluggable and only use one function call and have a better pluggable structure for content parsing (currently only html) git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@66 86f53f14-5ff3-0310-afe5-9b438ce3f40c
* move htmlparse to a more generic parsers package, ↵Arthur de Jong2005-07-092-0/+128
cleaning up the code and simplifying dependencies git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@58 86f53f14-5ff3-0310-afe5-9b438ce3f40c