Arthur de Jong
Open Source / Free Software developer
index
:
webcheck
master
A website link and structure checker
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
parsers
Commit message (
Expand
)
Author
Age
Files
Lines
*
make sure all URLs are consistently URL-encoded where it ...
Arthur de Jong
2006-01-29
1
-5
/
+1
*
fix typo (thanks Andrew Kim <Andrew.Kim@revolution.com>)
Arthur de Jong
2006-01-26
1
-2
/
+2
*
quote links so that they do not contain any non-ASCII cha...
Arthur de Jong
2006-01-19
1
-7
/
+13
*
bug fix to handle numeric character references better (Un...
Arthur de Jong
2005-12-26
1
-2
/
+2
*
add copyright clarification to specify that generated out...
Arthur de Jong
2005-12-17
3
-0
/
+9
*
store author and title in Unicode internally and ensure t...
Arthur de Jong
2005-09-17
1
-2
/
+23
*
also try to get character encoding from XML declaration a...
Arthur de Jong
2005-09-17
1
-0
/
+22
*
parse character entries as normal data, these entities wi...
Arthur de Jong
2005-09-17
1
-0
/
+10
*
also feed style tag content to the CSS parser to parse in...
Arthur de Jong
2005-08-20
1
-0
/
+7
*
remove some debugging functions from CSS parser
Arthur de Jong
2005-08-20
1
-3
/
+0
*
first attempt at a very simple CSS parser that just summa...
Arthur de Jong
2005-08-20
1
-1
/
+28
*
add checking of unescaped spaces to the html parser, incl...
Arthur de Jong
2005-08-20
1
-25
/
+41
*
split problems into page problems (parsing errors, wrong ...
Arthur de Jong
2005-08-19
1
-1
/
+1
*
also pass mimetypes to scheme modules to only fetch conte...
Arthur de Jong
2005-08-12
1
-6
/
+18
*
put compiled regular expression on module level so that i...
Arthur de Jong
2005-08-12
1
-2
/
+4
*
make parsing handle errors a little more gracefully, than...
Arthur de Jong
2005-08-01
1
-3
/
+6
*
also catch AttributeError for problem in HTMLParser not f...
Arthur de Jong
2005-07-31
1
-1
/
+1
*
replace numeric entity refs with their proper values base...
Arthur de Jong
2005-07-31
1
-2
/
+11
*
put new html parser in place
Arthur de Jong
2005-07-31
1
-88
/
+113
*
remove references to email addresses where they are not u...
Arthur de Jong
2005-07-29
3
-5
/
+5
*
empty module as place holder to parse CSS (referenced fro...
Arthur de Jong
2005-07-25
1
-0
/
+20
*
don't replace an already set title
Arthur de Jong
2005-07-25
1
-1
/
+2
*
Mike Meyer -> Mike W. Meyer
Arthur de Jong
2005-07-23
1
-1
/
+1
*
almost complete rewrite of crawling and site state code m...
Arthur de Jong
2005-07-22
2
-20
/
+65
*
move htmlparse to a more generic parsers package, cleanin...
Arthur de Jong
2005-07-09
2
-0
/
+128