Arthur de Jong
Open Source / Free Software developer
index
:
webcheck
master
A website link and structure checker
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
parsers
Commit message (
Expand
)
Author
Age
Files
Lines
*
copy-paste fix (thanks Robert M. Jansen <dutch12154@yahoo...
Arthur de Jong
2008-07-13
1
-1
/
+1
*
call tidy (if available) on HTML content (based on a patc...
Arthur de Jong
2008-07-04
2
-7
/
+59
*
fix name of file
Arthur de Jong
2008-07-04
1
-1
/
+1
*
also pick up any style attributes and parse as css, based...
Arthur de Jong
2008-06-21
2
-0
/
+10
*
add parsing of script tag and background attributes, base...
Arthur de Jong
2008-06-15
2
-0
/
+16
*
do not require src attribute for parsing inline style tags
Arthur de Jong
2008-06-15
1
-1
/
+1
*
update copyright year
Arthur de Jong
2008-06-15
1
-1
/
+1
*
fix parsing of <param> tag
Arthur de Jong
2008-05-25
1
-1
/
+1
*
support <iframe> and some common usages of <object>
Arthur de Jong
2008-05-24
1
-0
/
+15
*
add a warning if the used version of BeautifulSoup contai...
Arthur de Jong
2007-09-17
1
-0
/
+5
*
also handle http-equiv refresh meta header
Arthur de Jong
2007-07-15
1
-3
/
+13
*
split out URL cleaning code into own module
Arthur de Jong
2007-07-07
2
-17
/
+19
*
handle ID attribute as anchor on any tag
Arthur de Jong
2007-04-24
1
-5
/
+5
*
correctly parse author information
Arthur de Jong
2007-04-20
1
-2
/
+2
*
introduce HTML parsing using BeautifulSoup with a fall-ba...
Arthur de Jong
2007-04-20
3
-64
/
+255
*
evaluate archive attribute of <applet> tag instead of cod...
Arthur de Jong
2007-03-31
1
-2
/
+5
*
add set_encoding method to Link object to do some basic e...
Arthur de Jong
2006-07-13
1
-13
/
+11
*
add TODOs
Arthur de Jong
2006-05-31
1
-0
/
+2
*
make decoding try/fall-back code a lot simpler and handle...
Arthur de Jong
2006-05-15
1
-12
/
+7
*
improve warning text and add comment concerning trying of...
Arthur de Jong
2006-05-12
1
-1
/
+2
*
ignore unknown entities instead of throwing an error
Arthur de Jong
2006-05-12
1
-2
/
+5
*
move html escaping and unescaping functions to parsers.html
Arthur de Jong
2006-05-07
1
-11
/
+52
*
use unichr() to generate Unicode characters, not chr()
Arthur de Jong
2006-05-07
1
-1
/
+1
*
some more small code improvements thanks to pychecker
Arthur de Jong
2006-05-07
1
-0
/
+1
*
implement checking for id and name tags in anchors
Arthur de Jong
2006-05-06
1
-12
/
+39
*
code improvements thanks to pylint
Arthur de Jong
2006-04-23
3
-65
/
+74
*
do not fail on unknown encodings (fall back to system enc...
Arthur de Jong
2006-04-07
1
-3
/
+6
*
split urlescape() from _urlclean() and ensure that all an...
Arthur de Jong
2006-03-26
1
-2
/
+2
*
revert catching Exception instead of IOError that was the...
Arthur de Jong
2006-03-11
1
-1
/
+1
*
implement checking of anchors (there should be no double ...
Arthur de Jong
2006-03-10
1
-4
/
+20
*
trim spaces from title and author fields and check that t...
Arthur de Jong
2006-03-10
1
-2
/
+2
*
make sure all URLs are consistently URL-encoded where it ...
Arthur de Jong
2006-01-29
1
-5
/
+1
*
fix typo (thanks Andrew Kim <Andrew.Kim@revolution.com>)
Arthur de Jong
2006-01-26
1
-2
/
+2
*
quote links so that they do not contain any non-ASCII cha...
Arthur de Jong
2006-01-19
1
-7
/
+13
*
bug fix to handle numeric character references better (Un...
Arthur de Jong
2005-12-26
1
-2
/
+2
*
add copyright clarification to specify that generated out...
Arthur de Jong
2005-12-17
3
-0
/
+9
*
store author and title in Unicode internally and ensure t...
Arthur de Jong
2005-09-17
1
-2
/
+23
*
also try to get character encoding from XML declaration a...
Arthur de Jong
2005-09-17
1
-0
/
+22
*
parse character entries as normal data, these entities wi...
Arthur de Jong
2005-09-17
1
-0
/
+10
*
also feed style tag content to the CSS parser to parse in...
Arthur de Jong
2005-08-20
1
-0
/
+7
*
remove some debugging functions from CSS parser
Arthur de Jong
2005-08-20
1
-3
/
+0
*
first attempt at a very simple CSS parser that just summa...
Arthur de Jong
2005-08-20
1
-1
/
+28
*
add checking of unescaped spaces to the html parser, incl...
Arthur de Jong
2005-08-20
1
-25
/
+41
*
split problems into page problems (parsing errors, wrong ...
Arthur de Jong
2005-08-19
1
-1
/
+1
*
also pass mimetypes to scheme modules to only fetch conte...
Arthur de Jong
2005-08-12
1
-6
/
+18
*
put compiled regular expression on module level so that i...
Arthur de Jong
2005-08-12
1
-2
/
+4
*
make parsing handle errors a little more gracefully, than...
Arthur de Jong
2005-08-01
1
-3
/
+6
*
also catch AttributeError for problem in HTMLParser not f...
Arthur de Jong
2005-07-31
1
-1
/
+1
*
replace numeric entity refs with their proper values base...
Arthur de Jong
2005-07-31
1
-2
/
+11
*
put new html parser in place
Arthur de Jong
2005-07-31
1
-88
/
+113
[next]