| Commit message (Expand) | Author | Age | Files | Lines |
* | update copyright years | Arthur de Jong | 2010-09-11 | 3 | -3/+3 |
* | handle case where inline CSS is used on a page with <base... | Arthur de Jong | 2009-01-14 | 3 | -7/+9 |
* | copy-paste fix (thanks Robert M. Jansen <dutch12154@yahoo... | Arthur de Jong | 2008-07-13 | 1 | -1/+1 |
* | call tidy (if available) on HTML content (based on a patc... | Arthur de Jong | 2008-07-04 | 2 | -7/+59 |
* | fix name of file | Arthur de Jong | 2008-07-04 | 1 | -1/+1 |
* | also pick up any style attributes and parse as css, based... | Arthur de Jong | 2008-06-21 | 2 | -0/+10 |
* | add parsing of script tag and background attributes, base... | Arthur de Jong | 2008-06-15 | 2 | -0/+16 |
* | do not require src attribute for parsing inline style tags | Arthur de Jong | 2008-06-15 | 1 | -1/+1 |
* | update copyright year | Arthur de Jong | 2008-06-15 | 1 | -1/+1 |
* | fix parsing of <param> tag | Arthur de Jong | 2008-05-25 | 1 | -1/+1 |
* | support <iframe> and some common usages of <object> | Arthur de Jong | 2008-05-24 | 1 | -0/+15 |
* | add a warning if the used version of BeautifulSoup contai... | Arthur de Jong | 2007-09-17 | 1 | -0/+5 |
* | also handle http-equiv refresh meta header | Arthur de Jong | 2007-07-15 | 1 | -3/+13 |
* | split out URL cleaning code into own module | Arthur de Jong | 2007-07-07 | 2 | -17/+19 |
* | handle ID attribute as anchor on any tag | Arthur de Jong | 2007-04-24 | 1 | -5/+5 |
* | correctly parse author information | Arthur de Jong | 2007-04-20 | 1 | -2/+2 |
* | introduce HTML parsing using BeautifulSoup with a fall-ba... | Arthur de Jong | 2007-04-20 | 3 | -64/+255 |
* | evaluate archive attribute of <applet> tag instead of cod... | Arthur de Jong | 2007-03-31 | 1 | -2/+5 |
* | add set_encoding method to Link object to do some basic e... | Arthur de Jong | 2006-07-13 | 1 | -13/+11 |
* | add TODOs | Arthur de Jong | 2006-05-31 | 1 | -0/+2 |
* | make decoding try/fall-back code a lot simpler and handle... | Arthur de Jong | 2006-05-15 | 1 | -12/+7 |
* | improve warning text and add comment concerning trying of... | Arthur de Jong | 2006-05-12 | 1 | -1/+2 |
* | ignore unknown entities instead of throwing an error | Arthur de Jong | 2006-05-12 | 1 | -2/+5 |
* | move html escaping and unescaping functions to parsers.html | Arthur de Jong | 2006-05-07 | 1 | -11/+52 |
* | use unichr() to generate Unicode characters, not chr() | Arthur de Jong | 2006-05-07 | 1 | -1/+1 |
* | some more small code improvements thanks to pychecker | Arthur de Jong | 2006-05-07 | 1 | -0/+1 |
* | implement checking for id and name tags in anchors | Arthur de Jong | 2006-05-06 | 1 | -12/+39 |
* | code improvements thanks to pylint | Arthur de Jong | 2006-04-23 | 3 | -65/+74 |
* | do not fail on unknown encodings (fall back to system enc... | Arthur de Jong | 2006-04-07 | 1 | -3/+6 |
* | split urlescape() from _urlclean() and ensure that all an... | Arthur de Jong | 2006-03-26 | 1 | -2/+2 |
* | revert catching Exception instead of IOError that was the... | Arthur de Jong | 2006-03-11 | 1 | -1/+1 |
* | implement checking of anchors (there should be no double ... | Arthur de Jong | 2006-03-10 | 1 | -4/+20 |
* | trim spaces from title and author fields and check that t... | Arthur de Jong | 2006-03-10 | 1 | -2/+2 |
* | make sure all URLs are consistently URL-encoded where it ... | Arthur de Jong | 2006-01-29 | 1 | -5/+1 |
* | fix typo (thanks Andrew Kim <Andrew.Kim@revolution.com>) | Arthur de Jong | 2006-01-26 | 1 | -2/+2 |
* | quote links so that they do not contain any non-ASCII cha... | Arthur de Jong | 2006-01-19 | 1 | -7/+13 |
* | bug fix to handle numeric character references better (Un... | Arthur de Jong | 2005-12-26 | 1 | -2/+2 |
* | add copyright clarification to specify that generated out... | Arthur de Jong | 2005-12-17 | 3 | -0/+9 |
* | store author and title in Unicode internally and ensure t... | Arthur de Jong | 2005-09-17 | 1 | -2/+23 |
* | also try to get character encoding from XML declaration a... | Arthur de Jong | 2005-09-17 | 1 | -0/+22 |
* | parse character entries as normal data, these entities wi... | Arthur de Jong | 2005-09-17 | 1 | -0/+10 |
* | also feed style tag content to the CSS parser to parse in... | Arthur de Jong | 2005-08-20 | 1 | -0/+7 |
* | remove some debugging functions from CSS parser | Arthur de Jong | 2005-08-20 | 1 | -3/+0 |
* | first attempt at a very simple CSS parser that just summa... | Arthur de Jong | 2005-08-20 | 1 | -1/+28 |
* | add checking of unescaped spaces to the html parser, incl... | Arthur de Jong | 2005-08-20 | 1 | -25/+41 |
* | split problems into page problems (parsing errors, wrong ... | Arthur de Jong | 2005-08-19 | 1 | -1/+1 |
* | also pass mimetypes to scheme modules to only fetch conte... | Arthur de Jong | 2005-08-12 | 1 | -6/+18 |
* | put compiled regular expression on module level so that i... | Arthur de Jong | 2005-08-12 | 1 | -2/+4 |
* | make parsing handle errors a little more gracefully, than... | Arthur de Jong | 2005-08-01 | 1 | -3/+6 |
* | also catch AttributeError for problem in HTMLParser not f... | Arthur de Jong | 2005-07-31 | 1 | -1/+1 |