diff options
author | Arthur de Jong <arthur@arthurdejong.org> | 2005-08-16 22:50:57 +0200 |
---|---|---|
committer | Arthur de Jong <arthur@arthurdejong.org> | 2005-08-16 22:50:57 +0200 |
commit | 9da865e3796c0c48e13ed3574f42b946c103a986 (patch) | |
tree | e584e57176c3d633a00953f1ff7b708d9c71a925 | |
parent | 73c26fc1bd1c02155fc49ff5c32c26d9cb58d98a (diff) |
get files ready for 1.9.3 release1.9.3
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@136 86f53f14-5ff3-0310-afe5-9b438ce3f40c
-rw-r--r-- | ChangeLog | 82 | ||||
-rw-r--r-- | NEWS | 16 | ||||
-rw-r--r-- | TODO | 30 | ||||
-rw-r--r-- | config.py | 2 | ||||
-rw-r--r-- | webcheck.1 | 2 |
5 files changed, 114 insertions, 18 deletions
@@ -1,3 +1,83 @@ +2005-08-16 20:36 arthur + + * [r135] config.py, schemes/file.py, schemes/ftp.py: pick up + configured filenames if present in directories + +2005-08-16 18:25 arthur + + * [r134] schemes/ftp.py: add extra debugging info + +2005-08-13 19:19 arthur + + * [r133] schemes/ftp.py: use a pool of ftp connections to keep ftp + connection to a host open to do multiple requests (this greatly + speeds up crawling of ftp sites) + +2005-08-13 19:08 arthur + + * [r132] schemes/ftp.py: almost complete reimplementation of the + ftp scheme, handling errors more gracefully and also crawl + normal ftp directories + +2005-08-13 19:06 arthur + + * [r131] plugins/__init__.py: add missing newline and trim + trailing newline of extra link info + +2005-08-12 19:04 arthur + + * [r130] schemes/file.py: complete reimplementation of file + module, reading index.html from directory, otherwise read + directory contents + +2005-08-12 18:20 arthur + + * [r129] schemes/__init__.py, schemes/file.py, schemes/ftp.py, + schemes/http.py: rename parameter to acceptedtypes to not + conflict with mimetypes module + +2005-08-12 17:27 arthur + + * [r128] crawler.py, parsers/__init__.py, schemes/__init__.py, + schemes/file.py, schemes/ftp.py, schemes/http.py: also pass + mimetypes to scheme modules to only fetch content if we can + parse the content type + +2005-08-12 17:02 arthur + + * [r127] plugins/__init__.py: don't print referenced from if there + are no parents + +2005-08-12 16:57 arthur + + * [r126] crawler.py: add checkurl method to clean up urls and + report problems (currently only checks for spaces in urls) + +2005-08-12 16:55 arthur + + * [r125] parsers/html.py: put compiled regular expression on + module level so that it is compiled only once + +2005-08-12 16:52 arthur + + * [r124] webcheck.css: small fix to render menu better under MSIE + +2005-08-11 21:41 arthur + + * [r123] plugins/__init__.py: add some extra information to every + link with a nicely formatted size + +2005-08-01 17:58 arthur + + * [r122] parsers/html.py: make parsing handle errors a little more + gracefully, thanks to Stefan Schr�der <stefan@tokonoma.de> for + all the testing + +2005-07-31 20:58 arthur + + * [r120] ChangeLog, NEWS, TODO, config.py: get files ready for + 1.9.2 release + 2005-07-31 20:44 arthur * [r119] parsers/html.py: also catch AttributeError for problem in @@ -10,7 +90,7 @@ 2005-07-31 09:45 arthur * [r117] parsers/html.py: replace numeric entity refs with their - proper values based on patch by UNKNOWN + proper values based on patch by Eric W.Brown <eric@saugus.net> 2005-07-31 09:21 arthur @@ -1,3 +1,19 @@ +changes from 1.9.2 to 1.9.3 +--------------------------- + +* several improvements to the generated reports, including tooltips with some + useful information for the links (does not seem to work very well in + firefox) +* stability improvements to the html parser (thanks to everyone who reported + problems) not all problems have been solved but it shouldn't stop webcheck + any more +* reimplementation of the file and ftp modules to read directory contents or + read index.html file if present (there are known problems in the ftp module + regarding empty directories and recovering from errors) +* improvements to the url parsing code to warn about spaces in urls +* only fetch content if we can parse it + + changes from 1.9.1 to 1.9.2 --------------------------- @@ -1,16 +1,15 @@ before next release ------------------- * go over all FIXMEs in code -* rewrite ftp scheme module -* rewrite file scheme module +* parse css (see debian bugs?) +* make sure all characters in urls are properly url encoded (and make it an error in the checked page) +* close streams properly when not downloading files probably before 2.0 release --------------------------- -* parse css * maybe choose a different license for webcheck.css * make it possible to copy or reference webcheck.css -* make it possible to copy http:.../webcheck.css into place (use scheme system) -* create onmouseover information for links containing useful information for url +* make it possible to copy http:.../webcheck.css into place (maybe use scheme system, probably just urllib) * make more things configurable * make a Debian package * maybe generate a list of page parents (this is useful to list proper parent links for problem pages) @@ -19,12 +18,9 @@ probably before 2.0 release * support for mult-threading (maybe) * divide problems in transfer problems and page problems (transfer problems result in a bad link problem on a page) * clean up printing of messages, especially needed for multi-threading -* rewrite scheme modules to make proper use of new calling method -* only download complete documents if the mime type is supported * go over command line options and see if we need long equivalents * implement a fix for redirecting stdout and stderr to work properly * put a maximum transfer size for downloading files and things over http -* make error handling of html parser more robust wishlist -------- @@ -33,19 +29,23 @@ wishlist * maybe set referer (configurable) * support for authenticating proxies * new config file format (if we want a configfile at all) +* support ftp proxies * cookies support (maybe) * integration with weblint * combine with a logfile checker to also show number of hits per link -* performance and other improvements (we can switch to sets with python 2.4) * write a guide to writing plugins -* form checking -* spelling checking -* test w3c conformance of pages -* maybe make broken links not clickable +* do form checking +* do spelling checking +* test w3c conformance of pages (already done a little) +* maybe make broken links not clickable in report * maybe store crawled site's data in some format for later processing or continuing after interruption * create output directory if it does not exist -* add support for fetching gzipped content -* write section on internal and external urls in the manual page +* add support for fetching gzipped content to improve performance +* maybe do http pipelining * add a favicon to reports * add a test to see if python supports https and fail elegantly otherwise * maybe follow redirects of external links +* make error handling of html parser more robust (maybe send a patch for html parser upstream) +* improve tooltips +* maybe use this as a html parser: http://www.crummy.com/software/BeautifulSoup/examples.html +* maybe have a way to output google sitemap files: http://www.google.com/webmasters/sitemaps/docs/en/protocol.html @@ -27,7 +27,7 @@ items should be changeble from the command line.""" import urllib # Current version of webcheck. -VERSION = "1.9.2" +VERSION = "1.9.3" # The homepage of webcheck. HOMEPAGE = "http://ch.tudelft.nl/~arthur/webcheck/" @@ -15,7 +15,7 @@ .\" Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA .\" .nh .\" -.TH "webcheck" "1" "Jul 2005" "Version 1.9,0" "User Commands" +.TH "webcheck" "1" "Jul 2005" "Version 1.9,3" "User Commands" .nh .SH "NAME" webcheck \- website link checker |