Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorArthur de Jong <arthur@arthurdejong.org>2005-08-16 22:50:57 +0200
committerArthur de Jong <arthur@arthurdejong.org>2005-08-16 22:50:57 +0200
commit9da865e3796c0c48e13ed3574f42b946c103a986 (patch)
treee584e57176c3d633a00953f1ff7b708d9c71a925
parent73c26fc1bd1c02155fc49ff5c32c26d9cb58d98a (diff)
get files ready for 1.9.3 release1.9.3
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@136 86f53f14-5ff3-0310-afe5-9b438ce3f40c
-rw-r--r--ChangeLog82
-rw-r--r--NEWS16
-rw-r--r--TODO30
-rw-r--r--config.py2
-rw-r--r--webcheck.12
5 files changed, 114 insertions, 18 deletions
diff --git a/ChangeLog b/ChangeLog
index 29b02f0..1fa0e0f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,83 @@
+2005-08-16 20:36 arthur
+
+ * [r135] config.py, schemes/file.py, schemes/ftp.py: pick up
+ configured filenames if present in directories
+
+2005-08-16 18:25 arthur
+
+ * [r134] schemes/ftp.py: add extra debugging info
+
+2005-08-13 19:19 arthur
+
+ * [r133] schemes/ftp.py: use a pool of ftp connections to keep ftp
+ connection to a host open to do multiple requests (this greatly
+ speeds up crawling of ftp sites)
+
+2005-08-13 19:08 arthur
+
+ * [r132] schemes/ftp.py: almost complete reimplementation of the
+ ftp scheme, handling errors more gracefully and also crawl
+ normal ftp directories
+
+2005-08-13 19:06 arthur
+
+ * [r131] plugins/__init__.py: add missing newline and trim
+ trailing newline of extra link info
+
+2005-08-12 19:04 arthur
+
+ * [r130] schemes/file.py: complete reimplementation of file
+ module, reading index.html from directory, otherwise read
+ directory contents
+
+2005-08-12 18:20 arthur
+
+ * [r129] schemes/__init__.py, schemes/file.py, schemes/ftp.py,
+ schemes/http.py: rename parameter to acceptedtypes to not
+ conflict with mimetypes module
+
+2005-08-12 17:27 arthur
+
+ * [r128] crawler.py, parsers/__init__.py, schemes/__init__.py,
+ schemes/file.py, schemes/ftp.py, schemes/http.py: also pass
+ mimetypes to scheme modules to only fetch content if we can
+ parse the content type
+
+2005-08-12 17:02 arthur
+
+ * [r127] plugins/__init__.py: don't print referenced from if there
+ are no parents
+
+2005-08-12 16:57 arthur
+
+ * [r126] crawler.py: add checkurl method to clean up urls and
+ report problems (currently only checks for spaces in urls)
+
+2005-08-12 16:55 arthur
+
+ * [r125] parsers/html.py: put compiled regular expression on
+ module level so that it is compiled only once
+
+2005-08-12 16:52 arthur
+
+ * [r124] webcheck.css: small fix to render menu better under MSIE
+
+2005-08-11 21:41 arthur
+
+ * [r123] plugins/__init__.py: add some extra information to every
+ link with a nicely formatted size
+
+2005-08-01 17:58 arthur
+
+ * [r122] parsers/html.py: make parsing handle errors a little more
+ gracefully, thanks to Stefan Schr�der <stefan@tokonoma.de> for
+ all the testing
+
+2005-07-31 20:58 arthur
+
+ * [r120] ChangeLog, NEWS, TODO, config.py: get files ready for
+ 1.9.2 release
+
2005-07-31 20:44 arthur
* [r119] parsers/html.py: also catch AttributeError for problem in
@@ -10,7 +90,7 @@
2005-07-31 09:45 arthur
* [r117] parsers/html.py: replace numeric entity refs with their
- proper values based on patch by UNKNOWN
+ proper values based on patch by Eric W.Brown <eric@saugus.net>
2005-07-31 09:21 arthur
diff --git a/NEWS b/NEWS
index 7432a18..a2acbcb 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,19 @@
+changes from 1.9.2 to 1.9.3
+---------------------------
+
+* several improvements to the generated reports, including tooltips with some
+ useful information for the links (does not seem to work very well in
+ firefox)
+* stability improvements to the html parser (thanks to everyone who reported
+ problems) not all problems have been solved but it shouldn't stop webcheck
+ any more
+* reimplementation of the file and ftp modules to read directory contents or
+ read index.html file if present (there are known problems in the ftp module
+ regarding empty directories and recovering from errors)
+* improvements to the url parsing code to warn about spaces in urls
+* only fetch content if we can parse it
+
+
changes from 1.9.1 to 1.9.2
---------------------------
diff --git a/TODO b/TODO
index 381f574..ffa5ef3 100644
--- a/TODO
+++ b/TODO
@@ -1,16 +1,15 @@
before next release
-------------------
* go over all FIXMEs in code
-* rewrite ftp scheme module
-* rewrite file scheme module
+* parse css (see debian bugs?)
+* make sure all characters in urls are properly url encoded (and make it an error in the checked page)
+* close streams properly when not downloading files
probably before 2.0 release
---------------------------
-* parse css
* maybe choose a different license for webcheck.css
* make it possible to copy or reference webcheck.css
-* make it possible to copy http:.../webcheck.css into place (use scheme system)
-* create onmouseover information for links containing useful information for url
+* make it possible to copy http:.../webcheck.css into place (maybe use scheme system, probably just urllib)
* make more things configurable
* make a Debian package
* maybe generate a list of page parents (this is useful to list proper parent links for problem pages)
@@ -19,12 +18,9 @@ probably before 2.0 release
* support for mult-threading (maybe)
* divide problems in transfer problems and page problems (transfer problems result in a bad link problem on a page)
* clean up printing of messages, especially needed for multi-threading
-* rewrite scheme modules to make proper use of new calling method
-* only download complete documents if the mime type is supported
* go over command line options and see if we need long equivalents
* implement a fix for redirecting stdout and stderr to work properly
* put a maximum transfer size for downloading files and things over http
-* make error handling of html parser more robust
wishlist
--------
@@ -33,19 +29,23 @@ wishlist
* maybe set referer (configurable)
* support for authenticating proxies
* new config file format (if we want a configfile at all)
+* support ftp proxies
* cookies support (maybe)
* integration with weblint
* combine with a logfile checker to also show number of hits per link
-* performance and other improvements (we can switch to sets with python 2.4)
* write a guide to writing plugins
-* form checking
-* spelling checking
-* test w3c conformance of pages
-* maybe make broken links not clickable
+* do form checking
+* do spelling checking
+* test w3c conformance of pages (already done a little)
+* maybe make broken links not clickable in report
* maybe store crawled site's data in some format for later processing or continuing after interruption
* create output directory if it does not exist
-* add support for fetching gzipped content
-* write section on internal and external urls in the manual page
+* add support for fetching gzipped content to improve performance
+* maybe do http pipelining
* add a favicon to reports
* add a test to see if python supports https and fail elegantly otherwise
* maybe follow redirects of external links
+* make error handling of html parser more robust (maybe send a patch for html parser upstream)
+* improve tooltips
+* maybe use this as a html parser: http://www.crummy.com/software/BeautifulSoup/examples.html
+* maybe have a way to output google sitemap files: http://www.google.com/webmasters/sitemaps/docs/en/protocol.html
diff --git a/config.py b/config.py
index 8d1a0b5..40d4155 100644
--- a/config.py
+++ b/config.py
@@ -27,7 +27,7 @@ items should be changeble from the command line."""
import urllib
# Current version of webcheck.
-VERSION = "1.9.2"
+VERSION = "1.9.3"
# The homepage of webcheck.
HOMEPAGE = "http://ch.tudelft.nl/~arthur/webcheck/"
diff --git a/webcheck.1 b/webcheck.1
index fcf7842..2db6efe 100644
--- a/webcheck.1
+++ b/webcheck.1
@@ -15,7 +15,7 @@
.\" Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
.\" .nh
.\"
-.TH "webcheck" "1" "Jul 2005" "Version 1.9,0" "User Commands"
+.TH "webcheck" "1" "Jul 2005" "Version 1.9,3" "User Commands"
.nh
.SH "NAME"
webcheck \- website link checker