Extendible website checking tool for webmasters with
a simple report output.
News
2010-09-11 release 1.10.4 of webcheck
This is more or less a maintenance release that gathers some outstanding
fixes.
An overview of the changes since the last release:
remove some left-over debugging code
several small bugfixes which more or less drop support for Python 2.3
limit "referenced from" list to 10 items
pass char_encoding option to tidy to fix some tidy-related errors
add a Referer header if possible (thanks Devin Bayer)
2009-06-14 webcheck homepage moved
Since I haven't been studying at the Delft University for quite some time
the webcheck homepage has been moved to
https://arthurdejong.org/webcheck/.
The contact email address has also been changed to arthur@arthurdejong.org.
The subversion repository and viewvc URLs have also changed (see the
downloads section for details).
If you were using the svn repository before you can do
to relocate your working copy.
In the meantime, webcheck migrated to Git and the Subversion
repository will no longer be updated and likely go away at some point, see
the downloads page for more information.
2008-07-19 release 1.10.3 of webcheck
This is a minor update release that fixes some smaller outstanding
issues and adds a couple of new features.
An overview of the changes since the last release:
support <iframe> and some common usages of <object>
fix bug in command-line parsing of short -r option
implement the --userpass option to pass username and password
information to specific sites based on a patch by Chris Shenton
handle errors while parsing more gracefully
add parsing of <script> tag and background attributes,
based on a patch by Robert M. Jansen
fix in parsing <style> tags and support style attributes
call tidy (if available) on HTML content, based on a patch by Henning
Sielaff
2007-11-04 release 1.10.2 of webcheck
This is a minor update release that fixes some smaller outstanding
issues.
An overview of the changes since the last release:
add checking for bug in BeautifulSoup and issue warning if bug is
found
added support for Python 2.3 (alhough more recent versions of Python
are recommended)
2007-07-15 release 1.10.1 of webcheck
This release includes some big performance improvements
(especially for very large sites) as well as some small bug
fixes. This release requires Python 2.4 or more recent to work.
An overview of the changes since the last release:
some extra Unicode handling precautions
fix problem in reading webcheck.dat for non-ASCII text
be more verbose about HTTP retrieval failures
split out URL normalization code into own module and do some basic
protocol-specific normalizations
a number of big performance improvements
fix a bug in handling some zero-size pages
parse http-equiv meta HTML header to parse refresh option
2007-05-12 release 1.10.0 of webcheck
This release changes the HTML parser under the hood (hence the version bump) and
includes some further general improvements.
An overview of the changes since the last release:
switched HTML parsing to using BeautifulSoup with a fall-back
mechanism to the old HTMLParser based solution
the new parser is much more error-tolerant but is reportedly somewhat
slower and does not include line numbers in errors
new features will likely only be added to the new parser
some small improvements to the output to make it XHTML 1.1
compliant
internal improvements for handling Unicode strings
better support for parsing <applet> tags and anchors
using id attributes
re-enable robots.txt parsing that was disabled in 1.9.8 and
add an --ignore-robots option
2007-01-15 release 1.9.8 of webcheck
This is a long overdue development release that should mainly include some
stability improvements.
An overview of the changes since the last release:
some checks for properly handling unknown and wrong encodings have been added
added proper error handling for SSL related socket problems (exceptions are
not a subclass of regular socket exceptions)
a bugfix for urls that contain a user name without a password or the other
way around
2006-07-02 release 1.9.7 of webcheck
This is another development release that should improve stability but also adds some
new functionality.
Any feedback is still very much appreciated (thanks for all the feedback I
already got).
An overview of the changes since the last release:
site data is now stored to a file while crawling the site, this can be used to resume a crawl with the --continue option and for debugging purposes
implemented checking of link anchors
small improvements to generated reports (favicon included, css fix)
documentation improvements
properly handle float values for --wait
unreachable sites will time out faster
added support for plugins that don't output html
half a dozen other small bugfixes (stability fixes, code cleanups and improvements)
2006-05-08 svn access available
Public read-only access to the webcheck subversion development
repository is now available. The repository is also browsable through
viewcvs. More details in the downloads section.
svn access: http://arthurenhella.demon.nl/svn/webcheck/
viewcvs: http://arthurenhella.demon.nl/viewvc/webcheck/
The development repository currently includes a number of improvements
over release 1.9.6 including support for outputting different file
types in the report, some minor stability fixes, a transfer timeout and
checking for anchors in pages.
2006-01-30 release 1.9.6 of webcheck (security update)
This release fixes a cross site scripting vulnerability.
Content from crawled pages was insufficiently escaped in the tooltips of
the generated report.
A carefully crafted url, title or author name could allow a website
operator to insert html code into the generated report.
Users of webcheck 1.9.5 are urged to upgrade to this release.
The CVE project has assigned id
CVE-2006-1321
to this problem.
Further improvements to stability were also made.
Thanks for all the bugreports that help improve webcheck (more
feedback is always appreciated).
Changes since release 1.9.5:
a cross-site scripting vulnerability with content in the tooltips of
generated report was fixed by properly escaping all output
urls are now url encoded into a consistent form, solving some problems
with urls with non-ascii characters
no longer remove unreferenced redirects
more debugging info in debug mode
more fixes for escaping in generated reports and more support for
sites in different character sets
2005-12-30 release 1.9.5 of webcheck
This is another development release that should improve stability somewhat but
also has some new functionality.
Any feedback is still very much appreciated (thanks for all the feedback I
already got).
An overview of the changes since the last release:
about page now has some more useful information
proxy authentication is implemented
fix for using relative paths as output directory
add support for parsing html documents in different encodings
ensure that all generated html output is properly escaped
implemented --internal option to flag internal URLs with
regular expressions
documentation improvements
several bugfixes to get webcheck more robust
included
fancytooltips
by Victor Kulinski to have nicer tooltips
generated reports now have friendlier messages for when there is
nothing to report
there is a Debian package
2005-09-03 release 1.9.4 of webcheck
This is another development update that introduces some new functionality.
There were some small stability improvements but no need to fix any major
bugs.
Any feedback is still very much appreciated.
An overview of the changes since the last release:
split problems into link problems (errors retrieving the document) and
page problems (parsing errors, wrong links, etc)
some fixes and improvements to the layout of the generated pages
redirect loops are now detected
transfer result status is now stored
addition of a limited css parser that handles imports and
url() entries
support reading file names for checking from the command line (turning
them into file:// urls internally)
better error handling of problems writing generated pages and check
that we are not overwriting input files
2005-08-16 release 1.9.3 of webcheck
This release introduces some more rewritten part as well as some
bug and stability fixes. These releases are still more development
snapshot than real releases although they should be usable.
Please report any problems, ideas and/or improvements.
An overview of the changes since the last release:
several improvements to the generated reports, including tooltips with
some useful information for the links (does not seem to work very well in
firefox)
stability improvements to the html parser (thanks to everyone who
reported problems) not all problems have been solved but it shouldn't stop
webcheck any more
reimplementation of the file and ftp modules to read directory
contents or read index.html file if present (there are known problems in
the ftp module regarding empty directories and recovering from
errors)
improvements to the url parsing code to warn about spaces in urls
only fetch content if we can parse it, based on the content type
2005-07-31 release 1.9.2 of webcheck
This is another development release of webcheck with some more
structural changes. Please report any problems.
An overview of the changes since the last release:
complete reimiplementation of the html and http modules
added https support
some spelling and typo fixes contributed by several people
site map now does a proper breadth first traversal of the site structure
several minor bugfixes and tweaks
2005-07-29 webcheck homepage moved
Since the old server is no longer available the webcheck homepage has been moved
to http://ch.tudelft.nl/~arthur/webcheck/. As a consequence the new
contact email address has been changed to arthur@ch.tudelft.nl.
The old homepage and email address no longer work.
2005-07-25 release 1.9.1 of webcheck
This is a quick fix for a showstopper in release 1.9.0.
An overview of the changes since the last release:
ship an empty css.py to actually run
small bugfixes for pages with multiple titles and slow plugin
2005-07-24 release 1.9.0 of webcheck
This is the first release of webcheck from my hand.
Some major parts have been rewritten and some other parts have yet to be
rewritten so this is a development release. Some parts of the system
do not work 100% yet (e.g. there are known problems with ftp) but these are
worked on. Please send feedback on problems and wishes.
The goal for now is to work towards a stable 2.0 release.
An rough overview of the changes since the 1.0 release on which this
release is based on:
integrated several patches from
Debian and
Ubuntu packages
major rewrite of website crawling code allowing for easier change
of request model (e.g. multi-threaded crawling)
documentation has been rewritten, including a new manual page
clean up of config.py which isn't really a configuration file any
more
complete rewrite of output now generating valid XHTML 1.1 with CSS
for style information