| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This plugin generates a simple CSV file with all the URLs in the system
and some basic information about them.
|
| |
|
|
|
|
|
| |
This splits some common functionality from Link._get_child() and
Crawler.get_link() to the new Link.get_or_create() function.
|
|
|
|
|
| |
This should make some functions clearer and marks internal functions
with a leading underscore.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
This converts problems to unicode so they can be stored correctly by
SQLAlchemy. This amongst other things fixes a problem when the web
server returns a status message with non-ASCII characters.
|
|
|
|
|
| |
This fixes an issue for calling tidy when the character encoding of the
page could not be determined.
|
|
|
|
|
| |
This makes the script executable, adds copyright headers and ensures
that all needed files are installed and shipped in the source package.
|
|
|
|
|
| |
This moves all static files to be installed into the webcheck Python
path and uses pkg_resources to load the files.
|
| |
|
|
|
|
| |
Reported by Emmanuel Blot, fixes 24e191f.
|
|
|
|
|
| |
Packaging will be moved to the Debian Python Applications Packaging Team
(PAPT) repository.
|
|
|
|
|
|
| |
This updates the README, HACKING and other documentation to be more in
line with the current software set-up. This also updates the TODO list
with current changes.
|
|
|
|
|
| |
This tries to gracefully support older versions of Jinja that don't
provide the trim_blocks, lstrip_blocks or keep_trailing_newline options.
|
|
|
|
|
|
|
| |
This combines two queries using a union that already does distinct.
This also removes the distinct from the parents() function because it
uses a union which is supposed to use distinct already.
|
|
|
|
|
|
|
|
| |
Exposing crawler.bases leaks the sqlalchemy session to the plugins which
seems to cause problems in some cases.
As a consequence of this change, the sitemap plugin now uses its own
session.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This also fixes newlines in link meta information that were incorrectly
escaped.
|
|\
| |
| |
| |
| |
| |
| |
| | |
The switch to Jinja removes the need for custom escaping and Python code
to write HTML output and instead uses easy to read templates.
As a result of the switch, this drops more than 450 lines of Python code
while adding a little over 400 lines of HTML template code.
|
| |
| |
| |
| |
| | |
Most of this is removed because of the switch to the Jinja template
engine.
|
| |
| |
| |
| |
| |
| | |
The sitemap module has been somewhat rewritten to use generators to
provide the structure of the website. The problems module has also been
simplified a bit.
|
| | |
|
| |
| |
| |
| |
| | |
This sets up the basic layout for the report. The plugins are expected
to supply a crawler instance.
|
|/
|
|
|
|
| |
This uses the Jinja template engine to produce the report HTML files.
This also renames the util module to output to better describe its
purpose.
|
|
|
|
|
| |
Write output using codecs.open() with the UTF-8 encoding. This also
introduces a consistency improvement in argument naming.
|
|
|
|
|
| |
This tries to close the session when the function is done with it to
avoid using too much memory.
|
|
|
|
|
|
| |
This changes the constructor to accept a dict configuration of the
crawler. This is currently combined with the configuration in the config
module but the goal is to replace it completely.
|
|
|
|
| |
This avoids having module loading code in different places.
|
| |
|
|
|
|
| |
This greatly simplifies the command line parsing.
|
| |
|
|
|
|
|
|
| |
with unicode
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@471 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
|
|
| |
webcheck/__init__.py
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@470 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
| |
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@469 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
| |
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@468 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
| |
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@467 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
|
|
| |
webcheck
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@466 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
| |
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@465 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
| |
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@464 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
|
|
| |
pages
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@463 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
| |
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@462 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
| |
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@461 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
| |
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@460 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
|
|
| |
crawling based on a patch by Devin Bayer
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@459 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|
|
|
|
| |
git-svn-id: http://arthurdejong.org/svn/webcheck/webcheck@458 86f53f14-5ff3-0310-afe5-9b438ce3f40c
|