before next release ------------------- * go over all FIXMEs in code (ftp) * follow redirects (to a limit) of external sites * -U, --user-agent=AGENT identify as AGENT instead of webcheck VERSION probably before 3.0 release --------------------------- * support for multi-threading (use -t, --threads as option) * implement a maximum transfer size for downloading * support ftp proxies * support proxying https traffic * option to only force overwrite generated files and leave static files (css, js) alone * implement a --html-only option to not copy css and other files * check for missing encoding (report problem) * for FTP: don't fail if SIZE is not allowed * record with which parameters webcheck was started wishlist -------- * make code for stripping last part of a url (e.g. foo/index.html -> foo/) * integration with weblint * do form checking of crawled pages * do spelling checking of crawled pages * add support for fetching gzipped content to improve performance * maybe output a google sitemap file: http://www.google.com/webmasters/sitemaps/docs/en/protocol.html * maybe trim titles that are too long * maybe check that documents referenced in tags are really images * use gettext to present output to enable translations of messages and html * maybe report on embedded content that is external * present an overview of problem pages: "100 problems in 10 pages" (per author) * check of email addresses that they are formatted properly and check that host part has an MX record (make it a problem for no record or only an A record) * present age for times long ago in a friendlier format (.. days ago, .. months ago, .. years ago) * maybe unescaped spaces aren't always a real problem (e.g. in mailto: urls) * maybe give a warning for urls that have non-ascii characters * maybe fetch and store description and other meta information about page (keywords) (just like author) * output scan took so long * maybe also add robots.txt to urllist if fetched successfully * support CSS encoding: http://www.w3.org/International/questions/qa-css-charset * webcheck does not give an error when accessing http://site:443/ ?? * look into python-spf to see how DNS queries are done * implement an option to ignore problems on pages (but do consider internal, etc) (e.g. for generated or legacy html) * add support for robots meta tag: http://www.robotstxt.org/wc/meta-user.html * only report multiple definitions of a single anchor once * warn if URL contains unencoded characters * see section 6 of rfc3986.txt for URL comparison (esp. 6.2.2.) * implement paging for huge reports * output timing information on scan (e.g. scan took 30 minutes)