.\" Copyright (C) 2005, 2006 Arthur de Jong .\" .\" This program is free software; you can redistribute it and/or modify .\" it under the terms of the GNU General Public License as published by .\" the Free Software Foundation; either version 2 of the License, or .\" (at your option) any later version. .\" .\" This program is distributed in the hope that it will be useful, .\" but WITHOUT ANY WARRANTY; without even the implied warranty of .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the .\" GNU General Public License for more details. .\" .\" You should have received a copy of the GNU General Public License .\" along with this program; if not, write to the Free Software .\" Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA .\" .nh .\" .TH "webcheck" "1" "Jan 2006" "Version 1.9.6" "User Commands" .nh .SH "NAME" webcheck \- website link checker .SH "SYNOPSIS" .B webcheck .RI [ OPTION ]... .I URL .SH "DESCRIPTION" \fBwebcheck\fP will check the document at the specified URL for links to other documents, follow these links recursively and generate an HTML report. .TP .BI "\-i, \-\-internal=" "PATTERN" Mark URLs matching the .I PATTERN (perl\-type regular expression) as an internal link. Can be used multiple times. Note that the PATTERN is matched against the full URL. URLs matching this PATTERN will be considered internal, even if they match one of the \-\-external PATTERNs. .TP .BI "\-x, \-\-external=" "PATTERN" Mark URLs matching the .I PATTERN (perl\-type regular expression) as an external link. Can be used multiple times. Note that the PATTERN is matched against the full URL. .TP .BI "\-y, \-\-yank=" "PATTERN" Do not check URLs matching the .I PATTERN (perl\-type regular expression). Like the \-x flag, though this option will cause webcheck to not check the link matched by regex whereas \-x will check the link but not its children. Can be used multiple times. Note that the PATTERN is matched against the full URL. .TP .B \-b, \-\-base\-only Consider any URL not starting with the base URL to be external. For example, if you run .ft B webcheck \-b http://www.example.com/foo .ft R .br then http://www.example.com/foo/bar will be considered internal whereas http://www.example.com/ will be considered external. By default all the pages on the site will be considered internal. .TP .B \-a, \-\-avoid\-external Avoid external links. Normally if webcheck is examining an HTML page and it finds a link that points to an external document, it will check to see if that external document exists. This flag disables that action. .TP .B \-q, \-\-quiet, \-\-silent Do not print out progress as webcheck traverses a site. .TP .B \-d, \-\-debug Print debugging information while crawling the site. This option is mainly useful for developers. .TP .BI "\-o, \-\-output=" "DIRECTORY" Output directory. Use to specify the directory where webcheck will dump its reports. The default is the current directory or as specified by config.py. If this directory does not exist it will be created for you (if possible). .TP .B \-f, \-\-force Overwrite files without asking. .TP .BI "\-r, \-\-redirects=" "N" Redirect depth. the number of redirects webcheck should follow when following a link. 0 implies to follow all redirects. .TP .BI "\-w, \-\-wait=" "SECONDS" Wait .I SECONDS between document retrievals. Usually webcheck will process a url and immediately move on to the next. However on some loaded systems it may be desirable to have webcheck pause between requests. This option can be set to any non\-negative number. .TP .B \-v, \-\-version Show version of program. .TP .B \-h, \-\-help Show short summary of options. .SH "URL CLASSES" URLs are divided into two classes: .B Internal URLs are retrieved and the retrieved item is checked for syntax. Also, the retrieved item is searched for links to other items (of any class) and these links are followed. .B External URLs are only retrieved to test whether they are valid and to gather some basic information from them (title, size, content-type, etc). The retrieved items are not inspected for links to other items. Apart from their class URLs can also be considered .B yanked (as specified with the \-\-yank or \-\-avoid\-external options). The URLs can be either internal or external and will not be retrieved or checked at all. URLs of unsupported schemes are also considered yanked. .SH "EXAMPLES" Check the site www.example.com but exclude any path with "/webcheck" in it. .ft B webcheck http://www.example.com/ \-x /webcheck .ft R .SH "NOTES" When checking internal URLs webcheck honours the robots.txt file, identifying itself as user-agent webcheck. Disallowed links will not be checked at all as if the -y option was specified for that URL. To allow webcheck to crawl parts of a site that other robots are disallowed, use something like: .ft B User-agent: * Disallow: /foo User-agent: webcheck Allow: /foo .ft R .SH "ENVIRONMENT" .TP .BI _proxy Proxy url for . .SH "REPORTING BUGS" Bug reports shoult be sent to the current maintainer . More information on reporting bugs can be found on the webcheck homepage: .br http://ch.tudelft.nl/~arthur/webcheck/ .SH "COPYRIGHT" Copyright \(co 1998, 1999 Albert Hopkins (marduk) .br Copyright \(co 2002 Mike W. Meyer .br Copyright \(co 2005, 2006 Arthur de Jong .br webcheck is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. .br The files produced as output from the software do not automatically fall under the copyright of the software, unless explicitly stated otherwise.