Arthur de Jong

Open Source / Free Software developer

summaryrefslogtreecommitdiffstats
path: root/NEWS
blob: a45f6882515f91d526ac37b7a04eb0bbb80e95ce (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
changes from 1.9.5 to 1.9.6
---------------------------

* SECURITY FIX: a cross-site scripting vulnerability with content in the
                tooltips of generated report was fixed by properly escaping
                all output
* urls are now url encoded into a consistent form, solving some problems with
  urls with non-ascii characters
* no longer remove unreferenced redirects
* more debugging info in debug mode
* more fixes for escaping in generated reports and more support for sites in
  different character sets


changes from 1.9.4 to 1.9.5
---------------------------

* about page now has some more useful information
* proxy authentication is implemented
* fix for using relative paths as output directory
* add support for parsing html documents in different encodings
* ensure that all generated html output is properly escaped
* implemented --internal option to flag internal URLs with regular expressions
* documentation improvements
* several bugfixes to get webcheck more robust
* included fancytooltips by Victor Kulinski to have nicer tooltips
* generated reports now have friendlier messages for when there is nothing to
  report
* there is a Debian package


changes from 1.9.3 to 1.9.4
---------------------------

* split problems into link problems (errors retrieving the document) and page
  problems (parsing errors, wrong links, etc)
* some fixes and improvements to the layout of the generated pages
* redirect loops are now detected
* transfer result status is now stored
* addition of a limited css parser that handles imports and url() entries
* support reading file names for checking from the command line (turning them
  into file:// urls internally)
* better error handling of problems writing generated pages and check that we
  are not overwriting input files


changes from 1.9.2 to 1.9.3
---------------------------

* several improvements to the generated reports, including tooltips with some
  useful information for the links (does not seem to work very well in
  firefox)
* stability improvements to the html parser (thanks to everyone who reported
  problems) not all problems have been solved but it shouldn't stop webcheck
  any more
* reimplementation of the file and ftp modules to read directory contents or
  read index.html file if present (there are known problems in the ftp module
  regarding empty directories and recovering from errors)
* improvements to the url parsing code to warn about spaces in urls
* only fetch content if we can parse it


changes from 1.9.1 to 1.9.2
---------------------------

* complete reimplementation of the html and http modules
* added https support
* some spelling and typo fixes contributed by several people
* site map now does a proper breadth first traversal of the site structure
* webcheck homepage has been changed to http://ch.tudelft.nl/~arthur/webcheck/
* several minor bugfixes and tweaks


changes from 1.9.0 to 1.9.1
---------------------------

* ship an empty css.py to actually run
* small bugfixes for pages with multiple titles and slow plugin


changes from 1.0 to 1.9.0
-------------------------

* maintainership transferred to Arthur de Jong
* major structural rewrites of crawling code and plugin structure
* the documentation was combined and partially rewritten in the README for
  installation instructions and the manual page for usage information
* changed output to no longer use frames and produce valid XHTML 1.1 and use
  CSS for layout
* config.py is no longer really a configuration file


changes from 1.0b10 to 1.0
--------------------------

+ Don't send accept headers, as they weren't valid.
+ WARN_OLD_VERSION no longer works, until I decide what to do about it.
+ Named changed to webcheck.
+ Fixed typos in INSTALL.
+ Changes so it works with python 2.0.


changes from 1.0b9 to 1.0b10
----------------------------

b Fixed bug when server redirects to a document in robots.txt (does not show
  up as broken (hopefully))
+ Filename mangling in filelink.py to help OS/2 (and Win32) (Patch submitted
  by Steffen Siebert)
+ Added WARN_OLD_VERSION config.py option. If this option is set to true (the
  default) Linbot will check it's version number and the version numbers of
  it's plugins against a global registry on the Net. If it finds that a
  version is not the latest, it will print a warning on the reports along with
  a link you can follow to download the latest version. I think it's neat. You
  might find it annoying.
+ Added preliminary support for authenticating proxies, though it does not
  work correctly yet.
+ Added -r (redirect depth) and REDIRECT_DEPTH option in config.py to indicate
  the amount of redirects Linbot should follow when following a link. Thanks
  to Andrea Glorioso for the patch.
+ Added debugio module that handles debugging and I/O
+ Added -q (quiet option). Use it to suppress output
+ Added -d (debug) option and DEBUG_LEVEL variable in config.py for debugging
+ added version module and removed __version__ and __author__ from all the
  modules (except plugins).
b Fixed bug in Linbot using putrequest() instead of putheader() when
  requesting header information. Thanks to Andrea Glorioso for fixing this
  glitch (and Seth Chaiklin for noticing).


changes from 1.0b8 to 1.0b9
---------------------------

+ If you use the -o command-line option or the OUTPUT_DIR config file option
  and the directory does not exist, linbot will create it for you (provided
  that it has the correct permissions, etc.). Thanks to Andrea Glorioso for
  this feature.
+ Added a CREDITS file and probably left a lot of people out. If you think you
  should be in it let me know.
b Linbot will now report to the server that it can accept any MIME type (found
  in mimetypes.py. This should fix the "406: No acceptable objects found"
  error that some servers report.
b Linbot correctly identifies itself as "Linbot <version>" on HEAD requests as
  well as GET requests.


changes from 1.0b6 to 1.0b8
---------------------------

b Fixed bug when no images are reported for documents having 0 links If you
  don't know what this means it probably wasn't a problem for you.
b Fixed code that was messing with arguments passed via -x and -y and caused
  unexpected results and/or errors.
b -b flag should work this time (for real)
b Cosmetic changes (reports didn't look the way I thought they should in IE4.
  (and may not still as I haven't had a chance to check it yet)
b Linbot won't follow infinite redirects (currently hardcoded to max of 5
  redirects per document)


changes from 1.0b5 to 1.0b6
---------------------------

+ Minor change in ftplink.py should allow better ftp link checking
+ You can now press CTRL-C (or whatever your operating system supports) to
  break out of a linbot run. However, the work linbot does is not saved (yet).
b Fixed problem when server redirects a URL to itself. This fix seems to work
  for most servers I've tried but there are a few more out there that I need
  to take a look at.
b Fixed bug that caused linbot to not check for yanked URLs
+ Added -l command-line option. Usage: -l <url> where <url> is a url pointing
  to an image to be used as the report's logo.
b "patched" strings.py so that it can better parse html files created in 
   Windows/DOS (I think).
+ Made report LOGO a link to the base url
+ httplink does not HEAD a redirected URL if it is already in the link list
  (performance improvement)
- Removed LOGO_ALT from config.py
+ Changed my email address to marduk@python.net. The official home page of
  Linbot will probably also change with the next release so stay tuned.


changes from 1.0b4 to 1.0b5
---------------------------

+ Added a contrib directory. Right now it just contains the about plugin.
  Other plugins will be included if people contribute them. Also, the man page
  will return once I have updated it. Those ugly buttons are obsolete.
+ Linbot now "inlines" stylesheets. This has the benefits of 1) better support
  of Netscape browsers (so I hear) and 2) I don't have to document to put
  linbot.css in the output directory since it grabs it from starship 8*)
b Handling of error for when robots.txt cannot be retrieved.
+ Malformed urls are trapped (sorry, I had that commented out)
b FTP link handling is totally rewritten. Fortunately it shouldn't crash
  anymore Unfortunately it doesn't really work reliably and probably never
  will. See README.ftp for details.
b Two bugs in HTTP proxy handling made it almost completely unusable, though
  conveniently seemed to cancel each other out when I was testing.
b Too many files error on large sites should be fixed. Thanks to Andrew
  Kuchling et al for suggestions.
b Bug when some servers erroneously report (or don't report) Content-Length
  header fixed.