HTML Scan

The History of HTML Scan.

For those upgrading, here is a list of the recent changes:

1.20 - Released 08-Jun-98:
- Added some support for HTML-4 entities.
- HTMLScan now works properly if no task is registered for throwback.
1.19 - Released 23-Jun-97:
- Modification to hopefully allow a larger range of JPEG files to be processed by HTMLScan. It has not been tested with progressive JPEGs yet to see if processing of these is successful.
- Stopped stupidly adding unnecessary carriage returns when inserting image dimensions.
1.18 - Released 03-Apr-97:
- Removed bug which caused unnecessary warnings if an anchor tag contained both "href=" and "name=" attributes.
v1.17 - (released 05-Feb-97):
- Fixed bug which involved pages whose local references started with the "/" character.
v1.16 - (released 23-Jan-97):
- Added knowledge about mailto: and gopher: directives, so these are no longer flagged as warnings (as long as they are in lower case).
- References to things in directories with cgi-bin in their paths are treated less severely.
- References to directories are now treated more sensibly. However errors involving the directory not being present are more likely to cause HTMLScan internal problems.
- Made entity checking case sensitive and removed an illegal entity or three.
v1.15 - (released 14-Dec-96):
- Throwback implemented.
v1.14 - (released 25-Nov-96):
- Fixed problems with the <a name=name> construct which has no </a>a ending tag. HTML Scan now knows this.
v1.13 - (released 21-Nov-96):
- Characters such as ", &, >, and < now have their entity equivalent indicated by HTML Scan when they are found.
v1.12 - (released 20-Nov-96):
- Added a huge list of entities and options for HTML Scan to check all the entities in the document for ones that are not known to it.
- Characters such as ", &, >, and < are now queried as they would be better expressed as entities.
- Problems with the <FORM> tag resolved.
v1.11 - (released 13-Nov-96):
- Incorrect command-line options in Desc file changed.
- Problems with the <QUOTE> tag resolved.
v1.10 - (released 10-Nov-96):
- HTML Scan now copes with files whose paths are not in quotation marks provided the path name stays in the restricted case available when quotes are not used, i.e. 0-9, A-Z, a-z, '.' and '/'.
- A dump at the end of the scan of any unmatched tags is now made. This should make the task of tracking down unclosed tags easier.
- More checking is now performed on <CENTER> and <QUOTE> tags.
v1.09 (released 08-Nov-96):
- Added the extended command line functionality provided by Acorn's DDEUtils module to the program.
- Changed the internal format of the storage of tags internally to make it easier to add new tags. This should make tracing backwards through the tag-stack to find a tag matching a missing one easier to implement.
- Added dozens of new tags found during my research for ZapHoTMeaL.
- <META> and <TITLE> tags are now only allowed in the header.
- <TT> tag added to list queried if strict checking is enabled.
v1.08 - (released 20-Oct-96):
- Added ability to follow href=s in anchor tags.
- Added switch to control the above feature off.
- Added checking to background= parameters of <BODY> tags.
- Added switches to control the reporting of non-local href=s, src=s and background=s.
- Strict checking now includes warnings about missing alt= parameters in image tags, and missing text=, bgcolor=, link=, vlink= and alink= parameters in <BODY> tags which use background images.
- Added switch to make Be very strict mode optional.
- Cured bug causing occasional failure to find 'src=' files if dozens of them had already failed to be located.
- Tidied up a number of the reported messages.
- More options have been changed to their opposites. Sorry if this causes angst amongst users who are using batchfiles. Once more, this makes the command-line syntax more sensible.
- <HTML>, <BODY> and <TITLE> now all need to be missing for a fatal error to be generated. This is now trying to be especially kind to errant files with poor headers.
v1.07 - (released 13-Oct-96):
- Now queries and tags as some people have requested a strict mode where these tags are faulted as being too specific in their nature, with and tags being recommended as replacements.
- -v [verbose] option changed to its opposite -q [quiet] partly to benefit command-line users, and partly in order to reduce the length of the command line call which can cause problems if both your !Scrap directory and the files being scanned are buried deep in the directory structure.
- Added start up message to tell people that the program is alive and well.
- Added Processing file line to output so when processing multiple files, accessing the command line is not needed when trying to find out which output window relates to which file.
- Both the <BODY> and <HTML> tags now need to be missing for a fatal error to be generated. This is now more in line with the specification for HTML.
- Mismatched tags at end of file warning replaced by a more specific message with the number of tags involved listed.
v1.06 - (released 06-Oct-96):
- Corrected problems with some GIFs giving Unable to locate expected comma in GIF file errors.
v1.02 - (released 14-Sep-96)
- Corrected messages to remain agnostic with respect to differing conventions.
- Corrected bug associated with images higher in the directory tree than the source HTML file (i.e. paths with ../ structures).
- Template file made more conventional by filling in its buttons.
v1.01 - (released 04-Sep-96)
- Added support for <CENTER>, and tags.
- Used Squeeze instead of proprietary compression because of the possibility of StrongARM related problems.
v1.00 - (released 01-Aug-96)
- The first version.

tim@tt1.org | http://mandala.co.uk/