Washington Apple Pi

A Community of Apple iPad, iPhone and Mac Users

WebChecker and the herding of URLs

by David L. Harris

Washington Apple Pi Journal, pp. 34-36, January/February 1999, reprint information

I maintain the Washington Apple Pi's list of Apple User Group Web sites on the Pi's Web site. It has more than 550 links, and as anyone who uses the Web a lot knows, links can be ornery creatures and change or vanish overnight. I try to keep the list up to date, but cannot visit all of them each month. I do visit about half every month, but I have been relying on Big Brother and WebArranger (see my articles in the Nov/Dec 1996, Jan/Feb 1997, and May/Jun 1998 issues of the Journal) to monitor whether the other links are still valid. Unfortunately there are problems using both of these programs to find invalid links. Some of the error messages given by Big Brother are incorrect, and it will not notice if a link has been replaced by something else (such as a notice that the original page is gone). I use WebArranger to look for changed pages, such as might occur if a page has become such a notification. But WebArranger is slow, and requires interaction with the user when it finds a changed page. In addition, I suspect it of "losing" URLs that I feed it -- with more than 550 of them it's hard to remember if I forgot to give it one, but on many occasions I have found links missing that I'm "sure" I entered previously. So when the shareware WebChecker, whose job it is to check for changes in Web pages, arrived, I thought I would see if it does a better (and faster) job than WebArranger. And yes, I did pay the shareware fee.

WebChecker to the rescue?

In short, no. You can skip the rest of this article if you don't want the details, or if you don't want to check for yourself whether I investigated sufficiently.

WebChecker comes in PowerMac and 68K Mac flavors; I use the 68K version on my Performa 475 using System 7.5.5. The most recent version (which I downloaded) is supposed to be 1.2.1 but doing a Get Info on mine shows 1.1.1. I found a number of problems with it, some of which seemed to have successful work-arounds. For example, you can create "groups" of URLs for WebChecker to check, but each group can contain only somewhat more than 100 URLs. I need more than 500. Additional groups can be created, but I found that when this was done, URLs added to the second group didn't "stick" -- although they appeared to be saved, the next time WebChecker started, they were invisible. I solved both these problems by creating five different WebChecker Bookmarks files, hiding all but the one currently in use. (That one must be in the same folder with WebChecker.) Each such file has more than 100 URLs.

Although the File menu seems to have an option for importing URLs, it is not documented in the DOCMaker manual that comes with WebChecker, and I was unsuccessful getting it to work. URLs may be entered manually, by dragging, or when your browser has opened a site that you want to add. Most of mine, as with WebArranger, were entered manually.

WebChecker in action 9/13

My first trials of WebChecker were promising. It is much faster than WebArranger, and runs through an entire set of URLs without stopping. Once finished it arranges them in groups according to what results it has obtained. Figure 1 shows its screen while checking (notice the circling arrows), and Figure 2 shows how it arranges URLs of like results together after finishing. Status icons and Comments show the results. Netscape icons (you can tell WebChecker which browser to use) supposedly show a page that has been updated. Green check marks indicate you have previously visited a page, while clocks indicate a timeout (you can set the interval WebChecker will look before returning a timeout message), and exclamation marks inside of diamonds (they don't show well in grayscale) indicate other errors. Some pages return an "Invalid header date" in the Comments field. Both WebChecker and WebArranger query the Web site to see if a page has been modified, although perhaps not in the same way. Sometimes that information is not available. Both programs inform you if this is the case.

WebChecker arranges results 9/13

I found that many of WebChecker's error messages were invalid. Often a page labelled as not existing or with some other error was found instantly when WebChecker told Netscape to visit a page. Returning to WebChecker after a visit, the offending page was usually then shown as having been visited, but at times no matter on how many occasions it was successful in going to the page, WebChecker insisted it still didn't exist. This was the case even when extended page checking (supposed to reduce errors) was enabled in WebChecker's Preferences. However, WebChecker is not alone among these programs at producing false error messages. The remedy in each case is to tell the program to get the browser to actually visit a site.

Ready to visit Verde Valley 11/21

I experienced a much more serious problem when I used WebChecker at later times to see if pages had changed. I may be missing something basic, but it appears from my results that the date and time WebChecker shows a page as having been modified is instead when WebChecker itself first visited there. These dates do not change even when you find a page that you know has changed. For instance, one page that WebChecker found unchanged, when visited turned out to be a notice that the original page wasn't there, and pointing to a new one. The document had not even been created when WebChecker first visited that URL. (See Figure 3, where I am about to visit the highlighted page, and 4, that page seen with Netscape.) This in spite of WebChecker's Preferences having been set to recheck every item whether or not it had been previously checked. I looked at the source HTML for the Verde Valley page in Figure 4 and found that it had been created on October 4. Other pages where WebChecker reported no change were found by WebArranger to have been modified.

Verde Valley: how you've changed 11/21

Notice in Figure 2 that of the four sites shown as having been updated, three of them seem to have been done on the same day within one minute of each other! Unusual cooperation for Webmasters as far apart as Scandinavia and Singapore. I think instead those are the times WebChecker first looked at those pages. To confirm that my WebChecker is not seeing changes in Web pages, compare the listings for MacinSand and for Stavanger MUG in Figure 2, taken when WebChecker was run on September 13, with those in Figure 5, taken November 28. Looking at the source HTML and the pages themselves with Netscape on November 28 confirms that these pages have indeed changed since September 13.

Have these pages changed? 11/28

Summary

It is possible that I don't understand something vital to the operation of WebChecker. I have e-mailed its author asking for some help with this, but so far haven't gotten a reply. (I did get a reply to an earlier query.) But as of now it seems that at least the 68K version of WebChecker simply doesn't do the job it is designed to do. It looks as though I will have to stick with WebArranger for now.

*


Return to electric pi

Revised May 2, 1999 Lawrence I. Charters
Washington Apple Pi
URL: http://www.wap.org/journal/