Washington Apple Pi

A Community of Apple iPad, iPhone and Mac Users

Rage Sitemap Automator

© 2008 Lawrence I. Charters

Washington Apple Pi Journal, reprint information

Rage Sitemap Automator (formerly Google Sitemap Automator) is a neat little one-trick-pony: it creates XML sitemaps of Web sites. This sounds horribly obtuse, but it is important to note that it automates creating such sitemaps; you, the user, don’t have to know all that much. And sitemaps are very useful things to have, as they greatly increase your Web site’s visibility in Web search engines. But first, some background.

In the beginning

Once upon a time, Tim Burners-Lee, the guy who invented the World Wide Web, maintained a list of all the world’s Web servers. He maintained the list by hand. This was easy in 1992, but Web sites multiplied rapidly (experts predicted there would soon be thousands), and he soon abandoned the effort.

By 1994, there were tens of thousands of Web sites. Jerry Yang, an electrical engineering grad student at Stanford, published something called “Jerry’s Guide to the World Wide Web.” It was housed on a computer named after his favorite sumo wrestler, akebono.stanford.edu. After a few months, he renamed it “Yahoo!” (complete with exclamation mark), and eventually was offered boatloads of money to turn it into a commercial company. The original Stanford server, akebono, has returned to academic pursuits, but Yahoo! remains.

Yahoo! was a pioneer in automatically indexing sites. While other search engines crawled the Internet and reported back, “Hey, I found a Web server!” Yahoo! attempted to classify Web sites in terms of their knowledge domain, lumping chemistry sites with chemistry, mathematics with mathematics, and so on. AltaVista, in 1995, tried to go beyond this by doing its best to index every Web site on the planet, and indexing the first thousand or so words on every page.

Google, in 1998, went even farther, introducing the idea of “page rank.” Index results are returned in order of relevance, and relevance is determined by how many times a page is referenced by other pages, how specific the page is to the search being conducted, and a host of other criteria, many of them unpublished and, presumably, Google trade secrets. Google’s speed, accuracy, and coverage (it soon eclipsed AltaVista in number of Web pages covered) rapidly turned it into the search engine of choice, which it remains today. For more on Google technology, see:

http://www.google.com/technology/pigeonrank.html

The plot thickens

But even Google has limits, and in recent years has begun a campaign to help Webmasters produce Web sites that can be more rapidly and accurately searched. Google publishes extensive information on good site design, and offers a number of Web-based tools to assist in site management.

One of their most significant efforts is a campaign for Web sites to start using sitemaps. Sitemaps are maps of a site, often a Web page, that outlines the structure of the site and has links to every major section or, in some cases, every page. Google endorsed this as an excellent practice, but advocated that Webmasters go farther and create XML sitemaps. These XML documents contain a complete, detailed index of a Web site, including information on how often the content is updated, and all in a form that can be easily sucked up by a Web search engine without putting a heavy load on the Web server. Google offered detailed instructions on how these sitemaps should be create and formatted, and even how they should be named and where they should be located on a site. Google published the details on their Web site, at:

https://www.google.com/webmasters/tools/docs/en/protocol.html

Finally, Google provides a free Sitemap Generator tool. Unfortunately, the sitemap tool is written in Python — a programming language known by relatively few people — and it requires installation by a skilled system administrator. Very few Webmasters have the knowledge and level of access required to install and configure Python and install and configure the tool.

The Easy Part

Fortunately for Mac users, you really don’t need to know anything about Python, you don’t have to have system-level access to your Web server, and you don’t need to know a thing about XML data structures in order to create a Web sitemap. Rage Software has been making Web utilities for Mac users since before the dawn of Mac OS X, and the Rage Sitemap Automator is a splendidly Mac-like tool: it has a graphical user interface, requires no knowledge of Python or XML, and it works. It even works with Apple’s iDisk services.

To start, launch Rage Sitemap Automator and enter the address of your Web site (Figure 1). The only real technical requirement is that you must have write-level access to the Web site. If you can’t put files on the Web site, you can’t put a sitemap on it either.

Rage Sitemap Automator fig. 1

Figure 1: To build a Web site map, launch Rage Sitemap Automator, press the New Sitemap button, type in the address of a Web site, and press Create Sitemap. Other options allow you to create filters (to mask portions of a site) or to open an existing sitemap profile, if you want to refresh a sitemap.

Once Sitemap Automater has an address to work with, it spiders the site, following every link on the site to find every file and document (Figure 2). For small sites, this could take seconds. For larger, more complex sites such as the Washington Apple Pi site, this could fifteen minutes or more. In testing, I tried Sitemap Automator with sites ranging from 150 pages to over 10,000 pages, and in every case the program worked quickly and without any additional attention or oversight.

Rage Sitemap Automator fig. 2

Figure 2: While you don’t have to watch Sitemap Automator in action, it is educational to see how it scans for pages and examines pages for new links.

Once a site has been mapped, you have the option of editing the results, manually specifying how often a given link is updated. You can set a default value — ranging from “always” to “never” — but if you wish, you can manually change the default to cover the exceptions. This editing is done with a simple pop-up list of choices (Figure 3).

Rage sitemap automator fig. 3

Figure 3: Once a site has been scanned, you can make adjustments to the sitemap to help visiting search engines build a schedule for how often your site should be indexed.

After you’ve made whatever adjustments are required, you can either manually upload the resulting sitemap file to your Web site, or have Site Automator upload it for you. Once uploaded, you can test that the sitemap is properly named and in the proper place by pressing a “Test Sitemap” button (Figure 4). Another button will automatically notify search engines (Google, Yahoo, MSN and Ask.com) of the sitemap, acting as a prompt for them to come and harvest the site.

Rage sitemap automator fig. 4

Figure 4: The “Test Sitemap” button offers reassurance that the entire exercise — admittedly very geeky and abstract — was performed correctly.

The Hard Part

While the software is easy to use and inexpensive ($29.95), many of the benefits touted on Rage Software’s site are not benefits of Sitemap Automator itself but of Google’s use of sitemaps. Fortunately, getting an account on Google is free, and using their Webmaster tools is also free. Tracking down problems found on your site, on the other hand, involves work.

Take, for example, Web galleries. iPhoto, Photoshop, Photoshop Lightroom, Photoshop Elements, Microsoft Expression Media, and many other packages can take a bunch of your snapshots and publish them on the Web. Unfortunately, the Web pages tend to have repetitive and non-descriptive names. Once you have a sitemap, and Google indexes your site, you will discover that Google — and all other search engines — don’t look kindly on Web pages with identical or non-descriptive names.

Often, this requires that you painstakingly examine and re-title dozens, or possibly hundreds, of pages, and generate a new sitemap. The downside: it takes effort. The upside: generating the sitemap with Rage Sitemap Automator is easy, and the sitemap, along with Google’s Webmaster tools, makes finding the problems relatively painless. The end result: your site will be more completely indexed by search engines, more search engine users will find your site, and your site will become more popular.

Figure 5: You don’t need Rage Sitemap Automator if you feel comfortable typing XML documents. Here is a portion of the Washington Apple Pi sitemap; the complete sitemap (as of August 2008) comes to 594,743 bytes.

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>http://www.wap.org/</loc><lastmod>2007-05-01</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/pistylesv4.css</loc><lastmod>2008-05-16</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/default.html</loc><lastmod>2001-09-26</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/about/contacts.html</loc><lastmod>1997-12-13</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>

<url><loc>http://www.wap.org/search/default.html</loc><lastmod>2001-01-16</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/about/default.html</loc><lastmod>2001-09-26</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/events/chronology.html</loc><lastmod>1997-12-13</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/about/officemap.html</loc><lastmod>1997-12-13</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/about/donations.html</loc><lastmod>2001-09-26</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/about/aboutbrochure.html</loc><lastmod>1997-12-13</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/tutorials/default.html</loc><lastmod>1995-09-23</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/about/hotline.html</loc><lastmod>2003-01-01</lastmod>
<changefreq>monthly</changefreq><priority>0.5</priority></url>
<url><loc>http://www.wap.org/

Problems?

In using Rage Sitemap Automator, most of the problems came right at the start. Rage allows you to download a free, working demo. Unfortunately, the demo only permits you to create a sitemap for 20 pages, and that is none too useful; it was difficult to tell if the program did anything useful at all.

After getting the full version, I was notified of an update, which I downloaded and installed. But every time I ran the program, it complained it was using the old version, and prompted me to download the new version — again. After going through this cycle two or three times, I noticed the program name had changed. Originally, it was called Google Sitemap Automator, but (probably in response to complaints from Google) it had been renamed Rage Sitemap Automator. The update was downloading the new version, but my alias for the program was launching the original version. It would have been nice to get some notification of the name change, but fixing the problem was easy: I threw away all traces of the original program with the now-abandoned name.

Rage customer support, despite the somewhat unfriendly sounding company name, was always fast, courteous, and unfailingly accurate in answering questions. More than once, I received a response within 15 minutes of E-mailing a question. The bulk of the documentation is online, and not only does it tell you how to use Sitemap Automator, it is also a very good tutorial into the art and science of search engines and sitemaps in general.

Resources

Rage Sitemap Automator, $29.95
http://www.ragesw.com/products/googlesitemap.html