Support Bloggers' Rights!
Support Bloggers' Rights!

How to Videos & Articles:

Wednesday, August 23, 2006

Google Sitemaps: AKA Webmaster Tools Tutorial For the XML Challenged

Copyright © August 23, 2006 by Mike Banks Valentine

Google recently announced a change to their "Sitemaps" program. It went from a protocol meant for Python programmers and XML wizards to a much kinder, gentler (and friendlier to webmasters) program to help get all of your pages crawled and indexed. It's called "Google Webmaster Central. The tools can now be used and understood by most small business site owners.

Google explains everything and lists Sitemaps resources at:

To use Google sitemaps, you must first sign up for a Google Account. If you already use Google Adwords, Analytics, Gmail or other Google provided tools, you can use your existing account to submit a Google sitemap for your site. Get an account at the following URL if you don't already use Google services:


Many webmasters struggle to understand even the simplest HTML and meta tags and after visiting the Sitemaps program page when it was first announced in the summer of 2005, those small business site owners went away sadly shaking their heads and mumbling. They complained, "I can't even add PERL scripts to my own CGI bin and properly set permissions on page files - how am I going to install and debug a Python script on my server, run cron jobs and generate XML files?"

Apparently Google heard all that grumbling and came back with the newly released "Webmaster Central" to answer the concerns of excess complexity. They no longer require you to be a geek to get all your pages into their index. They've created tools to make the job of submitting all of your pages for inclusion in their index very much easier to handle.

The first listed "Site Status" tool lets you check indexing of your sites. If you enter an address into that search box and press the "Next" button, they'll return a page with a button labeled "Take me to Google Sitemaps" and encouraging use of the sitemaps tools, regardless of whether you've already submitted that sitemap or not. They'll list some minor details about the site entered such as:

Pages from your site are included in Google's index. Some of these pages are indexed without a title or deion. Googlebot last successfully accessed your home page on Aug 18, 2006

They list "Potential indexing problems" and then state:

More details about your site may be available By using Google Sitemaps, you can learn more details available only to site owners, such as:
  • errors Googlebot encountered while crawling your site
  • top search queries that return your site


Let's back up for a moment though. Webmasters have been told for ten years now to build a sitemap into their web site that lists all of their pages (if it is a small web site with under a hundred pages) or at least listing major sections of their site (if they have thousands or tens of thousands of pages.) So what is the difference here?

Google sitemaps are actually XML documents (not public html pages) that hold much more information about your web pages to help Google determine several things. They list the "priority" or importance, "last modified" dates, and "change frequency" of each page. But the creation of those documents had required webmasters to install that Python script on their server. Available at:

Or webmasters had to use third party software to generate the required XML file. Google recommends a brief list of sources for third party software to help them programmatically create the XML sitemaps:

I've personally tried several of those third party tools and found two of the web-based sitemap generators lacking, one of the downloaded software tools crashed my computer (and created havoc for me), so what is a small business owner without programming skills to do?

Those business owners who are non-programmer types and want to use Google Sitemaps complained that Google was favoring geeks over business owners. They wanted a simple way to submit all of their pages to Google without running cron jobs on their server and debugging Python scripts.


Google heard our grumbling and now allows simple lists of URL's in a plain text document. All you have to do is create that list of page files, save it as sitemap.txt and upload it to your server. Then you log in to your Google Webmaster Central (AKA Sitemaps) account and tell them the URL of your sitemap text document.

Before you submit your first sitemap URL on a domain, Google requires you to put a "site verification" meta tag on your site home page and click a "Verify" button to prove you own the site. Anyone with a Google account and access to your server can do this. You can add or remove any authorization tags placed by anyone with access to your server who is no longer authorized to see this data.


In the "Diagnostic" tab, there is a tool that will validate your robots.txt file, tells you which pages are restricted by that file and lists problem URL's and reasons for the problems. It also lets you make changes in a copy of your robots.txt file locally, which shows immediately how changes would affect the next crawl by all Google bots, including the Adsense and PPC landing page quality crawlers! They warn on that page that local changes don't affect your own robots.txt file and remind you to make the changes to the file on your server.

Another useful "Diagnostic" tool lets you set your preference for canonical URL's to include www or non-www versions of your site. (This last item shows how seriously the Google team takes this canonical issue.)


Under the "Statistics" tab in Webmaster Central is are "Query stats, Crawl stats, Page analysis" links with more data on your pages. The Query statistics show your top 20 search queries that searchers have used to find your web site and your top 20 click through queries. Those data tables provide some interesting and sometimes unexpected detail about how visitors find your site and allow you to further optimize and funnel those visitors. The "Crawl Stats" promises to show PageRank and distribution of PageRank throughout your site and in comparison to other sites.

The "Sitemaps" tab simply lists your submitted sitemaps for all your sites and shows the dates "Submitted, Last Downloaded, and Sitemap Status." The status tells you if there are errors, and what they were (not allowed, external site links, 404 error page not found, etc.) I've just submitted a new sitemap on a just reserved, created and newly posted site this week and will report back on how long it took for index inclusion on that site to record the effect of early sitemap submissions.


Finally, there is a "Tools" link in the upper right corner of the "Sitemaps" page which allows you to "Download Data for all sites", "Report Spam in Our Index" and a "Reinclusion Request" link to use if you've been banned for questionable techniques. Clearly, since you are doing all of this from within a Google account, you are openly providing Google with your information and making all spam reporting and reinclusion requests under your name from within a Google account. This suggests that you trust Google with all information they hold on your sites and any complaint made about search engine spam.

Currently there are ratings tools from within the Webmaster Central site to let you tell Google if you like the tools with a smily face, a neutral face and a frowny face. This may not last as the program comes out of beta, but lets you tell them what is useful and what isn't.

Still need some help? Try joining, reading, searching and posting to Sitemaps and Webmaster Central Google Groups. Posts from webmasters get back responses from knowledgeable members. Watch for the little green "G" logo for Sitemaps team members for particularly definitive and useful recommendations.

Want ongoing official Google blog posts about Webmaster Central?

If you've had trouble getting all your pages indexed and want to use those informative and useful webmaster tools and reports - give Webmaster Central a try.

Mike Banks Valentine operates SEOptimism, Offering SEO training of in-house content managers as well as contract SEO for advertising agencies, web development companies and marketing firms.

Content aggregation, article and press release optimization & distribution for linking campaigns.


Post a Comment

<< Home