Search rivals Google, Yahoo and Microsoft have announced improvements, and two new supporters for Sitemap, the specification which defines how sites are indexed by search engines.
IBM, and Ask.com from IAC/InterActiveCorp will now support the Sitemap effort, which is designed to simplify how webmasters and online publishers submit their sites for indexing in search engines.
In November, Google, Yahoo and Microsoft agreed to support Sitemap, an open source protocol based on XML, that aims to standardise the "site map" files that webmasters put on their sites to help the search engines' automated web crawlers properly index their pages. Sitemap should simplify the creation of site maps by web publishers and their discovery and interpretation by search engines.
The Sitemap protocol, now in version 0.90, provides a uniform way of telling search index crawlers where site map files are located on a site.
All web crawlers recognise the robots.txt file, which tells crawlers not to index certain information, so now webmasters can indicate the location of their site map file within robots.txt files. Meanwhile, the protocol's official website is now available in 18 languages.
Venus Swimwear, a $100 million swimsuit retailer, expects to benefit from the new feature, formally called "autodiscovery," which it is already implementing. The company manually re-submits its site map to search engines when the site changes, but pointing at it in the robots.txt file should ensure the crawlers automatically find the latest version every time, said Rhea Drysdale, a Venus Swimwear e-commerce analyst.
"It's very useful in that it automates the process. There are some weeks when we'll forget to resubmit the site map, and we tend to update our website weekly. By having [the site map address] on autodiscovery, it notifies them that [the site map] is here and whenever you come to our site, to please take a look at it. If there are changes, they should be able to pick those up quickly," Drysdale said. She also welcomed Ask.com's support for the protocol.
John Honeck, a mechanical engineer who runs several small sites and blogs in his spare time, including his personal blog, also predicts the new feature will be helpful to webmasters. "Anything that is standardised is helpful for the webmaster. We can spend more time on our sites and less time worrying about setting up different accounts, verification processes, and submissions for all of the multitude of search engines out there," he said in an email interview.
However, Honeck feels the vendors could clarify some points about the autodiscovery feature, such as how it will work in sites with multiple site maps. Privacy issues may also crop up, because pointing at the site map from the robots.txt file makes the information more easily accessible. "While not normally a problem, it could cause a security risk. As search engines can crawl your site more efficiently, so can scrapers and bad bots as well," wrote Honeck.
The Sitemap protocol was originally developed by Google and is offered under the terms of the Attribution-ShareAlike Creative Commons Licence.
Find your next job with techworld jobs