Schema.org aims to provide a common vocabulary for structuring web page data.
While Google competes with Microsoft and Yahoo in the search market, the three companies are cooperating to help web publishers make their content more comprehensible to search engines.
Google on Thursday said that the three companies had launched an initiative called schema.org, to create and support common ways to represent web page metadata. The project will offer web publishers the tools to make their web content more easily understood by search engines and more effectively represented on search results pages.
Schema.org hosts definitions for HTML tags that webmasters can use for data markup. For example, the Person schema provides a way to associate a person's name with data that relates to that person, like his or her street address and email address. Without the structure provided by metadata markup, it can be difficult for search engines to be certain that a name on a web page is associated with some other data attribute."With schema.org, site owners can improve how their sites appear in search results not only on Google, but on Bing, Yahoo, and potentially other search engines as well in the future," said Google Fellow Ramanathan Guha in a blog post.
There are other ways of marking up web pages, such as RDAa and microformats. But Google, Microsoft, and Yahoo argue that other formats have disadvantages and that webmasters will benefit from having a single markup resource focused on search engines, which in turn will lead to more markup and a better search experience.
Google has been pursuing its own structured markup for several years. In 2009, the company enhanced its search results with rich snippets, which made additional data like online reviews visible in search listings. The company has expanded its snippets to include events and recipes. As a result, companies like stubhub.com and allrecipies.com have chosen to structure their data to take advantage of the more effective presentation afforded well-described data.
The schema.org initiative is similar in some respects to sitemaps.org, an XML schema that helps search engine crawlers navigate websites. The protocol was created by Google in 2005 and supported by Microsoft and Yahoo in 2006, with other companies announcing support later.
The existence of schema.org can be seen as an acknowledgement of the limits of automated data analysis. One of the Frequently Asked Questions posted on the schema.org site attempts to deal with a possible objection to web page markup, specifically that it requires work from webmasters. "Automated data extraction is great when it works, but it can be error prone because different sites can represent the same information in so many different ways," the schema.org website says.
Understanding, in other words, is a harder problem than indexing. Humans may not be obsolete after all.