Digg makes official its adoption of a 'semantic Web' standard
It could be the very thing the Web has lacked all these years, even with its wealth of intermingled hyperlinks: a markup language for conclusively identifying context. Now, Digg is making the bold attempt to be its biggest "beta tester."
One of the principal deficiencies all these years about HTML or XHTML as a markup language has been the absence of any genuine, built-in feature for explaining to indexing services or even to browsers with intelligent features, just exactly what a page contains at a granular level. Metadata could conceivably help categorize data, assuming everything on a page had the same category; but with more Web pages these days constituting whole blogs, whole-page metadata is rapidly becoming useless.
The W3C standards body in charge of maintaining and developing the language of the Web has actually been addressing this problem for several years; though while it's been busy building "standards," by the objective definition, relatively very few sites have actually tried implementing them. Last month, one major exception was Digg, the social news aggregator which quietly began trials of a W3C standard for labeling contextual data at a very low level -- meaning, right next to the data itself.
The concept is called RDFa, and essentially it's a way to put W3C's existing RDF contextual markup language to real-world use by converting it to XML form. It borrows RDF's rather inspired way of explaining what an element means, or what the space being reserved for an element (say, from a database) should mean.
It calls for a stretch of the imagination a little bit because it uses a loose metaphor from the realm of common grammar: All context -- everything that can symbolize relative relevance in text -- can be represented in terms of a something "x" that does a certain something "y" to a something "z." In this case, "x" is the subject of the relationship and "z" is the object, in the grammatical (not the programming) sense. The action of that relationship is the "y," which in RDF is called the predicate (note, not "verb").
These three items together form what are called triples in RDF; and in the XML-based RDFa notation, a triple can be embedded into an HTML element -- such as <P>, <H2>, <IMG>, or <SPAN> -- in such a way that it effectively describes the context of the element's contents. This happens after you merge the RDFa namespace into the XML for the page.
For example, a paragraph about a subject defined by an online resource can include in its <P> element bracket an attribute about: that is set as a member of a defined class, and that points to the HTTP address of that resource. Then whenever the name of that person specified in the resource is referenced, that name can be placed in a <SPAN> element with an attribute such as contact:name, where contact is a property defined in the specified class. This way, an index or smart browser can detect when and where a paragraph is about a person whose name is indexed and catalogued by an outside resource.
The "triple" in this case is easy: The subject is the person's name, the predicate is the act of naming, and the object is the resource where the name is catalogued. Imagine if you ran a certain online resource -- say, a wiki/encyclopedia thingie of some sort -- and what an advantageous position you might be in.
Digg's involvement in all of this came by way of a very brief announcement on its company blog yesterday, where principal member Steve Williams wrote, "We've added RDFa, making Digg part of the 'semantic web' where Web pages become more sophisticated, beyond simply words and pictures."
But Williams is actually an active proponent of Digg's involvement in new and emerging standards, as demonstrated by his announcement last January of its entry into the DataPortability project, the gathering place for standards efforts in the field of data exchange, of which RSS and RDF are two prominent members.
Other brief mentions on Digg's blogs over the past month have been the only indications the company has been giving to the world of its direct -- and perhaps even principal -- involvement in RDF and RDFa, besides a simple check of the site's own source code, where attributions such as rel="dc:source" property="dc:title" within <DIV> elements are now common. A few weeks ago, developer Bob DuCharme discovered these little attributions and began playing with them to discern their viability.
On his personal blog, DuCharme wrote, "The first few times I tried the RDFa Highlight bookmarklet, which puts red rectangles around all the parts of a Web page that have RDFa metadata assigned, I didn't think it was very useful; I thought, OK, red rectangles, what can I do with them? My experience with Digg changed my mind. A single button click gives a very quick and intuitive display of how much RDFa a page offers to work with."
The possibility exists for a kind of mega-meta-source to emerge from Digg, where interesting news topics are associated with cataloged resources. But for that to actually work, someone has to manage those resources -- and that effort will take a level of humanpower and resources of another kind (the kind symbolized with "$") that RDF won't provide even the most ambitious sites just on its own.