Schema.org and Pre-Existing Communities

I have been reading tweets and blog posts expressing various levels of disappointment and unhappiness about schema.org not using RDFa, not using Microformats or not having been developed in the open with the community. Since other people’s perspectives differ from mine, I feel compelled to write down my take.

Disclaimer: I’m only speaking for myself on my own time and initiative. Expect blatant bias. I’m not claiming to speak for a community.

Background

For a long time, people have been thinking about metadata on the Web and theorizing about how awesome things could be if only everyone published more metadata in the speaker’s favorite format and search engines supported that kind of metadata for search.

Of the languages relevant for this blog post, the oldest one is RDF/XML. RDF is a knowledge representation framework that encodes data as subject-predicate-object triples (that become more like quintuples when object literals values come with a datatype and a language tag). When you combine triples, they form graphs (subject of one triple can be the object of another with circular chains allowed). RDF/XML is a serialization of the RDF data model as XML. It isn’t the only serialization though. The RDF community never seems to be happy about their syntax and keeps inventing new serializations for their data model (as if the syntax was the main problem). Other syntaxes include N3, Turtle and N-Triples.

The awesomeness that is expected to emerge if only enough people used RDF is called the Semantic Web with a capital ‘S’. There are specs over specs in the Semantic Web stack beyond the base RDF itself.

RDF/XML, N3, Turtle and N-Triples separate metadata from the HTML content that people see in their (non-Semantic) Web browsers. This means that when you edit the HTML that people see, you forget to edit the metadata separately and the metadata rots.

The Microformat community prefers integrating the HTML content that people see and the metadata to avoid the rot of “invisible metadata”. They take the human-visible HTML content and annotate bits of the visible content with markup so that the human-readable projection of the data and the metadata projection are overlaid onto each other in one document. Microformats do this by tucking the annotations in existing attributes of HTML 4.01 and XHTML 1.0 in such a way that HTML 4.01 and XHTML 1.0 validators don’t whine about it. The Microformat community also seeks to be more pragmatic and less theoretical than the Semantic Web community and tries to base their stuff in use cases identified from what people are already trying to express on the Web. Microformats are meant to enable a semantic Web with emphasizedly lower-case ‘s’.

RDFa is a serialization of RDF that applies the overlay idea of Microformats to serializing RDF. From the point of view of its creators, RDFa is “Microformats done Right”. (This characterization was inflammatory and has intentionally been retired from active advocacy use.) RDFa was developed by the XHTML2 WG and originally overlaid an RDF graph onto an XHTML2 document (though, interestingly, the examples didn’t show the namespace of the host document and the mechanism could be applied to other XML vocabularies). Since XHTML2 wasn’t going anywhere, RDFa was backported (as a joint venture with the Semantic Web branch of the W3C) to XHTML 1.x which was, wink, wink, nudge, nudge, served as text/html, so an RDFa legacy was introduced to HTML even though RDFa was reviewed only for XML considerations in the W3C Process. RDF has a special status at the W3C, so RDFa was given an escape pod when the XHTML2 WG was shut down.

Microdata is RDFa done Right for HTML(5). It is published both by the WHATWG and the W3C. The design of Microdata was informed by a Google-funded usability test. Microdata improves (in a Saint-Exupéry kind of way) upon RDFa thanks to four key insights. The first one is that the kind of prefix-based indirection used in Namespaces in XML and RDFa’s URL shortening mechanism confuses people as observed from what kind of mistakes content authors and consumer software implementors actually make regardless of what they say. The second one is that the identified use cases (the ones claimed for RDFa!) are served well enough by tree, so a graph data model is an overkill. The third insight is that explicit data typing doesn’t make sense when the data type of a value is tightly coupled with the metadata property that the value belongs to. The fourth insight is that the properties associated with a thing are generally tightly coupled with the type of the thing, so the properties and the types of things don’t need to be orthogonal. Microdata puts to the torch the Power of RDF, but YAGNI.

Unlike Microformats, both RDFa and Microdata add new attributes instead of putting the annotations into pre-existing ones. RDFa and Microdata are frameworks for defining vocabularies. Microformats are vocabularies based on shared design patterns but without an explicit syntax framework under them. A vocabulary becomes a Microformat with a capital ‘M’ after passing through a community process. RDFa and Microdata, on the other hand, allow anyone to mint vocabularies on top of them without any kind of cooperation.

With that context, let’s look at schema.org.

What’s schema.org?

Schema.org is a three-company (Google, Microsoft, Yahoo!) venue for publishing Microdata vocabularies for the kind of overlaid metadata that the major search engines deem worthy to be supported in special and specific ways in Web search. Schema.org supersedes (though the old stuff remains supported) a previous Google-only initiative that at first couldn’t decide if it wanted to use RDFa or Microformats and later added Microdata support. In the old Google-only initiative, Google defined Google-specific vocabularies on top of RDFa and later Microdata but used Microformats from the Microformat community. Now schema.org defines multi-search engine vocabularies on top of Microdata without the Microformat community.

Why Not RDFa?

If you look at the kind of things that the search engines are seeking to do with the schema.org vocabularies, you don’t need a graph for any given page. You just need a tree. If you have a page for an event, you put the participants people as children of the event node. If you have a person page, you put the events the person as speaking at as child nodes of the person. Even if in the aggregate things like this obviously form a graph, you can manage with a tree as far as overlaying a piece of the graph over a human-readable page goes. While you can typeset a list of people on an event page as child nodes of the event and you can typeset a list of events a person will be speaking at, you can’t nicely take a larger graph and overlay it on a naturally typeset human-readable page. And this is about overlays – not about larger general RDF models. (The search engines probably wouldn’t store the index they put this data into in an off-the-shelf RDF triple store anyway.)

Thus, the generality of RDFa is a complexity burden. The RDFa advocates who are demonstrating how RDFa can express everything that schema.org vocabularies need expressed are fundamentally missing the point that it’s not about being able to express all that they can demonstrate to be expressible. It’s about the latent complexity of being able to do more that’s the problem. The excess expressiveness is a burden for anyone implementing consuming software and for authors trying to understand what they are doing (as opposed to cargo-cult copying of example snippets).

Furthermore, the RDFa community, having managed to manufacture an RDFa legacy for text/html while officially speccing stuff for XML, has constantly refused to remove the Namespaces in XML-like URL shortening mechanism from their format even though the first (and still de facto most advertised) flavor of the mechanism ran afoul with the HTML DOM and all flavors of the mechanism has the same usability problems as Namespaces in XML. Time and again when the RDFa advocates cheer news about someone new consuming or publishing RDFa, it turns out that the new example of RDFa’s success has intentional deviations from the URL shortening mechanism as specced or accidental errors. Feedback about this problem has been deflected in ways that brought flashbacks of how problem reports about XHTML2 or XHTML 1.x were handled. If search engines want their vocabularies to be used, it’s a good idea to avoid a framework with this kind of problems.

I think the above considerations are pretty much all there is to not using RDFa. I find it amusing that the non-use of RDFa is being explained by Microsoft supposedly having a long-standing view of RDF as a threat.

It’s also worth mentioning that there’s nothing unusual or immoral in implementors deciding not to use a W3C-stamped spec if the spec doesn’t fit their needs. This sort of thing happens in the browser space all the time, so it’s not unreasonable for the same thing to happen in the search engine space. Browsers chose HTML5 over XHTML2, HTML5 Forms over XForms and CSS Animations over SMIL Timesheets. Right now, there are W3C working groups cranking out metadata APIs for browsers without the participation of browser developers and the resulting API drafts make basic mistakes like not dealing with the issues that arise if you have stuff spread across multiple HTTP resources and a script can poke at the API before all the resources have been downloaded. I predict that browsers will route around those APIs (at least as drafted). Considering this background, it’s business as usual that search engine implementors end up routing around a W3C spec that was written for them without their participation.

Why Not Microformats?

Over the years, the main criticism I’ve heard about Microformats have been that it’s too hard to get a new format through the community process (i.e. extensibility is socially hard), that there isn’t a generic parsing model for Microformats and that Microformats don’t compose generically.

I’ve never tried to launch a new Microformat, so I have no first hand experience with it. I’ve heard the complaint that it’s too hard to launch new Microformats enough times from different people that I tend to believe that it is true at least to the extent that the belief is held widely enough that it matters to people contemplating whether to use Microformats or something else. Therefore, the perception of how hard it is to get community approval for a new format has relevance to how quickly the search engines could proceed if they wanted new formats minted and approved by the Microformats community.

Manu Sporny is an interesting data point about Microformat extensibility. He started out on the Microformats side, but when he didn’t get what he wanted for the purposes of his startup (at the schedule he wanted?) from the Microformat community, he turned to RDFa (Microdata didn’t exist at the time) and is now the Chair of the working group in charge of RDFa.

I should mention that I’ve only heard Manu’s side of the story, but if you consider what Microformats look like from the point of view of someone wanting to develop something new in the context of the Microformat Process, that’s the side of the story that matters.

As for the generic parsing model, not only do Microformats lack a generic parsing spec, they lack detailed processing specs in general! This is what I personally find the most baffling about Microformats. Back in 2006, HTML5 (then called Web Applications 1.0) actually included a couple of Microformats by reference. I concluded that there wasn’t enough of a spec for them to build a validator for the Microformat parts, so I made sure I explicitly didn’t promise to anyone that I’d implement Microformat validation in the software that later became Validator.nu. I wasn’t the only one of the #whatwg IRC channel regulars who considered the lack of detailed processing (and document conformance) specs a major bug in Microformats. Three years in a row (2006, 2007 and 2008) after the Microformats session in XTech conferences, there was someone from #whatwg (Anne or I) asking the question about when Microformats are going to get proper spec. The answer at the time was basically that if you want a proper spec, you should be the one doing the heavy lifting of writing one.

This attitude makes sense in a way. If you want to defend the community against detractors, you shouldn’t allow situations where a random commenter says something that’s cheap to the commenter to say but expensive for the community to address. Making the commenter put his money where his mouth is an effective way to weed out commenters who aren’t serious. Understanding all this, I decided that bringing the WHATWG speccing culture to the Microformat community would be an uphill battle and that I could instead route around Microformat validation (I had plenty of other validator features to write anyway), so I walked away.

The normative inclusion of two Microformats in HTML5 by reference later went away at least in part due to the lack of specs meeting the quality standards of the WHATWG community.

The complaint about the lack of generic composability pretty much follows from not having generic processing and as mentioned above, Microformat don’t even have well-specified non-generic processing.

Now, one might argue that the Process and the YAGNI attitude to proper specs are features and not bugs. However, if someone contemplating whether to adopt Microformats considers these to be bugs, then they are barriers to adoption. I don’t know how the schema.org folks feel about the above reasons, but I woudn’t be surprised if they considered these aspects problems to some extent.

The reason the schema.org FAQ gives is that Microformats aren’t extensible without risking collisions with existing class names used for styling, which seems like a legitimate technical (as opposed to social) extensibility concern.

Conclusion About Frameworks

I think the major lesson here as far as the frameworks (ignoring vocabularies for a moment) go is how communities react to feedback and how that ends up attracting or repelling incumbent implementors who can bring their marketshare to a spec.

Communities don’t generally formulate visions about new stuff. Instead, there’s a very small set of people who get things done who set the direction and vision. The community then gets to accept, reject or refine what the cabal set in motion.

(Aside: Considering that Microformats started out with three people doing XFN and that RDFa started with about three people after being proposed by one person, I think it doesn’t make sense to argue against schema.org by saying that it’s “only a handful of people under the guise of three very large companies”.)

When the ball gets rolling, more and more people show up and make suggestions. Some of the suggestions are no-brainers to adopt. Other suggestions might go against the original vision of the core group that was initially getting things done. When there’s a clash of visions, the initial group can give up some of their vision to please the newcomer, they can make compromises (which degenerate their spec into an inconsistent turd if done too many times) or they can reject the feedback from the newcomer. Rejecting the newcomer’s feedback means, absent other information, taking the risk that the newcomer is a person who gets things done and launches a competing effort that eats your lunch. But you can’t work by assuming that every newcomer with an opinion is like that since very few are.

There are different kinds of people who show up in a community. There are clueful people who don’t have implementations but who give insightful feedback. You want to have those people around. They are one of the best parts of having an open community. “Not all smart people work for you” and all that. There are also armchair theorists that have way too much time on their hands and who, in good faith, repeatedly make terrible suggestions and will exhaust the community’s attention. There are occasionally purposefully disruptive people, but much less often than there are people who suggest bad stuff while thinking they are suggesting good stuff. Then there are small-time implementors, who give implementation feedback but who can’t bring serious market share and network effects. Then there are people who can commit code to products that already have serious marketshare – marketshare that your spec could enjoy if they like your stuff enough to implement it. Trouble is that even they can also suggest silly things but you may have to placate them if you want their participation and the marketshare they can bring.

The hardest thing is that you don’t always know initially who is in which of these groups. Also, occasionally, it may happen that you initially categorize the provenance of feedback correctly but fail to realize that the feedback also has relevance to another group so the initial commenter’s opinion was actually representative of a broader concern. The worst case scenario is that you reject an idea from an armchair theorist with too much time on their hands and develop a negative outlook on the idea due to its initial provenance but it happens that implementors wielding marketshare happen to think the idea is really important. (Maybe in that case, the person wasn’t one of those harmful theorists after all but was actually one of those clueful people who don’t have implementations.)

A good way to avoid having guess the significance of the provenance of feedback is to have taste that just happens to yield results that attract implementors with existing marketshare and then work according to your taste rather than the provenance of feedback. (This requires correcting the intuitive result upon explicit feedback the from people who are undeniably important implementors, though.) Ian Hickson, the editor of HTML5, has a particularly good track record at this. It’s remarkable to contrast XHTML2, XForms and RDFa with HTML5, HTML5 Forms (aka. Web Forms 2.0) and Microdata . The first three all came from the same small group of people. The latter three are Hixie specs. In every case, the specs from the XHTML2/XForms/RDFa folks had the W3C logo first, seemed to have the community interest first and even had the interest of some corner of IBM. How could anyone go wrong by betting on a spec that has the W3C, a community and even (some corner of) IBM on its side? Yet, in all these cases, the implementors with existing marketshare were more attracted to the Hixie specs. Hixie is better at addressing the kind of concerns that implementors who have existing marketshare end up treating as relevant concerns.

At this point, some readers probably think that it’s just wrong that you have to optimize for implementors who bring marketshare and, instead, the implementors should do what The Community tells them to do. But that’s not how the world works. If you want marketshare, you need the cooperation of the people who can add your stuff to products or services that already have marketshare. You can ignore this only if you don’t care about marketshare or if you are doing something so groundbreaking that it creates entirely new product or service categories without the participation of the incumbents. RDFa and Microformats aren’t groundbreaking enough to go that route and, judging from the commentary expressing unhappiness about what the big search engines have now chosen, I think both communities do care about getting marketshare. Also, there’s no single “The Community” deciding things as if the implementors shouldn’t be part of the community.

Edited edit: It has come to my attention that the above has been read as me suggesting that optimizing for implementor acceptance is virtuous or that I worship Hixie. I don't worship Hixie and I'm not trying to make a moral argument above. I'm making an amoral observation about what is effective for the purpose of bootstrapping deployment. As for moral arguments, I think it is incorrect imply that the first W3C working group (or group of people with a common interest) to address an area of use cases represents the will of the whole Web. Both RDFa and Microdata were published on the w3.org server not because of wide initial community support but because the people writing those specs already had access to the W3C publication pipeline as editors of other specs. Alternatives may end up enjoying more Web author and user acceptance. It is worth noting that implementors have an interest in aiming for author and user acceptance in order to maintain or grow their user base. It seems to me that the tacit admission of people who express so degree of unhappiness (me included; I express unhappines later in this article) about schema.org is the understanding that the actions of the big search engines matter for metadata technologies precisely because the search engines are big – in other words, have market share. I think you can and should make moral and aesthetic choices when designing specs, but if it does not get deployment, it does not have much of an effect – good or bad – except maybe in the sense of what kind of alternative fills the void if there is one to fill. (End edited edit.)

I think the situation with Microformats, RDFa and Microdata is a story of failing to address the concerns of people who turn out to be productive and determined enough to create competing specs when left unsatisfied by what an existing community offers and a story of concerns that end up being relevant to implementors with marketshare (for whatever reason). Consider the case of Manu Sporny. He didn’t get to do what he wanted done at the Microformats community, so a competing community got one more active person who ended up pushing RDFa more on the HTML5 radar – so much so that there was a perceived need to do something about the complexity of RDFa. The RDFa community refused to remove the complexity (and instead added alternative features to that increase to the overall complexity). This lead Hixie (a very productive person whose concerns were deflected by the RDFa community) to create Microdata. And now implementors with marketshare have indicated preference for the simpler thing.

I think that Microdata would not exist without RDFa as the intermediate step. I think it is fair to say that the HTML5 cabal (the people who hang out on the #whatwg IRC channel) liked the concept of Microformats. HTML5 even tried to give Microformats pieces of HTML that were missing from HTML 4 (in addition to originally including two Microformats by reference). For example, the time element in HTML5 exists pretty much as a response to the abbr pattern in Microformats not being that great. It was a bit of a disappointment that the Microformat community didn’t jump at the opportunity to adopt time and to go all-in with HTML5 (as opposed to continuing to use only HTML 4 features). Maybe it was naïve to expect them to.

Also, since the #whatwg regulars consider detailed processing specs very important and the Microformat community didn’t deliver those despite the feedback over the years, the IRC echo chamber around Hixie was of the inclination that Microformats weren’t up to the task of filling the position of RDFa done Right for HTML(5). This lead to an environment where the creation of Microdata seemed like a better alternative than leaving it to the Microformat community to provide an alternative to the complexity of RDFa.

So I think the Microformat community got entrenched instead of staying enough ahead of the game to pre-empt the conditions that allowed Microdata to be created. Once Microdata was there, the schema.org partners had the opportunity to prefer it over Microformats.

I think it’s also interesting that the Microformat community has an exceptional awareness of the problems disruptive individuals cause to the community and are less shy to deal with “trolls” than other communities. (“Troll” is used more broadly in the Microformat community than generally on the Internet. To avoid the offensiveness of the word “troll”, an alternative term “detractor performant” has been coined.) I can’t help but wonder if the community defense mechanism has been too strong and has ended up making the community stay the course where, in retrospect, it would have made sense to change in order to stay ahead of the competition.

But What About the Vocabularies?

So the above is mainly speculation about how schema.org might have ended up choosing Microdata as the syntax framework. It doesn’t address the process for creating the vocabularies on top of it.

I have no sympathy for the arguments of RDFa advocates who are calling foul over the schema.org participants creating the vocabularies on top of Microdata behind closed doors. The double standard is just so obvious when considering their cheery reactions to Facebook “using RDFa” with the unilaterally-introduced Open Graph Protocol.

I think people affiliated with Microformats have a legitimate reason to criticize the lack of community participation in the development of the schema.org vocabularies, though. The Microformat community had already developed vocabularies that have remarkable overlap with the schema.org vocabularies when you look at them on the high level of topics (people, events, etc.) and the Microformat vocabularies don’t come with the RDF baggage of the pre-existing vocabularies from the capital-S Semantic Web community. Why didn’t schema.org reuse those vocabularies?

To start with, choosing to go with Microdata as the framework opened up an opportunity for reinventing wheels at the vocabulary level. However, it’s not like schema.org didn’t have anything but an empty framework to start with. Hixie, as part of the effort of defining the Microdata framework, also created Microdata ports of the Microformats for events and people (and supplied a super-simple vocabulary about licensing works although the Microformats community hadn’t completed a full Microformat on that topic).

Aside: It’s worth noting that the W3C HTML WG suppressed these Microdata vocabularies from W3C publication. They remain in the WHATWG spec. There’s evidence of at least one person who participated in announcing schema.org reading the W3C Microdata spec instead of the WHATWG spec. (Pro tip: Always read the WHATWG version of spec if there is a WHATWG version. If there isn’t a WHATWG spec, try to read the Editor’s Draft at the W3C. Only read a spec under w3.org/TR/ as the last resort. This way, you have the best chance to avoid reading stale specs and the best chance to see useful information that the W3C politics have suppressed.) It would be tragic if the engineers behind schema.org had overlooked Hixie’s Microdata ports of the main vocabularies because the vocabularies were suppressed from publication on the W3C side and the secrecy around developing schema.org prevented anyone from pointing this out in time.

Personally, I hoped that the Microformat community had itself gone all-in with HTML5 and proactively ported its vocabularies to Microdata. Of course, that was an unrealistic wish. Before the big companies showed their preference (or just Google’s preference that Bing and Yahoo! had to follow?) for Microdata over both Microformats and RDFa (which both were previously supported by Google’s Rich Snippets), why would the Microformat community have disrupted themselves by promoting an unproven framework that would have reset the network effect of existing published content back to zero – when they don’t even seem to care about frameworks that much?

An easy explanation for not using existing vocabularies is that communities doing metadata domain modeling are looking at the problem from the wrong perspective by considering what can be expressed about a topic while the search engine implementors care about what matters for consuming for the purposes of search uses cases and they just happened to care about different things that what the pre-existing communities had develop specs for. In my experience, metadata design efforts tend to fall into the trap of focusing more about what could be said about a topic rather than what needs to be said in order to support use cases of the consuming software. I stopped believing in metadata when I spent a summer at the National Archives (of Finland) thinking about metadata and saw how many existing specs seemed to focus on recording inessential things and failing to record essential things as far as the use case I was tasked to think about was concerned.

Indeed, if you take a closer look at the vocabularies, Microformats and schema.org have rather different notions of what’s interesting about people and what’s interesting about events. It’s not that Microformats try to say too much or inessential things. It’s that they address very different use cases. The Microformat vocabularies for people and events, hCard and hCalendar, aren’t from-scratch vocabulary designs. Instead, they are ports of the IETF vCard and iCalendar vocabularies to the Microformat way of layering data on top of HTML.

The purpose of vCard is to enable the exchange of address book entries as “electronic business cards” between address book apps that people use to track their phone or email contacts. The purpose of iCalendar is to enable the interchange of event streams in calendaring applications that people use to remind themselves of meetings, dentist appointments and things of that nature. As you might expect, vCard is concerned about people’s addresses and iCalendar is concerned about the repetition rules for weekly meetings, etc.

Now, if you look at the schema.org vocabularies about people and events, the search engines seem to have very different ideas of what’s interesting. For example, events come with a property offers that is explained as “An offer to sell this item—for example, an offer to sell a product, the DVD of a movie, or tickets to an event.” Who on their right mind would care about that in a format that’s about recording meetings in a personal or even a work team calendar? The events also come with a whole collection of subtypes of events. For personal calendars, people generally write some kind of event title and don’t spend time telling their calendar app if the event is a ComedyEvent or a LiteraryEvent. Likewise, you wouldn’t record the death date of a person in the phone directory on your phone. You’d probably remove the person’s entry after getting over the grief and realizing that you aren’t going to call dead people.

This doesn’t explain, though, is why schema.org reinvented the wheel even for those properties of people and events that their use cases and the use cases of vCard and iCalendar have in common.

And this brings us to the Distributed Extensibility meme. For some people, Distributed Extensibility is just a way to sell Namespaces in XML in disguise: a set of requirements that Namespaces in XML magically happen to match. One might say that Distributed Extensibility is to Namespaces in XML what Intelligent Design is to Creationism. To generalize beyond Namespaces in XML though, Distributed Extensibility means that the base language provides a mechanism for uncoordinating parties to mint identifiers in such a way that identifiers from different parties don’t collide. This is in contrast to having to go to the WHATWG, the W3C HTML WG or the Microformat community to coordinate.

Microdata has Distributed Extensibility for item types and the schema.org partners have exercised Distributed Extensibility to mint their stuff without coordinating with another group before saying how things are going to be. I’m not a fan of Distributed Extensibility, but the idea has quite a bit of support in the W3C circles. Schema.org being prepared behind closed doors is Distributed Extensibility working as designed. If you don’t like it in action now, please remember your current feelings when to topic pops up again in at the W3C.

As I mentioned in the “Background” section, one of the key insights of Microdata is that the properties of an item are tightly coupled with the type of the item, so the properties don’t need to be orthogonal with the item type and applicable to items of any type. As a result, Microdata does not have Distributed Extensibility on the property level within a given item type. Thus, when a pre-existing vocabulary has an item type http://microformats.org/profile/hcard for people and organizations, it makes it look like the Microformats community owns the whole space of property names on that type of item. If you are preparing schema.org in secret, it may look like the right thing to do to create your own item types instead of sticking some new properties onto someone else’s item type.

Now, RDFa advocates might conclude that RDF(a) is better, because properties have Distributed Extensibility, too, so anyone can add non-colliding properties in secret while working with well-known object type (e.g. from the RDF port of vCard). My preference is in the opposite way. I’d would have like the Microdata item types not to have been URLs. I’d have wanted to make the item types short tokens centrally allocated by e.g. the Microformats community when implementors express interest in new features. I would prefer the social norm for extending Web languages to be that the would-be extender comes to a group that manages the Web-wide vocabulary and asks for new tokens to be minted. This would give interested parties the chance to catch questionable designs sooner. At least companies that value openness should be OK with working with that kind of public token allocation structure.

But with coordination, there’s the problem of the community disagreeing with what an implementor might want to do and the problems of delays for bikeshedding. Working behind closed doors might be seen as an advantage. At least you don’t need to explain what you are doing and why on a daily basis until you’ve come out in the open.

Still the appearance that schema.org is an ugly case of NIH makes the vocabulary side look bad even if one likes the choice of framework. It would be interesting to hear from the designers of the schema.org vocabularies why they didn’t reuse Microformat vocabularies where applicable and extend them in the open. In their FAQ, they just reference the FOAF and GoodRelations RDF vocabularies.

Is Microdata Now a Success?

I think it would be premature to declare Microdata a success now even though the major search engines choosing Microdata over RDFa or Microformats (after even supporting the latter two first) is clearly a very visible blow against RDFa and Microformats.

First, one should not jump to the conclusion that the schema.org partners are implementing Microdata correctly. That Facebook, Google or Yahoo! used the property attribute didn’t mean they were implementing RDFa per spec. Therefore, one shouldn’t conclude that just because the schema.org partners use the itemprop attribute that they are implementing Microdata per spec. Currently, I don’t have enough evidence to say how correctly they are implementing Microdata. I encourage others to test things and to find out.

Second, the whole thing is just getting started and could very well just flop. The main reason why someone would use the schema.org vocabularies is SEO. This means that sites that have useful content on topic covered by the schema.org taxonomy might not bother to participate if they aren’t desperate for SEO. On the other hand, parties who are desperate for SEO might publish less than truthful data if the metadata layer doesn’t have effects in browsers that would make pages with bogus metadata look tacky to humans viewing pages in a browser. The general concept of having Web authors add metadata that has no effect for browsing but has an effect for search might be doomed regardless of the choice of vocabulary or framework.

Addendum: Other Stuff to Read

Microformats vs RDFa vs Microdata: Philip Jägenstedt compared Microformats, RDFa and Microdata almost two years ago – long before the schema.org announcement.
Lessons for Microdata from schema.org: Jeni Tennison looks at the errors (per Microdata spec) made by schema.org
Schema.org and the Responsibility of Monopoly: Jeni Tennison finds parallels between IE’s unspecified behaviors and unspecified behaviors of Google’s schema.org markup consumption code. Great post, but I disagree about the fairness and neutrality of standards orgs.