Office 2007 no longer measures up to OXML standard, says consultant

With the myriad changes that had to be made to DIS 29500 before it could be approved by three-fourths of the ISO subcommittee's voters, there was a very high chance that by the time Microsoft saw its offspring once again, it wouldn't recognize it.

As a consultant for conformance testing agency Griffin Brown confirmed last Thursday, indeed, Office 2007 may require an upgrade before it can say it faithfully adheres to an international standard.

The job of the Griffin Brown consultancy is a strategic consultant and implementer of XML-based systems -- which means, among other things, it helps clients plan for information efficiency through conversion to, or implementation of, XML. But another thing it does is evaluate the adherence of an XML-based implementation to the standard that describes it, and no more important such standard currently exists than ISO/IEC 29500, which explains the Open XML formats.

To become a standard, Open XML had to embrace more existing standards than Microsoft had originally intended for it to. Among the more obvious examples are how it represents dates and times, and how it enumerates colors. To win support from skeptics, Microsoft had to promise that these features, among others, were capable of being changed. In so doing, they enabled alterations to the standard as recognized by the Ecma organization, resulting in a specification of a format to which, for now, Office 2007 doesn't adhere.

There are two models of the current ISO standard, one which is a strict interpretation, and another which more closely resembles the Ecma standard called the transitional interpretation. The latter is just what its name implies: a way for implementers to adhere to the basics of the standard while moving toward the strict interpretation.

As reported by the firm's Alex Brown -- who incidentally was a convener of the ISO's Ballot Resolution Meeting on the matter -- he received the technical specifications for both models from a colleague, stored using an OASIS standard XML schema called RELAX NG. He then used a Java-based validator called Jing to determine how many non-conformities there were between Microsoft's Open XML format, specified the same way, and the ISO models.

"The expectation is that existing Office 2007 documents might be some distance away from being valid according to the strict schemas," Brown wrote. "Sure enough, jing emitted 17 MB (around 122,000) of invalidity messages when validating in this scenario."

But most of those messages, Brown found, were mainly about the same thing -- to his surprise. And also as he expected, adherence with the transitional model was much closer: 84 rather than 122,000 +, with most of those having to do with the fact that ISO prefers the terms "TRUE" and "FALSE" to imply binary states, rather than Microsoft's occasional "ON" and "OFF."

In a blog post yesterday, Microsoft developer Doug Mahugh saw this latter set of results as good news. "To put that second number in perspective, there were 84 total errors in a document of 60,299,969 characters, which works out to about one error in every 700,000 characters or so," Mahugh wrote.

"Alex's research is an interesting first step in understanding conformance for IS29500," he continued. "Another interesting step may eventually appear in the form of a test suite, a suggestion from Italy and other countries. The existence of such a test would be useful as more implementations become available."

Brown promised to conduct a similar Jing test with ODF, the first ISO/IEC standard for interchangeable documents, and teased the reader a little bit as to whether it conforms as well to its RELAX NG specification. In a recent conversation shared on his blog, Linux Foundation board member and attorney Andrew Updegrove challenged Brown to admit that ODF would undoubtedly be cleaner than OXML, at the very least.

"I'd go with that," Brown conceded. "I think ISO/IEC 26300 (ODF 1.0) can be compared to a neat house built on good foundations which is not finished; 29500 (OOXML) is a baroque cliff-side castle replete with toppling towers, secret passages and ghosts: it is all too finished."

