Microsoft confirms it's buying semantic search provider Powerset
A few weeks ago, Microsoft denied it would be making any big purchases in the wake of its failed "hybrid" bid for Yahoo's search business. That's assuming that Powerset isn't a big purchase...and it very well might be.
Confirming rumors traded among the major blogs last week, as well as information Microsoft refused to comment about for BetaNews on Thursday, the company said today it is indeed purchasing San Francisco-based semantic search tools provider Powerset for an undisclosed sum.
The purchase will give Microsoft direct access to a complete team of researchers, which it says will remain completely intact, currently devoted to the problem of implementing true natural language search capability and applying semantic logic to search criteria. This is the same problem that has dogged search engine designers for the Web since 1994: With a truly categorically accurate index of all Web textual content likely impossible, how can a search engine glean the true intent of a user's query based on lexical and semantic relationships, see similarities in the logic gleaned from those relationships, and select pages that draw conclusions such that the returned pages directly answer the user's question?
A typical, raw search engine query looks for pattern matches between the search criteria and the indexed content; although Google, Yahoo, Microsoft, and others have continually refined their algorithms so that more information can be gleaned from the criteria, thus improving the relevance of the returned content. However, we know for a fact -- in the same way we know that the Earth has a limited supply of oil, though we keep consuming it anyway -- that such lexical refinements can only go so far.
Microsoft Research has actually devoted considerable resources to this problem since the 1990s, and just last November issued grants to independent researchers willing to find new approaches to refining the still-unexplored art of semantic computing. As the Research division's request for proposals last year read, "To transform raw data into information that is relevant to the information seeker, we need to go beyond string manipulation, and towards [Wired magazine founder John] Batelle's 'database of intentions.' With the advent of large scale text corpora, the cost of developing and maintaining ontologies and a rule-based system was either too high or just inadequate for the type of accuracy, scalability, and adaptability needed for a pervasive task such as Internet search."
But despite that grant, whose objective may have yielded Microsoft great fruit, the Research division's parent decided to buy a semantic search company outright, perhaps because Powerset's technology already exists.
Powerset's demo program -- the key feature on its own home page -- is a query line that leads to a semantic index of content compiled from Wikipedia. In a BetaNews test this afternoon, we gave this index some serious natural-language questions which, based on our prior research on this topic over several years, we know to have stumped other semantic search engines.
Our first example question was, Has the Higgs Boson been verified experimentally? Powerset's #1 response to this question, excerpted from Wikipedia, reads as follows: "Their predicted properties were experimentally confirmed with good precision. ... Does the Higgs boson predicted by the model really exist?"
A close semantic match, but a question responded to by a question nonetheless. With a lexical search engine, such a match would be deemed highly relevant; but with a semantic search engine, grammar is a key element, and questions responded to with questions are often undesired. That's why Powerset's #2 response was so chilling: "The only boson in the Standard Model that is yet to be discovered experimentally is the Higgs boson." Conceivably a perfect semantic response to our first query.
Next, we tried a grammatically correct question that semantic engines have had trouble with because it appears to lack a verb: How many points is a field goal in Australian football?
Now, there are a number of possible trip-ups in this phrase. "Points...is...goal" is how the question would be parsed and diagrammed, which is not very informative since what we're looking for -- the "score" -- is a concept that is actually missing from the question. Next, the index needs to be capable of discerning "Australian football" as different from other forms of football -- which was one of the tests Yahoo applied in its recent improvements to its lexical search.
Again, #1 wasn't the best answer: "Thus, the forty-yard line was analogous to basketball's three point line and Australian rules football's Super Goal. In Arena Football, a field goal scored by drop kick is worth four points." Wikipedia's entry on "football" dealt with American football, which leads to another reason this test is so important: Many published texts refer to European football (soccer), American football, and Australian football as "football" for their native readers, even though "Canadian football" is often distinguished as such specifically.
It was #2 that did the trick: "The primary aim of the game is to score goals (worth six points) by kicking the ball between the middle two posts of the opposing goal. ... Australian rules football."
In a blog post today, Microsoft SVP for search, portal, and advertising Satya Nadella wrote, "Sometimes a result looks relevant from its short description on the results page but turns out to be not so relevant when you visit the actual page. As a result, searchers frequently click results and then rapidly click back when they realize they aren't what they're looking for. These problems exist because search engines today primarily match words in a search to words on a Web page. We can solve these problems by working to understand the intent behind each search and the concepts and meaning embedded in a Web page."
Rather than lay off employees, Nadella added that the Powerset team will actually add new ones in an effort to tackle this problem. Perhaps the recipients of the Research team's 2007 grant would be interested in applying.