Google's next search engine: What's the difference?
Yesterday, without much explanation or instructions, Google opened the floodgates on what it's describing as the next generation of its search engine, most likely to test its efficiency and performance using real-world traffic. Testers are being invited to sample the new engine that Google is calling "Caffeine," although perhaps intentionally, it isn't yet explaining just what the differences are.
In Betanews' initial tests Tuesday morning comparing Caffeine to Google's current stable release, we noticed that for nearly every simple and complex search query we tried, the top three non-paid search results were always the same. But the order of results starting as high as #4, sometimes #6, changed. Usually Caffeine retrieved the same pages as the stable version, but shuffled them in a different order.
For instance, with a query that used to stump search engines that couldn't make sense of special punctuation, "virtual function" C++ C#, we would expect to find entries that compared the use of virtual functions in two classes of programming: traditional C++ and Microsoft's C#. For this query, the first four results retrieved were the same for Caffeine as for the stable version (which I'll call "Stable" for short). But Caffeine swapped the order of entries #5 and #6: an independent article by Jordan Leverington from devarticles.com, and an MSDN article by Microsoft support engineer Rakki Muthukumar, respectively. Caffeine placed the independent article higher.
Caffeine also included a separate grouping of entries taken from Google Groups (posts on Usenet forums), which Stable omitted. Caffeine then rated the text of a discussion forum on the subject much higher -- #7 rather than #10.
Unless you believe in conspiracy theories (and I tend not to), the logic of this organizational shuffling isn't yet self-evident. So for our next trial, we tried an intentionally vague query related to something that's been in the news lately: folks who show up at political town hall meetings and try to out-shout the speaker. Without any punctuation and without much specificity, our test query is shouting town hall.
For both engines, the retrieved Google News entries at the top were identical -- evidently Caffeine's new algorithms do not extend to Google News (or to any other Google department). And again, the top three entries retrieved by Caffeine and Stable were identical, with a Fox News story showing up as #1, and a CBS News story as #3.
The #4 items were different, although they came from the same source: the political blog TalkingPointsMemo.com. Both were about the strange trend of congressmen being shouted down by onlookers at public events. But surprisingly, of the two stories the engines pulled up, it was Caffeine that pulled up the older story (August 3 versus August 5 from Stable); and it was the Stable version that included an expansion box enabling more results from the same blog.
While the remainder of Caffeine's Page 1 entries included a YouTube video that didn't appear on Stable's Page 1, Stable pulled up this story from a St. Louis Fox affiliate of a shouted-down town hall meeting from last week, conducted by Rep. Russ Carnahan (D - Mo.) headlined "Carnahan Town Hall Turns Into Political Shouting Match" -- certainly fitting the criteria -- rating it #8; while Caffeine rated the same story #32.
This one is something of a puzzle, because all three of our search query words appear squarely in the story's headline (although "shouting" was not in the URL). And since it fit the subject matter, Caffeine should have had good reason to rate it high. In an effort to discover why, we mixed the order of the terms to town hall shouting, so that Google's interpreter would pair the first two terms rather than the second and third. As expected, we received different results. The second time around, with Caffeine, a different TalkingPointsMemo.com story (but still August 3) appeared as #4, and the one that had been #4 just a few minutes ago had been bumped down to #7. The order of stories appearing from #5 on had changed. Meanwhile, the same reorganized query in the Stable version bumped the Carnahan story down one spot to #9, whereas Caffeine bumped it down to #39.
In other words, our promoting the pairing of "town hall" in the query demoted a story that should ring alarm bells for that very context, more so for the test version of the search engine than for the stable version.
Next, the tie-breaker: How both engines handle a misspelled query…