Microsoft Seeks to Stop Search Spam
Microsoft Research has embarked on a new project to automatically seek out search engine spam before it can be used to defraud advertisers on MSN, Yahoo and Google. Called Strider Search Defender, the tool combines two other projects from MSR: Strider Honey Monkey and URL Tracer.
The effort is being headed up by researcher Yi-Min Wang and focuses on a major problem now plaguing the Web: blog spam. The basic premise of Strider Search Defender is that spammers utilize what Yi-Min calls "doorway pages" -- sites at reputable hosts and blog services. The doorway pages pull ads from a "target page" operated by the spammer.
Instead of reading the actual content of a page to see if it could be classified as spam, Microsoft is taking a context-based approach that analyzes URL redirection. Because many Web sites will use redirection to serve up different pages to search engines and humans, this methodology could prove more effective.
In addition, Yi-Min notes that large-scale spammers create hundreds or thousands of doorway pages the either redirect to or retrieve ads from a single domain. By finding these target pages that are connected to a large number of doorways, an entire spam operation can be stopped in a single pass.
In order to accomplish this goal, Strider Search Defender starts by using the Spam Hunter to feed a list of known spam URLs to search engines in order to find forums, blogs and other pages where more such spam links are located. It then compiles those links into a single potential spam URL list.
Next, that list is fed into Strider URL Tracer to find which domains are associated with a high volume of doorway pages. False positives are reduced by checking the URLs against a whitelist of legitimate ad and Web analytics providers that were compiled through the Strider Honey Monkey project.
According to Yi-Min, the more a spammer spreads a URL, the easier it is for Spam Hunter to find. And once a forum for spam is identified, it essentially becomes a "Honey Forum" to obtain other spam URLs. If a spammer has a large number of doorway pages, the higher priority they become for manual investigation.
"Yi-Min has been working closely with the MSN Search team to share the results of his spam Web page research," a Microsoft representative told BetaNews. "The Search team has been actively pursuing his leads, and if they are indeed spam pages, they will be either removed from the search index or assigned a low relevance ranking."