What do we know about bad bots?

By Stephen Singam
Published 7 years ago

virus spreading network devices nodes connected Internet of Things malware hacked bot

In 2016, approximately 185 million new Internet users went online, with the vast majority of these coming from nations like India. This represents a huge increase in the market. However, while the Internet population continues to grow, there has also been an increase in bots as well. The word "bot" covers a wide variety of automated programs: while some source data for search engines and help people match their queries with the most appropriate websites, others are not so helpful.

In the past year, bad bots accounted for 19.9 percent of all website traffic -- a 6.98 percent increase over the same time in 2015. Bad bots interact with applications in the same way a legitimate user would, making them harder to prevent. However, the results are harmful: for example, bad bots can take data from sites without permission while others undertake criminal activities such as ad fraud and account theft.

Bots enable high-speed abuse, misuse, and attacks on websites and APIs. They enable attackers, unsavory competitors and fraudsters to perform a wide array of malicious activities, including web scraping, competitive data mining, personal and financial data harvesting, brute force login and man-in-the-middle attacks, digital ad fraud, spam, transaction fraud, and more.

The bad bot problem has become so rampant it has earned its first piece of US federal legislation. In an attempt to make the use of ticket scraping bots illegal, the US Congress passed the Better Online Ticket Sales Act. Similarly, governments in the UK and Canada are also looking at introducing new laws to stop automated ticket purchasing by bots. While legislation is a welcome deterrent, it’s difficult to legislate against those you can’t identify. Bad bots continue to exist under the radar and they are looking to stay.

What does the data say?

Using our network, we looked for trends in how bots are developing, including hundreds of billions of bad bot requests, anonymized over thousands of domains. As part of this, we focused on bad bot activity at the application layer as these attacks differ from the simple volumetric Distributed Denial of Service attacks that typically grab the headlines. Here are some of our top findings:

1. Bigger site? Bigger target

Bad bots don’t sleep -- they’re everywhere, at all times. But even though bad bots are active on all sites, the larger sites were hit the hardest in 2016. Bad bots accounted for 21.83 percent of large website web traffic, which saw an increase of 36.43 percent since last year.

Larger sites are generally ranked higher in search engine results because humans rarely look past the first few search engine results. Smaller sites don’t get the same level of SEO traffic uplift so large and medium sites are more enticing targets for bad bots.

2. Bad bots lie

Bad bots must lie about who they are to avoid detection. They do this by reporting their user agent as a web browser or mobile device. In 2016 the majority of bad bots claimed to be the most popular browsers: Chrome, Safari, Internet Explorer, and Firefox. Chrome was at the top spot.

Alongside this, there was also a 42.78 percent year-over-year increase in bad bots claiming to be mobile browsers. For the first time, mobile Safari made the top five list of self-reported user agents, outranking web Safari by 17 percent.

3. If you build it, bots will come

When it comes to the attractiveness of a website, bad bots have a type. There are four key website features bad bots look for:

Proprietary content and/or pricing information
A login section
Web forms
Payment processors

In 2016, 97 percent of sites with proprietary content were hit by unwanted scraping, 96 percent of websites with login pages were hit by bad bots, 90 percent of websites were hit by bad bots that bypassed the login page, and 31 percent of websites with forms were hit by spam bots.

4. The weaponization of the data center

Data centers were the weapon of choice for bad bots in 2016, with 60.1 percent coming from the cloud. Amazon AWS was the top originating ISP for the third year in a row with 16.37 percent of all bad bot traffic -- four times more than the next ISP.

But why use central data centers rather than the traditional "zombie" PC that is part of a botnet, which is more typically used for DDoS attacks? The answer here is that it’s never been easier to build bad bots with open source software or cheaper to launch them from globally distributed networks using the cloud. These data centers can scale up faster and more efficiently for bot attacks on application layers, while steps like masking IP addresses has become easy and essential within bot deployments. This centralized approach is easier to manage when it comes to fraud and account theft campaigns.

5. Out of date? Out of luck

Humans aren’t the only ones falling behind on software updates; it turns out bad bots have the same problem. One in every ten of bad bots said they were using browser versions released before 2013 -- some were reporting browser versions released as far back as 1999.

But why are bad bots reporting as out-of-date browsers? Perhaps some were written many years ago and are still at work today. Some may have been targeting specific systems that only accept specific browser versions. Others may be have been out-of-control programs, bouncing around the Internet in endless loops, still causing collateral damage.

6. The continuing rise of advanced persistent bots

In 2016, 75 percent of bad bots were Advanced Persistent Bots (APBs). Today’s advanced persistent bots are more sophisticated as they can load JavaScript, hold onto cookies and load up external resources -- this makes them more effective in their attacks. Similarly, bots can carry out obfuscation techniques to randomize the IP address, headers, and user agents associated with their activity. This helps them to hide in the noise of everyday activity.

APBs can carry out highly progressive attacks, such as account-based abuse and transaction fraud, which require multiple steps and deeper penetration into the web application. If you’re using a web application firewall (WAF) and are filtering out known violator user agents and IP addresses, that’s a good start. However, bad bots rotate through IPs and cycle through user agents to evade these WAF filters. You’ll need a way to differentiate humans from bad bots that are using headless browsers, browser automation tools, and man-in-the-browser malware campaigns.

7. Is the USA the bot superpower?

The US has topped the list of bad bot originating countries for the third year in a row. In fact, the US had a larger amount of total bad bot traffic (55.4 percent) than all other countries combined. The Netherlands generated 11.4 percent of bad bot traffic and was the next closest country, while China reached the top three for bad bots for the first time. South Korea made the biggest jump, up 14 spots from 2015.

But does over half of all cybercrime really come from US citizens? A spammer bot might originate from a US data center, but the perpetrator responsible for it could be located anywhere in the world. Thanks to virtual private data centers such as Amazon AWS, cyber crooks can leverage US-based ISPs to carry out their attacks as if they originated inside America and avoid location-based blocking techniques.

What can you do about bots?

As much as they try to hide their activity, there are some results from bad bot attacks that can be noticed. Normally, these results may not be explained within traditional monitoring tools. For example, you can tell significant volumes of bad bot traffic when unexpected spikes in traffic cause slowdowns without a concomitant increase in sales traffic. Another example might be where your site’s search rankings plummet due to content theft and data being scraped. Similarly, you might see poor results from misguided ad spend as a result of skewed analytics.

Other pointers to bad bot activity might be that your company sees high numbers of failed login attempts and increased customer complaints regarding account lockouts. Bad bots will leave fake posts, malicious backlinks, and competitor ads in your forums and customer review sections.

In order to filter out bad bots, it’s worth taking the time to learn about the most attractive areas of your website and find out if they are all properly secured against bots. One way to choke off bad bots is to geo-fence your website by blocking users from foreign nations where your company doesn’t do business.

Similarly, it can be worth looking at the audience profile for your customers -- is there is a good reason why users would be on browsers that are several years and multiple updates past their release date? If not, having a whitelist policy that imposes browser version age limits stops up to 10 per cent of bad bots. Also consider if all automated programs, even ones that aren’t search engine crawlers or pre-approved tools, belong on your site. Consider setting up filters to block all other bots -- this can block up to 25 percent of bad bots.

The best way to deal with bots is to monitor and respond on all your web and mobile traffic in real-time so that you see the next bad bot attack coming and stop it in its tracks. This approach relies on using more intelligence and automation to spot activities -- rather than relying on human oversight of analytics logs, security can be maintained through better use of data and machine learning over time.

Stephen Singam is MD of Security Research at Distil Networks.

Photo Credit: fotogestoeber/Shutterstock

1 Comment

What do we know about bad bots?

One Response to What do we know about bad bots?

Recent Headlines

Politically motivated DDoS attacks on the rise

TCL 50 XL 5G Android smartphone hits Metro by T-Mobile: Big features, small price

The increasing sophistication of synthetic identity fraud

The NIST/NVD situation and vulnerability management programs

How AI will shape the future of the legal industry

Start menu ads are rolling out to all Windows 11 users -- here's how to turn them off

Qualcomm introduces Snapdragon X Plus for Windows PCs

Most Commented Stories

Say goodbye to Microsoft Windows 11 and hello to Nitrux Linux 3.4.0 'pl'

The stunning Windows 13 -- yes, 13! -- is the Microsoft operating system we want

Microsoft 'improves' Windows 11 by bringing ads to the Start menu in the US

Microsoft is up to its old tricks yet again -- Windows 10 users harassed with full-screen Windows 11 upgrade warnings

Outrageous: Microsoft to charge $61 for Windows 10 updates -- consider switching to Linux!

Windows 11 slammed for its 'comically bad' performance even on high-end hardware

Microsoft releases preview version of Office 2024 for Windows and macOS -- download it now!

Easter giveaway! Get a licensed copy of 'VideoProc Converter for Windows/Mac' (worth $78.90) for FREE