Microsoft data leaks and the importance of open-source intelligence

By Vaidotas Šedys
Published 2 years ago

Interconnected digital technology advances at a rapid pace, and so do the tactics and strategies employed by malicious individuals, criminal groups, and even nation-states. The World Economic Forum predicts global cybercrime will reach $10.5 trillion by 2025, forcing businesses and governments to look for next-generation solutions against emerging digital threats.

Unfortunately, deliberate criminal activity is only part of the challenge in this data-driven era. Costly leaks of sensitive data might happen due to simple human errors -- in September, Microsoft’s data was leaked two times, not only disclosing the company’s plans for the next-gen Xbox but also exposing private employee data. As we already know, at least one of these events happened due to an accidentally misconfigured URL link.

October is Cybersecurity Awareness Month, so it is a perfect time to ask how businesses could improve their cyber resilience. Raising public awareness, educating employees, and implementing standard security measures (such as data encryption, multi-factor authentication, or routing traffic through VPNs) are good recommendations for increased organizational security. However, they are hardly enough today if one does not employ open-source intelligence.

What is open-source intelligence?

Open-source intelligence, or OSINT, defines the efforts of collecting, analyzing, and utilizing information from publicly available web sources, including forums, libraries, open databases, and even the dark web. Though OSINT can be used to gather commercially important business information and perform market analysis, at Oxylabs, we usually use it in the context of cyber threat intelligence.

Cybersecurity companies that employ open-source intelligence crawl through thousands of sites, forum messages, and dark web marketplaces, looking for stolen personal credentials and other confidential information, such as source code or trade secrets. Monitoring these sources also helps identify insecure databases and domain squatting.

It might sound counterintuitive, but organizations often do not suspect that some of their sensitive data is lurking somewhere in the open cyberspace. As such, OSINT helps organizations find both unintentional data leaks and criminal data breaches. It can also aid in identifying insecure devices and outdated applications.

The breakthrough that OSINT brings to the cybersecurity landscape mostly comes from the fact it uses publicly available information, releasing cybersecurity organizations of a legally troubling necessity to scour through classified or restricted sources looking for criminal evidence. Moreover, modern data scraping solutions, combined with artificial intelligence (AI) and machine learning (ML), allow them to pull and analyze raw cyber intelligence in real time.

OSINT "starter" pack

To gather cyber threat intelligence, cybersecurity providers must scan thousands of URLs looking for specific client data -- it can be corporate email addresses or phone numbers, company names, employee information, and technical details, such as access tokens or IP addresses. The company can be instantly alerted whenever compromised data becomes available in the public domain or the dark web.

It is important to note that companies might monitor not only data directly related to their business and employees but also their client data, alerting them in case their passwords or other sensitive information has been breached.

The biggest challenges here are those of scale and anti-scraping measures. First of all, the global "surface" web hosts about 6 billion websites, which is only the tip of the iceberg. The deep web, which isn’t indexed by search engines, is estimated to be 400 to 550 times as large. Scraping at such a scale requires powerful automation and ML-driven solutions to structure otherwise a massive mess of unstructured data that comes in various formats and languages.

Furthermore, threat actors today are technically advanced professionals, employing anti-bot measures that can include anything from honey-pots serving erroneous data to IP blocking that compromises real-time data flow. It means that cybersecurity companies have to employ resilient proxy networks together with adaptive scraping solutions to circumvent the blocks. With this in mind, it is well worth leaving OSINT efforts for cybersecurity professionals, especially if it involves monitoring the dark web.

Diving into the dark

The dark web is a part of the deep web that is inaccessible to ordinary browsers and hidden by multiple proxy layers. Although there are legitimate actors that use this part of the internet, e.g., investigative journalists, law enforcement actors, and intelligence agencies, the dark web is mostly employed by criminals. This is where stolen private data, intellectual property, confidential information, drugs, and illegal weapons are sold.

As in the case of the surface web, dark web monitoring is performed with the help of custom crawlers and scraper bots. Surveilling the dark web is a valuable source of information about fresh data breaches and new cyber attack methods and vectors. It enables a faster incident response, closing the time gap between the data breach and the moment an organization becomes aware of it. For cybersecurity researchers, dark web monitoring also allows deep-diving into the newest cybercrime strategies.

However, even if your organization suffered a breach, it is definitely not recommended to scour the dark web looking for that data yourself -- firstly, the dark web is difficult to navigate without prior experience. Secondly, even if you’re armed with proxy servers and VPNs, the risk of exposing your organization to malware and cyber attacks is still high. Therefore, it is always recommended to use "burner computers" for such tasks instead of devices connected to your corporate network.

Final recommendations

Powered with modern scraping solutions and ML technology, open-source intelligence today allows cybersecurity companies to take a proactive approach to incident management and prevention. OSINT speeds up the detection of data leaks, cyberthreat hunt, and research on the newest criminal strategies.

However, it is important to stress that, although becoming an imperative for cybersecurity, OSINT cannot and shouldn’t replace standard security measures. Businesses should first of all ensure their sensitive data is actually safe. Removing unused access, updating passwords, using multi-factor authentication, working with reliable proxy and VPN providers, and periodically educating employees is the best way to make sure that your business data doesn’t end up as a Black Friday deal on some dark web marketplace.

The same applies to the recent hype around monitoring the dark web. Without denying the opportunities the dark web surveillance opens up for professional cybersecurity researchers and threat hunters, for ordinary businesses out there, pulling valuable information from the surface web and integrating digital security best practices and standards into daily operations might be a more rewarding path to follow.

Image credit: liorpt / depositphotos

Vaidotas Šedys is the Head of the Risk Management department at Oxylabs, a market-leading web intelligence solutions provider. Having extensive experience in payment and digital risk management, Vaidotas established himself as an influential force in the online web data gathering industry, employing innovative methods to ensure the most ethical and secure SaaS business processes. Currently, Vaidotas is leading a team of 9 professionals that is successfully overseeing risk-vulnerable areas of business operations and countering emerging threats.

No Comments

Comments are closed.

Microsoft data leaks and the importance of open-source intelligence

What is open-source intelligence?

OSINT "starter" pack

Diving into the dark

Final recommendations

Recent Headlines

Google is giving Gmail and AI boost with new Gemini-powered features

Satechi announces Slim EX wireless keyboards and mouse for multi-device use

Kodi downloads are going offline next week and installs will fail -- but don't panic

77 percent of successful email attacks impersonate trusted platforms

Ashampoo Burning Studio 2026 usually costs €30, but you can get it free

NordPass launches Authenticator for personal accounts

Developers don’t trust AI-generated code

Most Commented Stories

Ashampoo Burning Studio 2026 usually costs €30, but you can get it free

Anna’s Archive has its main domain suspended

DuRoBo launches Krono, an Android-based ePaper hub for reading and writing

TikTok GamePlan brings new power to sport fans

Why network issues are holding back enterprise deployments [Q&A]

NuraLogix's Longevity Mirror uses a 30 second selfie to predict your future health

Google is giving Gmail and AI boost with new Gemini-powered features

Gmail set to drop POP3 mail fetching from other accounts

Why Trust Us

NEWS

UNITED STATES

UNITED KINGDOM

CANADA