Russinovich: A possible cure for exploitable heap corruption in Windows 7
The key to a huge plurality, if not a majority, of exploits that have plagued Microsoft Windows over the past two decades has been tricking the system into executing data as though it were code. A malicious process can place data into its own heap -- the pile of memory reserved for its use -- that bears the pattern of executable instructions. Then once that process intentionally crashes, it can leave behind a state where the data in that heap is pointed to and then executed, usually without privilege attached.
Yet it doesn't take a malicious user to craft a heap corruption. Multithreaded applications that make use of collective heaps become like multiple users of a single, distributed database. Without intensive methodologies to maintain vigilance, making sure one thread doesn't corrupt an application's heap for all the other threads, the app collapses into something more closely resembling the more colloquial meaning of the metaphor "heap." Microsoft would like to present its development environments and runtime frameworks as providing these vigilance services on behalf of the developer, so she can concentrate on her application. But in recent years, what developers don't know about what's going on under the hood, has come back to bite them.
Bolt-on patches to the heap corruptibility problem have historically been, in a word, pathetic. The best that security software can generally do is wait for certain patterns of corruption to appear, and then act when the corruption happens -- which is a bit like waiting for the underpants bomber to catch fire before tightening airport security. No longer wasting time crafting ingenious hacks, malicious users have devolved into an industry of slackers, waiting for Microsoft to release the latest bulletin on heap corruption patterns before working to build exploits around them.
So the malicious user's worst nightmare is Mark Russinovich, the SysInternals engineer hired by Microsoft to improve system reliability. Last November, Russinovich triumphantly introduced developers at the company's annual PDC conference in Los Angeles to a multitude of measures implemented in Windows 7 and Windows Server 2008 R2 not only to improve reliability and harden security, but to overcome the deficiencies he openly admits characterized the brief era of Windows Vista. Collectively, just the introductions to these new features by Russinovich and his partners consumed 11 hours over the first two days, all of that time with a standing-room-only crowd.
Though they were just bullet points in Russinovich's long programs, history may record the most critically important improvements made to Windows 7 -- more important than any round of Patch Tuesday fixes we've ever seen or may ever see -- address the heap corruption dilemma. To the degree to which the heap is less corruptible, the operating system is less exploitable. And Patch Tuesday ceases to be a headline.
You start by improving the OS
On its surface, it might not appear that the addition of something called the Unified Background Process Manager might lead to Windows' most important security and reliability improvement to date. That's because it's not a security bolt-on. Rather, it begins with the simple goal of reducing the number of concurrently running processes in Windows.
As Russinovich explained to his audience, a lot of the stuff that made Vista and its predecessors slow were services hanging around in memory, waiting for an excuse to do something useful. But because they had to be active to react when the time was right, they had to be running, even if they were doing nothing.
"One of the overheads of just running the operating system, are these background services and scheduled tasks, where some of these services are not really providing anything useful until certain things happen," he said. "In the previous design of the Windows service model, if you were writing a Windows service, you said, 'My service really needs to be running if a Bluetooth device gets added to the system.' What you had to do is have your service be an auto-start service, and then watch for the arrival of the Bluetooth device. There's lots of examples like that where a service is really doing nothing other than waiting around for something to do."
Ironically, Windows 2000 had introduced something called Event Tracing for Windows [ETW], ostensibly to drive the Task Scheduler. That enabled systems to trigger backups for certain times, for example. So a triggering mechanism had actually been in place in Windows since before the turn of the decade. Now there was a new excuse to actually use it.
"The Unified Background Process Manager is a component that's implemented in the Service Control Manager, SERVICE.EXE -- we didn't want to introduce another process for this -- [that] serves as the central registrar for what we call triggers. A trigger is an ETW event [that] can be registered to start something or to stop something. So statically, in the definition for your service, you would say [for example], 'My service only needs to be running if a Bluetooth device is ready,' [and] you would specify that ETW event as your trigger. UBPM then registers as an ETW consumer for that event, turning on the Bluetooth device arrival ETW provider, and watching for that ETW event to come through. When it sees that event come through, it's going to go start your service."
Handset owners: What this means is the potential for drivers that don't have to waste time and energy in Windows waiting for your phone to ping the Bluetooth receiver.
"You can also specify events that will stop your service, and it will watch for those as well," Russinovich added. "The nice thing about having UBPM, besides your service not having to run -- it being a central registrar -- is that if multiple services or components are registered for the same trigger, UBPM is the only one that is watching for that event, and multiple things don't need to do it."
If you're wondering when this article stopped being about preventing heap corruption, it didn't -- keep following. One of the processes that had typically been hanging around Vista waiting for something to happen and consuming CPU cycles, was the Windows Error Reporting (WER) service. In Win7 and WS2K8 R2, that service now uses UBPM. Therefore, the crash is the ETW trigger that wakes up the new WER. The new WER doesn't have to spend its time analyzing the system in wait for a crash to happen -- it knows a crash is happening, otherwise it wouldn't be running.
With the overhead of crash detection removed, the new WER now has more opportunity to use its valuable time analyzing the causes of crashes as they happen. And with that data, Russinovich and the reliability engineers discovered some extremely valuable information.
"When we look at the root causes of crashes that come into Windows from applications to the online Windows Error Reporting...we saw that roughly 15% of all user-mode crashes are caused by heap corruption," he reported. "And if you look at the shutdown, 30%, roughly one-third, of crashes during the shutdown path -- when your program is shutting down and cleaning up -- are caused by heap corruption."
Next: "The Big Fix"...