Software performance matters

The issue with the soon-to-come generation of Windows mobile computers (tablets) of performance versus productivity, when it comes to software development, looms on all of us programmers who desire to write software for Windows 8. As a longtime Windows API programmer I appear to be in the minority, but I just can't help but ask the question: "Do programmers just not get it?"

I have watched the video of a talk by Microsoft's Herb Sutter entitled "Why C++?" more than once and even though I don't use C++, I just can't help but appreciate his points about the importance of performance, especially when it comes to the next generation of mobile devices.

He discusses the importance of performance from tablet computers to large datacenters. He uses terms like performance per watt , performance per dollar and performance per cycle (how much work can you get from the hardware). He even comments about how in a datacenter, the performance of software can have an effect on up to 80 percent of the total cost to run the datacenter (better software uses less power and requires less hardware).


OK, folks, this guy is an expert in the industry, so don't listen to me. He definitely is on to something here, and I am not convinced that most programmers grasp it fully yet.

The Problem Is Not the Hardware, But the Software

Manufacturers of computer hardware for years have frantically tried to keep pace, pushing each new generation further and further. I can remember developing software for computers, with a 20MB (not a typo) hard drive or just floppy drives and no hard drive at all. Memory counted in kilobytes. A fast CPU ran at 25 MHz.

Let's be honest here. When it comes to computers we have come a long way. Today, CPUs are in the gigahertz range, and they have an onboard cache with more memory than the entire computer had back in those days. Hard drives are now come with terabytes of space. RAM memory is in the gigabytes. So hardware manufacturers, give yourself a pat on the back, because you have done an amazing job of pushing the maximum. It is time programmers stop picking on companies like Intel with their Atom CPU.

The so called "Moore's Law", at some point will reach an end most likely. Even Herb Sutter admits this. At some point performance of software will be what really matters. Programmers just don't seem to get this though. I don't understand it personally. Let me give you an example of how programmers may not fully appreciate how performance is really achieved.


Read the comments to some of my other articles and more than once you will see someone post how the latest managed (.NET) programming languages solve the problem with slow software, by providing the asynchronous calls to object methods (subroutines or functions). The idea is that because Windows is a multitasking operating system and most CPUs today are multicore, that if some part of the software is too slow or takes too long to run, a programmer can just pass this code on to a separate thread (running asynchronously or at the same time) , which is then simply passed on to another core of the CPU and everything is solved. Now you have "fast and fluid" software.

The truth is that such programming techniques simply pass the buck back to the hardware, expecting it to solve the programmers' problems. The technique of using threads is nothing new. Multi-threading has been in Windows for many, many years and experienced programmers knew how and when to tap into multithreading.

Multithreading does not speed up software. It simply forces a computer to do two tasks at one time, simply by switching between them so fast you don't notice the switch. But no matter how you word it, the computer is trying to do two tasks, rather than just one, which means more work, not less and the switching process has overhead, which means even more work.

To illustrate. Consider someone who is a juggler. It is not easy to juggle two balls. But as the juggler starts adding more balls, things start getting harder and harder. Each new ball added significantly increases the complexity of the task. You see a juggler only has two hands and each hand can only handle one ball at a time. The same holds true with computers. No matter how many cores a CPU has, at some point you are handling more tasks than the number of CPU cores you have. Each core can only handle one task at a time. Now a CPU may be optimized to do some things in parallel within itself, but even there it has limits. The point is that just like the juggler, a computer can do only so much work.

Sutter in his talk tries to tell us that performance still matters and will matter even more in the future. Why? Mobile computers, because of their small form factor, have less power and resources, so they have to be used carefully. Computers where form factor is not an issue, like data centers, do more work today, not less, so hardware is being pushed to the limit.

But look how far computer hardware has come in just a few decades. The problem is not with computer hardware failing to keeping pace, but it is with how we design software and our lack of appreciation for using limited resources effectively. Programmers just don't get it. We can't keep blaming the hardware or just keep passing the buck to the hardware. Performance is our problem to solve, but sadly it is a low priority for many.

A Lesson From Old Timers

Programmers who date back to the days of DOS or earlier are often looked upon as relics in this modern age. Some of them still use command line compilers, rather than the latest studio development environment. Whether businesses or the enterprise realize it or not, there is a great asset there that could and should be tapped into. Such programmers likely know what it means to get every last bit of power out of a computer. They know what it means to work with limited memory. They know what it means to count CPU cycles. They may still use a stop watch to time how long some code takes to run.

Such talent is rare today, in my opinion. Programmers, no matter what development language they use, who know how to write software with performance in mind, are valuable to the industry. When I wrote my article about sending programmers to Walmart to buy their computers, few who commented could appreciate the purpose behind it. They just don't get the point. Some of what I was saying was in jest. I don't expect programmers to use low-power computers. The mindset matters -- that performance is important. Whatever it takes, even making a programmer use a less-powerful computer, to help him or her see the importance of performance can be beneficial.

Get Back to the Basics

The programming world may be just so wrapped up in the latest technologies, the latest frameworks, the latest development studios thinking it all makes us more productive, while forgetting what a computer really is and how it really works. Just mention the word "assembler" or "machine language" and some programmers may just shudder! They may think such terms belong in the Dark Ages. Well, guess what? That is how a computer actually works under the hood. Everything we develop using high-level tools eventually will be executed on a computer at the lowest level, which is machine language. What we design using any high-level tool or language, must some how become machine language.

This is why old timers moved away from interpreted languages in favor of compilers. While a scripting language can be useful in some contexts, remember it takes longer for scripting to run than pure machine language. While I don't expect us all to go back to assembler, nor do I feel that scripting languages have no value, it is still important to remember to always count the cost of how we develop software. Productivity is important too, but not at the cost of all performance. Actually productivity tools designed with performance in mind can be very powerful. Even scripting languages can be optimized for speed and performance.

But programmers have to always keep performance in mind.  Just because computers appear to have extra power (or resources) to spare, does not mean that programmers should not care about developing fast software that uses minimal resources. Maybe we just need to get back to the basics. Performance matters.

Photo Credit: Jose Gil/Shutterstock

Chris Boss is an advanced Windows API programmer and developer of 10 year-old EZGUI, which is now version 5. He owns The Computer Workshop, which opened for businesses in the late 1980s. He originally developed custom software for local businesses. Now he develops programming tools for use with the PowerBasic compiler.

27 Responses to Software performance matters

  1. Matt Johnson says:

    Thanks. I've had these discussions/debates with my friends and I tend to agree with you.

  2. Adas Weber says:

    I think programmers do get it.

    When WP7 first launched, many progarmmers simply tried to port their existing desktop Silverlight applications to WP7, only to find that they ran crap on the phone! The solution was to rewrite the applications and optimise for performance.

    That said, many modern mobile devices are optimised for specific multi-threading scenarios, in particular splitting the workload between the GPU and the CPU. And Microsoft even explains how to utilise this technique efficiently in order to squeeze the maximum performance from an app.

    • chrisboss says:

      You are definitely right about data connectivity being a bottleneck. I do think though that Herb Sutter touches on this when he says that performance at data centers trickles down to all the deviced in between from the mobile device to the data center. The internet is over worked. I have decent DSL, but half the time my PC is waiting for the data . Windows XP has a nice little feature, where the tray icon blicks when data is being sent or recieved. Half the time when I browse the web, no data is moving. The servers on the other end take too long to respond back. So performance on the server end in data centers is critical to the whole thing. So once again, Herb Sutter is correct when he says that performance matters.

      • olivierc says:

        What make you think that the lag are because of the server that is busy ? This make no sense. 

  3. Jeremy Collake says:

    The premise is true, though many programmers make the mistake of assuming C++ is compiled to native code. In fact, Microsoft has created a formulation of C++ that allows it to be compiled to MSIL (.NET), so you should say 'native C++', IMHO.

  4. Jeremy Blosser says:

    Wow -- imagine my surprise that you have no concept of how multithreading actually works.

    But hey, why stop at C++? Why aren't all those lazy programmers writing hand-coded assembly optimized for all the SIMD instructions!? Oh, and make sure that we are all offloading what we can to GPUs too. Because all that processing power is being wasted...


    Here is where you really show your naivete though:"Let me give you an example of how programmers may not fully appreciate how performance is really achieved -- Asynchronous" If you knew anything about .NET, you'd understand that the Async keyword is just syntactic sugar over techniques that are already heavily leveraged in well programmed .NET programs. It is a nice way to keep the flow of logic for asynchronous operations together in a method call. It isn't some weird magic mumbo-jumbo multithreaded "thingie" that didn't exist in .NET until now -- it is just an easier way to organize your multithreaded code. (Behind the machines it rewrites the code into a state machine)--------Here's the thing -- Multithreading is incredibly useful, even on a single-core processor. The reason is that in a vast majority of programs, the bottleneck isn't the CPU, the bottleneck is IO. So, while the CPU is waiting on bits to come in via the network card, hard disk controller, memory controller, video card, etc... it can amazingly be used to do more useful tasks.Let's take your juggler metaphor. I'll base my numbers on a best-case network latency of 1ms, and a typical case ARM CPU frequency of 1GHz -- while that network call is happening, the CPU can perform 1,000,000 operations. Based on my rough estimate while juggling 3 balls here, I'm going to say a typical ball being juggled is in the air for approximately 1 second.Okay, let's do the math:1GHz = 1,000,000,000 CPU operations/second1ms = .001seconds1,000,000,00 * .001 = 1,000,000 operations per ms.1x10^6 operations are accomplished while we are waiting for the network call to finish (note: this is just a simple ping, if we were requesting a lot of data, it'd take even longer)So, converting this back into Juggler Terms:The CPU juggler throws a ball in the air, and 16,666 seconds later it comes back down.Maybe our CPU juggler can throw a few more balls in the air eh?Next, let's take some guidance from Intel on where they see the future of computing going --, Intel has run into a bit of a wall on scaling the GHz rating of CPUs (turns out heat causes all sorts of problems) -- so instead of scaling up the CPU frequency, they are adding more cores. And guess how programmers can best utilize more cores? Yep, multithreading.

    • Jeremy Collake says:

      I agree with much of this, but to add a *little* counter-weight -- and I said some of this in my prior post about how we hit the thermal limit on single CPU performance (or come increasingly close). However, we must also consider that just because your see 1% CPU utilization, that doesn't mean that the speed of your CPU and the code it executes isn't important. MICRO-BURSTS of activity make a big difference as far as responsiveness goes, assuming other bottlenecks aren't blocking (which is indeed common). Anyway, my point is that the metric we often use of x% CPU utilization is skewed because it is averaged over a relatively long period (usually 1 second). In reality, the CPU is either executing code, or not, at any given point.

      Also, the SMP barrier is a tricky one. Parallelism is something we programmers have a hard time dealing with because most operations are inherently linear in nature.

    • Jeremy Collake says:

      I also agree, his statement about multi-threading being ONLY concurrent execution seemed to totally ignore SMP. WTH!?? It is as if he missed the last decade. And you ARE right, multi-threading yields benefits even for single CPU systems that execute it concurrently because most threads are in a wait state most of the time.

    • Jeremy Collake says:

      And you are also right that he totally seemed to misunderstand asynchronous calls, which have been in computing ever since programming was invented. It's not some new .NET buzzword, lol. Right on ;)

  5. olivierc says:

    It is funny that Microsoft is saying something about this, and you transpose it to windows phone. On windows phone, at least for now, you do not have access to any native SDK, you have to use .NET. Which actually prevent me from porting my SDK to this platform ( I really do not want to mess with C++ CLI).

    • Jeremy Collake says:

      I hear you on not wanting to mess with the C++ CLI.. its an abomination ;p. I am still traumatized by a 'senior' programmer (owner of a company) who simply wouldn't believe me when I told him that C++ could be compiled to *managed* code. Indeed, it is counter-intuitive, and you basically have to craft the code using .NET OO objects, so as to maintain the capability to be managed code.

      The reason I'm still traumatized is I sent this dude evidence about it, but he still wouldn't admit such a thing existed. It made me want to scream. If I'm wrong, I admit it. I therefore now have a policy where I will *never* work with anyone who won't admit when they are wrong.

      • Shirondale Kelley says:

        No, you're not crazy for wanting people to admit when they're wrong. It's not like you're going to dig into them forever, you just want progress. They're hiding/lying/willful ignorance impedes that progress.

  6. Jeremy Collake says:

    After reading your article more closely, I have a problem with the following statement .. which seems to be a fundamental misunderstanding of *modern*  SMP architectures. You speak of concurrency when they are doing true simultaneous execution. 10 years ago it might have been somewhat true, though another poster points out that even on single CPUs, multi-threading is useful. With SMP (multi-CPU) systems, this is no longer the case, as code is truly executed simultaneously, not just concurrently.

    HIS QUOTE (*not* mine)"
    Multithreading does not speed up software. It simply forces a computer to do two tasks at one time, simply by switching between them so fast you don't notice the switch. But no matter how you word it, the computer is trying to do two tasks, rather than just one, which means more work, not less and the switching process has overhead, which means even more work."[/quote] <-- NOT TRUE for SMP systems.

    • chrisboss says:

      I write software using PowerBasic (great grand child of the famous Turbo Basic) and it can produce executables with performance and compactness on par with C or C++. I also work with things like low Windows API's, including threading. While it is true that multicore CPU's can do more work than a single core, no matter how you "slice it" some CPU or core must execute machine language and it can only do so much work. Windows API programmers who work with threading will learn quickly that threads are not always the solutions to performance, but it can create its own sort of bottlenecks. The context switching between threads the operating system must do also has extra overhead. Just because C# may make the use of threading easier for the programmer, , does not mean that the problems associated with threading are still not there.  Now when you watch the video of Herb Sutters talk, do you hear him saying anything about multi-core CPU's solving the problem with performance, so that managed languages somehow become on par with C or C++ ? No ! Because no matter the hardware, performance (aka. how you write your code and the language you use) matters. Likely many C# programmers today view C++ programmers the same way they may view a PowerBasic programmer. Thats why Herbs talk was entitled "Why C++?", as if why aren't you using C# instead of C++. The answer: performance !

      • Jeremy Collake says:

        I do agree with your premise that performance matters, and do applaud your efforts to get software developers to care more about performance. Don't get me wrong. I do apologize for any harsh or abrasive statements. 

    • chrisboss says:

      Don't laugh at Basic. True many basics may be dated, but I use the latest Powerbasic compiler and it can rival any C or C++ compiler you may be using today when it comes to performance. The PowerBasic compiler itself was written in assembler and compiles at amazing speeds. The people at PowerBasic have the motto "smaller, faster" and are experts at getting the most out of an x86 CPU. Now when a C programmer starts writing apps using the more common C runtimes, they are still large in size compared to what I can do with PowerBasic.

  7. Jeremy Collake says:

    Ok, I've went through this, and other posts of his ... my comments are below. I write native C++, tight, and optimized code. I believe he is right in that code speed and efficiency *is* important. HOWEVER, this article has so many fundamental misunderstandings.. ugh.. I gotta clear it up.

    1. I say this NOT to encourage bloated code, but to dispute his claim that multi-threading is 'bad' .. He seems to forget we now have more than one CPU even in our phones. Therefore, concurrency has been replaced with true simultaneous operations, even for the memory controller. As the other poster pointed out, and I seconded, multi-threading even on a single CPU architecture is beneficial because most threads are in a wait state, therefore allowing concurrent execution without penalty in many cases.

    2. Asynchronous calls are not anything new with .NET, that's been around since the advent of computing. The idea is not to have to *wait* for I/O before continuing to do other things. I just don't know what in the world you are trying to say here, but it sure isn't harmful. The alternative is to sit there and wait for that I/O to be returned to you, halting everything in that thread until its done.

    3. You are simply not up to speed... I *only* code native C++, and before that I used mostly only x86 assembly. So, I try to write fast and tight code, but what you're talking about is like something written in 1980 that mis-predicted the direction of technology. You need to get up to date before you write about programming.

    • chrisboss says:

      I was not saying threading is bad. Yes, the new generation of multicore CPU's make threading very useful and the latest generation of CPU's have all sorts of new tricks. What bothered me, was when I discussed the importance of performance in other articles, programmers in the world come back with "just use asynchronous" coding and you once again get performance and the fast and fluid experience. As a Windows API programmer I grasp the value of threading, but also know when "not" to use it. Threading does not solve all problems. An easy mistake one can make is to think that if one thread can improve the flow of an app, then more must be better. That is definitely wrong. By making threading even easier via asynchronous code, then it opens up even more problems. Threading is not bad. The abuse of threading is what is bad. Lastly, there is also the danger of the mindset that when performance suffers, that it always can be solved with better hardware. It is the programmers responsibility to treat performance as just as vital as productivity. If one looks at the "average" experience of most computer users today, with all the power we have today in the hardware, why is it that we tend to always be waiting on the computer and not see the "fast and fluid" experience we desire.

      • Adas Weber says:


        Indeed, threading does not solve all performance problems. But using C++ doesn't solve all performance problems either.

        The reason why we often have to wait on the computer is because quite often the bottlenecks are I/O related. No amount of C++optimisation can overcome I/O bottlenecks that are beyond your control. That's where asynchronous methods come in handy!!!

  8. Adas Weber says:

    Async calls are not about solving performance issues. They are about simplifying how you write your code so that your application is able to do other work while waiting for slow tasks to complete.

    Non-async (synchronous) methods require that you either wait for the task to finish, or manually poll the status of the task so that you can do something else in the mean time.

    With ASYNC methods, you are simply saying to the method "go and do your task, and tell me when you are done by firing an event", so that I don't have to wait for it to finish and I don't have to poll for it's status. It makes a programmer's life much easier without adding any extra performance overhead.

    • Jeremy Collake says:

      I would say that polling is a bit different, as that implies an asynchronous call was made in which you have no event notification telling you when its done (so you have to continually check). The author here suggests synchronous calls, which are blocking by their very nature.

      • Adas Weber says:

        True. Sync methods in the strict sense are blocking, however there are workarounds that allow you to use sync methods in a non blocking way, But, they are still workarounds, and the best way to solve it is using proper async methods and handle the events raised.

  9. chrisboss says:

    Using Threads in Windows, no matter how you do it and no matter how many cores a CPU has (the mass market PC's usually only 2 or 3) has serious ramifications, not to be taken lightly. True, multiple cores literally allow true simultanous execution, but remember we are talking about an operating system with a lot of things going on (ie. dozens of background services) and power users do like to multitask, so multiple apps may be running.  For example, I use IExplorer for a web browser and when I am doing research I may have dozens of tabs in IE, each with loading a separate web page. Things can quicky go bad, when you start pushing the computer with too many tasks. Now you start let programmers add threads indescriminitely to their apps and see what happens. API programmers need to be careful how and when they use threads, so the same rules would apply to even a managed language. A very good book (albiet a few years old, but still valuable) is "Multithreading Applications in WIN32, the complete guide to threads" by Jim Beveridge and Robert Wiener. Even with all the new multicore CPU's the principles discuss are still valid. Threads have overhead. Its called a context switch and the CPU has to do work, just to make the switch. Now remember, that most likely there will always be more threads running in Windows than there are cores, so context switches must occur to keep pace. Also the better the performance of the code in the primary thread, the less one will have to resort to threads to deal with tasks that take too long and the app can stay "fast fluid" as Microsoft would say.

    • Brent Newland says:

      >remember we are talking about an operating system with a lot of things going on (ie. dozens of background services)

      Very true. You may think your OS is just sitting there idle, but download and run Process Monitor from Microsoft and you'll see that, just sitting there, your computer is performing tens of thousands of tasks every second.

  10. chrisboss says:

    To add one more thing. Programmers must consider the full range of computers their software may be run on. Programmers tend to lean towards the leading edge computers, while end users may actually be using much less powerful mass market computers with only a single or dual core CPU, less RAM and a limited onboard GPU. What may run well on a development PC, with multiple core CPU, lots of RAM, a fast SSD and quality video card, may run terrible on a mass market PC. Even in businesses, they save huge amounts of money if the software they use can run on cheaper, more mass market computers, rather than PC's with more powerful hardware. How many programmers today will test their apps on a PC with a single core Celeron CPU which runs at less than 1 ghz, to see where the real bottlenecks are in their apps.  Now that may be far below even most computers made in the last 5 years, but it quickly shows you where the bottlenecks are. How many programmers today even care about businesses who are trying to extend the life of older PC's (ie. XP) so they can save money ? I develop software which will run just as well on a 10 years old XP computer as it does on the latest PC. Such software, which can have high performance too can save companies a lot of money.

    • Jeremy Collake says:

      I do. I care ;). I actually take old, crap filled, PCs and use them as test beds, lol.

  11. Adas Weber says:


    I've been giving this some more thought. I think one of the reason why programmers don't necessarily focus on outright performance is that users don't actually want this. WHAT???? I hear you scream? Let me explain...

    Users want applications which are RESPONSIVE and FAST, ie fast and fluid. In order for an application to be responsive, it might need to trade-off some outright speed in terms of the actual processing task.

    Let's take a graphics processing application as an example -

    Suppose I write an image processing algorithm in C# which takes 20 seconds to apply a complex transformation or filtering. OK, that's slow. Now, suppose I decide to performance-optimise this by rewriting the algorithm in C++, using some super-duper optimal algorithms, which reduces the processing time of the same operation down to 10 seconds. Great! I have solved the performance problem.

    Except that I haven't actually solved the performance problem. Why? Because the user doesn't want to wait 10 seconds for that image to be processed. The user wants a visual indicator to see how long he still has to wait, and he also wants to look at other images while he waits for this particular image processing operation to finish. So then I have to rewrite my algorithm to provide some user feedback in terms of how the operation is proceeding, which itself adds to the computation time. I also need to free-up the UI so the user can do other things while waiting for the operation to finish, so I use some async magic which will probably also add more to the computation time. So I have deliberately slowed down the overall processing speed of the task because the user wasn't happy with the performance, but is happy now that the application is responsive even though the image processing actually takes longer than it ought to.

    The point I'm trying to make is this - users don't like waiting, because waiting leads to frustration. It's all about perceived productivity when using an application, i.e. users often feel they can be more productive when they can do other things whilst having to wait for a computational task to finish. Therefore outright processing speed of an algorithm doesn't actually solve the "fast and fluid" performance problem if the user is still having to wait.

© 1998-2020 BetaNews, Inc. All Rights Reserved. Privacy Policy - Cookie Policy.