Performance Hot Spots



From a software performance standpoint, “hot spots” are an area of intense activity.  They’re hot spots because they’re frequently executed code paths with some sort of friction or bottleneck.  They represent potential optimization paths for improving the performance of your code or design.  You find the hot spots by measuring and analyzing the system.

Stepping back, we can use “Hot Spots” more loosely.  We can use them to gather, organize, and share principles, patterns, and practices for performance.  You can think of Performance Hot Spots as a map or frame.  Remember that even with a map, you still need to set goals and measure.  You need to know what good looks like and you need to know when you’re done.  The benefit of a map or frame is that you have a set of potential areas to help you focus your exploration as well as find and share knowledge.

Why Performance Hot Spots
There’s several reasons for using Performance Hot Spots:

  • Performance Hot Spots are a way to chunk up the performance space.
  • Performance Hot spots create more meaningful filters.
  • It’s a living map to help you think about performance issues and solutions.
  • You can use Performance Hot Spots to help guide your inspections (performance design, code, and deployment inspections).
  • You can use Performance Hot Spots to divide and conquer your performance issues.

Hot Spots for Performance (Application Level)
The Performance Hot Spots at the application level are:

  • Caching
  • Communication
  • Concurrency
  • Coupling / Cohesion
  • Data Access
  • Data Structures / Algorithms
  • Exception Management
  • Resource Management
  • State Management

Performance Issues Organized by Performance Hot Spots
Here are some performance issues organized by Performance Hot Spots.

Category Issues
  • Round trips to data store for every single user request, increased load on the data store
  • Increased client response time, reduced throughput, and increased server resource utilization
  • Increased memory consumption, resulting in reduced performance, cache misses, and increased data store access
  • Communication
  • Multiple round trips to perform a single operation
  • Increased serialization overhead and network latency
  • Cross-boundary overhead: security checks, thread switches, and serialization
  • Concurrency
  • Stalls the application, and reduces response time and throughput
  • Stalls the application, and leads to queued requests and timeouts
  • Additional processor and memory overhead due to context switching and thread management overhead
  • Increased contention and reduced concurrency
  • Poor choice of isolation levels results in contention, long wait time, timeouts, and deadlocks
  • Coupling and Cohesion
  • Mixing functionally different logic (such as presentation and business) without clear, logical partitioning limits scalability options
  • Chatty interfaces lead to multiple round trips
  • Data Access
  • Increased database server processing
  • Reduced throughput
  • Increased network bandwidth consumption
  • Delayed response times Increased client and server load
  • Increased garbage collection overhead
  • Increased processing effort required
  • Inefficient queries or fetching all the data to display a portion is an unnecessary cost, in terms of server resources and performance
  • Unnecessary load on the database server
  • Failure to meet performance objectives and exceeding budget allocations
  • Data Structures / Algorithms
  • Reduced efficiency; overly complex code
  • Reduced efficiency; overly complex code
  • Passing value type to reference type causing boxing and unboxing
  • Complete scan of all the content in the data structure, resulting in slow performance
  • Undetected bottlenecks due to inefficient code.
  • Exception Management
  • Round trips to servers and expensive calls
  • Expensive compared to returning enumeration or Boolean values
  • Increased inefficiency
  • Adds to performance overhead and can conceal information unnecessarily
  • Resource Management
  • Can result in creating many instances of the resources along with its connection overhead
  • Increase in overhead cost affects the response time of the application; Not releasing (or delaying the release of) shared resources, such as connections, leads to resource drain on the server and limits scalability
  • Retrieving large amounts of data from the resource increases the time taken to service the request, as well as network latency
  • Increase in time spent on the server also affects response time as concurrent users increase
  • Leads to resource shortages and increased memory consumption; both of these affect scalability
  • Large numbers of clients can cause resource starvation and overload the server.
  • State Management
  • Holding server resources and can cause server affinity, which reduces scalability options
  • Limits scalability due to server affinity
  • Increased server resource utilization Limited server scalability
  • In-process and local state stored on the Web server limits the ability of the Web application to run in a Web farm
  • Large amounts of state maintained in memory also create memory pressure on the server
  • Increased server resource utilization, and increased time for state storage and retrieval
  • Inappropriate timeout values result in sessions consuming and holding server resources for longer than necessary.
  • Case Studies / Examples
    Using Performance Hot Spots produces results.  Here are examples of Performance Hot spots in Action:

    Questions for Reflection
    Hot spots are a powerful way for sharing information.  Here’s some questions to help you turn insight into action:

    • How can you leverage Performance Hot Spots to improve performance results in your organization?
    • How can you organize your bodies of knowledge using Performance Hot Spots?
    • How can you improve sharing patterns and anti-patterns using Performance Hot Spots?
    • How can you improve checklists using Performance Hot Spots?
    • How can you tune and prune your security inspections using Performance Hot Spots?

    Additional Resources

    My Related Posts


    1. This blog seems to be very high-level, and I would suggest there is room for a low-level perspective as well, especially for the problem of finding and removing performance problems in specific software.

      First, I would suggest not using terms unless they have a clear definition, not hand-waving. For “bottleneck” I have never heard a clear definition. For “hotspot” I think there is a clear definition, though it less important than commonly regarded.

      I have a lot of practical experience doing performance tuning of software, and depending on how bad the problems are, often achieve speedup factors between 10x and 200x.

      Rather than the terms “hotspot” or “bottleneck”, I prefer the term “slowness bug”, or “slug” for short, although it sounds more professional to say “performance problem”. In any case, they are bugs. What they do wrong is take too much time. They are like bugs in that they must be found, and fixed. The fix always consists of changing some lines of code, and the finding consists of finding those lines.

      Here’s my method, which is tried-and-true, even though it flies in the face of accepted wisdom. (Accepted wisdom has gone without being questioned for far too long.) Your first reaction to this method may be that it is naively simple, but in fact that is why it is so effective.

      1. While the program is being slow, I manually interrupt it at a random time, and make a record of the call stack. The call stack is a list of instruction addresses, all of which are function call instructions except the “bottom” one. (The distinction between a function, and a function CALL, cannot be stressed enough. The latter is packed with useful information.)

      2. Then I take another sample of the call stack in the same way. Then I compare the two, looking for any instruction address that appears on both of them.

      Now, here is the key point: any instruction appearing on both samples, _if it can be avoided_ will save a large fraction of execution time.

      3. In the typical case that I do _not_ find a repeated instruction that can be avoided, I repeat step 2, as many times as necessary, building up a collection of call stack samples, and each time looking for instruction addresses that appear on multiple samples, _and_ that can be avoided. As soon as I find one, I fix it, rebuild, and re-run. The program is now faster, guaranteed. (If I go past 20 samples without finding something to fix, I’ve hit the point of diminishing returns, and stop.)

      4. I repeat this overall cycle 1,2,3, each time finding and removing a slowness bug until it is difficult to get samples because the program runs too fast. Then I insert a temporary outer loop to make it take longer, and keep at it, until I can no longer find things that I can fix. Then I remove the outer loop, and the program is much, much faster than it was to begin with, depending on how many problems it had to begin with.

      There are problems for which the call stack alone is not enough information to identify the problem, and for those I augment this with other techniques, but that’s too complicated to explain here.

      When I try to teach this method I run into standard objections, none of which are shared by the small group of people who actually use it.

      Objection 1: It is just a sampling method, with a very small sample size, so it could give spurious results.

      Answer: Yes it is sampling, but the key difference is in _what_ it samples. Retaining the program counter alone does not tell you much, except in very small programs. Retaining only the identities of the functions in the call stack, without the line-number or call site address information, is discarding key information.

      As far as the results being spurious, consider the smallest useful sample size, two samples of the call stack. If a call instruction (or any instruction) appears on the first sample, that does not tell you much. However, if it appears on both samples, that is _highly unlikely_ to happen by chance. Some instructions will obviously be on both samples, like “call _main()”, but those cannot be avoided. You are looking for instructions that _can_ be avoided.

      If you change the code so that such an avoidable function call is performed less often or not at all, the amount of time saved is approximately the percentage of samples that contained that function call. If this is not understood, please think about it until it is.

      Objection 2: It will stop someplace random, and will miss the _real_ problem.

      Answer: This comes from having a preconception of what the _real_ problem is. You don’t have to think about this very hard to understand that a) if it finds a problem, that problem is _real_, and 2) if a problem is _real_, it will find it.

      Objection 3: It will not work if there is recursion.

      Answer: Call-graph profilers are confused by recursion, but this method is not. Remember the method is to locate avoidable call instructions that appear on multiple samples, and the amount of time that could be saved is approximated by the percentage of samples containing such an instruction. This remains true even if any sample contains duplicate instructions. If this is not understood, please think about it until it is.

      Objection 4: What about multi-thread programs?

      Answer: Each thread by itself can be tuned. Then, there can be cross-thread dependencies, where one thread waits for another, or for a message from a remote process. In these cases, stack sample information is not sufficient, and more information needs to be captured. I will not go into this here, for lack of space, but it is quite do-able.

      Finally I offered to define the term “hot-spot”. It is a small set of addresses found at the bottom of the call stack on a significant percentage of samples. A hot-spot can be a performance bug, but usually it is not. The most common type of performance bug is an avoidable function call somewhere mid-stack. So if it were up to me I would downplay the emphasis on hot-spots in discussing performance tuning.

      The term “bottleneck” implies a constriction of some sort, like the narrow end of a funnel. Maybe in a communication network such a thing makes sense. In a single thread of execution it makes no sense, because the time taken is more like an automobile driver taking a cross-country route. If he is efficient, he goes straight from A to B by the most direct routine. If he is not, he makes lots of side trips, gets lost, dallies about, and takes his sweet time. Does it make sense to call that a “bottleneck”?

      Thanks for the chance to air this point of view.

    2. @ Mike

      Good stuff.

      For lower level stuff, I keep that on MSDN (for example, This is stepping back to look at a simpler map.

      Hot Spots in this case isn’t a performance term. It’s the concept of framing out common areas.

      I think it’s helpful in software to think in terms of app level and code level. The Hot Spots examples I gave are more at the app level, though they apply to lower level as well (

    Comments are closed.