Top-Down performance analysis methodology.
This article was originally posted here.
This post aims to help people that want to better understand performance bottlenecks in their application. There are many existing methodolgies to do performance anlysis,
but not so many of them are robust and formal. When I was just starting with performance work I usually just profiled the app and tried to grasp through the hotspots of the
benchmark hoping to find something there. This often lead to random experiments with unrolling, vectorization, inlining, you name it. I’m not saying it’s always a loosing
strategy. Sometimes you can be lucky to get big performance boost from random experiments. But usually you need to have very good intuition and luck :).
In this post I show more formal way to do performance analysis. It’s called Top-down Microarchitecture Analysis Method (TMAM)
(Intel® 64 and IA-32 Architectures Optimization Reference Manual, Appendix B.1). In this metodology we try to detect what was stalling our execution starting from the
high-level components (like Front End, Back End, Retiring, Branch predictor) and narrowing down the source of performance inefficiencies.
It’s an iterative process with 2 steps:
- Identify the type of the performance problem.
- Locate the exact place in the code using PEBS (precise event).
After fixing performance issue you repeat the process again.
If it doesn’t make sense to you yet, don’t worry, it’ll become clear with the example..
15 Jan 2024 @ 22:59 PM | 0 Comment(s) | Posted by Denis Bakhvalov