In the world of high-performance computing and compiler design, the smallest bottlenecks often yield the most significant headaches. We spend hours optimizing algorithms, refining memory access patterns, and unrolling loops. But there is a silent killer of CPU cycles lurking in the heart of modern processors: the .