In the world of high-performance computing, and especially in high, frequency trading (HFT) - nanoseconds matter. As a developer learning C++, understanding why some operations are faster than others isn’t just an academic curiosity. It’s the beginning of writing better, faster, and more predictable code.
In this chapter, we’ll answer a seemingly simple question:
Why is integer math faster than floating-point math?
The answer lies deeper than just “the compiler does a better job.” To really understand it, we have to go all the way down to the CPU’s architecture.
The Experiment
Let’s consider two versions of the same loop. Both run one billion times. One adds 1.0 to a floating-point variable (double), and the other adds 1 to an integer (int64_t).
// Floating-point loop
volatile double sink = 0.0;
for (long long i = 0; i < 1'000'000'000; ++i) {
sink += 1.0;
}
// Integer loop
volatile long long sink = 0;
for (long long i = 0; i < 1'000'000'000; ++i) {
sink += 1;
}
When compiled in Release mode and timed, I got:
- Floating-point: ~2.8 seconds (2,800,000,000 ns)
- Integer: ~0.45 seconds (450,000,000 ns)
That’s a 6x performance difference.
What’s going on?
The CPU is an Assembly Line
Modern CPUs are built around a concept called pipelining.
Imagine a factory assembly line: instead of waiting for one item to be built from start to finish, you build many items in stages, passing each piece down the line. Every stage works in parallel, increasing throughput.
Instructions in a CPU work the same way. When you write something like sink += 1, the CPU breaks it down into micro-operations and sends them through its instruction pipeline.
But - and this is key - some instructions are simpler and faster to move through that pipeline than others.
Integer Math Is the Fast Path
Integer addition is about as simple as it gets in CPU terms. It involves:
- One ALU (Arithmetic Logic Unit) instruction
- One register read
- One register write
Modern CPUs have multiple ALUs, and each can do integer math in a single clock cycle. The CPU can queue them up, execute them in parallel, and keep its pipelines full.
💡 Result: Integer math is fast, parallelizable, and compiler-friendly.
Floating-Point Is the Slow Path
Floating-point math, even simple additions, is much more complex:
- The numbers are represented in IEEE 754 format (sign, exponent, mantissa)
- The CPU must align the exponents, add the mantissas, then normalize and round the result
- It also must handle edge cases: NaN, ±infinity, denormals, overflows, underflows
This work is done in a dedicated unit called the FPU (Floating Point Unit). Most CPUs have fewer FPUs than integer units. These units are slower and more specialized - they don’t benefit from the same level of pipelining and parallelism as integer units.
💡 Result: Floating-point math is powerful, but it’s slower and harder to optimize.
Why This Matters (Especially in HFT)
In Python, you don’t think about this. Whether you write x += 1 or x += 1.0, the interpreter figures it out. Performance differences are hidden under multiple abstraction layers.
But in C++, you’re much closer to the metal.
If you’re writing tight loops that run on every market tick, or latency-sensitive order book code, knowing that floating-point math can cost you 5-10x more CPU time is not optional - it’s essential.
Why Compilers Favor Integers
The CPU is part of the story - but the compiler plays a major role too.
Compilers optimize integer code more aggressively because:
- Integer math has predictable, exact behavior
- It's free from rounding or precision concerns
- Loop analysis is easier, enabling loop unrolling, vectorization, and instruction reordering
- There’s less risk of introducing undefined or edge-case behavior
Floating-point math introduces complexity that often prevents the compiler from applying aggressive optimizations.
So here’s the rule:
If you can use integers, use them.
Reserve floating-point only when you need:
- Decimal precision (e.g., P&L calcs, percentages)
- Complex math (e.g., square roots, trigonometry)
For loop counters, indices, IDs, timestamps, or state - stick to integers.
What You Just Learned
By running two simple loops, you’ve learned:
- The CPU treats integer and floating-point instructions very differently
- Integer math is simpler, faster, and more parallel
- Floating-point math is complex, slow, and harder to optimize
- Compiler optimizations depend heavily on instruction type
- Even one line of code (
sink += 1.0vssink += 1) can make a 6x difference
And most importantly:
Performance in C++ is not magic. It’s mechanics.
The more you understand the hardware, the more power you have as a developer.
The Takeaway
If you're coming from Python or high-level programming, it's easy to overlook what your code turns into once compiled. But in C++, especially in latency-sensitive applications like HFT, you can't afford to ignore how your instructions translate into silicon.
So here’s the golden rule:
If you can use integers, use them.
Use floating-point types only when you truly need them - and know that they come at a cost.
Final Thoughts
Understanding performance in C++ isn’t just about writing faster code - it’s about thinking in layers:
language → compiler → CPU
By learning how a simple loop behaves differently with integers vs. floating points, you're not just optimizing - you're building intuition about how machines actually work.
“To understand the code, read the code.
To understand the performance, read the assembly.”
— Unknown systems engineer, probably sleep-deprived
Until next time, may your loops be tight and your pipelines full.
If you’re new to low-level benchmarking, you might also like this article on build modes.