wolfwings

From:

wolfwings.livejournal.com

...the problem here is that the code did any second-guessing at all. Instead of simply trying to make the code itself simpler and cleaner.

That's all that has been 'standard' over all these years. The simpler, more straight-forward the code, the faster the CPU's can make it run. If anything, they've made optimization easier because you can simply write actual code, instead of trying to target anything in particular.

Smaller code running with fewer branches runs faster. That's always been true. For a short while though, most of the more essoteric math was very slow, while memory suddenly surged ahead in speed for a couple years, resulting in a short-lived period where everything, even simple multiplication, was faster to do by looking up the answer in a lookup table. Long-term, simpler, less branching code, has always been an improvement.

On the 8086, everything was slow, but memory especially so. On the 186 and 286, this was still true mostly. The 386 and 486 however, turned this on it's ear. The 586 fixed this trend, and the 686/P2/P3/P4/Athlon/K6 and beyond all went back to the way the 8086 and so-forth were, memory:CPU ratio-wise of speed.

Unfortunately, too many people are still stuck in the very short-lived 'speedy memory' days, where it was 1-2 clock cycles for a cache miss. A multiply alone could take 10 clock cycles too, which only made things more silly. The 386 and 486 were the 'glory days' of optimization, where all sorts of insane, silly tricks were needed to make things run Really Fast.

Surprisingly, much of the older 80286 and before code still runs blazingly fast on my Athlon, in many cases faster than the 386 and 486-era 'demo scene' code that was so impressive for how much it could accomplish, using the tricks and shortcuts of the day.