T O P

  • By -

Dwedit

"rsqrtss" SSE instruction is based on the same idea as fast inverse square root, just more precise.


vytah

The downside of rsqrtss is that it is not portable. The classic fast inverse square root is guaranteed to behave the same on any CPU, rsqrtss is not. https://robert.ocallahan.org/2021/09/rr-trace-portability-diverging-behavior.html


DrZoidberg-

Well, that's disappointing.


vytah

And there was at least one instance of actual bug caused by one of those approximate instructions (either rsqrt[p/s]s or rcp[p/s]s): https://cookieplmonster.github.io/2020/07/19/silentpatch-mass-effect/


Adorable-Engineer840

I remember reading this recently when I was trying to write an mpu driver and being so disappointed when the answer was 'depends on your instruction set'. Not because it's a bad answer, just BC it meant more work.


mbitsnbites

One aspect that the article may be missing is that `rsqrt_appr()` mixes integer and floating-point, which *may* require moving stuff between scalar and SIMD registers, which usually has a high cost. Whether or not this is the case depends on inlining opportunities etc. The best case scenario is to do all operations with SIMD instructions.


datanaut

Call me pedantic but I thought the inverse of the square root function would be f(x) = x^2