▇ In this post I will talk about one of the most famous pieces of code optimisation that can be written in a single line of code. It can be found in the source code of the first-person shooter Quake 3, and is probably the reason why it got so much public attention. It involves the calculation of the inverse square root of a floating point number. See also the bibliography of the Wikipedia article Fast inverse square root for more information. Here is the code:
1: float InvSqrt (float x) { 2: float xhalf = 0.5f*x; 3: long i = *(long*)&x; 4: i = 0x5F83759DF - (i>>1); //initial guess of inv-sqr 5: x = *(float*)&i; 6: x = x*(1.5f-xhalf*x*x); //one step of newton iteration 7: return x; 8: } |
A single execution of this InvSqrt-Function yields a very good approximation to the inverse square root. It is said, that it is $4$-times faster than the native division method (float)((1.0)/sqrt(x)).
The basic idea of this approach is the Newton iteraton method. But stop, the strange appearing constant 0x5F3759DF is not just an educated guess that gets corrected after a some iterations. There is no iteration at all. Look at the code, there is no loop, it is just a single round of execution with a guess and a step of the Newton iteration. And that has to be enough to compute the inverse square enough to not screw up the whole Quake engine.
Before we go through this code, I will briefly review how floating point numbers are stored by computers.
If you create a floating variable $x$ then $32$-bits are reserved with the following meaning:
$$ b_{31} | b_{30} ... b_{23} | b_{22} ... b_0 $$ The bit $b_{31}$ is the sign-bit and indicates with $0$ a positive integer and with $1$ a negative one. The bits $b_{30}$ to $b_{23}$ are the exponent bits, interpreted as an eight-bit two's complement number. That is, the bit $b_{30}$ is again the sign-bit of the exponent number, to allow negative exponent and hence fractional numbers. The range is as usual $[-128,127]$. The last block are the bits that make up the actual integer, called significand or mantissa. It is always assumed, that these bits are stored that way, that the highest bit that is equal to $1$, is shifted so that it is the only bit before the decimal point. This leading bit is not part of the $32$-bits, but it is assumed to be there. The amount of necessary shift to the left or to the right to compute the original number is stored in the exponent.
Before we go through this code, I will briefly review how floating point numbers are stored by computers.
If you create a floating variable $x$ then $32$-bits are reserved with the following meaning:
$$ b_{31} | b_{30} ... b_{23} | b_{22} ... b_0 $$ The bit $b_{31}$ is the sign-bit and indicates with $0$ a positive integer and with $1$ a negative one. The bits $b_{30}$ to $b_{23}$ are the exponent bits, interpreted as an eight-bit two's complement number. That is, the bit $b_{30}$ is again the sign-bit of the exponent number, to allow negative exponent and hence fractional numbers. The range is as usual $[-128,127]$. The last block are the bits that make up the actual integer, called significand or mantissa. It is always assumed, that these bits are stored that way, that the highest bit that is equal to $1$, is shifted so that it is the only bit before the decimal point. This leading bit is not part of the $32$-bits, but it is assumed to be there. The amount of necessary shift to the left or to the right to compute the original number is stored in the exponent.
For example, we have the $32$-bit binary string (which actually is the magic constant) $$S\;|\;E\;|\;M = 0\;|\;10111110\;|\; 01101110101100111011111$$ The sign bit is zero, hence it is a positive integer. The exponent is $10111110_2 = 190_{10}$. To get the actual exponent, one must subtract $127$, so $e = 190-127 = 63$. The mantisse is $01101110101100111011111$ which must be interpreted as
\begin{align*} 1+\frac{01101110101100111011111_2}{2^{23}} = & \frac{2^{23} + 01101110101100111011111_2}{2^{23}} \\
= & 1.01101110101100111011111_2\\
= & 1.4324301481246948_{10}
\end{align*} So in total the stored integer is $$1.4324301481246948\cdot 2^{63}$$ Given a S|E|M representation, the integer it represents can than be written as $$x = (-1)^S\cdot 2^{E-127}(1+M/2^{23}) $$ The inverse square root of this (we assume $S = 0$) is
= & 1.01101110101100111011111_2\\
= & 1.4324301481246948_{10}
\end{align*} So in total the stored integer is $$1.4324301481246948\cdot 2^{63}$$ Given a S|E|M representation, the integer it represents can than be written as $$x = (-1)^S\cdot 2^{E-127}(1+M/2^{23}) $$ The inverse square root of this (we assume $S = 0$) is
$$(*)\;\;\;\frac{1}{\sqrt{x}} = \frac{1}{\sqrt{1+M/2^{23}}}2^{-(E-127)/2}= \frac{1}{\sqrt{1+M/2^{23}}}2^{-E/2+63+1/2} $$
We will see, that the magic constant yields only a approximation for the inverse square root but a surprisingly well one. Albeit it is from the theoretical point not the best constant.
Much has been written about this constant, and there are many authors who have shown more or less rigorously which constant is the best by distinguishing several cases, e.g. [1], or by logarithmic equations, e.g. in the blog post by [2].
I will keep this here as simple as possible. I just want to show, via a plot of the final equation, that there is an area where the error term for the approximation gets minimized and 0x5F3759DF is within that area.
# Step by Step #
So lets step through this code line by line (skipping lines 1,7 and 8):2: float xhalf = 0.5f*x; |
In line 2, the input value is multiplied by 0.5f. That means that the exponent in the floating point representation of $x$ is decreased by $1$.
$$ \text{S|E|M} \rightarrow \text{S|E-1|M}$$ that is $$\text{xhalf} = (-1)^{b_{31}}\cdot 2^{E-1-127}\cdot
\left(1 + M/2^{23} \right)$$ Since we want to compute the square root we can safely assume $b_{31} = 0$ (we do not consider complex integers here).
3: long i = *(long*)&x; |
Here comes the first tricky assignment. What happens here is that the floating point representation of $x$ is now interpreted as a long integer. Instead of interpreting it as sign, exponent and mantisse bits, these bits are simply set to be a $32$-bit integer. Using the symbols E and M, this $32$-bit integer can then be written as:
$$ 0 | b_{30} ... b_{23} | b_{22} ... b_0 \;\;\rightarrow\;\; 0\cdot 2^{31}+E\cdot 2^{23}+M = i$$
4: i = 0x5F3759DF - (i>>1); |
Here comes the magic. The previous calculated integer $i$ is shifted right by one, which is equal to the operation $\left\lfloor i/2 \right\rfloor$. Afterwards, it is subtracted from our magician. $$ i = \text{0x5F3759DF} - \left\lfloor \frac{E\cdot 2^{23}+M}{2} \right\rfloor $$ This is used as the initial guess for the start of the newton iteration.
5: x = *(float*)&i; |
After the manipulation of the $32$-bit in line 4, these bits are converted back to a float value, i.e., those bits are now again interpreted as S|E|M bits. The input integer $i$ is \begin{align*}i = & 190\cdot 2^{23} + 3627487 - E\cdot 2^{22}-\left\lfloor \frac{M}{2} \right\rfloor
\end{align*} where $190\cdot 2^{23} + 3627487 = \text{0x5F3759DF}$.
Since we want to compute this constant for ourself, we replaced the magic constant with some unkowns.
\begin{align*}
i = r_1\cdot 2^{23} + r_2 - E\cdot 2^{22}-M/2
\end{align*}
Further we dropped the floor sign since this only contributes a negligible error. If we simply compute the floating value of this expression, we get
\begin{align*}
x = & 2^{ \left\lfloor i/2^{23} \right\rfloor -127} \left(1 + \frac{i\pmod{2^{23}}}{2^{23}} \right)
\end{align*}
For the exponent it holds
\begin{align*}
\left\lfloor i/2^{23} \right\rfloor -127 = & r_1 + \left\lfloor \frac{r_2 - E\cdot 2^{22} - M/2}{2^{23}} \right\rfloor - 127 \\
= & r_1 + \left\lfloor - \frac{1}{2}E + \frac{r_2 - M/2}{2^{23}} \right\rfloor - 127
\end{align*}
\begin{align*}
\left\lfloor i/2^{23} \right\rfloor -127 = & r_1 + \left\lfloor \frac{r_2 - E\cdot 2^{22} - M/2}{2^{23}} \right\rfloor - 127 \\
= & r_1 + \left\lfloor - \frac{1}{2}E + \frac{r_2 - M/2}{2^{23}} \right\rfloor - 127
\end{align*}
Let us w.l.o.g. assume that $E$ is even, hence we can write
\begin{align*}
\left\lfloor i/2^{23} \right\rfloor -127 = & r_1 - \frac{1}{2}E + \left\lfloor \frac{r_2 - M/2}{2^{23}} \right\rfloor - 127 \\
= & r_1 - \frac{1}{2}E + \delta - 127, \;\text{with}\;\delta \in \{-1,0\}
\end{align*}
From (*) we know that the target exponent must be $63-E/2+1/2$. So it is easy to see that $r_1 = 190$. The get the bits 24-31 from the magic constant, we cut of the LSB from 190, which is equal to the operation $190 >> 1 = \text{0x5F}$ and yields the first part of the magic constant. But more important is, that the $\delta$ value do not depend on $E$. Hence we write
\begin{align*}
x = & 2^{63-E/2+1/2} \frac{2^{\delta}}{2^{1/2}}\left(1 + \frac{i \pmod{2^{23}}}{2^{23}} \right)
\end{align*}
and can only proceed with the term $$ \frac{2^{\delta}}{2^{1/2}}\left(1 + \frac{i \pmod{2^{23}}}{2^{23}} \right) $$ To measure the quality of the approximation, we compute the distance to the real inverse square root, you can call it the error-function:
\begin{align*}
& \left| \frac{2^{\delta}}{2^{1/2}}\left(1 + \frac{i \pmod{2^{23}}}{2^{23}} \right) - \frac{1}{\sqrt{1+M/2^{23}}} \right| \\
= & \left| \frac{2^{\delta}}{2^{1/2}}\left(1 + \frac{r_1\cdot 2^{23} + r_2 - E\cdot 2^{22}-M/2 \pmod{2^{23}}}{2^{23}} \right) - \frac{1}{\sqrt{1+M/2^{23}}} \right| \\
\stackrel{\text{E even}}{=} & \left| \frac{2^{\delta}}{2^{1/2}}\left(1 + \frac{r_2 - M/2 \pmod{2^{23}}}{2^{23}} \right) - \frac{1}{\sqrt{1+M/2^{23}}} \right|= \epsilon(r_2,M)
\end{align*}
\left\lfloor i/2^{23} \right\rfloor -127 = & r_1 - \frac{1}{2}E + \left\lfloor \frac{r_2 - M/2}{2^{23}} \right\rfloor - 127 \\
= & r_1 - \frac{1}{2}E + \delta - 127, \;\text{with}\;\delta \in \{-1,0\}
\end{align*}
From (*) we know that the target exponent must be $63-E/2+1/2$. So it is easy to see that $r_1 = 190$. The get the bits 24-31 from the magic constant, we cut of the LSB from 190, which is equal to the operation $190 >> 1 = \text{0x5F}$ and yields the first part of the magic constant. But more important is, that the $\delta$ value do not depend on $E$. Hence we write
\begin{align*}
x = & 2^{63-E/2+1/2} \frac{2^{\delta}}{2^{1/2}}\left(1 + \frac{i \pmod{2^{23}}}{2^{23}} \right)
\end{align*}
and can only proceed with the term $$ \frac{2^{\delta}}{2^{1/2}}\left(1 + \frac{i \pmod{2^{23}}}{2^{23}} \right) $$ To measure the quality of the approximation, we compute the distance to the real inverse square root, you can call it the error-function:
\begin{align*}
& \left| \frac{2^{\delta}}{2^{1/2}}\left(1 + \frac{i \pmod{2^{23}}}{2^{23}} \right) - \frac{1}{\sqrt{1+M/2^{23}}} \right| \\
= & \left| \frac{2^{\delta}}{2^{1/2}}\left(1 + \frac{r_1\cdot 2^{23} + r_2 - E\cdot 2^{22}-M/2 \pmod{2^{23}}}{2^{23}} \right) - \frac{1}{\sqrt{1+M/2^{23}}} \right| \\
\stackrel{\text{E even}}{=} & \left| \frac{2^{\delta}}{2^{1/2}}\left(1 + \frac{r_2 - M/2 \pmod{2^{23}}}{2^{23}} \right) - \frac{1}{\sqrt{1+M/2^{23}}} \right|= \epsilon(r_2,M)
\end{align*}
Figure 1. Plot of the error function $\epsilon(r,M)$. |
Above in Figure 1 you can see the 3D-Plot of the error function above (Thanks to wolframalpha.com), whereof the X and the Y axis are the two parameters $r_2$ (here $r$) and $M$. The third dimension is the error term.
Figure2. Plot of the error function, with market line around $\approx 3.627*10^6$. |
6: x = x*(1.5f-xhalf*x*x); |
So, nothing really magic happens but a little clever computation :)
Bibliography:
[1] http://blog.quenta.org/2012/09/0x5f3759df.html
[2] http://web.archive.org/web/20090206040806/http://lomont.org/Math/Papers/2003/InvSqrt.pdf
No comments:
Post a Comment