Fast Approximate Logarithms, Part I: The Basics

Performance profiling of some of the eBay code base showed the logarithm (log) function to be consuming more CPU than expected. The implementation of log in modern math libraries is an ingenious wonder, efficiently computing a value that is correct down to the last bit. But for many applications, you’d be perfectly happy with an approximate logarithm, accurate to (say) 10 bits, especially if it were two or three times faster than the math library version. This post is the first of a series that examines approximate logarithms and the tradeoff between accuracy and speed. By the end of the series, you will see how to design fast log functions, and will discover that on a MacBook Pro, you can get a roughly 3x speedup if you are willing to settle for 7 bits. Alternatively, if you want more accuracy you can get 11 bits with a 2x speedup. The numbers might vary for your target machine, but I will present code so that you can determine the tradeoffs for your case.

You can find code for approximate logs on the web, but they rarely come with an evaluation of how they compare to the alternatives, or in what sense they might be optimal. That is the gap I’m trying to fill here. The first post in this series covers the basics, but even if you are familiar with this subject I think you will find some interesting nuggets. The second post considers rounding error, and the final post gives the code for a family of fast log functions.

A very common way to to compute log (meaning \ln = \log_e) is by using the formula \log x = (\log 2) \log_2 x \approx 0.6931472 \log_2x to reduce the problem to computing \log_2. The reason is that \log_2 for arbitrary x is easily reduced to the computation of \log_2 x for x in the interval [1, 2); details below. So for the rest of this series I will exclusively focus on computing \log_2. The red curve in the plot below shows \log_2 on [1, 2]. For comparison, I also plot the straight line y=x-1.

fig1_part1

If you’ve taken a calculus course, you know that \log x has a Taylor series about x=1 which is \log x = (x-1) - \frac{1}{2} (x-1)^2 + \cdots. Combining with \log_2 x = (\log_2 e)\log x gives (t for Taylor)

    \[ t(x) \approx -0.7213 x^2 + 2.8854 x - 2.1640 \]

How well does t(x) approximate \log_2 x?

fig2_part_1

The plot shows that the approximation is very good when x \approx 1, but is lousy for x near 2—so is a flop over the whole interval from 1 to 2. But there is a function that does very well over the whole interval: b(x) = -0.344845x^2 + 2.024658x -1.674873. I call it b for better. It is shown below in red (\log_2 in blue). The plot makes it look like a very good approximation.

fig3_part_1

A better way to see quality of the approximation is to plot the error \log_2 x - b(x). The largest errors are around x \approx 1.2 and x \approx 1.7.

fig4_part1

Now that I’ve shown you an example, let me get to the first main topic: how do you evaluate different approximations? The conventional answer is minimax. Minimax is very conservative—it only cares about the worst (max) error. It judges the approximation over the entire range 1 \leq x < 2 by its error on the worst point. As previously mentioned, in the example above the worst error occurs at x \approx 1.2, or perhaps at x \approx 1.7, since the two have very similar errors. The term minimax means you want to minimize the max error, in other words find the function with the minimum max error. The max error here is very close to 0.0050, and it is the smallest you can get with a quadratic polynomial. In other words, b solves the minimax problem.

Now, onto the first of the nuggets mentioned at the opening. One of the most basic facts about \log is that \log 1 = 0, whether it’s \log_2 or \ln or \log_{10}. This means there’s a big difference between ordinary and relative error when x \approx 1.

As an example, take x = 1.001. The error in b is quite small: \log_2 x - b(x) \approx -0.005. But most likely you care much more about relative error: (\log_2 x - b(x))/\log_2(x), which is huge, about -3.35. It’s relative error that tells you how many bits are correct. If \log_2x and b(x) agree to k bits, then (\log_2 x - b(x))/\log_2(x) is about 2^{-k}. Or putting it another way, if the relative error is \epsilon, then the approxmation is good to about -\log_2 \epsilon bits.

The function b that solved the minimax problem solved it for ordinary error. But it is a lousy choice for relative error. The reason is that its ordinary error is about -0.005 near x=1. As x \rightarrow 1 it follows that \log x \rightarrow 0, and so the relative error will be roughly -0.005/\log x \rightarrow \infty. But no problem. I can compute a minimax polynomial with respect to relative error; I’ll call it r(x) for relative. The following table compares the coefficients of the Taylor method t, minimax for ordinary error b and minimax for relative error r:

Rendered by QuickLaTeX.com

The coefficients of b and r are similar, at least compared to t, but r is a function that is always good to at least 5 bits, as the plot of relative error (below) shows.

fig5_part1

Here’s a justification for my claim that r is good to 5 bits. The max relative error for r occurs at x=1, x=1.5, and x=2. For example, at x=1.5

    \begin{eqnarray*} r(1.5) & = & 0.1001\textbf{1000}\ldots \\ \log_2(1.5) & = & 0.1001\textbf{0101}\ldots \\ \end{eqnarray*}

If you’re a nitpicker, you might question whether this is good to 5 bits as claimed. But if you round each expression to 5 bits, each is 0.10011_2.

 

Unfortunately, there’s a big problem we’ve overlooked. What happens outside the interval [1,2)? Floating-point numbers x are represented as f2^k with 1 \leq f < 2. This leads to the fact mentioned above: \log_2 x = \log_2 f2^k = \log_2 f + k. So you only need to compute \log_2 on the interval [1,2). When you use r(x) for 1 \leq x < 2 and reduce to this range for other x, you get

fig6_part1

The results are awful for x just below 1. After seeing this plot, you can easily figure out the problem. The relative error of r for x \approx 2 is about 0.02, and is almost the same as ordinary error (since the denominator is close to \log_2 2 = 1). Now take an x just below 1. Such an x is multiplied by 2 to move it into [1,2), and the approximation to \log_2 is r(2x) - 1, where the -1 compensates for changing x to 2x. The ordinary error (r(2x) - 1) - \log_2(x) \approx r(2) - 1 is still about 0.02. But \log_2 x is very small for x \approx 1, so the ordinary error of 0.02 is transformed to 0.02/\log_2 x, which is enormous. At the very least, a candidate for small relative error must satisfy r(2) = \log_2(2) = 1. But r(2) \approx 0.98. This can be fixed by finding the polynomial that solves the minimax problem for all x. The result is a polynomial g for global.

Rendered by QuickLaTeX.com

One surprise about g is that its coefficients appear to be simple rational numbers, suggesting there might be a simple proof that this polynomial is optimal. And there is an easy argument that it is locally optimal. Since g(x) = Cx2 + Bx + A must satisfy g(1) = 0 and g(2) = 1 it is of the form gC(x) = C(x-1)2 + (1−C)(x−1). When x > 1 the relative error is ε(x) = (gC(x)− log2(x)) ⁄ log2(x) and limx→ 1+ ε(x) = (1−C)log2 − 1. When x < 1 then ε(x) = (gC(2x) − 1− log2 (x)) ⁄ log2(x) and lim x→1ε(x) = 2(1+C)log2 − 1. The optimal gC has these two limits equal, that is (1−C)log2 − 1 = 2(1+C)log2 − 1, which has the solution C = −1/3.

 

fig7_part1

Globally (over all x), the blue curve g does dramatically better, but of course it comes at a cost. Its relative error is not as good as over the interval [1, 2). That’s because it’s required to satisfy g(2) = 1 in order to have a small relative error at x-2. The extra requirement reduces the degrees of freedom, and so g does less well on [1, 2].

fig8_part1

Finally, I come to the second nugget. The discussion so far suggests rethinking the basic strategy. Why reduce to the interval [1,2)? Any interval [t, 2t) will do. What about using [0.75, 1.5)? It is easy to reduce to this interval (as I show below), and it imposes only a single requirement: that g(1) = 0. This gives g an extra degree of freedom that can be used to do a better job of approximating \log_2. I call the function based on reduction [0.75, 1.5) s for shift, since the interval has been shifted.

Rendered by QuickLaTeX.com

fig9_part1

The result is a thing of beauty! The error of s is significantly less than the error in g. But you might wonder about the cost: isn’t it more expensive to reduce to [0.75, 1.5) instead of [1.0, 2.0)? The answer is that the cost is small. A floating-point number is represented as (1+f)2^e, with f stored in the right-most 23 bits. To reduce to [0.75, 1.5) requires knowing when f \geq \frac{1}{2}, and that is true exactly when the left-most of the 23 bits is one. In other words, it can be done with a simple bit check, not a floating-point operation.

Here is more detail. To reduce to [0.75, 1.5), I first need code to reduce to the interval [1,2). There are library routines for this of course. But since I’m doing this whole project for speed, I want to be sure I have an efficient reduction, so I write my own. That code combined with the further reduction to [0.75, 1.5) is below. Naturally, everything is written in single-precision floating-point. You can see that the extra cost of reducing to [0.75, 1.5) is a bit-wise operation to compute the value \mathsf{greater}, and a test to see if \mathsf{greater} is nonzero. Both are integer operations.

The code does not check that x > 0, much less check for infinities or NaNs. This may be appropriate for a fast version of log.

float fastlog2(float x)  // compute log2(x) by reducing x to [0.75, 1.5)
{
    // a*(x-1)^2 + b*(x-1) approximates log2(x) when 0.75 <= x < 1.5
    const float a =  -.6296735;
    const float b =   1.466967;
    float signif, fexp;
    int exp;
    float lg2;
    union { float f; unsigned int i; } ux1, ux2;
    int greater; // really a boolean 
    /* 
     * Assume IEEE representation, which is sgn(1):exp(8):frac(23)
     * representing (1+frac)*2^(exp-127)  Call 1+frac the significand
     */

    // get exponent
    ux1.f = x;
    exp = (ux1.i & 0x7F800000) >> 23; 
    // actual exponent is exp-127, will subtract 127 later

    greater = ux1.i & 0x00400000;  // true if signif > 1.5
    if (greater) {
        // signif >= 1.5 so need to divide by 2.  Accomplish this by 
        // stuffing exp = 126 which corresponds to an exponent of -1 
        ux2.i = (ux1.i & 0x007FFFFF) | 0x3f000000;
        signif = ux2.f;
        fexp = exp - 126;    // 126 instead of 127 compensates for division by 2
        signif = signif - 1.0;                    // <
        lg2 = fexp + a*signif*signif + b*signif;  // <
    } else {
        // get signif by stuffing exp = 127 which corresponds to an exponent of 0
        ux2.i = (ux1.i & 0x007FFFFF) | 0x3f800000;
        signif = ux2.f;
        fexp = exp - 127;
        signif = signif - 1.0;                    // <<--
        lg2 = fexp + a*signif*signif + b*signif;  // <<--
    }
    // lines marked <<-- are common code, but optimize better 
    //  when duplicated, at least when using gcc
    return(lg2);
}
You might worry that the conditional test if greater will slow things down. The test can be replaced with an array lookup. Instead of doing a bitwise | with 0x3f000000 in one branch and 0x3f800000 in the other, you can have a single branch that uses \mathsf{arr[greater]}. Similarly for the other difference in the branches, exp−126 versus exp−127. This was not faster on my MacBook.

 

In summary:

  • To study an approximation p(x), don’t plot \log_2 and p(x) directly, instead plot their difference.
  • The measure of goodness for p(x) is its maximum error.
  • The best p(x) is the one with the smallest max error (minimax).
  • For a function like \log_2 x, ordinary and relative error are quite different. The proper yardstick for the quality of an approximation to \log_2 x is the number of correct bits, which is relative error.
  • Computing \log requires reducing to an interval [t, 2t) but you don’t need t=1. There are advantages to picking t = 3/4 instead.

In the next post, I’ll examine rounding error, and how that affects good approximations.

Powered by QuickLaTeX

13 thoughts on “Fast Approximate Logarithms, Part I: The Basics

    1. David Goldberg Post author

      That book has a lot of good background material, but is rather different from my posts. It is broad, covering multiple math functions, whereas my three posts go in much greater depth on just the log function.

      Many of the differences will come out in the next two posts, where I discuss rounding error, and give code for computing minimax coefficients. Another difference is my emphasis on efficiency, and measuring actual execution time. This leads me to make different design choices, for example I do not transform z = (x-1)/(x+1) as Crenshaw does, but instead work directly with x.

      Reply
  1. Ian Vehrmt

    A very enjoyable read. I love knowing the thought process behind efficient algorithms, but since calculus is not my forte, it helps a lot to have a well-explained accompanying mathematical approach. I look forward to your next post!

    Reply
  2. Aur Saraf

    A friend of mine wrote an x86 assembly poetry book you might like: http://www.xorpd.net/pages/xchg_rax/snip_00.html

    but specifically it contains far better methods for eliminating conditions than array lookups, and I’m willing to bet those would indeed be much faster on your MacBook.

    Actually, if you tell the compiler exp - (greater ? 126 : 127), I’m pretty sure it would optimize it to conditionless machine code.

    Reply
    1. David Goldberg Post author

      I didn’t investigate improvements to be had using assembly code, thanks for that reference. I agree with you that the C conditional operator compiles to code that doesn’t generate branch instructions. However, there are two conditionals in each branch of the if/else block, and so the obvious technique of getting rid of the if statement and replacing each conditional with the C ? operator turns out to be slower not faster. Is there a more clever way to do it?

      Reply
      1. Amos Onn

        int new_exp = 0x1fc00000 ^ greater
        and then the constants you want are new_exp << 1 and new_exp >> 22.

        Reply
        1. David Goldberg Post author

          Does the following code capture your suggestion?

          const int off = -127;
          const int mask = 0x3f800000;

          ux2.i = (ux1.i & 0x007FFFFF) | (mask ^ (greater << 1));
          fexp = exp + (off + (greater >> 22));

          Reply
          1. Amos Onn

            Almost; you can shave just a little bit more with:

            const int mask = 0x3f800000;

            int exp_fix = mask ^ (greater <> 23));

          2. Amos Onn

            Almost; you can shave just a little bit more with:

            const int mask = 0x3f800000;

            int exp_fix = mask ^ (greater << 1);
            ux2.i = (ux1.i & 0x007FFFFF) | exp_fix;
            fexp = exp – (exp_fix >> 23);

  3. Jason Sachs

    Why would relative error in a logarithm be considered important? Absolute error would make much more sense, unless for some odd reason you are dividing by log x near x=1.

    Reply
  4. David Goldberg Post author

    Relative error corresponds to the number of correct digits, and seems a better way to measure error than absolute error. Let me elaborate on that in two different ways.

    First, for floating-point, the exponent is more important than the fractional part. The numbers 5e-3 and 5e-4 differ by a small amount, since they are both small numbers. However they’re way different floating-point numbers because they have different exponents. Relative error takes the exponent into account, and says these are very different. Here’s an example using logs, using log2(1.005) = 0.0072. If your approximate log returns 0.07 instead, the absolute error seems small: 0.063. But the exponent is wrong. The true log is 7e-3 but the approximate log is 7e-2. The absolute error of this approximate value is small, the relative error is large.

    A second advantage of relative error applies if log2(x) is a library routine called by others. Then log2(x) might be a component of the final answer. For example if you take the log of y = a*x^b you get log2(y) = log2(a) + b*log2(x). So you might end up computing 1 + 10*log2(x). The absolute error in log2 might be small, but the absolute error 10*log2(x) + 1 is 10 times bigger.

    The reason that relative error is nice is that it corresponds to the number of correct digits, and (roughly speaking) the number of correct digits in an expression is the minimum of the correct digits in each component. If log2(x) always returns 3 correct digits, 1 + 10*log2(x) is also good to about 3 correct digits. But there’s no similar rule for absolute error. If the absolute error of log2(x) is less then 0.1, the absolute error in an expression involving log2(x) could be almost anything.

    Reply
    1. Jason Sachs

      I understand your points, but focusing on relative error overly emphasizes the accuracy of log2(x) when x is close to 1.0. Why should log2(1.0) and log2(0.99) and log2(1.01) be perfect, but log2(1.4) have errors?

      Reply
  5. David Goldberg Post author

    log2(1.0) is perfect, but none of the others are. All of log2(0.99), log2(1.01) and log2(1.4) are good to about 3 decimal digits. I illustrate that below, writing them in scientific notation and putting parens around the digits that match between the true value and the rational approximation I recommend in part III, log2(x) = 2.97171*(x-1)/((x-1) + 2.049810)


    log2(0.99) =
    (-1.4)5685e-02 approx
    (-1.4)4995e-02 true

    log2(1.01) =
    (1.44)270e-02 approx
    (1.43)552e-02 true

    log2(1.40) =
    (4.85)214e-01 approx
    (4.85)426e-01 true

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *