Fast Approximate Logarithms, Part II: Rounding Error

In Part I, I assumed all computations were done exactly. It’s time to bring in floating-point rounding error. There’s a key fact about rounding error that might seem surprising: If x \approx 1, then the computation of x-1 in floating-point has no rounding error. The surprise comes for those who’ve been taught to beware of rounding error when subtracting two similar quantities. For example, the computation in floating-point of x^2-1 might be far from the true value of x^2-1 when x \approx 1. That’s because x^2 can have rounding error, so the subtraction might cancel most of the good bits leaving mainly the incorrect (due to rounding) bits. But in x-1, neither x nor 1 have rounding error, so the computation of x-1 is exactly equal to x-1 when x \approx 1.

Here’s the explanation. Rounding error in x+y can happen in two ways. First if x > y and x has a different exponent from y, then x will have its fractional part shifted right to make the exponents match, and so x might drop some bits. Second, even if the exponents are the same, there may be rounding error if the addition of the fractional parts has a carry-out from the high order bit. In the case of x − 1, the exponents are the same if 1 ≤ x < 2. And if 1/2 ≤ x < 1, the larger number is 1 and it does not drop bits when shifted. So there is no rounding error in x − 1 if 1/2 < x < 2.

 
The rule of thumb is that when approximating a function f with f(a) = 0, severe rounding error can be reduced if the approximation is in terms of $x-a$. For us, \log 1 = 0 so a=1, and the rule suggests polynomials written in terms of x-1 rather than x. By the key fact above, there is no rounding error in x-1 when x \approx 1.

Let me apply that to two different forms of the quadratic polynomials used in Part I: the polynomial can be written in terms of x or x-1.

    \begin{eqnarray*} \log_2x & \approx & s_2 x^2 + s_1x + s_0 \\ \log_2x & \approx & \widehat{s}_2 (x-1)^2 + \widehat{s}_1 (x-1) + \widehat{s}_0 \end{eqnarray*}

If they are to be used on the interval [0.75, 1.5) and I want to minimize relative error, it is crucial that the polynomial be 0 when x=1, so they become

    \begin{eqnarray*} \log_2x & \approx & s_2 x^2 + s_1x + (-s_2 - s_1) \\ \log_2x & \approx & \widehat{s}_2 (x-1)^2 + \widehat{s}_1 (x-1) \end{eqnarray*}

The second equation has no constant term, so they both cost the same amount to evaluate, in that they involve the same number of additions and multiplications.

But one is much more accurate. You can see that empirically using an evaluation program (code below) that I will be using throughout to compare different approximations. I invoked the program as \mathit{eval\: 10\: 40960\: 1024} and \mathit{eval\: 10\: 10\: 4194304} and got the following:

SPACING IS 1/1024
using x    bits  5.5 at x=0.750000  2.06 nsecs nsec/bit=0.372  bits/nsec=2.69
using x-1  bits  5.5 at x=0.750000  2.10 nsecs nsec/bit=0.380  bits/nsec=2.63

SPACING IS 1/4194304 = 2^-22
using x    bits  1.7 at x=1-2.4e-07 2.08 nsecs nsec/bit=1.222  bits/nsec=0.82
using x-1  bits  5.5 at x=0.750000  2.12 nsecs nsec/bit=0.384  bits/nsec=2.61

When the approximating polynomials are evaluated at points spaced 1/1024 apart, they have similar performance. The accuracy of both is 5.5 bits, and the one using x-1 is slightly slower. But when they are evaluated at points spaced 2^{-22} apart, the polynomial using x has poor accuracy when x is slightly below 1. Specifically, the accuracy is only 1.7 bits when x \approx 1 - 2.4 \times 10^{-7}.

To see why, note that when x \approx 1, \widehat{s}_2 (x-1)^2 + \widehat{s}_1 (x-1) is summing two numbers that have rounding error, but are of different sizes, since (x-1)^2 \ll x-1. But s_2 x^2 + s_1x + s_0 is summing two numbers of similar size, since $s_0 = -s_1 - s_2$ and the sum of the first two terms is about s_1 + s_2. This is the bad case of subtracting two nearby numbers (cancellation), because they both have rounding error.

I suppose it is an arguable point whether full accuracy for all x is worth a time performance hit of about 2%. I will offer this argument: you can reason about your program if you know it has (in this case) 5.5 bits of accuracy on every input. You don’t want to spend a lot of time tracking down unexpectedly low accuracy in your code that came about because you used a log library function with poor precision on a small set of inputs.

Here’s some more information on the output of the evalution program displayed above. The first number is accuracy in bits measured in the usual way as -\log_2 E where E is the maximum relative error. Following \mathit{x=} is the value of x where the max error E occured. The execution time (e.g. 2.06 nsecs for the first line) is an estimate of the time it takes to do a single approximation to \log_2, including reducing the argument x to the interval [0.75, 1.5). The last two numbers are self explanatory.

Estimating execution time is tricky. For example on my MacBook, if the argument x must be brought into the cache, it will significantly affect the timings. That’s why the evaluation program brings \mathit{xarr} and \mathit{yarr} into the cache before beginning the timing runs.

For polynomials, using x-1 has almost the same cost and better accuracy, so there is a good argument that it is superior to using x. Things are not so clear when the approximation is a rational function rather than a polynomial. For example, q(x) =\frac{a(x-1)+b}{c(x-1)+d}. Because q(1) = 0, the numerator is actually a(x-1). And because you can multiply numerator and denominator by anything (as long as it’s the same anything), it further simplifies to q(x) = \frac{a(x-1)}{(x-1)+e}. This will have no floating-point cancellation, and will have good accuracy even when x \approx 1. But there’s a rewrite of this expression that is faster:

    \[ \frac{a(x-1)}{(x-1)+e} = a - \frac{ae}{(x-1) + e} \]

Unfortunately, this brings back cancellation, because when x \approx 1 there will be cancellation between a and the fraction. Because there’s cancellation anyway, you might as well make a further performance improvement eliminating the need to compute x-1, namely

    \[ \frac{a(x-1)}{(x-1)+e} = a - \frac{ae}{x + (e-1)} \]

Both sides have a division. In addition, the left hand side has a multiplication and 2 additions. The right hand side has no multiplications and 2 additions (ae is a constant and doesn’t involve a run-time multiplication, similarly for e-1). So there is one less multiplication, which should be faster. But at the cost of a rounding error problem when x \approx 1.

SPACING IS 1/1024
using x-1  bits  7.5 at x=0.750000  2.17 nsecs nsec/bit=0.289  bits/nsec=3.46
using x    bits  7.5 at x=0.750000  2.07 nsecs nsec/bit=0.275  bits/nsec=3.64

SPACING IS 1/4194304 = 2^-22
using x-1  bits  7.5 at x=0.750000  2.24 nsecs nsec/bit=0.298  bits/nsec=3.36
using x    bits  1.4 at x=1+2.4e-07 2.09 nsecs nsec/bit=1.522  bits/nsec=0.66

As expected, the rational function that has one less multiplication (the line marked using x) is faster, but has poor accuracy when x is near 1. There’s a simple idea for a fix. When |x-1| is small, use the Taylor series, \log_2 x \approx \log_2 e \left( (x-1) - \frac{1}{2} (x-1)^2 + \cdots\right). Using (\log_2 e) (x-1) is a subtraction and a multiplication, which is most likely cheaper than a division and two additions, What is the size cutoff? The error in the Taylor series is easy to compute: it is the next term in the series, (x-1)^2/2, so the relative error is about (x-1)/2. And I want to maintain an accuracy of 7.5 bits, or 2^{-7.5} \approx 0.0055. So the cutoff is 1 \pm \delta, where \delta/2 = 0.0055 or \delta = 0.011.

On my MacBook, the most efficient way to implement |x-1| < \delta appears to be x < 1 + \delta \small{\mathsf{\;AND\;}} x > 1 - \delta. In the evaluation program, most of the x are greater than 1, so only the first of the inequalities is executed. Despite this, adding the check still has a high cost, but no more accuracy than using x-1.

SPACING IS 1/1024
using x-1  bits  7.5 at x=0.750000  2.17 nsecs nsec/bit=0.289  bits/nsec=3.46
using x    bits  7.5 at x=0.750000  2.07 nsecs nsec/bit=0.275  bits/nsec=3.64
cutoff     bits  7.5 at x=0.750000  2.58 nsecs nsec/bit=0.343  bits/nsec=2.91

SPACING IS 1/4194304 = 2^-22
using x-1  bits  7.5 at x=0.750000  2.24 nsecs nsec/bit=0.298  bits/nsec=3.36
using x    bits  1.4 at x=1+2.4e-07 2.09 nsecs nsec/bit=1.522  bits/nsec=0.66
cutoff     bits  7.5 at x=0.989000  2.60 nsecs nsec/bit=0.347  bits/nsec=2.88

In Part I of this series, I noted that testing whether x was in the range 0.75 \leq x < 1.5 can be done with a bit operation rather than a floating-point one. The same idea could be used here. Instead of using the Taylor series when |x-1| < 0.011 or 0.989 \leq x \leq 1.011, use it in a slightly smaller interval

    \begin{alignat*}{3} & 0.992 && < x && < 1.007812 \\ & 1 - 2^{-7} && < x && < 1 + 2^{-7} \end{alignat*}

The latter can be converted to bit operations on $f$, the fraction part of x, as follows:

    \begin{eqnarray*} 1 \leq x < 1 + 2^{-7} & \small{\mathsf{\;\; OR \;\;}} & 1 - 2^{-7} < x < 1 \\ 1 \leq x < 2 \;\small{\mathsf{\; AND \;}} f < 2^{-7} & \small{\mathsf{\;\; OR \;\;}} & \frac{1}{2} \leq x < 1 \;\small{\mathsf{\; AND \;}} 1 - 2^{-6} < 2x-1 = f\\ \mathit{\; exp \;} = 0 \;\small{\mathsf{\; AND \;}} f < 2^{-7} & \small{\mathsf{\;\; OR \;\;}} & \mathit{\; exp \;} = -1 \;\small{\mathsf{\; AND \;}} f > 1 - 2^{-6} \end{eqnarray*}

As bit operations, this is

(exp == 0 && (f & 111111100...) == 0) 
   OR (exp = -1 && (f & 11111100...) == 111111000...)

When I tested this improvement (\mathit{fcutoff} in the table below). it was faster, but still slower than using x-1, at least on my MacBook.

SPACING IS 1/1024
using x-1  bits  7.5 at x=0.750000  2.17 nsecs nsec/bit=0.289  bits/nsec=3.46
using x    bits  7.5 at x=0.750000  2.07 nsecs nsec/bit=0.275  bits/nsec=3.64
cutoff     bits  7.5 at x=0.750000  2.58 nsecs nsec/bit=0.343  bits/nsec=2.91
fcutoff    bits  7.5 at x=0.750000  2.46 nsecs nsec/bit=0.327  bits/nsec=3.06

SPACING IS 1/4194304 = 2^-22
using x-1  bits  7.5 at x=0.750000  2.24 nsecs nsec/bit=0.298  bits/nsec=3.36
using x    bits  1.4 at x=1+2.4e-07 2.09 nsecs nsec/bit=1.522  bits/nsec=0.66
cutoff     bits  7.5 at x=0.989000  2.60 nsecs nsec/bit=0.347  bits/nsec=2.88
fcutoff    bits  7.5 at x=0.750001  2.44 nsecs nsec/bit=0.325  bits/nsec=3.08

Bottom line: having special case code when x \approx 1 appears to significantly underperform computing in terms of x-1.

In the first post, I recommended reducing to [0.75, 1.5) instead of [1, 2) because you get one extra degree of freedom, which in turns gives greater accuracy. Rounding error gives another reason for preferring [0.75, 1.5). When x = 1 - \delta, reduction to [1, 2) will have cancellation problems. Recall the g function that was optimal for the interval [1, 2), g(x) = -(1/3)(x-1)^2 + (4/3)(x-1). When x = 1 - \delta, x must be multiplied by two to move into [1, 2), and then to compensate, the result is g(2x) - 1. When x \approx 1, g(2x) \approx 1, and so you get cancellation. Below are the results of running the evaluation program on g. If there was no rounding error, g would be accurate to 3.7 bits. As you get closer to 1 (\delta \rightarrow 0) the accuracy drops.

SPACING IS 1/2^19
g    bits  3.5 at x=1-3.8e-06  2.05 nsecs nsec/bit=0.592  bits/nsec=1.69

SPACING IS 1/2^20
g    bits  2.9 at x=1-9.5e-07  2.05 nsecs nsec/bit=0.706  bits/nsec=1.42

SPACING IS 1/2^21
g    bits  2.9 at x=1-9.5e-07  2.05 nsecs nsec/bit=0.706  bits/nsec=1.42

SPACING IS 1/2^22
g    bits  1.7 at x=1-2.4e-07  2.05 nsecs nsec/bit=1.203  bits/nsec=0.83

The goal of this series of posts is to show that you can create logarithm routines that are much faster than the library versions and have a minimum guaranteed accuracy for all x. To do this requires paying attention to rounding error. Summarizing what I’ve said so far, my method for minimizing rounding error problems is to reduce x to the interval [0.75, 1.5) and write the approximating expression using x-1, for example a(x-1)^2 + b(x-1)). More generally, the approximating expression would be a polynomial

    \[ a_n(x-1)^n + a_{n-1}(x-1)^{n-1} + \cdots + a_1(x-1) \]

or rational function

    \[ \frac{ b_n(x-1)^n + b_{n-1}(x-1)^{n-1} + \cdots + b_1(x-1)}{ (x-1)^m + c_{m-1}(x-1)^{m-1} + \cdots + c_1(x-1) + c_0} \]

I close by giving the code for the evaluation program that was used to compare the time and accuracy of the different approximations:

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <math.h> 

/*
 *   Usage: eval [hi reps spacing]
 *       Evaluates an approximation to log2 in the interval [0.125, hi]
 *       For timing purposes, repeats the evaluation reps times.
 *       The evaluation is done on points spaced 1/spacing apart.	       
 */

int
main(argc, argv)
    char **argv;
{
    float x;
    struct timeval start, stop;
    float lo, hi, delta;
    int i, j, n, repetitions, one_over_delta;
    double xd;
    float *xarr, *lg2arr, *yarr;

    // parameters
    lo = 0.125;
    hi = 10.0;
    one_over_delta = 4194304.0; // 2^22
    repetitions = 1;
    if (argc > 1) {
        hi = atof(argv[1]);
        repetitions = atoi(argv[2]);
        one_over_delta = atoi(argv[3]);
    }
    delta = 1.0/one_over_delta;

    // setup
    n = ceil((hi - lo)/delta) + 1;
    xarr = (float *)malloc(n*sizeof(float));
    yarr = (float *)malloc(n*sizeof(float));
    lg2arr = (float *)malloc(n*sizeof(float));
    i = 0;
    for (xd = lo; xd <= hi; xd += delta) {
        x = xd;
        if (x == 1.0) // relative error would be infinity
            continue;
        xarr[i] = x;
        lg2arr[i++] = log2(x);
    }
    if (i >= n)  // assert (i < n)
        fprintf(stderr, "Help!!!\n");
    n = i;

    /* cache-in xarr[i], yarr[i] */
    yarr[0] = 0.0;
    for (i = 1; i < n; i++) {
        yarr[i] = xarr[i] + yarr[i-1];
    }
    fprintf(stderr, "cache-in: %f\n\n", yarr[n-1]); // to foil optimizer

    gettimeofday(&start, 0);
    for (j = 0; j < repetitions; j++) {
        for (i = 0; i < n; i++) {
            yarr[i] = approx_fn(xarr[i]);
        }
    }
    gettimeofday(&stop, 0);
    finish(&start, &stop, "name ", n, repetitions, xarr, yarr, lg2arr);

    exit(0);
}

// convert x to string, with special attention when x is near 1
char *format(float x) {
    static char buf[64];
    float y;

    if (fabs(x - 1) > 0.0001)
        sprintf(buf, "%f", x);
    else {
        y = x-1;
        if (y < 0)
            sprintf(buf, "1%.1e", y);
        else
            sprintf(buf, "1+%.1e", y);
    }
    return(buf);
}

void
finish(struct timeval *start, struct timeval *stop,
       char *str, int n, int repetitions, float *xarr, float *yarr, float *lg2arr)
{
    double elapsed; // nanosecs
    float max, rel;
    int maxi, i;
    double bits;

    elapsed = 1e9*(stop->tv_sec - start->tv_sec) + 1000.0*(stop->tv_usec - start->tv_usec);
    max = 0.0;
    for (i = 0; i < n; i++ ) {
        rel = fabs( (yarr[i] - lg2arr[i])/lg2arr[i]);
        if (rel > max) {
            max = rel;
            maxi = i;
        }
    }
    bits = -log2(max);
    elapsed = elapsed/(n*repetitions);
    printf("%s bits %4.1f at x=%s  %.2f nsecs nsec/bit=%.3f  bits/nsec=%.2f\n",
           str, bits, format(xarr[maxi]), elapsed, elapsed/bits, bits/elapsed);
}


Powered by QuickLaTeX

3 thoughts on “Fast Approximate Logarithms, Part II: Rounding Error

  1. Kevin

    Your last code snippet seems to have been copied incorrectly. You have a couple for loops that don’t have two semicolons. If I copy, paste, and compile, I get a number of error messages.

    1. David Goldberg Post author

      Yikes! Yes that code block got garbled. It’s just been updated. Thanks for catching that.

  2. Amos Onn

    You can further optimize fcutoff (after limiting to [0.75,1.5), meaning the exponent is either 0 or -1 and the sign bit is 0) as:
    0x007F0000 & (((ux.i >> 14) | ux.i) ^ (ux.i >> 1)) == 0

    I suppose even better is possible, I will explain why this one works.

    Essentially, 0x7FF… & (x ^ (x >> 1)) == 0
    checks that all bits of x are identical, i.e. that x is either 0x000… or 0xFFF… . This is almost what we want to do; we want to check that 0x00FF0000 & ux.i is either 0x00000000, meaning that the least bit of the exponent, hence all of it, is 0, and fraction is 0b0000000…; or it’s 0x00FF0000, meaning that the least bit of the exponent is 1, hence the exponent is -1, and the fraction is 0b1111111…; or it’s 0x00FE0000, meaning the exponent is -1, and the fraction is 0b1111110… .

    To compare for the first two cases, as explained before, we can run 0x007F0000 & (x ^ (x >> 1)) == 0. Now we just need to fix 0x00FE0000 to 0x00FF0000, without fixing 0x00000000 to 0x00010000 or vice versa. We note that in the case where we ultimately have 0x00FF0000 or 0x00FE0000, our number starts with 0b01…, whereas when we have 0x00000000 or 0x00010000, it starts with 0b00… . Therefore, if we shift the number to the right position, and then OR, we will fix the bit of 0x00010000 to be 1 in the former case, but not touch it in the latter. This is what the ((ux.i >> 14) | ux.i) performs, while damaging only less significant digits.

Comments are closed.