ewx: (geek)
[personal profile] ewx

One of the things I needed for the user management window below was to convert from single bit in a bitmap to its bit number. (Rights are represented internally as a bitmap, but you want the bit number to index the array of GtkCheckButton widgets that represented them.)

Henry Warren suggests (among a number of other things) using the system's ability to perform this operation as part of converting integers to normalized floating point values, and inferring the answer from the exponent it chooses. Unfortunately the code he presents is very unportable, as it explicitly inspects the bytes representing the floating point value. His code is:

int nlz6(unsigned k) {
   union {
      unsigned asInt[2];
      double asDouble;
   };
   int n;

   asDouble = (double)k + 0.5;
   n = 1054 - (asInt[LE] >> 20);
   return n;
}

(nlz stands for “number of leading zeroes”. In a fixed-size word this is essentially the same question as leftmost bit or log2.)

As it happens C has long had a function called frexp which will portably extract the exponent, eliminating the need to make any assumptions about the representation of double. The code is very simple:

/** @brief Compute index of leftmost 1 bit
 * @param n Integer
 * @return Index of leftmost 1 bit or -1
 *
 * For positive @p n we return the index of the leftmost bit of @p n.  For
 * instance @c leftmost_bit(1) returns 0, @c leftmost_bit(15) returns 3, etc.
 *
 * If @p n is zero then -1 is returned.
 */
int leftmost_bit(uint32_t n) {
  /* See e.g. Hacker's Delight s5-3 (p81) for where the idea comes from.
   * Warren is computing the number of leading zeroes, but that's not quite
   * what I wanted.  Also this version should be more portable than his, which
   * inspects the bytes of the floating point number directly.
   */
  int x;
  frexp((double)n, &x);
  /* This gives: n = m * 2^x, where 0.5 <= m < 1 and x is an integer.
   *
   * If we take log2 of either side then we have:
   *    log2(n) = x + log2 m
   *
   * We know that 0.5 <= m < 1 => -1 <= log2 m < 0.  So we floor either side:
   *
   *    floor(log2(n)) = x - 1
   *
   * What is floor(log2(n))?  Well, consider that:
   *
   *    2^k <= z < 2^(k+1)  =>  floor(log2(z)) = k.
   *
   * But 2^k <= z < 2^(k+1) is the same as saying that the leftmost bit of z is
   * bit k.
   *
   *
   * Warren adds 0.5 first, to deal with the case when n=0.  However frexp()
   * guarantees to return x=0 when n=0, so we get the right answer without that
   * step.
   */
  return x - 1;
}

The code should certainly work up to the number of mantissa bits in a double, but I would worry about rounding causing 2n-1 to overflow to 2n for values larger than around 253. I suppose this could be checked manually, avoiding including knowledge of how the rounding is actually done, but I don't need anything bigger than 32 bits right now anyway.

(comments locked due to excess spam which for some reason targets this article in particular.)

(no subject)

Date: 2008-04-20 10:56 am (UTC)
pm215: (Default)
From: [personal profile] pm215
My personal favourite algorithm for this is one by David Seal which uses a table lookup. It does have the restriction that the input must be an exact power of 2, though. (Seal's original aim was finding the rightmost set bit, for which of course you can start off with x = x & (x - 1); to isolate the lowest bit before doing the multiply and lookup. I don't know of anything similar for isolating the leftmost bit.)
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

static char clztable[] =
{
  0x7f, 0x00, 0x01, 0x0c, 0x02, 0x06, 0x7f, 0x0d,
  0x03, 0x7f, 0x07, 0x7f, 0x7f, 0x7f, 0x7f, 0x0e,
  0x0a, 0x04, 0x7f, 0x7f, 0x08, 0x7f, 0x7f, 0x19,
  0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x15, 0x1b, 0x0f,
  0x1f, 0x0b, 0x05, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
  0x09, 0x7f, 0x7f, 0x18, 0x7f, 0x7f, 0x14, 0x1a,
  0x1e, 0x7f, 0x7f, 0x7f, 0x7f, 0x17, 0x7f, 0x13,
  0x1d, 0x7f, 0x16, 0x12, 0x1c, 0x11, 0x10, 0x7f,
};

int leftmost_bit(uint32_t x)
{
  /* From an algorithm by David Seal:
   * http://groups.google.co.uk/group/comp.sys.acorn.tech/msg/92ef8de21d4d9cdc
   * The input number is assumed to be a power of 2. (If it is not you will get
   * back a wrong answer, possibly 0x7f.)
   * The magic number is chosen so that on ARM it can be done in 3 insns:
   *   ORR     R1,R1,R1,LSL #4     ;If R1=X with 0 or 1 bits set, R0 = X * 0x11
   *   ORR     R1,R1,R1,LSL #6     ;R0 = X * 0x451
   *   RSB     R1,R1,R1,LSL #16    ;R1 = X * 0x0450FBAF
   */
  x *= 0x0450fbaf;
  return clztable[x >> 26];
}

int main(int argc, char **argv)
{
  uint32_t x;
  if (argc != 2)
  {
    fprintf(stderr, "Usage: %s number\n", argv[0]);
    return 1;
  }

  /* Not worrying about fine details of overflow/duff input */
  x = strtol(argv[1], 0, 0);

  printf("leftmost_bit(%x) == %d\n", x, leftmost_bit(x));
  return 0;
}

(no subject)

Date: 2008-04-20 09:12 pm (UTC)
gerald_duck: (frontal)
From: [personal profile] gerald_duck
Don't you mean x = x & (-x) for isolating the lowest set bit?

(no subject)

Date: 2008-04-20 09:53 pm (UTC)
pm215: (Default)
From: [personal profile] pm215
Yes, I was confused; probably I was thinking of x = x & ~(x - 1), which for 2s complement is obviously the same as your suggestion. x = x & (x - 1) will clear the lowest set bit (which I guess might be useful in some situations; compare with zero to find out whether your input number was a power of 2, suggests google).

(no subject)

Date: 2008-04-20 10:37 pm (UTC)
From: [identity profile] gjm11.livejournal.com
Slight generalization of that: it's possibly the most efficient way to do a population count when you expect the number of 1-bits to be small. (If you don't, other approaches are better. Even if you do, the other approaches may be better depending on your architecture.)

(no subject)

Date: 2008-04-21 08:41 am (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
That's an ARM-centric solution in that it's only really worthwhile on systems without a fast hardware divide. Otherwise, simply pick a prime p which is greater than your word size and which has 2 as a primitive root (so that all the relevant powers of two are distinct mod p); then once you've got a word with at most one bit set, reduce it mod p and look the remainder up in a table.

(no subject)

Date: 2008-04-21 09:33 am (UTC)
From: [identity profile] gjm11.livejournal.com
It's hardly only ARM that lacks a fast integer divide. (Is there in fact *any* architecture on which divide-by-37, say, is faster than multiply-by-0x0450fbaf? There probably is, and maybe e.g. Core2 is an example since it has pretty good idiv, but I have my doubts.)

(no subject)

Date: 2008-04-21 10:05 am (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
Hmm. I suppose the other criterion is how you get your hands on a constant that wide. A fixed-width RISCoid instruction set is likely to let you get the constant 37 pretty easily, but require several ALU instructions and/or a literal pool access for 0x0450fbaf.

On x86, you can trivially embed any literal you like in the instruction stream, so you're probably right that the multiply approach outperforms the divide. On ARM, getting your hands on the appropriate constant is a significant hassle, but there's no hardware divider at all so it's still a no-brainer in favour of the multiplicative strategy. But on a system which does have a hardware integer divide but would require a literal pool load to get its hands on the multiplicand, it comes down to a race between the memory system and the divider, and I'm not sure I'd bet on the memory winning.
Edited Date: 2008-04-21 10:05 am (UTC)

(no subject)

Date: 2008-04-21 08:02 pm (UTC)
From: [identity profile] gjm11.livejournal.com
As mentioned above, getting your hands on that particular constant isn't so very bad on an ARM (more precisely, you don't need to get the constant as such, you just need to multiply by it).

If the speed of this thing actually matters, it's probably being done a lot, so the relevant bit of your literal pool is likely to be in the dcache.

(For the avoidance of doubt, I'm not claiming that Seal's approach is best on all architectures. I just don't think it's so narrowly targetted as to justify calling it "ARM-centric".)

(no subject)

Date: 2008-04-21 09:38 am (UTC)
From: [identity profile] gjm11.livejournal.com
2 doesn't need to be a primitive root, btw; it's enough that its order be at least the word size. I don't know whether that would ever be useful in practice; maybe if by some coincidence you already happened to have such a prime in another register when you needed to do your which-bit-set calculation :-).

(no subject)

Date: 2008-04-21 12:50 pm (UTC)
pm215: (Default)
From: [personal profile] pm215
Mostly I work with embedded systems so 'no fast hardware divide' is usually a good assumption. (Come to think of it, 'is an ARM' is also usually a good assumption :-))

(no subject)

Date: 2008-04-20 11:56 am (UTC)
fanf: (Default)
From: [personal profile] fanf
Why not use gcc's builtin ffs()? It compiles to a single bsf instruction on x86.

(no subject)

Date: 2008-04-20 12:02 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
For the specific case, because I didn't know about it (I don't think I've irrevocably committed this code to being in GNU C yet, though it currently only works on platforms where that's the default compiler anyway, so I'm not overly worried about that). In general, because GCC isn't the only compiler, so it's still useful to have a portable alternative.

(no subject)

Date: 2008-04-20 12:12 pm (UTC)
fanf: (Default)
From: [personal profile] fanf
Actually it's in POSIX - it's just built-in to gcc so that it can be compiled efficiently.

http://www.opengroup.org/onlinepubs/009695399/functions/ffs.html

(no subject)

Date: 2008-04-20 12:35 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
Oh, excellent.

(no subject)

Date: 2008-04-25 11:26 am (UTC)
From: [identity profile] david jones (from livejournal.com)
"Bits are numbered starting at one"

Whaaat!!?

(no subject)

Date: 2008-04-20 12:19 pm (UTC)
emperor: (Default)
From: [personal profile] emperor
Also, ffs() is an excellent name for a function :)

(no subject)

Date: 2008-04-20 12:37 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
The ffs() function shall find the first bit set (beginning with the least significant bit)

Isn't that therefore doing precisely what Richard didn't want, i.e. returning the lowest set bit rather than the highest?

(no subject)

Date: 2008-04-20 12:42 pm (UTC)
ext_8103: (Default)
From: [identity profile] ewx.livejournal.com
Doesn't matter for my specific application, and there's an fls too.

(no subject)

Date: 2008-04-20 12:54 pm (UTC)
pm215: (Default)
From: [personal profile] pm215
No fls in POSIX or Linux. Google suggests it's a recentish BSDism.

(no subject)

Date: 2008-04-20 12:56 pm (UTC)

(no subject)

Date: 2008-04-20 01:01 pm (UTC)
fanf: (Default)
From: [personal profile] fanf
Only one bit set :-)

(no subject)

Date: 2008-04-20 12:38 pm (UTC)
pm215: (Default)
From: [personal profile] pm215
That starts at the least significant end, not the most significant end. (Not that it matters for this application, I think.)

(no subject)

Date: 2008-04-20 08:45 pm (UTC)
toothycat: (Default)
From: [personal profile] toothycat
This has potential to generate truly horrendous code on architectures that have separate floating-point and integer register sets (hello, PowerPC) ;)

November 2025

S M T W T F S
      1
2345678
91011121314 15
1617 181920 2122
23242526272829
30      

Most Popular Tags

Expand Cut Tags

No cut tags