Interleave bits efficiently

孤者浪人 提交于 2019-12-22 03:46:36

问题


I need to make uint64_t out of 2 uint32_t interleaving the bits: if A=a0a1a2...a31 and B=b0b1...b31, I need C=a0b0a1b1...a31b31. Is there a way to do this efficiently? So far I've got only the naive approach with a for loop of 32 iterations, where each iteration does C|=((A&(1<<i))<<i)|((B&(1<<i))<<(i+1)).

I guess there should be some mathematical trick like multiplying A and B by some special number which results in interleaving their bits with zeros in the resulting 64-bit number, so that what only remains is to or these products. But I can't find such multiplier.

Another potential way to go is a compiler intrinsic or assembly instruction, but I don't know of such.


回答1:


NathanOliver's link offers the 16-bit -> 32-bit implementation:

static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF};
static const unsigned int S[] = {1, 2, 4, 8};

unsigned int x; // Interleave lower 16 bits of x and y, so the bits of x
unsigned int y; // are in the even positions and bits from y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.  
            // x and y must initially be less than 65536.

x = (x | (x << S[3])) & B[3];
x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];

y = [the same thing on y]

z = x | (y << 1);

Which works by:

  1. leave the low 8 bits of x where they are. Move the high 8 bits up by 8;
  2. divide in half and do the same thing, this time leaving the low pairs of 4 bits where they are and moving the others up by 4;
  3. and again, and again.

I.e. it proceeds as:

abcdefghijklmnop
-> 00000000abcdefgh 00000000ijklmnop
-> 0000abcd0000efgh 0000ijkl0000mnop
-> 00ab00cd00ef00gh 00ij00kl00mn00op
-> 0a0b0c0d0e0f0g0h 0i0j0k0l0m0n0o0p

And then combines the two inputs together.

As per my earlier comment, to extend that to 64 bits, just add an initial shift by 16 and mask by 0x0000ffff0000ffff, either because you can intuitively follow the pattern or as a divide-and-conquer step, turning the 32-bit problem into two non-overlapping 16-bit problems and then using the 16-bit solution.




回答2:


Would a short, precalculated array lookup count as a "mathematical trick"?

Precalculate an array of 256 uint16_ts:

static const uint16_t lookup[256]={0x0000, 0x0001, 0x0005 ..., 0x5555};

We can interleave two eight-bit values, and come up with a 16 bit value easily:

uint16_t interleave(uint8_t a, uint8_t b)
{
    return (lookup[a] << 1) | lookup[b];
}

How to extend this to interleave two 32-bit values into a 64-bit value should be obvious: call this four times, for each one of the four bytes that make up a uint32_t, then << an | the results together. Bribe the compiler to inline the whole thing, and the end result should be fairly quick and cheap.

Since RAM is cheap these days, you might want to consider a precalculated table of 65536 uint32_ts, also.



来源:https://stackoverflow.com/questions/39490345/interleave-bits-efficiently

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!