Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

When you and some data with a mask you get some result which is of the same size as the data/mask. What I want to do, is to take the masked bits in the result (where there was 1 in the mask) and shift them to the right so they are next to each other and I can perform a CTZ (Count Trailing Zeroes) on them.

I didn't know how to name such a procedure so Google has failed me. The operation should preferably not be a loop solution, this has to be as fast operation as possible.

And here is an incredible image made in MS Paint. enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
335 views
Welcome To Ask or Share your Answers For Others

1 Answer

This operation is known as compress right. It is implemented as part of BMI2 as the PEXT instruction, in Intel processors as of Haswell.

Unfortunately, without hardware support is it a quite annoying operation. Of course there is an obvious solution, just moving the bits one by one in a loop, here is the one given by Hackers Delight:

unsigned compress(unsigned x, unsigned m) {
   unsigned r, s, b;    // Result, shift, mask bit. 

   r = 0; 
   s = 0; 
   do {
      b = m & 1; 
      r = r | ((x & b) << s); 
      s = s + b; 
      x = x >> 1; 
      m = m >> 1; 
   } while (m != 0); 
   return r; 
} 

But there is an other way, also given by Hackers Delight, which does less looping (number of iteration logarithmic in the number of bits) but more per iteration:

unsigned compress(unsigned x, unsigned m) {
   unsigned mk, mp, mv, t; 
   int i; 

   x = x & m;           // Clear irrelevant bits. 
   mk = ~m << 1;        // We will count 0's to right. 

   for (i = 0; i < 5; i++) {
      mp = mk ^ (mk << 1);             // Parallel prefix. 
      mp = mp ^ (mp << 2); 
      mp = mp ^ (mp << 4); 
      mp = mp ^ (mp << 8); 
      mp = mp ^ (mp << 16); 
      mv = mp & m;                     // Bits to move. 
      m = m ^ mv | (mv >> (1 << i));   // Compress m. 
      t = x & mv; 
      x = x ^ t | (t >> (1 << i));     // Compress x. 
      mk = mk & ~mp; 
   } 
   return x; 
}

Notice that a lot of the values there depend only on m. Since you only have 512 different masks, you could precompute those and simplify the code to something like this (not tested)

unsigned compress(unsigned x, int maskindex) {
   unsigned t; 
   int i; 

   x = x & masks[maskindex][0];

   for (i = 0; i < 5; i++) {
      t = x & masks[maskindex][i + 1]; 
      x = x ^ t | (t >> (1 << i));
   } 
   return x; 
}

Of course all of these can be turned into "not a loop" by unrolling, the second and third ways are probably more suitable for that. That's a bit of cheat however.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...