Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am trying to find out the number of 1's in binary form of a large decimal number (decimal number can be as large as 1000000).

I tried this piece of code:

while(sum>0)  
{  
    if(sum%2 != 0)  
    {  
        c++;   // counting number of ones  
    }  
    sum=sum/2;  
}  

I want a faster algorithm as it takes long time for large decimal input. Please suggest me an efficient algorithm.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
475 views
Welcome To Ask or Share your Answers For Others

1 Answer

What you are looking for is "popcount", which is implemented as a single CPU instruction on later x64 CPU's, which won't be beaten for speed:

#ifdef __APPLE__
#define NAME(name) _##name
#else
#define NAME(name) name
#endif

/*
 * Count the number of bits set in the bitboard.
 *
 * %rdi: bb
 */
.globl NAME(cpuPopcount);
NAME(cpuPopcount):
    popcnt %rdi, %rax
    ret

But of course, you'll need to test the CPU supports it first:

/*
 * Test if the CPU has the popcnt instruction.
 */
.globl NAME(cpuHasPopcount);
NAME(cpuHasPopcount):
    pushq %rbx

    movl $1, %eax
    cpuid                   // ecx=feature info 1, edx=feature info 2

    xorl %eax, %eax

    testl $1 << 23, %ecx
    jz 1f
    movl $1, %eax

1:
    popq %rbx
    ret

Here's an implementation in C:

unsigned cppPopcount(unsigned bb)
{
#define C55 0x5555555555555555ULL
#define C33 0x3333333333333333ULL
#define C0F 0x0f0f0f0f0f0f0f0fULL
#define C01 0x0101010101010101ULL

    bb -= (bb >> 1) & C55;              // put count of each 2 bits into those 2 bits
    bb = (bb & C33) + ((bb >> 2) & C33);// put count of each 4 bits into those 4 bits
    bb = (bb + (bb >> 4)) & C0F;        // put count of each 8 bits into those 8 bits
    return (bb * C01) >> 56;            // returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + ...
}

The GNU C Compiler runtime contains a "built-in" which might be faster than the implementation above (it might use the CPU popcnt instruction, but that's implementation-specific):

unsigned builtinPopcount(unsigned bb)
{
    return __builtin_popcountll(bb);
}

All of the above implementations are used in my C++ chess library as popcount plays a vital role in chess move generation when bitboards are used to represent piece positions. I use a function pointer, set-up during library initialisation, to point to the implementation requested by the user and then use the popcount function via that pointer.

Google will provide many other implementations as it's an interesting problem, for example: http://wiki.cs.pdx.edu/forge/popcount.html.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...