There is a library for this—libdivide:
libdivide is an open source library for optimizing integer division
libdivide allows you to replace expensive integer divides with
comparatively cheap multiplication and bitshifts. Compilers usually do
this, but only when the divisor is known at compile time. libdivide
allows you to take advantage of it at runtime. The result is that
integer division can become faster - a lot faster. Furthermore,
libdivide allows you to divide an SSE2 vector by a runtime constant,
which is especially nice because SSE2 has no integer division
instructions!
libdivide is free and open source with a permissive license. The name
"libdivide" is a bit of a joke, as there is no library per se: the
code is packaged entirely as a single header file, with both a C and a
C++ API.
You can read about the algorithm behind it at this blog; for example, this entry.
Basically, the algorithm behind it is the same one that compilers use to optimize division by a constant, except that it allows these strength-reduction optimizations to be done at run-time.
Note: you can create an even faster version of libdivide. The idea is that for every divisor, you can always create a triplet (mul
/add
/shift
), so this expression gives the result: (num
*mul
+add
)>>shift
(multiply is a wide multiply here). Interestingly, this method could beat the compiler version for constant division for several microbenchmarks!
Here's my implementation (this is not compilable out of the box, but the general algorithm can be seen):
struct Divider_u32 {
u32 mul;
u32 add;
s32 shift;
void set(u32 divider);
};
void Divider_u32::set(u32 divider) {
s32 l = indexOfMostSignificantBit(divider);
if (divider&(divider-1)) {
u64 m = static_cast<u64>(1)<<(l+32);
mul = static_cast<u32>(m/divider);
u32 rem = static_cast<u32>(m)-mul*divider;
u32 e = divider-rem;
if (e<static_cast<u32>(1)<<l) {
mul++;
add = 0;
} else {
add = mul;
}
shift = l;
} else {
if (divider==1) {
mul = 0xffffffff;
add = 0xffffffff;
shift = 0;
} else {
mul = 0x80000000;
add = 0;
shift = l-1;
}
}
}
u32 operator/(u32 v, const Divider_u32 &div) {
u32 t = static_cast<u32>((static_cast<u64>(v)*div.mul+div.add)>>32)>>div.shift;
return t;
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…