Up for consideration is the following function which can be used to (relatively quickly) factor a 64-bit unsigned integer into its prime factors. Note that the factoring is not probabalistic (i.e., it is exact). The algorithm is already fast enough to find that a number is prime or has few very large factors in a matter of several seconds, on modern hardware.
The question: Can any improvements be made to the algorithm presented, while keeping it single-threaded, so that it can factor (arbitrary) very large unsigned 64-bit integers faster, preferably without using a probabalistic approach (e.g., Miller-Rabin) for determining primality?
// system specific typedef for ulong should go here (or use boost::uint64_t)
typedef unsigned __int64 ulong;
typedef std::vector<ulong> ULongVector;
// Caller needs to pass in an empty factors vector
void GetFactors(ULongVector &factors, ulong num)
{
// Num has to be at least 2 to contain "prime" factors
if (num<2)
return;
ulong workingNum=num;
ulong nextOffset=2; // Will be used to skip multiples of 3, later
// Factor out factors of 2
while (workingNum%2==0)
{
factors.push_back(2);
workingNum/=2;
}
// Factor out factors of 3
while (workingNum%3==0)
{
factors.push_back(3);
workingNum/=3;
}
// If all of the factors were 2s and 3s, done...
if (workingNum==1)
return;
// sqrtNum is the (inclusive) upper bound of our search for factors
ulong sqrtNum=(ulong) sqrt(double(workingNum+0.5));
// Factor out potential factors that are greate than or equal to 5
// The variable n represents the next potential factor to be tested
for (ulong n=5;n<=sqrtNum;)
{
// Is n a factor of the current working number?
if (workingNum%n==0)
{
// n is a factor, so add it to the list of factors
factors.push_back(n);
// Divide current working number by n, to get remaining number to factor
workingNum/=n;
// Check if we've found all factors
if (workingNum==1)
return;
// Recalculate the new upper bound for remaining factors
sqrtNum=(ulong) sqrt(double(workingNum+0.5));
// Recheck if n is a factor of the new working number,
// in case workingNum contains multiple factors of n
continue;
}
// n is not or is no longer a factor, try the next odd number
// that is not a multiple of 3
n+=nextOffset;
// Adjust nextOffset to be an offset from n to the next non-multiple of 3
nextOffset=(nextOffset==2UL ? 4UL : 2UL);
}
// Current workingNum is prime, add it as a factor
factors.push_back(workingNum);
}
Thanks
Edit: I've added even more comments. The reason that a vector is passed in by reference, is to allow for the vector to be reused in between calls and avoid dynamic allocations. The reason the vector is not emptied in the function, is to allow for the odd requirement of appending the current "num's" factors to factors already in the vector.
The function itself is not pretty and can be refactored, but the question is about how to making the algorithm faster. So, please, no suggestions about how to make the function more pretty, readable, or C++ish. That's child's play. Improving this algorithm, so that it can find (proven) factors faster is the difficult part.
Update: Potatoswatter has some excellent solutions so far, be sure to check out his MMX solution near the bottom, as well.
See Question&Answers more detail:os