What accounts for the added execution time of the first data set? The assembly instructions are the same.
With DN_FLUSH flag not on, the first data set takes 63 milliseconds, the second set takes 15 milliseconds.
With DN_FLUSH flag on, the first data set takes 15 milliseconds, the second set takes ~0 milliseconds.
Therefore, in both cases the execution time of the first data set is much greater.
Is there any way to decrease the execution time to be closer in line with the second data set?
I am using C++ Visual Studio 2005, /arch:SSE2 /fp:fast running on Intel Core 2 Duo T7700 @ 2.4Ghz Windows XP Pro.
#define NUMLOOPS 1000000
// Denormal values flushed to zero by hardware on ALPHA and x86
// processors with SSE2 support. Ignored on other x86 platforms
// Setting this decreases execution time from 63 milliseconds to 16 millisecond
// _controlfp(_DN_FLUSH, _MCW_DN);
float denormal = 1.0e-38;
float denormalTwo = 1.0e-39;
float denormalThree = 1;
tickStart = GetTickCount();
// Run First Calculation Loop
for (loops=0; loops < NUMLOOPS; loops++)
{
denormalThree = denormal - denormalTwo;
}
// Get execution time
duration = GetTickCount()-tickStart;
printf("Duration = %dms
", duration);
float normal = 1.0e-10;
float normalTwo = 1.0e-2;
float normalThree = 1;
tickStart = GetTickCount();
// Run Second Calculation Loop
for (loops=0; loops < NUMLOOPS; loops++)
{
normalThree = normal - normalTwo;
}
// Get execution time
duration = GetTickCount()-tickStart;
printf("Duration = %dms
", duration);
See Question&Answers more detail:os