Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

What accounts for the added execution time of the first data set? The assembly instructions are the same.

With DN_FLUSH flag not on, the first data set takes 63 milliseconds, the second set takes 15 milliseconds.
With DN_FLUSH flag on, the first data set takes 15 milliseconds, the second set takes ~0 milliseconds.

Therefore, in both cases the execution time of the first data set is much greater.

Is there any way to decrease the execution time to be closer in line with the second data set?

I am using C++ Visual Studio 2005, /arch:SSE2 /fp:fast running on Intel Core 2 Duo T7700 @ 2.4Ghz Windows XP Pro.

#define NUMLOOPS 1000000

// Denormal values flushed to zero by hardware on ALPHA and x86
// processors with SSE2 support. Ignored on other x86 platforms
// Setting this decreases execution time from 63 milliseconds to 16 millisecond
// _controlfp(_DN_FLUSH, _MCW_DN);

float denormal = 1.0e-38;
float denormalTwo = 1.0e-39;
float denormalThree = 1;

tickStart = GetTickCount();

// Run First Calculation Loop 
for (loops=0; loops < NUMLOOPS; loops++)
{
    denormalThree = denormal - denormalTwo;
}

// Get execution time
duration = GetTickCount()-tickStart;
printf("Duration = %dms
", duration);

float normal = 1.0e-10;
float normalTwo = 1.0e-2;
float normalThree = 1;

tickStart = GetTickCount();

// Run Second Calculation Loop 
for (loops=0; loops < NUMLOOPS; loops++)
{
    normalThree = normal - normalTwo;
}

// Get execution time
duration = GetTickCount()-tickStart;
printf("Duration = %dms
", duration);
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
315 views
Welcome To Ask or Share your Answers For Others

1 Answer

Quoting from Intel's optimization manual:

When an input operand for a SIMD floating-point instruction [here this includes scalar arithmetic done using SSE] contains values that are less than the representable range of the data type, a denormal exception occurs. This causes a significant performance penalty. An SIMD floating-point operation has a flush-to-zero mode in which the results will not underflow. Therefore subsequent computation will not face the performance penalty of handling denormal input operands.

As for how to avoid this, if you can't flush denormals: do what you can to make sure your data is scaled appropriately and you don't encounter denormals in the first place. Usually this means delaying applying some scale factor until you've finished all of your other computation.

Alternatively, do your computations in double which has a much larger exponent range, and therefore makes it much less likely that you will encounter denormals in the first place.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...