Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Today I have found sample code which slowed down by 50%, after adding some unrelated code. After debugging I have figured out the problem was in the loop alignment. Depending of the loop code placement there is different execution time e.g.:

Address Time[us]
00007FF780A01270 980us
00007FF7750B1280 1500us
00007FF7750B1290 986us
00007FF7750B12A0 1500us
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
302 views
Welcome To Ask or Share your Answers For Others

1 Answer

In the slow cases (i.e., 00007FF7750B1280 and 00007FF7750B12A0), the jne instruction crosses a 32-byte boundary. The mitigations for the "Jump Conditional Code" (JCC) erratum (https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf) prevent such instructions from being cached in the DSB. The JCC erratum only applies to Skylake-based CPUs, which is why the effect does not occur on your i5-3570k CPU.

As Peter Cordes pointed out in a comment, recent compilers have options that try to mitigate this effect. Intel JCC Erratum - should JCC really be treated separately? mentions MSVC's /QIntel-jcc-erratum option; another related question is How can I mitigate the impact of the Intel jcc erratum on gcc?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...