Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have the following __m128 vectors:

v_weight

v_entropy

I need to add v_entropy to v_weight only where elements in v_weight are not 0f.

Obviously _mm_add_ps() adds all elements regardless.

I can compile up to AVX, but not AVX2.

EDIT

I do know beforehand how many elements in v_weight will be 0 (there will always be either 0 or the last 1, 2, or 3 elements). If it's easier, how do I zero-out the corresponding elements in v_entropy?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

The cmpeq/cmpgt instructions create a mask, all ones or all zeros. The overall process goes as follows:

auto mask=_mm_cmpeq_ps(_mm_setzero_ps(), w);
mask=_mm_andnot_ps(mask, entropy);
w = _mm_add_ps(w, mask);

Other option is to accumulate anyway, but use blendv to select between added/not added.

auto w2=_mm_add_ps(e,w);
auto mask=_mm_cmpeq_ps(zero,w);
w=_mm_blendv_ps(w2,w, mask);

Third option uses the fact that w+e = 0, when w=0

 m=(w==0); // make mask as in above
 w+=e; // add
 w&=~m; // revert adding for w==0

(I'm using cmpeq instead of cmpneq to make it usable for integers as well.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...