I've problems getting my g++ 5.4 use vectorization for comparison. Basically I want to compare 4 unsigned ints using vectorization. My first approach was straight forward:
bool compare(unsigned int const pX[4]) {
bool c1 = (temp[0] < 1);
bool c2 = (temp[1] < 2);
bool c3 = (temp[2] < 3);
bool c4 = (temp[3] < 4);
return c1 && c2 && c3 && c4;
}
Compiling with g++ -std=c++11 -Wall -O3 -funroll-loops -march=native -mtune=native -ftree-vectorize -msse -msse2 -ffast-math -fopt-info-vec-missed
told be, that it could not vectorize the comparison due to misaligned data:
main.cpp:5:17: note: not vectorized: failed to find SLP opportunities in basic block.
main.cpp:5:17: note: misalign = 0 bytes of ref MEM[(const unsigned int *)&x]
main.cpp:5:17: note: misalign = 4 bytes of ref MEM[(const unsigned int *)&x + 4B]
main.cpp:5:17: note: misalign = 8 bytes of ref MEM[(const unsigned int *)&x + 8B]
main.cpp:5:17: note: misalign = 12 bytes of ref MEM[(const unsigned int *)&x + 12B]
Thus my second attempt was to tell g++ to align the data and use a temporary array:
bool compare(unsigned int const pX[4] ) {
unsigned int temp[4] __attribute__ ((aligned(16)));
temp[0] = pX[0];
temp[1] = pX[1];
temp[2] = pX[2];
temp[3] = pX[3];
bool c1 = (temp[0] < 1);
bool c2 = (temp[1] < 2);
bool c3 = (temp[2] < 3);
bool c4 = (temp[3] < 4);
return c1 && c2 && c3 && c4;
}
However, same output. AVX2 is supported by my CPU and intel intrinsic guide tells me, there is e.g. _mm256_cmpgt_epi8/16/32/64
for comparison. Any idea how to tell the g++ to use this?