Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have the following function:

template <typename T>
void SSE_vectormult(T * A, T * B, int size)
{

    __m128d a;
    __m128d b;
    __m128d c;
    double A2[2], B2[2], C[2];
    const double * A2ptr, * B2ptr;
    A2ptr = &A2[0];
    B2ptr = &B2[0];
    a = _mm_load_pd(A);
    for(int i = 0; i < size; i+=2)
    {
        std::cout << "In SSE_vectormult: i is: " << i << '
';
        A2[0] = A[i];
        B2[0] = B[i];
        A2[1] = A[i+1];
        B2[1] = B[i+1];
        std::cout << "Values from A and B written to A2 and B2
";
        a = _mm_load_pd(A2ptr);
        b = _mm_load_pd(B2ptr);
        std::cout << "Values converted to a and b
";
        c = _mm_mul_pd(a,b);
        _mm_store_pd(C, c);
        A[i] = C[0];
        A[i+1] = C[1];
    };
//    const int mask = 0xf1;
//    __m128d res = _mm_dp_pd(a,b,mask);
//    r1 = _mm_mul_pd(a, b);
//    r2 = _mm_hadd_pd(r1, r1);
//    c = _mm_hadd_pd(r2, r2);
//    c = _mm_scale_pd(a, b);
//    _mm_store_pd(A, c);
}

When I am calling it on Linux, everything is fine, but when I am calling it on a windows OS, my program crashes with "program is not working anymore". What am I doing wrong, and how can I determine my error?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
197 views
Welcome To Ask or Share your Answers For Others

1 Answer

Your data is not guaranteed to be 16 byte aligned as required by SSE loads. Either use _mm_loadu_pd:

    a = _mm_loadu_pd(A);
    ...
    a = _mm_loadu_pd(A2ptr);
    b = _mm_loadu_pd(B2ptr);

or make sure that your data is correctly aligned where possible, e.g. for static or locals:

alignas(16) double A2[2], B2[2], C[2];    // C++11, or C11 with <stdalign.h>

or without C++11, using compiler-specific language extensions:

 __attribute__ ((aligned(16))) double A2[2], B2[2], C[2];   // gcc/clang/ICC/et al

__declspec (align(16))         double A2[2], B2[2], C[2];   // MSVC

You could use #ifdef to #define an ALIGN(x) macro that works on the target compiler.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...