Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I was testing algorithms and run into this weird behavior, when std::accumulate is faster than a simple for cycle.

Looking at the generated assembler I'm not much wiser :-) It seems that the for cycle is optimized into MMX instructions, while accumulate expands into a loop.

This is the code. The behavior manifests with -O3 optimization level, gcc 4.7.1

#include <vector>                                                                                                                                                                                                                                                              
#include <chrono>                                                                                                                                                                                                                                                              
#include <iostream>                                                                                                                                                                                                                                                            
#include <random>                                                                                                                                                                                                                                                              
#include <algorithm>                                                                                                                                                                                                                                                           
using namespace std;                                                                                                                                                                                                                                                           

int main()                                                                                                                                                                                                                                                                     
{                                                                                                                                                                                                                                                                              
    const size_t vsize = 100*1000*1000;                                                                                                                                                                                                                                        

    vector<int> x;
    x.reserve(vsize);

    mt19937 rng;
    rng.seed(chrono::system_clock::to_time_t(chrono::system_clock::now()));

    uniform_int_distribution<uint32_t> dist(0,10);

    for (size_t i = 0; i < vsize; i++)
    {
        x.push_back(dist(rng));
    }

    long long tmp = 0;
    for (size_t i = 0; i < vsize; i++)
    {
        tmp += x[i];
    }

    cout << "dry run " << tmp << endl;

    auto start = chrono::high_resolution_clock::now();
    long long suma = accumulate(x.begin(),x.end(),0);
    auto end = chrono::high_resolution_clock::now();

    cout << "Accumulate runtime " << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << " - " << suma << endl;

    start = chrono::high_resolution_clock::now();
    suma = 0;
    for (size_t i = 0; i < vsize; i++)
    {
        suma += x[i];
    }
    end = chrono::high_resolution_clock::now();

    cout << "Manual sum runtime " << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << " - " << suma <<  endl;

    return 0;
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
434 views
Welcome To Ask or Share your Answers For Others

1 Answer

When you pass the 0 to accumulate, you are making it accumulate using an int instead of a long long.

If you code your manual loop like this, it will be equivalent:

int sumb = 0;
for (size_t i = 0; i < vsize; i++)
{
    sumb += x[i];
}
suma = sumb;

or you can call accumulate like this:

long long suma = accumulate(x.begin(),x.end(),0LL);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...