After reading the question and its answer from the following
LINK
I still have a question remaining in my mind. From my background in C/C++; I understand that using volatile
has it's demerits. And also it is pointed in the answers that in case of CUDA, the optimizations can replace shared array with registers to keep data if volatile
keyword is not used.
I want to know what would be the performance issues that can be encountered when calculating (sum) reduction. e.g.
__device__ void sum(volatile int *s_data, int tid)
{
if (tid < 16)
{
s_data[tid] += s_data[tid + 16];
s_data[tid] += s_data[tid + 8];
s_data[tid] += s_data[tid + 4];
s_data[tid] += s_data[tid + 2];
s_data[tid] += s_data[tid + 1];
}
}
I am using in warp reduction. Since all the threads with in warp are in sync, therefore I believe there is no need to use syncthreads()
construct.
I want to know will removing the keyword volatile
mess up my sum (due to cuda optimizations)? Can I use reduction such as this without volatile
keyword.
Since I use this reduction function multiple time, will volatile
keyword cause any performance degradation?