For the purpose of testing printf() call on device, I wrote a simple program which copies an array of moderate size to device and print the value of device array to screen. Although the array is correctly copied to device, the printf() function does not work correctly, which lost the first several hundred numbers. The array size in the code is 4096. Is this a bug or I'm not using this function properly? Thanks in adavnce.
EDIT: My gpu is GeForce GTX 550i, with compute capability 2.1
My code:
#include<stdio.h>
#include<stdlib.h>
#define N 4096
__global__ void Printcell(float *d_Array , int n){
int k = 0;
printf("
=========== data of d_Array on device==============
");
for( k = 0; k < n; k++ ){
printf("%f ", d_Array[k]);
if((k+1)%6 == 0) printf("
");
}
printf("
Totally %d elements has been printed", k);
}
int main(){
int i =0;
float Array[N] = {0}, rArray[N] = {0};
float *d_Array;
for(i=0;i<N;i++)
Array[i] = i;
cudaMalloc((void**)&d_Array, N*sizeof(float));
cudaMemcpy(d_Array, Array, N*sizeof(float), cudaMemcpyHostToDevice);
cudaDeviceSynchronize();
Printcell<<<1,1>>>(d_Array, N); //Print the device array by a kernel
cudaDeviceSynchronize();
/* Copy the device array back to host to see if it was correctly copied */
cudaMemcpy(rArray, d_Array, N*sizeof(float), cudaMemcpyDeviceToHost);
printf("
");
for(i=0;i<N;i++){
printf("%f ", rArray[i]);
if((i+1)%6 == 0) printf("
");
}
}
See Question&Answers more detail:os