c++ - Compiling code containing dynamic parallelism fails

Question

Welcome To Ask or Share your Answers For Others

c++ - Compiling code containing dynamic parallelism fails

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

I am doing dynamic parallelism programming using CUDA 5.5 and an NVDIA GeForce GTX 780 whose compute capability is 3.5. I am calling a kernel function inside a kernel function but it is giving me an error:

error : calling a __global__ function("kernel_6") from a __global__ function("kernel_5") is only allowed on the compute_35 architecture or above

What am I doing wrong?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

266 views

1 Answer

深蓝 · Answer 1 · 2021-10-17T01:13:36+0000

You can do something like this

nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt

or

If you have 2 files simple1.cu and test.c then you can do something as below. This is called seperate compilation.

nvcc -arch=sm_35 -dc simple1.cu 
nvcc -arch=sm_35 -dlink simple1.o -o link.o -lcudadevrt
g++ -c test.c 
g++ link.o simple1.o test.o -o simple -L/usr/local/cuda/lib64/ -lcudart

The same is explained in the cuda programming guide

Categories

c++ - Compiling code containing dynamic parallelism fails

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags