cudamemcpyasync, memcpy fails to copy inside kernel while direct copying works
问题 I am trying to copy from a source float array(containing 1.0f) to a destination float array(containing 2.0f) inside a cuda kernel. I try three different ways using: cudamemcpysync memcpy direct copy (dst[i] = src[i]) When i read the results after the kernel has been executed I found that both cudamemcpyasync and memcpy has failed to copy while the direct copy method has worked. Why has the cudamemcpyasync and memcpy method failed? I am using GTX TitanX(SM_52). compiled using: nvcc -arch