Why is my rather trivial CUDA program erring with certain arguments?
问题 I made a simple CUDA program for practice. It simply copies over data from one array to another: import pycuda.driver as cuda import pycuda.autoinit import numpy as np from pycuda.compiler import SourceModule # Global constants N = 2**20 # size of array a a = np.linspace(0, 1, N) e = np.empty_like(a) block_size_x = 512 # Instantiate block and grid sizes. block_size = (block_size_x, 1, 1) grid_size = (N / block_size_x, 1) # Create the CUDA kernel, and run it. mod = SourceModule(""" __global__