PyCuda学习三之--共享内存与Thread的同步
共享内存与Thread的同步 给出3072*3072大小的数组, 每一个元素都是整数, 现在要做的就是, 将每个元素的立方相加, 并求出最终的结果. 首先,我们先用PyCuda基础知识写出来一个可以运行的程序. import time import numpy as np import pycuda . autoinit import pycuda . driver as cuda from pycuda . compiler import SourceModule mod = SourceModule ( """ __global__ void sumOfSquares(int* num, int *result, size_t N) { int index = threadIdx.x + blockIdx.x * blockDim.x; int stride = blockDim.x * gridDim.x; int sum = 0; for (int i = index; i < N; i += stride) { sum += num[i]*num[i]*num[i]; } result[index] = sum; } """ ) def test ( N , np_seed ) : np . random . seed ( np_seed ) a = np .