I have a very large array that I would like to process on the GPU using a CUDA kernel implemented with Numba, so I split the processing in smaller steps in this way: