问题
Saying if we do a mmap() system call and maps some PCIE device memory (like GPU) into the user space, then application can access those memory region in the device without any OS overhead. Data can by copied from file system buffer directly to device memory without any other copy.
Above statement must be wrong... Can anyone tell me where is the flaw? Thanks!
回答1:
For a normal device what you have said is correct. If the GPU memory behaves differently for reads/write, they might do this. We should look at some documentation of cudaMemcpy()
.
From Nvidia's basics of CUDA page 22,
direction specifies locations (host or device) of src and dst Blocks CPU thread: returns after the copy is complete. Doesn't start copying until previous CUDA calls complete
It seems pretty clear that the cudaMemcpy()
is synchronized to prior GPU registers writes, which may have caused the mmap()
memory to be updated. As the GPU pipeline is a pipeline, prior command issues may not have completed when cudaMemcpy()
is issued from the CPU.
来源:https://stackoverflow.com/questions/20298147/mmap-device-memory-into-user-space