How do I explain performance variability over PCIe bus?

问题

On my CUDA program I see large variability between different runs (upto 50%) in communication time which include host to device and device to host data transfer times over PCI Express for pinned memory. How can I explain this variability? Does it happen when the PCI controller and memory controller is busy performing other PCIe transfers? Any insight/reference is greatly appreciated. The GPU is Tesla K20c, the host is AMD Opteron 6168 with 12 cores running the Linux operating system. The PCI Express version is 2.0.

回答1:

The system you are doing this on is a NUMA system, which means that each of the two discrete CPUs (the Opteron 6168 has two 6 core CPUs in a single package) in your host has its own memory controller and there maybe a different number of HyperTransport hops between each CPUs memory and the PCI-e controller hosting your CUDA device.

This means that, depending on CPU affinity, the thread which runs your bandwidth tests may have different latency to both host memory and the GPU. This would explain the differences in timings which you are seeing

来源：https://stackoverflow.com/questions/32774189/how-do-i-explain-performance-variability-over-pcie-bus

标签

cuda

gpu

data-transfer

pci-e

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!