Latency in ioread

青春壹個敷衍的年華 提交于 2020-01-25 04:42:07

问题


Suppose you have a PCIE device presenting a single BAR and one DMA area declared with pci_alloc_consistent(..). The BAR's flags indicate non-prefetchable, non-cacheable, memory region.

What are the principle causes for latency in reading the DMA area, and similarly, what are the causes of latency reading the BAR?

Thank you for answering this simple question :D!


回答1:


This smells a bit like homework but I suspect the concepts are not well understood by many so I'll add an answer.

The best way to think through this is to consider what needs to happen in order for a read to complete. The CPU and the device are on separate sides of the PCIe link. It's helpful to view PCI-Express as a mini network. Each link is point-to-point (like your PC connected to another PC). There may also be intermediate switches (aka bridges in PCI). In that case, it's like your PC is connected to a switch that is in turn connected to the other PC.

So, if the CPU wants to read its own memory (the "DMA" region you allocated), it's relatively fast. It has a high speed bus that is designed to make that happen fast. Also, there are multiple layers of caching built in to keep frequently (or recently) used data "close" to the CPU.

But if the CPU wants to read from the BAR in the device, the CPU (actually the PCIe root complex integrated with the CPU) must compose a PCIe read request, send the request, and wait while the device decodes the request, accesses the BAR location and sends back the requested data. Tick tock. Your CPU is doing nothing else while it waits for this to complete.

This is pretty much analogous to asking for a web page from another computer. You formulate an HTTP request, send it and wait while the web server accesses the content, formulates a return packet and sends it to you.

If the device wishes to access memory residing "in" the CPU, it's pretty much the exact same thing in reverse. ("Direct memory access" just means that it doesn't need to interrupt the CPU to handle it, but something [the root complex here] is still responsible for decoding the request, fulfilling the read and sending back the resulting data.)

Also, if there are intermediate PCIe switches between CPU and device, those may add additional buffering/queuing delays (exactly as a switch or router might in a network). And any such delays are doubled since they're incurred in both directions.

Of course PCIe is very fast, so all of that happens in mere nanoseconds, but that's still orders of magnitude slower than a "local" read.



来源:https://stackoverflow.com/questions/36431038/latency-in-ioread

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!