Coherently understand the software-hardware interaction with regard to DMA and buses

做~自己de王妃 提交于 2019-12-04 19:54:15

You've slightly conflated two things here - there are some devices (e.g. UARTs, MMC controllers, audio controllers, typically lower-bandwidth devices) which rely on an external DMA controller ("DMA engine" in Linux terminology), but many devices are simply bus masters in their own right and perform their own DMA directly (e.g. GPUs, USB host controllers, and of course the DMA controllers themselves). The former involves a bunch of extra complexity with the CPU programming the DMA controller, so I'm going to ignore it and just consider straightforward bus-master DMA.

In a typical ARM SoC, the CPU clusters and other master peripherals, and the memory controller and other slave peripherals, are all connected together with various AMBA interconnects, forming a single "bus" (generally all mapped to the "platform bus" in Linux), over which masters address slaves according to the address maps of the interconnect. You can safely assume that the device drivers know (whether by device tree or hardcoded) where devices appear in the CPU's physical address map, because otherwise they'd be useless.

On simpler systems, there is a single address map, so the physical addresses used by the CPU to address RAM and peripherals can be freely shared with other masters as DMA addresses. Other systems are more complex - one of the more well-known is the Raspberry Pi's BCM2835, in which the CPU and GPU have different address maps; e.g. the interconnect is hard-wired such that where the GPU sees peripherals at "bus address" 0x7e000000, the CPU sees them at "physical address" 0x20000000. Furthermore, in LPAE systems with 40-bit physical addresses, the interconnect might need to provide different views to different masters - e.g. in the TI Keystone 2 SoCs, all the DRAM is above the 32-bit boundary from the CPUs' point of view, so the 32-bit DMA masters would be useless if the interconnect didn't show them a different addresses map. For Linux, check out the dma-ranges device tree property for how such CPU→bus translations are described. The CPU must take these translations into account when telling a master to access a particular RAM or peripheral address; Linux drivers should be using the DMA mapping API which provides appropriately-translated DMA addresses.

IOMMUs provide more flexibility than fixed interconnect offsets - typically, addresses can be remapped dynamically, and for system integrity masters can be prevented from accessing any addresses other than those mapped for DMA at any given time. Furthermore, in an LPAE or AArch64 system with more than 4GB of RAM, an IOMMU becomes necessary if a 32-bit peripheral needs to be able to access buffers anywhere in RAM. You'll see IOMMUs on a lot of the current 64-bit systems for the purpose of integrating legacy 32-bit devices, but they are also increasingly popular for the purpose of device virtualisation.

IOMMU topology depends on the system and the IOMMUs in use - the system I'm currently working with has 7 separate ARM MMU-401/400 devices in front of individual bus-master peripherals; the ARM MMU-500 on the other hand can be implemented as a single system-wide device with a separate TLB for each master; other vendors have their own designs. Either way, from a Linux perspective, most device drivers should be using the aforementioned DMA mapping API to allocate and prepare physical buffers for DMA, which will also automatically set up the appropriate IOMMU mappings if the device is attached to one. That way, individual device drivers need not care about the presence of an IOMMU or not. Other drivers (typically GPU drivers) however, depend on an IOMMU and want complete control, so manage the mappings directly via the IOMMU API. Essentially, the IOMMU's page tables are set up to map certain ranges of physical addresses* to ranges of I/O virtual addresses, those IOVAs are given to the device as DMA (i.e. bus) addresses, and the IOMMU translates the IOVAs back to physical addresses as the device accesses them. Once the DMA operation is finished, the driver typically removes the IOMMU mapping, both to free up IOVA space and so that the device no longer has access to RAM.

Note that in some cases the DMA transfer is cyclic and never "finishes". With something like a display controller, the CPU might just map a buffer for DMA, pass that address to the controller and trigger it to start, and it will then continuously perform DMA reads to scan out whatever the CPU writes to that buffer until it is told to stop.

Other peripheral buses beyond the SoC interconnect, like I2C/SPI/USB/etc. work as you suspect - there is a bus controller (which is itself a device on the AMBA bus, so any of the above might apply to it) with its own device driver. In a crude generalisation, the CPU doesn't communicate directly with devices on the external bus - where a driver for an AMBA device says "write X to register Y", that just happens by the CPU performing a store to a memory-mapped address; where an I2C device driver says "write X to register Y", the OS usually has some bus abstraction layer which the bus controller driver implements, whereby the CPU programs the controller with a command saying "write X to register Y on device Z", the bus controller hardware will go off and do that, then notify the OS of the peripheral device's response via an interrupt or some other means.

* technically, the IOMMU itself, being more or less "just another device", could have a different address map in the interconnect as previously described, but I would doubt the sanity of anyone actually building a system like that.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!