pci-e | 易学教程

Linux driver DMA transfer to a PCIe card with PC as master

阅读更多关于 Linux driver DMA transfer to a PCIe card with PC as master

I am working on a DMA routine to transfer data from PC to a FPGA on a PCIe card. I read DMA-API.txt and LDD3 ch. 15 for details. However, I could not figure out how to do a DMA transfer from PC to a consistent block of iomem on the PCIe card. The dad sample for PCI in LDD3 maps a buffer and then tells the card to do the DMA transfer, but I need the PC to do this. What I already found out: Request bus master pci_set_master(pdev); Set the DMA mask if (dma_set_mask(&(pdev->dev), DMA_BIT_MASK(32))) { dev_err(&pdev->dev,"No suitable DMA available.\n"); goto cleanup; } Request a DMA channel if

How do I explain performance variability over PCIe bus?

阅读更多关于 How do I explain performance variability over PCIe bus?

问题 On my CUDA program I see large variability between different runs (upto 50%) in communication time which include host to device and device to host data transfer times over PCI Express for pinned memory. How can I explain this variability? Does it happen when the PCI controller and memory controller is busy performing other PCIe transfers? Any insight/reference is greatly appreciated. The GPU is Tesla K20c, the host is AMD Opteron 6168 with 12 cores running the Linux operating system. The PCI

If I have only the physical address of device buffer (PCIe), how can I map this buffer to user-space?

阅读更多关于 If I have only the physical address of device buffer (PCIe), how can I map this buffer to user-space?

If I have only the physical address of the memory buffer to which is mapped the device buffer via the PCI-Express BAR (Base Address Register), how can I map this buffer to user-space ? For example, how does usually the code should look like in Linux-kernel? unsigned long long phys_addr = ...; // get device phys addr unsigned long long size_buff = ...l // get device size buff // ... mmap(), remap_pfn_range(), Or what should I do now? On: Linux x86_64 From: https://stackoverflow.com/a/17278263/1558037 ioremap() maps a physical address into a kernel virtual address. remap_pfn_range() maps

How do I explain performance variability over PCIe bus?

阅读更多关于 How do I explain performance variability over PCIe bus?

On my CUDA program I see large variability between different runs (upto 50%) in communication time which include host to device and device to host data transfer times over PCI Express for pinned memory. How can I explain this variability? Does it happen when the PCI controller and memory controller is busy performing other PCIe transfers? Any insight/reference is greatly appreciated. The GPU is Tesla K20c, the host is AMD Opteron 6168 with 12 cores running the Linux operating system. The PCI Express version is 2.0. talonmies The system you are doing this on is a NUMA system, which means that

GPUDirect Peer 2 peer using PCIe bus: If I need to access too much data on other GPU, will it not result in deadlocks?

阅读更多关于 GPUDirect Peer 2 peer using PCIe bus: If I need to access too much data on other GPU, will it not result in deadlocks?

问题 I have simulation program which requires a lot of data. I load the data in the GPUs for calculation and there is a lot of dependency in the data. Since 1 GPU was not enough for the data, so I upgraded it to 2 GPUs. but the limitation was, if I required data on other GPU, there had to be a copy to host first. So, if I use GPU Direct P2P, will the PCI bus handle that much of to and fro communication between the GPUs? Wont it result in deadlocks? I am new to this, so need some help and insight.

What is the Base Address Register (BAR) in PCIe?

阅读更多关于 What is the Base Address Register (BAR) in PCIe?

After going through some basics documents what I understood is, Base Address Register is Address space which can be accessed by PCIe IP. PCIe IP can either transmit data in Base Address Register or it can write received data on to it. Am I right? Or missing anything? Paebbels I think this is a very basic question and I would suggest to read: PCI Express Base 3.1 Specification (pcisig.com) or PCI Express Technology 3.0 (MindShare Press) book A Base Address Register (BAR) is used to: - specify how much memory a device wants to be mapped into main memory, and - after device enumeration, it holds

CUDA - how much slower is transferring over PCI-E?

阅读更多关于 CUDA - how much slower is transferring over PCI-E?

If I transfer a single byte from a CUDA kernel to PCI-E to the host (zero-copy memory), how much is it slow compared to transferring something like 200 Megabytes? What I would like to know, since I know that transferring over PCI-E is slow for a CUDA kernel, is: does it change anything if I transfer just a single byte or a huge amount of data? Or perhaps since memory transfers are performed in "bulks", transferring a single byte is extremely expensive and useless with respect to transferring 200 MBs? Hope this pic explain everything. The data is generated by bandwidthTest in CUDA samples. The

How can the linux kernel be forced to enumerate the PCI-e bus?

阅读更多关于 How can the linux kernel be forced to enumerate the PCI-e bus?

问题 Linux kernel 2.6 I've got an fpga that is loaded over GPIO connected to a development board running linux. The fpga will transmit and receive data over the pci-express bus. However, this is enumerated at boot and as such, no link is discovered (because the fpga is not loaded at boot). How can I force re-enumeration of the pci-e bus in linux? Is there a simple command or will I have to make kernel changes? I need the capability to hotplug pcie devices. 回答1: I wonder what platform you are on: A

CUDA - how much slower is transferring over PCI-E?

阅读更多关于 CUDA - how much slower is transferring over PCI-E?

问题 If I transfer a single byte from a CUDA kernel to PCI-E to the host (zero-copy memory), how much is it slow compared to transferring something like 200 Megabytes? What I would like to know, since I know that transferring over PCI-E is slow for a CUDA kernel, is: does it change anything if I transfer just a single byte or a huge amount of data? Or perhaps since memory transfers are performed in "bulks", transferring a single byte is extremely expensive and useless with respect to transferring