Mastering The DMA and IOMMU Apis: Embedded Linux Conference 2014 San Jose
Mastering The DMA and IOMMU Apis: Embedded Linux Conference 2014 San Jose
Mastering The DMA and IOMMU Apis: Embedded Linux Conference 2014 San Jose
APIs
Embedded Linux Conference 2014
San Jose
Laurent Pinchart
laurent.pinchart@ideasonboard.com
DMA != DMA
DMA != DMA
(mapping)
(engine)
Memory
Access
CPU
Core
Memory
Controller
Memory
Simple Case
Device
CPU
Core
Memory
Controller
Device
2
Memory
Simple Case
CPU
Core
Memory
Controller
Memory
Write Buffer
Device
CPU
Core
Memory
Controller
Device
3
Memory
Write Buffer
CPU
Core
L1
Cache
Memory
Controller
Memory
L1 Cache
Device
CPU
Core
L1
Cache
Memory
Controller
3
Memory
L1 Cache
Device
CPU
Core
L1
Cache
L2 Cache
CPU
Core
Memory
Controller
L1
Cache
Memory
L2 Cache
Device
CPU
Core
L1
Cache
Memory
Controller
L2 Cache
Device
3
CPU
Core
L1
Cache
2
1
L2 Cache
Memory
L1
Cache
L2 Cache
CPU
Core
L1
Cache
Cache Coherent
Interconnect
CPU
Core
Device
Memory
Controller
Memory
L1
Cache
L2 Cache
CPU
Core
L1
Cache
Cache Coherent
Interconnect
CPU
Core
Device
Memory
Controller
Memory
(1) CPU writes to memory
(2) Device reads from memory
L2 Cache
CPU
Core
L1
Cache
IOMMU
L1
Cache
Cache Coherent
Interconnect
CPU
Core
Memory
Controller
Memory
IOMMU
Device
L1
Cache
L2 Cache
CPU
Core
L1
Cache
IOMMU
CPU
Core
Cache Coherent
Interconnect
Memory
Controller
Memory
(1) CPU writes to memory
(2) CPU programs the IOMMU
(3) Device reads from memory
IOMMU
Device
Memory
Mappings
Fully Coherent
Coherent (or consistent) memory is memory for which a write by either the
device or the processor can immediately be read by the processor or device
without having to worry about caching effects.
Consistent memory can be expensive on some platforms, and the minimum
allocation length may be as big as a page.
Write Combining
Weakly Ordered
Reads and writes to the mapping may be weakly ordered, that is that reads
and writes may pass each other. Not all architectures support non-cached
weakly ordered mappings.
Non-Coherent
Cache
Management
#include <asm/cacheflush.h>
#include <asm/cacheflush.h>
#include <asm/outercache.h>
#include <asm/cacheflush.h>
#include <asm/outercache.h>
Conclusion
DMA Mapping
API
#include <linux/dma-mapping.h>
linux/dma-mapping.h
linux/dma-attrs.h
linux/dma-direction.h
linux/scatterlist.h
#ifdef CONFIG_HAS_DMA
asm/dma-mapping.h
#else
asm-generic/dma-mapping-broken.h
#endif
linux/dma-mapping.h
linux/dma-attrs.h
linux/dma-direction.h
linux/scatterlist.h
arch/arm/include/asm/dma-mapping.h
asm-generic/dma-mapping-common.h
asm-generic/dma-coherent.h
DMA Coherent
Mapping
/* asm-generic/dma-mapping.h */
void *
dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle,
gfp_t flag);
This routine allocates a region of @size bytes of coherent memory. It also
returns a @dma_handle which may be cast to an unsigned integer the same
width as the bus and used as the device address base of the region.
Returns: a pointer to the allocated region (in the processor's virtual address
space) or NULL if the allocation failed.
Note: coherent memory can be expensive on some platforms, and the
minimum allocation length may be as big as a page, so you should
consolidate your requests for consistent memory as much as possible. The
simplest way to do that is to use the dma_pool calls.
Coherent Allocation
/* asm-generic/dma-mapping.h */
void
dma_free_coherent(struct device *dev, size_t size,
void *cpu_addr,
dma_addr_t dma_handle);
Free memory previously allocated by dma_free_coherent(). Unlike with CPU
memory allocators, calling this function with a NULL cpu_addr is not safe.
Coherent Allocation
/* asm-generic/dma-mapping.h */
void *
dma_alloc_attrs(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flag,
struct dma_attrs *attrs);
void
dma_free_attrs(struct device *dev, size_t size,
void *cpu_addr, dma_addr_t dma_handle,
struct dma_attrs *attrs);
Those two functions extend the coherent memory allocation API by allowing
the caller to specify attributes for the allocated memory. When @attrs is NULL
the behaviour is identical to the dma_*_coherent() functions.
Attribute-Based Allocation
Allocation Attributes
DMA_ATTR_WRITE_COMBINE
DMA_ATTR_WEAK_ORDERING
DMA_ATTR_NON_CONSISTENT
DMA_ATTR_WRITE_BARRIER
DMA_ATTR_FORCE_CONTIGUOUS
DMA_ATTR_WRITE_COMBINE
DMA_ATTR_WEAK_ORDERING
DMA_ATTR_NON_CONSISTENT
DMA_ATTR_WRITE_BARRIER
DMA_ATTR_FORCE_CONTIGUOUS
DMA_ATTR_NO_KERNEL_MAPPING
DMA_ATTR_SKIP_CPU_SYNC
DMA Mask
/* asm/dma-mapping.h */
int dma_set_mask(struct device *dev, u64 mask),
/* linux/dma-mapping.h */
int dma_set_coherent_mask(struct device *dev, u64 mask);
int dma_set_mask_and_coherent(struct device *dev,
u64 mask);
DMA Mask
/* linux/device.h */
struct device {
...
u64
u64
...
};
*dma_mask;
coherent_dma_mask;
/* linux/dma-mapping.h */
int dma_coerce_mask_and_coherent(struct device *dev,
u64 mask);
DMA Mask
Userspace
Mapping
/* asm-generic/dma-mapping.h */
/* Implemented on arm, arm64 and powerpc */
int dma_mmap_attrs(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr,
dma_addr_t dma_addr, size_t size,
struct dma_attrs *attrs);
/* Wrappers */
int dma_mmap_coherent(...);
int dma_mmap_writecombine(...);
Userspace Mapping
Userspace Mapping
/*
* Implemented on arc, avr32, blackfin, cris, m68k and
* metag
*/
int dma_mmap_coherent(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr,
dma_addr_t dma_addr, size_t size);
/* Implemented on metag */
int dma_mmap_writecombine(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr,
dma_addr_t dma_addr,
size_t size);
Userspace Mapping
DMA
Streaming
Mapping
/* linux/dma-direction.h */
enum dma_data_direction {
DMA_BIDIRECTIONAL = 0,
DMA_TO_DEVICE = 1,
DMA_FROM_DEVICE = 2,
DMA_NONE = 3,
};
DMA Direction
/* asm-generic/dma-mapping.h */
dma_addr_t
dma_map_single_attrs(struct device *dev, void *ptr,
size_t size,
enum dma_data_direction dir,
struct dma_attrs *attrs);
void
dma_unmap_single_attrs(struct device *dev,
dma_addr_t addr, size_t size,
enum dma_data_direction dir,
struct dma_attrs *attrs);
dma_addr_t dma_map_single(...);
void dma_unmap_single(...);
Device Mapping
/* asm-generic/dma-mapping.h */
dma_addr_t
dma_map_page(struct device *dev, struct page *page,
size_t offset, size_t size,
enum dma_data_direction dir);
void
dma_unmap_page(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir);
Device Mapping
/* asm-generic/dma-mapping.h */
int
dma_map_sg_attrs(struct device *dev,
struct scatterlist *sg, int nents,
enum dma_data_direction dir,
struct dma_attrs *attrs);
void
dma_unmap_sg_attrs(struct device *dev,
struct scatterlist *sg,
int nents,
enum dma_data_direction dir,
struct dma_attrs *attrs);
int dma_map_sg(...);
void dma_unmap_sg(...);
Device Mapping
/* asm/dma-mapping.h */
int
dma_mapping_error(struct device *dev,
dma_addr_t dma_addr);
In some circumstances dma_map_*() will fail to create
a mapping. A driver can check for these errors by testing the returned DMA
address with dma_mapping_error(). A non-zero return value means the
mapping could not be created and the driver should take appropriate action
(e.g. reduce current DMA mapping usage or delay and try again later).
Error Checking
/* asm-generic/dma-mapping.h */
void
dma_sync_single_for_cpu(struct device *dev,
dma_addr_t addr, size_t size,
enum dma_data_direction dir);
void
dma_sync_single_for_device(struct device *dev,
dma_addr_t addr, size_t size,
enum dma_data_direction dir);
Synchronization
/* asm-generic/dma-mapping.h */
void
dma_sync_single_for_*(struct device *dev,
dma_addr_t addr, size_t size,
enum dma_data_direction dir);
void
dma_sync_single_range_for_*(struct device *dev,
dma_addr_t addr,
unsigned long offset,
size_t size,
enum dma_data_direction dir);
void
dma_sync_sg_for_*(struct device *dev,
struct scatterlist *sg, int nelems,
enum dma_data_direction dir);
(* = cpu or device)
Synchronization
Contiguous
Memory
Allocation
#include <linux/dma-contiguous.h>
drivers/base/dma-contiguous.h
CMA
/* linux/dma-contiguous.h */
void dma_contiguous_reserve(phys_addr_t addr_limit);
int dma_declare_contiguous(struct device *dev,
phys_addr_t size,
phys_addr_t base,
phys_addr_t limit);
/* linux/dma-contiguous.h */
void dma_contiguous_reserve(phys_addr_t addr_limit);
This function reserves memory from early allocator. It should be called by arch
specific code once the early allocator (memblock or bootmem) has been
activated and all other subsystems have already allocated/reserved memory.
The size of the reserved memory area is specified through the kernel
configuration and can be overridden on the kernel command line. An area of
the given size is reserved from the early allocator for contiguous allocation.
int dma_declare_contiguous(struct device *dev,
phys_addr_t size,
phys_addr_t base,
phys_addr_t limit);
This function reserves memory for the specified device. It should be called by
board specific code when early allocator (memblock or bootmem) has been
activated.
IOMMU
Integration
#include <linux/iommu.h>
IOMMU API
/* linux/iommu.h */
struct iommu_domain *
iommu_domain_alloc(struct bus_type *bus);
void iommu_domain_free(struct iommu_domain *domain);
int iommu_attach_device(struct iommu_domain *domain,
struct device *dev);
void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
int iommu_map(struct iommu_domain *domain,
unsigned long iova, phys_addr_t paddr,
size_t size, int prot);
size_t iommu_unmap(struct iommu_domain *domain,
unsigned long iova, size_t size);
IOMMU API
/* asm/dma-mapping.h */
struct dma_iommu_mapping *
arm_iommu_create_mapping(struct bus_type *bus,
dma_addr_t base, size_t size);
void arm_iommu_release_mapping(
struct dma_iommu_mapping *mapping);
int arm_iommu_attach_device(struct device *dev,
struct dma_iommu_mapping *mapping);
void arm_iommu_detach_device(struct device *dev);
Device Tree
Bindings
Documentation/devicetree/bindings/iommu
Coherent Mappings
Problems &
Issues
Generic Problems
ARM-Specific Problems
Resources
Documentation/DMA-API-HOWTO.txt
Documentation/DMA-API.txt
Documentation/DMA-attributes.txt
http://community.arm.com/groups/proce
ssors/blog/2011/03/22/memory-accessordering-an-introduction
http://elinux.org/images/7/73/Deaconweak-to-weedy.pdf
https://lwn.net/Articles/486301/
Documentation
linux-kernel@vger.kernel.org
linux-arm-kernel@lists.infradead.org
laurent.pinchart@ideasonboard.com
Contact
?!
Thx.
Advanced
Topics
DMA Coherent
Memory Pool
/* linux/dmapool.h */
The DMA mapping API allocates buffers in at least page size chunks. If your
driver needs lots of smaller memory regions you can use the DMA pool API to
subdivide pages returned by dma_alloc_coherent().
struct dma_pool *
dma_pool_create(const char *name, struct device *dev,
size_t size, size_t align,
size_t boundary);
This function creates a DMA allocation pool to allocate buffers of the given
@size and alignment characteristics (@ must be a power of two and can be
set to zero). If @boundary is nonzero, objects returned from dma_pool_alloc()
won't cross that size boundary. This is useful for devices which have
addressing restrictions on individual DMA transfers.
Given one of these pools, dma_pool_alloc() may be used to allocate memory.
Such memory will all have consistent DMA mappings, accessible by the
device and its driver without using cache flushing primitives.
DMA Pool
/* linux/dmapool.h */
void dma_pool_destroy(struct dma_pool *pool);
Destroy a DMA pool. The caller guarantees that no more memory from the
pool is in use,and that nothing will try to use the pool after this call. A DMA
pool can't be destroyed in interrupt context.
void *dma_pool_alloc(struct dma_pool *pool,
gfp_t mem_flags,
dma_addr_t *handle);
This returns the kernel virtual address of a currently unused block, and reports
its DMA address through the handle. Return NULL when allocation fails.
void dma_pool_free(struct dma_pool *pool, void *vaddr,
dma_addr_t addr);
Puts memory back into the pool. The CPU (vaddr) and DMA addresses are
what were returned when dma_pool_alloc() allocated the memory being freed.
DMA Pool
NonCoherent
Mapping
/* asm-generic/dma-mapping.h */
void *
dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *dma_handle,
gfp_t flag);
void
dma_free_noncoherent(struct device *dev, size_t size,
void *cpu_addr,
dma_addr_t dma_handle);
Non-Coherent Allocation
Returns NULL
arm, arm64
Non-Coherent Allocation
Generic DMA
Coherent
Memory
Allocator
/* asm-generic/dma-coherent.h */
/*
* Standard interface
*/
#define ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY
extern int
dma_declare_coherent_memory(struct device *dev,
dma_addr_t bus_addr,
dma_addr_t device_addr,
size_t size, int flags);
extern void
dma_release_declared_memory(struct device *dev);
extern void *
dma_mark_declared_memory_occupied(struct device *dev,
dma_addr_t device_addr,
size_t size);
Device API
/* asm-generic/dma-coherent.h */
extern int
dma_declare_coherent_memory(struct device *dev,
dma_addr_t bus_addr,
dma_addr_t device_addr,
size_t size, int flags);
Declare a coherent memory area for a device. The area is specified by its
(CPU) bus address, device bus address and size. The following flags can be
specified:
Device API
/* asm-generic/dma-coherent.h */
extern void
dma_release_declared_memory(struct device *dev);
Release the coherent memory previously declared for the device. All DMA
coherent memory allocated for the device must be freed before calling this
function.
Device API
/* asm-generic/dma-coherent.h */
extern void *
dma_mark_declared_memory_occupied(struct device *dev,
dma_addr_t device_addr,
size_t size);
Mark part of the coherent memory area as unusable for DMA coherent
memory allocation. Multiple ranges can be marked as occupied.
This function is used by the NCR_Q720 SCSI driver only to reserve the first
kB. In this specific case this could be handled by declaring a coherent region
that skips the first page.
Device API
/* asm-generic/dma-coherent.h */
/*
* These three functions are only for dma allocator.
* Don't use them in device drivers.
*/
/* asm-generic/dma-coherent.h */
/*
* These three functions are only for dma allocator.
* Don't use them in device drivers.
*/
int dma_alloc_from_coherent(struct device *dev,
ssize_t size,
dma_addr_t *dma_handle,
void **ret);
int dma_release_from_coherent(struct device *dev,
int order, void *vaddr);
int dma_mmap_from_coherent(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr, size_t size,
int *ret);
/* asm-generic/dma-coherent.h */
int dma_alloc_from_coherent(struct device *dev,
ssize_t size,
dma_addr_t *dma_handle,
void **ret);
Try to allocate memory from the per-device coherent area.
Returns 0 if dma_alloc_coherent should continue with allocating from
generic memory areas, or !0 if dma_alloc_coherent should return @ret.
This function can only be called from per-arch dma_alloc_coherent (and
dma_alloc_attrs) to support allocation from per-device coherent memory
pools.
/* asm-generic/dma-coherent.h */
int dma_release_from_coherent(struct device *dev,
int order, void *vaddr);
Try to free the memory allocated from per-device coherent memory pool.
This checks whether the memory was allocated from the per-device
coherent memory pool and if so, releases that memory and returns 1.
Otherwise it returns 0 to signal that the caller should proceed with releasing
memory from generic pools.
This function can only be called from within the architecture's
dma_free_coherent (and dma_free_attrs) implementation.
/* asm-generic/dma-coherent.h */
int dma_mmap_from_coherent(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr, size_t size,
int *ret);
Try to mmap the memory allocated from per-device coherent memory pool
to userspace.
This checks whether the memory was allocated from the per-device
coherent memory pool and if so, maps that memory to the provided vma and
returns 1. Otherwise it returns 0 to signal that the caller should proceed with
mapping memory from generic pools.
This function can only be called from within the architecture's
dma_alloc_coherent (and dma_alloc_attrs) implementation.