DMA With Bridge (XDMA)
DMA With Bridge (XDMA)
DMA With Bridge (XDMA)
Product Guide
Vivado Design Suite
Chapter 2: Overview......................................................................................................6
Feature Summary........................................................................................................................ 8
Applications..................................................................................................................................8
Unsupported Features................................................................................................................9
Limitations....................................................................................................................................9
Licensing and Ordering............................................................................................................ 10
Appendix A: GT Locations.......................................................................................122
Appendix C: Upgrading............................................................................................127
New Parameters......................................................................................................................127
New Ports................................................................................................................................. 127
Chapter 1
Introduction
The AMD DMA/Bridge Subsystem for PCI Express® (PCIe®) implements a high performance,
configurable Scatter Gather DMA for use with the PCI Express® 2.1 and 3.x Integrated Block.
The IP provides a choice between an AXI4 Memory Mapped or AXI4-Stream user interface.
This IP optionally also supports a PCIe AXI Bridge mode which is enabled for only AMD
UltraScale+™ devices. For details about PCIe AXI Bridge mode operation, see AXI Bridge for PCI
Express Gen3 Subsystem Product Guide (PG194).
Note: For details about the AMD Versal™ Adaptive SoC subsystem, refer to the Versal Adaptive SoC DMA
and Bridge Subsystem for PCI Express Product Guide (PG344).
Features
• Supports AMD UltraScale+™, AMD UltraScale™, AMD Virtex™ 7 XT Gen3 (Endpoint), and 7
series 2.1 (Endpoint) Integrated Blocks for PCIe. 7A15T and 7A25T are not supported
• Support for 64, 128, 256, 512-bit datapath (64, and 128-bit datapath only for 7 series Gen2
IP)
• 64-bit source, destination, and descriptor addresses
• Up to four host-to-card (H2C/Read) data channels (up to two for 7 series Gen2 IP)
• Up to four card-to-host (C2H/Write) data channels (up to two for 7 series Gen2 IP)
• Selectable user interface
○ Single AXI4 memory mapped (MM) user interface
○ AXI4-Stream user interface (each channel has its own AXI4-Stream interface)
• AXI4 Master and AXI4-Lite Master optional interfaces allow for PCIe traffic to bypass the
DMA engine
• AXI4-Lite Slave to access DMA status registers
• Scatter Gather descriptor list supporting unlimited list size
• 256 MB max transfer size per descriptor
IP Facts
AMD LogiCORE™ IP Facts Table
Subsystem Specifics
Supported Device Family1 AMD UltraScale+™, AMD UltraScale™, 7 series Gen2 devices
Supported User Interfaces AXI4 MM, AXI4-Lite, AXI4-Stream
Resources See Resource Utilization web page.
Provided with Subsystem
Design Files Encrypted System Verilog
Example Design Verilog
Test Bench Verilog
Constraints File XDC
Simulation Model Verilog
Supported S/W Driver Linux and Windows Drivers2
Tested Design Flows3
Design Entry AMD Vivado™ Design Suite
Simulation For supported simulators, see the Vivado Design Suite User Guide: Release Notes,
Installation, and Licensing (UG973).
Synthesis Vivado synthesis
Support
Release Notes and Known Issues Master Answer Record: 65443
All Vivado IP Change Logs Master Vivado IP Change Logs: 72775
Support web page
Notes:
1. For a complete list of supported devices, see the AMD Vivado™ IP catalog.
2. For details, see Appendix B: Application Software Development and AR: 65444.
3. For the supported versions of the tools, see the Vivado Design Suite User Guide: Release Notes, Installation, and Licensing
(UG973).
4. For AMD Versal™ Adaptive SoC, refer to Versal Adaptive SoC DMA and Bridge Subsystem for PCI Express Product Guide
(PG344).
Chapter 2
Overview
The DMA/Bridge Subsystem for PCI Express® (PCIe®) can be configured to be either a high-
performance direct memory access (DMA) data mover or a bridge between the PCI Express and
AXI memory spaces.
• DMA Data Mover: As a DMA, the core can be configured with either an AXI (memory
mapped) interface or with an AXI streaming interface to allow for direct connection to RTL
logic. Either interface can be used for high performance block data movement between the
PCIe address space and the AXI address space using the provided character driver. In addition
to the basic DMA functionality, the DMA supports up to four upstream and downstream
channels, the ability for PCIe traffic to bypass the DMA engine (Host DMA Bypass), and an
optional descriptor bypass to manage descriptors from the FPGA fabric for applications that
demand the highest performance and lowest latency.
• Bridge Between PCIe and AXI Memory: When configured as a PCIe Bridge, received PCIe
packets are converted to AXI traffic and received AXI traffic is converted to PCIe traffic. The
bridge functionality is ideal for AXI peripherals needing a quick and easy way to access a PCI
Express subsystem. The bridge functionality can be used as either an Endpoint or as a Root
Port. PCIe Bridge functionality is only supported for AMD UltraScale+™ devices. For Bridge
only option and for 7 series non-XT device, you should use AXI Memory Mapped to PCI
Express (PCIe) Gen2. For details see, AXI Memory Mapped to PCI Express (PCIe) Gen2 LogiCORE
IP Product Guide (PG055). For 7 series XT and UltraScale device, you should use AXI Bridge for
PCI Express Gen3. For details, see AXI Bridge for PCI Express Gen3 Subsystem Product Guide
(PG194).
AXI Write
H2C
Interface (MM
Channels
or ST)
RQ/RC
Interface
AXI Read
C2H
Interface (MM
Channels
or ST)
Cfg Master
(AXI4-Lite
Master)
Host DMA
Bypass (AXI MM
Master)
X14718-042121
This diagram refers to the Requester Request (RQ)/Requester Completion (RC) interfaces, and
the Completer Request (CQ)/Completer Completion (CC) interfaces. For more information about
these, see the UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide
(PG213).
Feature Summary
The DMA/Bridge Subsystem for PCI Express® allows for the movement of data between Host
memory and the DMA subsystem. It does this by operating on 'descriptors' that contain
information about the source, destination and amount of data to transfer. These direct memory
transfers can be both in the Host to Card (H2C) and Card to Host (C2H) transfers. The DMA can
be configured to have a single AXI4 Master interface shared by all channels or one AXI4-Stream
interface for each channel enabled. Memory transfers are specified on a per-channel basis in
descriptor linked lists, which the DMA fetches from host memory and processes. Events such as
descriptor completion and errors are signaled using interrupts. The core also provides up to 16
user interrupt wires that generate interrupts to the host.
The host is able to directly access the user logic through two interfaces:
• The AXI4-Lite Master Configuration port: This port is a fixed 32-bit port and is intended for
non-performance-critical access to user configuration and status registers.
• The AXI Memory Mapped Master CQ Bypass port: The width of this port is the same as the
DMA channel datapaths and is intended for high-bandwidth access to user memory that
might be required in applications such as peer-to-peer transfers.
The user logic is able to access the DMA/Bridge Subsystem for PCIe internal configuration and
status registers through an AXI4-Lite Slave Configuration interface. Requests that are mastered
on this interface are not forwarded to PCI Express.
Applications
The core architecture enables a broad range of computing and communications target
applications, emphasizing performance, cost, scalability, feature extensibility, and mission-critical
reliability. Typical applications include:
Unsupported Features
The following features of the standard are not supported by this core:
• Tandem Configuration solutions (Tandem PROM, Tandem PCIe, Tandem with Field Updates,
PR over PCIe) are not supported for AMD Virtex™ 7 XT and 7 series Gen2 devices
• Tandem Configuration is not yet supported for Bridge mode in UltraScale+ devices.
• SR-IOV
• ECRC
• Example design not supported for all configurations
• Narrow burst (not supported on the master interface)
• BAR translation for DMA addresses to the AXI4 Memory Mapped interface
Limitations
PCIe Transaction Type
The PCIe® transactions that can be generated are those that are compatible with the AXI4
specification. The following table lists the supported PCIe transaction types.
TX RX
MRd32 MRd32
MRd64 MRd64
MWr32 MWr32
MWr64 MWr64
Msg(INT/Error) Msg(SSPL,INT,Error)
Cpl Cpl
CplD CplD
PCIe Capability
For the DMA/Bridge Subsystem for PCI Express®, only the following PCIe capabilities are
supported due to the AXI4 specification:
• 1 PF
• MSI
• MSI-X
• PM
• AER (only PCIe 3.x core)
DMA/Bridge Subsystem for PCI Express® does not support expansion ROM.
Others
• Only supports the INCR burst type. Other types result in a Slave Illegal Burst (SIB) interrupt.
• No memory type support (AxCACHE)
• No protection type support (AxPROT)
• No lock type support (AxLOCK)
• No non-contiguous byte enable support (WSTRB)
• For 7 series Gen2 IP, PCIe access from the Host system must be limited to 1DW (4 Bytes)
transaction only.
Note: Both AXI Bypass and Register access are limited by this restriction.
For more information, visit the DMA Subsystem for PCI Express product page.
Chapter 3
Product Specification
The DMA/Bridge Subsystem for PCI Express® in conjunction with the Integrated Block for PCI
Express IP, provides a highly configurable DMA Subsystem for PCIe, and a high performance
DMA solution.
Standards
The DMA/Bridge Subsystem for PCIe is compliant with the following specifications:
• A Host-to-Card (H2C) channel generates read requests to PCIe and provides the data or
generates a write request to the user application.
• A Card-to-Host (C2H) channel either waits for data on the user side or generates a read
request on the user side and then generates a write request containing the data received to
PCIe.
The DMA/Bridge Subsystem for PCIe also enables the host to access the user logic. Write
requests that reach PCIe to DMA bypass Base Address Register (BAR) are processed by the
DMA. The data from the write request is forwarded to the user application through the NoC
interface to the PL logic.
The host access to the configuration and status registers in the user logic is provided through an
AXI master port. These requests are 32-bit reads or writes. The user application also has access
to internal DMA configuration and status registers through an AXI slave port.
When multiple channels for H2C and C2H are enabled, transactions on the AXI4 Master
interface are interleaved between all selected channels. Simple round robin protocol is used to
service all channels. Transactions granularity depends on host Max Payload Size (MPS), page size,
and other host settings.
Target Bridge
The target bridge receives requests from the host. Based on BARs, the requests are directed to
the internal target user through the AXI4-Lite master, or the CQ bypass port. After the
downstream user logic has returned data for a non-posted request, the target bridge generates a
read completion TLP and sends it to the PCIe IP over the CC bus.
In the following tables, the PCIe BARs selection corresponds to the options set in the PCIe BARs
tab in the IP Configuration GUI.
Different combinations of BARs can be selected. The tables above list only 32-bit selections and
64-bit selections for all BARs as an example. You can select different combinations of BARs
based on your requirements.
Related Information
H2C Channel
The previous tables represents PCIe to AXI4-Lite Master, DMA, and PCIe to DMA Bypass for 32-
bit and 64-bit BAR selections. Each space can be individually selected for 32-bits or 64-bits BAR.
The number of H2C channels is configured in the AMD Vivado™ Integrated Design Environment
(IDE). The H2C channel handles DMA transfers from the host to the card. It is responsible for
splitting read requests based on maximum read request size, and available internal resources. The
DMA channel maintains a maximum number of outstanding requests based on the RNUM_RIDS,
which is the number of outstanding H2C channel request ID parameter. Each split, if any, of a
read request consumes an additional read request entry. A request is outstanding after the DMA
channel has issued the read to the PCIe RQ block to when it receives confirmation that the write
has completed on the user interface in-order. After a transfer is complete, the DMA channel
issues a writeback or interrupt to inform the host.
The H2C channel also splits transaction on both its read and write interfaces. On the read
interface to the host, transactions are split to meet the maximum read request size configured,
and based on available Data FIFO space. Data FIFO space is allocated at the time of the read
request to ensure space for the read completion. The PCIe RC block returns completion data to
the allocated Data Buffer locations. To minimize latency, upon receipt of any completion data, the
H2C channel begins issuing write requests to the user interface. It also breaks the write requests
into maximum payload size. On an AXI4-Stream user interface, this splitting is transparent.
When multiple channels are enabled, transactions on the AXI4 Master interface are interleaved
between all selected channels. Simple round robin protocol is used to service all channels.
Transactions granularity depends on host Max Payload Size (MPS), page size, and other host
settings.
C2H Channel
The C2H channel handles DMA transfers from the card to the host. The instantiated number of
C2H channels is controlled in the AMD Vivado™ IDE. Similarly the number of outstanding
transfers is configured through the WNUM_RIDS, which is the number of C2H channel request
IDs. In an AXI4-Stream configuration, the details of the DMA transfer are set up in advance of
receiving data on the AXI4-Stream interface. This is normally accomplished through receiving a
DMA descriptor. After the request ID has been prepared and the channel is enabled, the AXI4-
Stream interface of the channel can receive data and perform the DMA to the host. In an AXI4
MM interface configuration, the request IDs are allocated as the read requests to the AXI4 MM
interface are issued. Similar to the H2C channel, a given request ID is outstanding until the write
request has been completed. In the case of the C2H channel, write request completion is when
the write request has been issued as indicated by the PCIe IP.
When multiple channels are enabled, transactions on the AXI4 Master interface are interleaved
between all selected channels. Simple round robin protocol is used to service all channels.
Transactions granularity depends on host MaxPayload Size (MPS), page size, and other host
settings.
AXI4-Lite Master
This module implements the AXI4-Lite master bus protocol. The host can use this interface to
generate 32-bit read and 32-bit write requests to the user logic. The read or write request is
received over the PCIe to AXI4-Lite master BAR. Read completion data is returned back to the
host through the target bridge over the PCIe IP CC bus.
AXI4-Lite Slave
This module implements the AXI4-Lite slave bus protocol. The user logic can master 32-bit reads
or writes on this interface to DMA internal registers only. You cannot access the PCIe integrated
block register through this interface. This interface does not generate requests to the host.
IRQ Module
The IRQ module receives a configurable number of interrupt wires from the user logic and one
interrupt wire from each DMA channel. This module is responsible for generating an interrupt
over PCIe. Support for MSI-X, MSI, and legacy interrupts can be specified during IP configuration.
Note: The Host can enable one or more interrupt types from the specified list of supported interrupts
during IP configuration. The IP only generates one interrupt type at a given time even when there are more
than one enabled. MSI-X interrupt takes precedence over MSI interrupt, and MSI interrupt take
precedence over Legacy interrupt. The Host software must not switch (either enable or disable) an
interrupt type while there is an interrupt asserted or pending.
Legacy Interrupts
Asserting one or more bits of usr_irq_req when legacy interrupts are enabled causes the
DMA to issue a legacy interrupt over PCIe. Multiple bits may be asserted simultaneously but
each bit must remain asserted until the corresponding usr_irq_ack bit has been asserted.
After a usr_irq_req bit is asserted, it must remain asserted until both corresponding
usr_irq_ack bit is asserted and the interrupt is serviced and cleared by the Host. This ensures
interrupt pending register within the IP remains asserted when queried by the Host's Interrupt
Service Routine (ISR) to determine the source of interrupts. The usr_irq_ack assertion
indicates the requested interrupt has been sent to the PCIe block. You must implement a
mechanism in the user application to know when the interrupt routine has been serviced. This
detection can be done in many different ways depending on your application and your use of this
interrupt pin. This typically involves a register (or array of registers) implemented in the user
application that is cleared, read, or modified by the Host software when an interrupt is serviced.
After the usr_irq_req bit is deasserted, it cannot be reasserted until the corresponding
usr_irq_ack bit has been asserted for a second time. This indicates the deassertion message
for the legacy interrupt has been sent over PCIe. After a second usr_irq_ack has occurred, the
xdma0_usr_irq_req wire can be reasserted to generate another legacy interrupt.
The xdma0_usr_irq_req bit and DMA interrupts can be mapped to legacy interrupt INTA,
INTB, INTC, and INTD through the configuration registers. The following figure shows the
legacy interrupts.
Note: This figure shows only the handshake between xdma0_usr_irq_req and usr_irq_ack. Your
application might not clear or service the interrupt immediately, in which case, you must keep
usr_irq_req asserted past usr_irq_ack. The figure below shows one possible scenario where
usr_irq_ackis deasserted at the same cycle for both requests[1:0], which might not be the case in other
situations.
After a usr_irq_req bit is asserted, it must remain asserted until the corresponding
usr_irq_ack bit is asserted and the interrupt has been serviced and cleared by the Host. The
usr_irq_ack assertion indicates the requested interrupt has been sent to the PCIe block. This
will ensure the interrupt pending register within the IP remains asserted when queried by the
Host's Interrupt Service Routine (ISR) to determine the source of interrupts. You must implement
a mechanism in the user application to know when the interrupt routine has been serviced. This
detection can be done in many different ways depending on your application and your use of this
interrupt pin. This typically involves a register (or array of registers) implemented in the user
application that is cleared, read, or modified by the Host software when an Interrupt is serviced.
Configuration registers are available to map usr_irq_req and DMA interrupts to MSI or MSI-X
vectors. For MSI-X support, there is also a vector table and PBA table. The following figure shows
the MSI interrupt.
Note: This figure shows only the handshake between usr_irq_req and usr_irq_ack. Your application
might not clear or service the interrupt immediately, in which case, you must keep usr_irq_req asserted
past usr_irq_ack.
Note: This figure shows only the handshake between usr_irq_req and usr_irq_ack. Your application
might not clear or service the interrupt immediately, in which case, you must keep usr_irq_req asserted
past usr_irq_ack.
Config Block
The config module, the DMA register space which contains PCIe® solution IP configuration
information and DMA control registers, stores PCIe IP configuration information that is relevant
to the DMA/Bridge Subsystem for PCIe. This configuration information can be read through
register reads to the appropriate register offset within the config module.
XDMA Operations
Quick Start
At the most basic level, the PCIe® DMA engine typically moves data between host memory and
memory that resides in the FPGA which is often (but not always) on an add-in card. When data is
moved from host memory to the FPGA memory, it is called a Host to Card (H2C) transfer or
System to Card (S2C) transfer. Conversely, when data is moved from the FPGA memory to the
host memory, it is called a Card to Host (C2H) or Card to System (C2S) transfer.
These terms help delineate which way data is flowing (as opposed to using read and write which
can get confusing very quickly). The PCIe DMA engine is simply moving data to or from PCIe
address locations.
In a typical operation, an application in the host moves data between the FPGA and host
memory. To accomplish this transfer, the host sets up buffer space in system memory and creates
descriptors that the DMA engine use to move the data.
The contents of the descriptors will depend on a number of factors, including which user
interface is chosen for the DMA engine. If an AXI4-Stream interface is selected, C2H transfers do
not use the source address field and H2C fields do not use the destination address. This is
because the AXI4-Stream interface is a FIFO type interface that does not use addresses.
If an AXI Memory Mapped interface is selected, then a C2H transfer has the source address as an
AXI address and the destination address is the PCIe address. For a H2C transfer, the source
address is a PCIe address and the destination address is an AXI address.
The following flow charts show typical transfers for both H2C and C2H transfers when the data
interface is selected during IP configuration for an AXI Memory Mapped interface.
Figure 5: Setup
Load driver
(setup)
Set ‘H2C Channel interrupt enable mask’ register 0x0090 to generate interrupts
for corresponding bits.
Set ‘C2H Channel interrupt enable mask’ register 0x1090 to generate interrupts
for corresponding bits.
Set ‘IRQ Block Channel Interrupt Enable Mask’ register 0x2010 and enable all
channels (both H2C and C2H) to generate interrupt.
X19438-061319
Application program initiates H2C transfer, with transfer length, buffer location where data is stored.
Driver writes first descriptor base address to Address 0x4080 and 0x4084.
Driver writes next adjacent descriptor count to 0x4088 if any.
No
Is this the last DMA sends read request to (Host) source address based
descriptor? on first available descriptor.
Yes
Yes
Transmit data on (Card ) AXI-MM Master interface
Is there any more
descriptor left?
Yes
Is there more data
to transfer? No
No
Stop fetching data from Host.
Send interrupt to Host.
Interrupt process.
Read ‘IRQ Block Channel Interrupt Request’ 0x2044 to see which channels sent interrupt.
Mask corresponding channel interrupt writing to 0x2018.
Driver Reads corresponding ‘Status register’ 0x0044 which will also clear the status register.
Read channel ‘Completed descriptor count’ 0x0048 and compare with the number of
descriptor generated.
X19389-061319
Driver writes first descriptor base address to Address 0x5080 and 0x5084.
Driver writes next adjacent descriptor count to 0x5088 if any.
No
Is this the DMA reads data from (Card) Source address for
last descriptor? a given descriptor.
Yes
Stop fetching descriptor Yes
from host.
Is there any more
descriptor left?
No
Interrupt process.
Read ‘IRQ Block Channel Interrupt Request’ 0x2044 to see which
channels sent interrupt.
Mask corresponding channel interrupt writing to 0x2018 .
Driver Reads corresponding ‘Status register’ 0x1044 which will also clear
status register.
Read channel ‘completed descriptor count’ 0x1048 and compare with
number of descriptor generated. Exit application
program.
Write to channel ‘Control register’ 0x1004 to stop DMA run.
Write to ‘Block channel interrupt Enable Mask’ 0x2014 to enable interrupt Application program reads transfer data from
for next transfer. assigned buffer and writes to a file.
Return control to application program with transfer size.
X19388-061319
Descriptors
The DMA/Bridge Subsystem for PCI Express® uses a linked list of descriptors that specify the
source, destination, and length of the DMA transfers. Descriptor lists are created by the driver
and stored in host memory. The DMA channel is initialized by the driver with a few control
registers to begin fetching the descriptor lists and executing the DMA operations.
Descriptors describe the memory transfers that the DMA/Bridge Subsystem for PCIe should
perform. Each channel has its own descriptor list. The start address of each channel's descriptor
list is initialized in hardware registers by the driver. After the channel is enabled, the descriptor
channel begins to fetch descriptors from the initial address. Thereafter, it fetches from the
Nxt_adr[63:0] field of the last descriptor that was fetched. Descriptors must be aligned to a
32 byte boundary.
The size of the initial block of adjacent descriptors are specified with the Dsc_Adj register. After
the initial fetch, the descriptor channel uses the Nxt_adj field of the last fetched descriptor to
determine the number of descriptors at the next descriptor address. A block of adjacent
descriptors must not cross a 4K address boundary. The descriptor channel fetches as many
descriptors in a single request as it can, limited by MRRS, the number the adjacent descriptors,
and the available space in the channel's descriptor buffer.
Note: Because MRRS in most host systems is 512 bytes or 1024 bytes, having more than 32 adjacent
descriptors is not allowed on a single request. However, the design will allow a maximum 64 descriptors in
a single block of adjacent descriptors if needed.
Every descriptor in the descriptor list must accurately describe the descriptor or block of
descriptors that follows. In a block of adjacent descriptors, the Nxt_adj value decrements from
the first descriptor to the second to last descriptor which has a value of zero. Likewise, each
descriptor in the block points to the next descriptor in the block, except for the last descriptor
which might point to a new block or might terminate the list.
Termination of the descriptor list is indicated by the Stop control bit. After a descriptor with the
Stop control bit is observed, no further descriptor fetches are issued for that list. The Stop
control bit can only be set on the last descriptor of a block.
When using an AXI4 memory mapped interface, DMA addresses to the card are not translated. If
the Host does not know the card address map, the descriptor must be assembled in the user
logic and submitted to the DMA using the descriptor bypass interface.
Offset Fields
0x0 Magic[15:0] Rsv[1:0] Nxt_adj[5:0] Control[7:0]
0x04 4’h0, Len[27:0]
0x08 Src_adr[31:0]
0x0C Src_adr[63:32]
Offset Fields
0x10 Dst_adr[31:0]
0x14 Dst_adr[63:32]
0x18 Nxt_adr[31:0]
0x1C Nxt_adr[63:32]
The DMA has Bit_width * 512 deep FIFO to hold all descriptors in the descriptor engine. This
descriptor FIFO is shared with all selected channels and it is used only in an internal mode (not in
descriptor bypass mode).
• For Gen3x8 with 2H2C and 2C2H design, AXI bit width is 256 bits. FIFO depth is
256 bit * 512 = 32 B * 512 = 16 KB (512 descriptors). This FIFO is shared by 4 DMA
engines.
Descriptor Bypass
The descriptor fetch engine can be bypassed on a per channel basis through AMD Vivado™ IDE
parameters. A channel with descriptor bypass enabled accepts descriptor from its respective
c2h_dsc_byp or h2c_dsc_byp bus. Before the channel accepts descriptors, the Control
register Run bit must be set. The NextDescriptorAddress and NextAdjacentCount, and Magic
descriptor fields are not used when descriptors are bypassed. The ie_descriptor_stopped
bit in Control register bit does not prevent the user logic from writing additional descriptors. All
descriptors written to the channel are processed, barring writing of new descriptors when the
channel buffer is full.
When XDMA is configured in descriptor bypass mode, there is an 8 deep descriptor FIFO which
is common for all descriptor channels from user.
Poll Mode
Each engine is capable of writing back completed descriptor counts to host memory. This allows
the driver to poll host memory to determine when the DMA is complete instead of waiting for an
interrupt.
For a given DMA engine, the completed descriptor count writeback occurs when the DMA
completes a transfer for a descriptor, and ie_descriptor_completed and
Pollmode_wb_enable are set. The completed descriptor count reported is the total number of
completed descriptors since the DMA was initiated (not just those descriptors with the
Completed flag set). The writeback address is defined by the Pollmode_hi_wb_addr and
Pollmode_lo_wb_addr registers.
Offset Fields
0x0 Sts_err 7’h0 Compl_descriptor_count[23:0]
Field Description
Sts_err The bitwise OR of any error status bits in the channel Status register.
Compl_descriptor_count[23:0] The lower 24 bits of the Complete Descriptor Count register.
Data delivered to the AXI4-Stream interface will be packed for each descriptor. tkeep is all 1s
except for the last cycle of a data transfer of the descriptor if it is not a multiple of the datapath
width. The DMA does not pack data across multiple descriptors.
Note: C2H Channel Writeback information is different then Poll mode updates. C2H Channel Writeback
information provides the driver current length status of a particular descriptor. This is different from
Pollmode_*, as is described in Poll Mode.
The tkeep bits must be all 1s except for the last data transfer of a packet. On the last transfer of
a packet, when tlast is asserted, you can specify a tkeep that is not all 1s to specify a data
cycle that is not the full datapath width. The asserted tkeep bits need to be packed to the lsb,
indicating contiguous data. If tlast is asserted and tkeep has all zero's, this is not a valid
combination for DMA to function properly.
The length of a C2H Stream descriptor (the size of the destination buffer) must always be a
multiple of 64 bytes.
Offset Fields
0x0 WB Magic[15:0] Reserved [14:0] Status[0]
0x04 Length[31:0]
Address Alignment
Table 11: Address Alignment
Datapath
Interface Type Address Restriction
Width
AXI4 MM 64, 128, 256, None
512
AXI4-Stream 64, 128, 256, None
512
AXI4 MM fixed address1 64 Source_addr[2:0] == Destination_addr[2:0] == 3’h0
AXI4 MM fixed address1 128 Source_addr[3:0] == Destination_addr[3:0] == 4’h0
AXI4 MM fixed address1 256 Source_addr[4:0] == Destination_addr[4:0] == 5’h0
AXI4 MM fixed address1 512 Source_addr[5:0] == Destination_addr[5:0]==6'h0
Notes:
1. For fixed address mode, you must set bit [25] in the control registers.
Related Information
Length Granularity
Table 12: Length Granularity
Datapath
Interface Type Length Granularity Restriction
Width
AXI4 MM 64, 128, 256, None
512
AXI4-Stream 64, 128, 256, None1
512
AXI4 MM fixed address 64 Length[2:0] == 3’h0
AXI4 MM fixed address 128 Length[3:0] == 4’h0
AXI4 MM fixed address 256 Length[4:0] == 5’h0
AXI4 MM fixed address 512 Length[5:0] == 6'h0
Notes:
1. Each C2H descriptor must be sized as a multiple of 64 Bytes. However, there are no restrictions to the total number of
Bytes in the actual C2H transfer.
Parity
Parity checking occurs one of two ways. Set the Parity Checking option in the PCIe DMA Tab in
the AMD Vivado™ IDE during core customization:
When Check Parity is enabled, the DMA/Bridge Subsystem for PCIe checks for parity on read
data from PCIe, and generates parity for write data to the PCIe.
When Propagate Parity is enabled, the DMA/Bridge Subsystem for PCIe propagates parity to the
user AXI interface. You are responsible for checking and generating parity in the AXI Interface.
Parity is valid every clock cycle when a data valid signal is asserted, and parity bits are valid only
for valid data bytes. Parity is calculated for every byte; total parity bits are DATA_WIDTH/8.
• Parity information is sent and received on *_tuser ports in AXI4-Stream (AXI_ST) mode.
• Parity information is sent and received on *_ruser and *_wuser ports in AXI4 Memory
Mapped (AXI-MM) mode.
Odd parity is used for parity checking. By default, parity checking is not enabled.
Related Information
Port Descriptions
IMPORTANT! This document covers only DMA mode port descriptions. For AXI Bridge mode, see the AXI
Bridge for PCI Express Gen3 Subsystem Product Guide (PG194).
The AMD DMA/Bridge Subsystem for PCI Express® connects directly to the integrated block for
PCIe. The datapath interfaces to the PCIe integrated block IP are 64, 128, 256 or 512-bits wide,
and runs at up to 250 MHz depending on the configuration of the IP. The datapath width applies
to all data interfaces except for the AXI4-Lite interfaces. AXI4-Lite interfaces are fixed at 32-bits
wide.
Ports associated with this subsystem are described in the following tables.
Related Information
m_axis_h2c_tkeep_x O The tkeep signal specifies how many bytes are valid when
[DATA_WIDTH/8-1:0] tlast is asserted.
Notes:
1. _x in the signal name changes based on the channel number 0, 1, 2, and 3. For example, for channel 0 use the
m_axis_h2c_tready_0 port, and for channel 1 use the m_axis_h2c_tready_1 port.
The tkeep signal tells how many bytes are valid for each
m_axis_c2h_tkeep_x beat. It must be asserted for all beets except when tlast is
I
[DATA_WIDTH/8-1:0] asserted. You must specify how many bytes are valid when
tlast is asserted.
Notes:
1. _x in the signal name changes based on the channel number 0, 1, 2, and 3. For example, for channel 0 use the
m_axis_c2h_tready_0 port, and for channel 1 use the m_axis_c2h_tready_1 port.
m_axi_awid
O Master write address ID.
[ID_WIDTH-1:0]
m_axi_awlen[7:0] O Master write address length.
m_axi_awsize[2:0] O Master write address size.
m_axi_awburst[1:0] O Master write address burst type.
m_axi_awprot[2:0] O 3’h0
The assertion of this signal means there is a valid
m_axi_awvalid O
write request to the address on m_axi_araddr.
m_axi_awready I Master write address ready.
m_axi_awlock O 1’b0
m_axi_awcache[3:0] O 4’h0
Table 27: Config AXI4-Lite Memory Mapped Write Master Interface Signals (cont'd)
Interrupt Interface
Table 31: Interrupt Interface
Each bits in usr_irq_reqbus corresponds to the same bits in usr_irq_ack. For example,
usr_irq_ack[0] represents an ack for usr_irq_req[0].
The following timing diagram shows how to input the descriptor in descriptor bypass mode.
When dsc_byp_ready is asserted, a new descriptor can be pushed in with the
dsc_byp_load signal.
IMPORTANT! Immediately after dsc_byp_ready is deasserted, one more descriptor can be pushed in.
In the above timing diagram, a descriptor is pushed in when dsc_byp_ready is deasserted.
Related Information
Register Space
Note: This document covers only DMA mode register space. For AXI Bridge mode, see the AXI Bridge for
PCI Express Gen3 Subsystem Product Guide (PG194).
Configuration and status registers internal to the DMA/Bridge Subsystem for PCI Express® and
those in the user logic can be accessed from the host through mapping the read or write request
to a Base Address Register (BAR). Based on the BAR hit, the request is routed to the appropriate
location. For PCIe BAR assignments, see Target Bridge.
DMA/Bridge Subsystem for PCIe registers can be accessed from the host or from the AXI Slave
interface. These registers should be used for programming the DMA and checking status.
Attribute Description
RV Reserved
RW Read/Write
RC Clear on Read.
W1C Write 1 to Clear
W1S Write 1 to Set
RO Read Only
WO Write Only
Some registers can be accessed with different attributes. In such cases different register offsets
are provided for each attribute. Undefined bits and address space is reserved. In some registers,
individual bits in a vector might represent a specific DMA engine. In such cases the LSBs of the
vectors correspond to the H2C channel (if any). Channel ID 0 is in the LSB position. Bits
representing the C2H channels are packed just above them.
Table 49: H2C Poll Mode Low Write Back Address (0x88)
Table 50: H2C Poll Mode High Write Back Address (0x8C)
Table 68: C2H Poll Mode Low Write Back Address (0x88)
Table 69: C2H Poll Mode High Write Back Address (0x8C)
Interrupt processing registers are shared between AXI Bridge and AXI DMA. In AXI Bridge mode
when MSI-X Capabilities is selected, 64 KB address space from the BAR0 is reserved for the
MSI-X table. By default, register space is allocated in BAR0. You can select register space in a
different BAR, from BAR1 to BAR5, by using the CONFIG.bar_indicator {BAR0} Tcl
command. This option is valid only when MSI-X Capabilities option is selected. There is no
allocated space for other interrupt options.
The following figure shows the packing of H2C and C2H bits.
Bits 7 6 5 4 3 2 1 0
4 H2C and 4 C2H enabled C2H_3 C2H_2 C2H_1 C2H_0 H2C_3 H2C_2 H2C_1 H2C_0
3 H2C and 3 C2H enabled X X C2H_2 C2H_1 C2H_0 H2C_2 H2C_1 H2C_0
X15954-010115
Similar to the other C2H/H2C bit packing clarification, see the previous figure. The first C2H
vector is after the last H2C vector. For example, if NUM_H2C_Channel = 1, then H2C0 vector is
at 0xA0, bits [4:0], and C2H Channel 0 vector is at 0xA0, bits [12:8]. If NUM_H2C_Channel = 4,
then H2C3 vector is at 0xA0 28:24, and C2H Channel 0 vector is at 0xA4, bits [4:0].
Similar to the other C2H/H2C bit packing clarification, see the previous figure. The first C2H
vector is after the last H2C vector. For example, if NUM_H2C_Channel = 1, then H2C0 vector is
at 0xA0, bits [4:0], and C2H Channel 0 vector is at 0xA0, bits [12:8].If NUM_H2C_Channel = 4,
then H2C3 vector is at 0xA0 28:24, and C2H Channel 0 vector is at 0xA4, bits [4:0].
Table 100: Config Block PCIE Max Read Request Size (0x0C)
Table 106: Config AXI User Max Read Request Size (0x44)
Table 106: Config AXI User Max Read Request Size (0x44) (cont'd)
15 1’b0 RO Stream
1: AXI4-Stream Interface
0: AXI4 Memory Mapped Interface
14:12 3’h0 RO Reserved
11:8 Varies RO Channel ID Target [3:0]
7:0 8'h04 RO Version
8'h01: 2015.3 and 2015.4
8'h02: 2016.1
8'h03: 2016.2
8'h04: 2016.3
8'h05: 2016.4
8'h06: 2017.1 to current release
Note: The MSI-X enable in configuration control register should be asserted before writing to MSI-X table.
If not, the MSI-X table will not work as expected.
Chapter 4
The axi_aclk output is the clock used for all AXI interfaces and should drive all corresponding
AXI Interconnect aclk signals. axi_aclkis not a free running clock. This is a derived clock and
will be valid after signal axi_aresetn is de-asserted
Note: The axi_aclk output should not be used for the system clock for your design. The axi_aclk is
not a free-run clock output. As noted, axi_aclk may not be present at all times.
Resets
For the DMA/ Bridge Subsystem for PCIe in AXI Bridge mode, there is an optional
dma_bridge_resetn input pin which allows you to reset all internal Bridge engines and
registers as well as all AXI peripherals driven by axi_aresetn pin. When the following
parameter is set, dma_bridge_resetn does not need to be asserted during initial link up
operation because it will be done automatically by the IP. You must terminate all transactions
before asserting this pin. After being asserted, the pin must be kept asserted for a minimum
duration of at least equal to the Completion Timeout value (typically 50 ms) to clear any pending
transfer that may currently be queued in the data path. To set this parameter, type the following
command at the Tcl command line:
For information about clocking and resets, see the applicable PCIe® integrated block product
guide:
• 7 Series FPGAs Integrated Block for PCI Express LogiCORE IP Product Guide (PG054)
• Virtex-7 FPGA Integrated Block for PCI Express LogiCORE IP Product Guide (PG023)
• UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product Guide (PG156)
• UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide (PG213)
Tandem Configuration
Tandem Configuration features are available for the AMD DMA Subsystem for PCI Express® for
all AMD UltraScale™ and AMD UltraScale+™ device with PCIe hard blocks. Tandem
Configuration uses a two-stage methodology that enables the IP to meet the configuration time
requirements indicated in the PCI Express Specification. Multiple use cases are supported with
this technology:
• Tandem PROM: Load the single two-stage bitstream from the flash.
• Tandem PCIe: Load the first stage bitstream from flash, and deliver the second stage bitstream
over the PCIe link to the MCAP.
• Tandem with Field Updates: After a Tandem PROM (UltraScale only) or Tandem PCIe initial
configuration, update the entire user design while the PCIe link remains active. The update
region (floorplan) and design structure are predefined, and Tcl scripts are provided.
• Tandem PCIe + Dynamic Function eXchange: This is a more general case of Tandem
Configuration followed by Dynamic Function eXchange (DFX) of any size or number of
dynamic regions.
• Dynamic Function eXchange over PCIe: This is a standard configuration followed by DFX,
using the PCIe/MCAP as the delivery path of partial bitstreams.
For information on Dynamic Function eXchange, see the Vivado Design Suite User Guide: Dynamic
Function eXchange (UG909).
• DFX over PCIe: To enable the MCAP link for Dynamic Function eXchange, without
enabling Tandem Configuration.
For complete information about Tandem Configuration, including required PCIe block locations,
design flow examples, requirements, restrictions and other considerations, see Tandem
Configuration in the UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product
Guide (PG156).
UltraScale+ Devices
To enable any of the Tandem Configuration capabilities for AMD UltraScale+™ devices, select the
appropriate IP catalog option when customizing the subsystem. In the Basic tab:
For complete information about Tandem Configuration, including required PCIe block locations,
design flow examples, requirements, restrictions and other considerations, see Tandem
Configuration in the UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide
(PG213).
Supported Devices
The DMA/Bridge Subsystem for PCIe® and AMD Vivado™ tool flow support implementations
targeting AMD reference boards and specific part/package combinations. Tandem configuration
supports the configurations found in the following tables.
UltraScale Devices
The following table lists the Tandem PROM/PCIe supported configurations for AMD UltraScale™
devices.
AMD Reference KCU105 Evaluation Board for AMD Kintex™ UltraScale™ FPGA
Board Support VCU108 Evaluation Board for AMD Virtex™ UltraScale™ FPGA
Part1 PCIe Block PCIe Reset Tandem Tandem with
Device Support Location Location Configuration Field Updates
XCKU025 PCIE_3_1_X0Y0 IOB_X1Y103 Production Production
XCKU035 PCIE_3_1_X0Y0 IOB_X1Y103 Production Production
XCKU040 PCIE_3_1_X0Y0 IOB_X1Y103 Production Production
Kintex UltraScale XCKU060 PCIE_3_1_X0Y0 IOB_X2Y103 Production Production
XCKU085 PCIE_3_1_X0Y0 IOB_X2Y103 Production Production
XCKU095 PCIE_3_1_X0Y0 IOB_X1Y103 Production Production
XCKU115 PCIE_3_1_X0Y0 IOB_X2Y103 Production Production
XCVU065 PCIE_3_1_X0Y0 IOB_X1Y103 Production Production
XCVU080 PCIE_3_1_X0Y0 IOB_X1Y103 Production Production
XCVU095 PCIE_3_1_X0Y0 IOB_X1Y103 Production Production
Virtex UltraScale XCVU125 PCIE_3_1_X0Y0 IOB_X1Y103 Production Production
XCVU160 PCIE_3_1_X0Y1 IOB_X1Y363 Production Production
XCVU190 PCIE_3_1_X0Y2 IOB_X1Y363 Production Production
XCVU440 PCIE_3_1_X0Y2 IOB_X1Y363 Production Production
Notes:
1. Only production silicon is officially supported. Bitstream generation is disabled for all engineering sample silicon
(ES2) devices.
UltraScale+ Devices
The following table lists the Tandem PROM/PCIe supported configurations for UltraScale+
devices.
Chapter 5
• Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
• Vivado Design Suite User Guide: Designing with IP (UG896)
• Vivado Design Suite User Guide: Getting Started (UG910)
• Vivado Design Suite User Guide: Logic Simulation (UG900)
If you are customizing and generating the subsystem in the Vivado IP integrator, see the Vivado
Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994) for detailed
information. IP integrator might auto-compute certain configuration values when validating or
generating the design. To check whether the values do change, see the description of the
parameter in this chapter. To view the parameter value, run the validate_bd_design
command in the Tcl console.
You can customize the IP for use in your design by specifying values for the various parameters
associated with the IP subsystem using the following steps:
For details, see the Vivado Design Suite User Guide: Designing with IP (UG896) and the Vivado
Design Suite User Guide: Getting Started (UG910).
Figures in this chapter are illustrations of the Vivado IDE. The layout depicted here might vary
from the current version.
Basic Tab
The Basic tab for the DMA mode (Functional Mode option) is shown in the following figure.
• Mode: Allows you to select the Basic or Advanced mode of the configuration of subsystem.
• Device /Port Type: Only PCI Express® Endpoint device mode is supported.
• PCIe Block Location: Selects from the available integrated blocks to enable generation of
location-specific constraint files and pinouts. This selection is used in the default example
design scripts. This option is not available if an AMD Development Board is selected.
• Lane Width: The subsystem requires the selection of the initial lane width. For supported lane
widths and link speeds, see the 7 Series FPGAs Integrated Block for PCI Express LogiCORE IP
Product Guide (PG054), Virtex-7 FPGA Integrated Block for PCI Express LogiCORE IP Product
Guide (PG023), UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product
Guide (PG156), or the UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product
Guide (PG213) Higher link speed cores are capable of training to a lower link speed if
connected to a lower link speed capable device.
• Maximum Link Speed: The subsystem requires the selection of the PCIe Gen speed.
• Reference Clock Frequency: The default is 100 MHz, but 125 MHz and 250 MHz are also
supported.
• Reset Source: You can choose between User Reset and Phy ready.
• User reset comes from the PCIe core once link is established. When PCIe link goes down,
User Reset is asserted and XDMA goes to reset mode. And when the link comes back up,
User Reset is deasserted.
• When the Phy ready option is selected, XDMA is not affected by PCIe link status.
• GT DRP Clock Selection: Select either internal clock (default) or external clock.
• GT Selection, Enable GT Quad Selection: Select the Quad in which lane 0 is located.
• AXI Data Width: Select 64, 128, 256-bit, or 512-bit (only for UltraScale+). The subsystem
allows you to select the Interface Width, as defined in the 7 Series FPGAs Integrated Block for
PCI Express LogiCORE IP Product Guide (PG054), Virtex-7 FPGA Integrated Block for PCI Express
LogiCORE IP Product Guide (PG023), UltraScale Devices Gen3 Integrated Block for PCI Express
LogiCORE IP Product Guide (PG156), or the UltraScale+ Devices Integrated Block for PCI Express
LogiCORE IP Product Guide (PG213)
• AXI Clock Frequency: Select 62.5 MHz, 125 MHz or 250 MHz depending on the lane width/
speed.
• When Check Parity is enabled, XDMA checks for parity on read data from the PCIe and
generates parity for write data to the PCIe.
• When Propagate Parity is enabled, XDMA propagates parity to the user AXI interface. The
user is responsible for checking and generating parity on the user AXI interface.
RECOMMENDED: AMD recommends that you select the correct GT starting quad before setting the link
rate and width. Selecting line rate and width prior to selecting GT quad can have adverse effects.
Related Information
Tandem Configuration
PCIe ID Tab
The PCIe ID tab is shown in the following figure.
• Enable PCIe-ID Interface: By enabling this option, PCIe_ID port is given as an input port, you
are expected to connect to a proper value as desired.
For a description of these options, see the “Design Flow Steps” chapter in the respective product
guide listed below:
• 7 Series FPGAs Integrated Block for PCI Express LogiCORE IP Product Guide (PG054)
• Virtex-7 FPGA Integrated Block for PCI Express LogiCORE IP Product Guide (PG023)
• UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product Guide (PG156)
• UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide (PG213)
• Base Address Register Overview: In Endpoint configuration, the core supports up to six 32-bit
BARs or three 64-bit BARs, and the Expansion read-only memory (ROM) BAR. BARs can be
one of two sizes:
• 32-bit BARs: The address space can be as small as 128 bytes or as large as 2 GB. Used for
DMA, AXI Lite Master or AXI Bridge Master.
• 64-bit BARs: The address space can be as small as 128 bytes or as large as 8 Exabytes.
Used for DMA, AXI Lite Master or AXI Bridge Master.
IMPORTANT! The DMA requires a large amount of space to support functions and queues. By default,
64-bit BAR space is selected for the DMA BAR. This applies for PF and VF bars. You must calculate your
design needs first before selecting between 64-bit and 32-bit BAR space.
BAR selections are configurable. By default DMA is at BAR 0 (64 bit), AXI4-Lite Master is at BAR
2 (64 bit). These selections can be changed according to user needs.
• BAR: Click the checkbox to enable the BAR. Deselect the checkbox to disable the BAR.
• Type: Select from DMA (by default in BAR0), AXI Lite Master (by default in BAR1, if enabled),
or AXI Bridge Master (by default in BAR2, if enabled). All other BARs, you can select between
AXI List Master and AXI Bridge Master. Expansion ROM can be enabled by selecting BAR6
For 64-bit BAR (default selection), DMA (by default in BAR0), AXI Lite Master (by default in
BAR2, if enabled), and AXI Bridge Master (by default in BAR4, if enabled). Expansion ROM
can be enabled by selection BAR6.
• DMA: DMA by default is assigned to BAR0 space and for all PFs. DMA option can be
selected in any available BAR (only one BAR can have DMA option). If you select DMA
Mailbox Management rather than DMA; however, DMA Mailbox Management will not
allow you to perform any DMA operations. After selecting the DMA Mailbox Management
option, the host has access to the extended Mailbox space. For details about this space, see
the QDMA_PF_MAILBOX (0x22400) register space.
• AXI Lite Master: Select the AXI Lite Master interface option for any BAR space. The Size,
scale and address translation are configurable.
• AXI Bridge Master: Select the AXI Bridge Master interface option for any BAR space. The
Size, scale and address translation are configurable.
• Expansion ROM: When enabled, this space is accessible on the AXI4-Lite Master. This is a
read-only space. The size, scale, and address translation are configurable.
• Size: The available Size range depends on the 32-bit or 64-bit bar selected. The DMA requires
256 KB of space, which is the fixed default selection. Other BAR size selections are available,
but must be specified.
• Value: The value assigned to the BAR based on the current selections.
Note: For best results, disable unused base address registers to conserve system resources. A base address
register is disabled by deselecting unused BARs in the Customize IP dialog box.
• Legacy Interrupt Settings: Select one of the Legacy Interrupts: INTA, INTB, INTC, or INTD.
• MSI Capabilities: By default, MSI Capabilities is enabled, and 1 vector is enabled. You can
choose up to 32 vectors. In general, Linux uses only 1 vector for MSI. This option can be
disabled.
• MSI RX PIN EN: This option is valid only in AXI Bridge Root Port mode.
• MSI-X Capabilities: Select a MSI-X event. For more information, see MSI-X Vector Table and
PBA (0x8).
• Finite Completion Credits: On systems which support finite completion credits, this option
can be enabled for better performance.
• Extended Tag Field: By default, 6-bit completion tags are used. For AMD UltraScale™ and
AMD Virtex™ 7 devices, the Extended Tag option gives 64 tags. For AMD UltraScale+™
devices, the Extended Tag option gives 256 tags. If the Extended Tag option is not selected,
DMA uses 32 tag for all devices.
• Configuration Extend Interface: PCIe extended interface can be selected for more
configuration space. When Configuration Extend Interface is selected, you are responsible for
adding logic to extend the interface to make it work properly.
• Link Status Register: By default, Enable Slot Clock Configuration is selected. This means that
the slot configuration bit is enabled in the link status register.
• Number of Request IDs for Read channel: Select the max number of outstanding request per
channel. Available selection is from 2 to 64.
• Number of Request IDs for Write channel: Select max number of outstanding request per
channel. Available selection is from 2 to 32.
• Descriptor Bypass for Read (H2C): Available for all selected read channels. Each binary digits
corresponds to a channel. LSB corresponds to Channel 0. Value of one in bit position means
corresponding channels has Descriptor bypass enabled.
• Descriptor Bypass for Write (C2H): Available for all selected write channels. Each binary digits
corresponds to a channel. LSB corresponds to Channel 0. Value of one in bit position means
corresponding channels has Descriptor bypass enabled.
• AXI ID Width: The default is 4-bit wide. You can also select 2 bits.
• DMA Status port: DMA status ports are available for all channels.
• LTSSM State Debug Logic: This option shows all the LTSSM state transitions that have been
made starting from link up.
• In System IBERT: This option is used to check and see the eye diagram of the serial link at the
desired link speed. For more information on In System IBERT, refer to In-System IBERT
LogiCORE IP Product Guide (PG246).
IMPORTANT! This option is used mainly for hardware debug purposes. Simulations are not supported
when this option is used.
• Add Mark Debug Utility: This option adds predefined PCIe signals to with mark_debug
attribute so these signals can be added in ILA for debug purpose. Following is the list of
signals:
• m_axis_cq_tdata
• s_axis_cc_tdata
• s_axis_rq_tdata
• m_axis_rc_tdata
• m_axis_cq_tuser
• s_axis_cc_tuser
• m_axis_cq_tlast
• s_axis_rq_tlast
• m_axis_rc_tlast
• s_axis_cc_tlast
• pcie_cq_np_req
• pcie_cq_np_req_count
• s_axis_rq_tuser
• m_axis_rc_tuser
• m_axis_cq_tkeep
• s_axis_cc_tkeep
• s_axis_rq_tkeep
• m_axis_rc_tkeep
• m_axis_cq_tvalid
• s_axis_cc_tvalid
• s_axis_rq_tvalid
• m_axis_rc_tvalid
• m_axis_cq_tready
• s_axis_cc_tready
• s_axis_rq_tready
• m_axis_rc_tready
• Enable Descrambler: This option integrates encrypted version of the descrambler module
inside the PCIe core, which is used to descrambler the PIPE data to/from PCIe integrated
block in Gen3/Gen4 link speed mode only.
• PCIe Debug Ports: With this option enabled, the following ports are available:
• cfg_negotiated_width: cfg_negotiated_width_o
• cfg_current_speed: cfg_current_speed_o
• cfg_ltssm_state: cfg_ltssm_state_o
• cfg_err_cor: cfg_err_cor_o
• cfg_err_fatal: cfg_err_fatal_o
• cfg_err_nonfatal: cfg_err_nonfatal_o
• cfg_local_error: cfg_local_error_o
• cfg_local_error_valid: cfg_local_error_valid_o
The Shared Logic tab for IP in an AMD UltraScale+™ device is shown in the following figure.
For a description of these options, see Chapter 4, “Design Flow Steps” in the respective product
guide listed below:
• UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product Guide (PG156)
• UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide (PG213)
GT Settings Tab
The GT Settings tab is shown in the following figure.
For a description of these options, see Chapter 4, “Design Flow Steps” in the respective product
guide listed below:
• 7 Series FPGAs Integrated Block for PCI Express LogiCORE IP Product Guide (PG054)
• UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product Guide (PG156)
• UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide (PG213)
Output Generation
For details, see Vivado Design Suite User Guide: Designing with IP (UG896).
Required Constraints
The DMA/Bridge Subsystem for PCI Express® requires the specification of timing and other
physical implementation constraints to meet specified performance requirements for PCI
Express. These constraints are provided in a Xilinx Design Constraints (XDC) file. Pinouts and
hierarchy names in the generated XDC correspond to the provided example design.
IMPORTANT! If the example design top file is not used, copy the IBUFDS_GTE3 (for UltraScale+
IBUFDS_GTE4) instance for the reference clock, IBUF Instance for sys_rst and also the location and
timing constraints associated with them into your local design top.
Constraints provided with the integrated block solution have been tested in hardware and
provide consistent results. Constraints can be modified, but modifications should only be made
with a thorough understanding of the effect of each constraint. Additionally, support is not
provided for designs that deviate from the provided constraints.
The device selection portion of the XDC informs the implementation tools which part, package,
and speed grade to target for the design.
IMPORTANT! Because Gen2 and Gen3 Integrated Block for PCIe cores are designed for specific part and
package combinations, this section should not be modified.
The device selection section always contains a part selection line, but can also contain part or
package-specific options. An example part selection line follows:
For detailed information about clock requirements, see the respective product guide listed below:
• 7 Series FPGAs Integrated Block for PCI Express LogiCORE IP Product Guide (PG054)
• Virtex-7 FPGA Integrated Block for PCI Express LogiCORE IP Product Guide (PG023)
• UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product Guide (PG156)
• UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide (PG213)
Banking
Transceiver Placement
1. Copy the constraints for the block that needs to be overwritten from the core-level XDC
constraint file.
2. Place the constraints in the user XDC constraint file.
3. Update the constraints with the new location.
The user XDC constraints are usually scoped to the top-level of the design; therefore, ensure that
the cells referred by the constraints are still valid after copying and pasting them. Typically, you
need to update the module path with the full hierarchy name.
Note: If there are locations that need to be swapped (that is, the new location is currently being occupied
by another module), there are two ways to do this.
• If there is a temporary location available, move the first module out of the way to a new temporary
location first. Then, move the second module to the location that was occupied by the first module.
Next, move the first module to the location of the second module. These steps can be done in XDC
constraint file.
• If there is no other location available to be used as a temporary location, use the reset_property
command from Tcl command window on the first module before relocating the second module to this
location. The reset_property command cannot be done in XDC constraint file and must be called
from the Tcl command file or typed directly into the Tcl Console.
Simulation
This section contains information about simulating IP in the AMD Vivado™ Design Suite.
Basic Simulation
Simulation models for AXI-MM and AXI-ST options can be generates and simulated. These are
very basic simulation model options on which you can develop complicated designs.
AXI-MM Mode
The example design for the AXI4 Memory Mapped (AXI-MM) mode has 4 KB block RAM on the
user side, so data can be written to the block RAM and read from block RAM to the Host. The
first H2C transfer is started and the DMA reads data from the Host memory and writes to the
block RAM. Then, the C2H transfer is started and the DMA reads data from the block RAM and
writes to the Host memory. The original data is compared with the C2H write data.
H2C and C2H are setup with one descriptor each, and the total transfer size is 64 bytes.
AXI-ST Mode
The example design for the AXI4-Stream (AXI_ST) mode is a loopback design. On the user side
the H2C ports are looped back to the C2H ports. First, the C2H transfer is started and the C2H
DMA engine waits for data on the user side. Then, the H2C transfer is started and the DMA
engine reads data from the Host memory and writes to the user side. Because it is a loopback,
design data from H2C is directed to C2H and ends up in the host destination address.
H2C and C2H are setup with one descriptor each, and the total transfer size is 64 bytes.
Interrupts are not used in AMD Vivado™ Design Suite simulations. Instead, descriptor completed
count register is polled to determine transfer complete.
Descriptor Bypass
Simulation models for the descriptor bypass mode is available only for channel 0. This design can
be expanded to support other channels.
Use the Enable PIPE Simulation option on the Basic page of the Customize IP dialog box to
enable PIPE mode simulation in the current AMD Vivado™ Design Suite solution example design,
in either Endpoint mode or Root Port mode. The External PIPE Interface signals are generated at
the core boundary for access to the external device. Enabling this feature also provides the
necessary hooks to use third-party PCI Express VIPs/BFMs instead of the Root Port model
provided with the example design. See also PIPE Mode Simulation Using Integrated Endpoint PCI
Express Block in Gen3 x8 and Gen2 x8 Configurations Application Note (XAPP1184).
The following tables describe the PIPE bus signals available at the top level of the core and their
corresponding mapping inside the EP core (pcie_top) PIPE signals.
IMPORTANT! The xil_sig2pipe.v file is delivered in the simulation directory, and the file replaces
phy_sig_gen.v . BFM/VIPs should interface with the xil_sig2pipe instance in board.v.
PIPE mode simulations are not supported for this core when VHDL is the selected target
language.
Table 131: Common In/Out Commands and Endpoint PIPE Signals Mappings
Table 131: Common In/Out Commands and Endpoint PIPE Signals Mappings (cont'd)
Chapter 6
Example Design
This chapter contains information about the example designs provided in the AMD Vivado™
Design Suite.
FPGA
DMA Subsystem for PCIe
CQ
CC
RC
Figure 22: AXI-MM Example with PCIe to DMA Bypass Interface and PCIe to AXI-Lite
Master Enabled
FPGA
DMA Subsystem for PCIe
CQ AXI4-Lite Master
Block
RAM
CC
FPGA
DMA Subsystem for PCIe
AXI4-Lite Master
CQ
AXI4-Lite Slave
CC
Figure 24: AXI4-Stream Example with PCIe to DMA Bypass Interface and PCIe to AXI-
Lite Master Enabled
FPGA
DMA Subsystem for PCIe
AXI4-Lite Master Block
CQ RAM
CC DMA ST H2C
PCIe AXI-ST
Host DMA
IP*
RQ DMA ST C2H
The following figure shows the AXI-MM example with Descriptor Bypass Mode enabled.
FPGA
DMA Subsystem for PCIe
CQ
AXI-MM Block
RAM
CC
PCIe
Host DMA Des
IP* bypass
RQ
interface
Descriptor
RC
3. In order to add the DMA/Bridge IP to the canvas, search for DMA/Bridge (xdma) IP in the IP
catalog.
After adding the IP to the canvas, the green Designer Assistance information bar appears at
the top of the canvas.
4. Click Run Block Automation from the Designer Assistance information bar.
This opens a Run Block Automation dialog box (shown in the following figure) which lists all
the IP currently in the design eligible for block automation (left pane), and any options
associated with a particular automation (right pane). In this case, there is only the XDMA IP in
the hierarchy in the left pane. The right pane has a description and options available. The
Options can be used to configure the IP as well as decide the level of automation for block
automation.
The Run Block Automation dialog box has an Automation Level option, which can be set to IP
Level or Subsystem Level.
• IP Level: When you select IP level automation, the Block Automation inserts the utility buffer
for the sys_clk input, connects the sys_rst_n input and pcie_mgt output interface for
the XDMA IP, as shown in the following figure.
• Subsystem Level: When you select subsystem level automation, the Block Automation inserts
the necessary sub IPs on the canvas and makes the necessary connections. In addition to
connecting the sys_clk and sys_rst_n inputs it also connects the pcie_mgt output
interface and user_lnk_up, user_clk_heartbeat and user_resetn outputs. It inserts
the AXI interconnect to connect the Block Memory with the XDMA IP through the AXI BRAM
controller. The AXI interconnect has one master interface and multiple slave interfaces when
the AXI4-Lite master and AXI-MM Bypass interfaces are enabled in the Run Block Automation
dialog box. The block automation also inserts Block Memories and AXI BRAM Controllers
when the AXI4-Lite master and AXI-MM Bypass interfaces are enabled.
The example design can be generated using the following Tcl command.
FPGA
DMA Subsystem for PCIe
RQ
AXI-MM Block
RAM
RC
User
PCIe Interrupt
Host DMA 0x0000 –
Endpoint 0x000F Generator
CQ
De-mux
AXI4-Lite Module
0x0800 –
CC 0x0BFF Block
RAM
(1K x 32)
X21787-052019
Register
Register Name Access Type Description
Offset
0x00 Scratch Pad RW Scratch Pad
Register
Register Name Access Type Description
Offset
User Memory Size connected to XDMA.
Memory size = (2[7:4]) ([3:0]Byte)
[7:4] – denotes the size in powers of 2.
0–1
1–2
2–4
…
8 – 256
0x04 DMA BRAM Size RO
9 – 512
[3:0] – denotes unit.
0 – Byte
1 – KB
2 – MB
3 – GB
For example, if the register value is 21, the size is 4
KB. If the register value is 91, the size is 512 KB.
0x08 Interrupt Control Register RW Interrupt control register (write 1 to generate
interrupt).
Interrupt Status register corresponding bit must be
1 (ready) to generate interrupt. Also, reset the
corresponding bit after ISR is served.
0x0C Interrupt Status Register RO Interrupt Status.
1: ready
0: Interrupt generation in progress
Note: In case of Legacy interrupt, the Interrupt Control Register (0x08) value for the corresponding
interrupt bit should only be cleared after the ISR is served as this can be used by the host to determine the
interrupt source.
Chapter 7
Test Bench
This chapter contains information about the test bench provided in the AMD Vivado™ Design
Suite.
Source code for the Root Port Model is included to provide the model for a starting point for
your test bench. All the significant work for initializing the core configuration space, creating TLP
transactions, generating TLP logs, and providing an interface for creating and verifying tests are
complete, allowing you to dedicate efforts to verifying the correct functionality of the design
rather than spending time developing an Endpoint core testbench infrastructure.
• Test Programming Interface (TPI), which allows you to stimulate the Endpoint device for the
PCI Express.
• Example tests that illustrate how to use the test program TPI.
• Verilog source code for all Root Port Model components, which allow you to customize the
test bench.
The following figure shows the Root Port Module with DMA Subsystem for PCIe.
Figure 27: Root Port Module with DMA Subsystem for PCIe
Usrapp_rx Usrapp_tx
dsport
Endpoint subsystem
X15051-010115
Architecture
The Root Port Model, illustrated in the previous figure, consists of these blocks:
The usrapp_tx and usrapp_rx blocks interface with the dsport block for transmission and
reception of TLPs to/from the EndPoint DUT. The Endpoint DUT consists of the DMA Subsystem
for PCIe.
The usrapp_tx block sends TLPs to the dsport block for transmission across the PCI Express
Link to the Endpoint DUT. In turn, the Endpoint DUT device transmits TLPs across the PCI
Express Link to the dsport block, which are subsequently passed to the usrapp_rx block. The
dsport and core are responsible for the data link layer and physical link layer processing when
communicating across the PCI Express logic. Both usrapp_tx and usrapp_rx utilize the
usrapp_com block for shared functions, for example, TLP processing and log file outputting.
The DMA Subsystem for PCIe uses the 7 series Gen2 Integrated Block for PCIe, the 7 series
Gen3 Integrated Block for PCIe, the AMD UltraScale™ Devices Gen3 Integrate Block for PCIe,
and the AMD UltraScale+™ Devices Integrate Block for PCIe. See the “Test Bench” chapter in the
appropriate guide:
• 7 Series FPGAs Integrated Block for PCI Express LogiCORE IP Product Guide (PG054)
• Virtex-7 FPGA Integrated Block for PCI Express LogiCORE IP Product Guide (PG023)
• UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product Guide (PG156)
• UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide (PG213)
Test Case
The DMA Subsystem for PCIe can be configured as AXI4 Memory Mapped (AXI-MM) or AXI4-
Stream (AXI-ST) interface. The simulation test case reads configuration register to determine if a
AXI4 Memory Mapped or AXI4-Stream configuration. The test case, based on the AXI settings,
performs simulation for either configuration.
Simulation
Simulation is set up to transfer one descriptor in H2C and one descriptor in C2H direction.
Transfer size is set to 128 bytes in each descriptor. For both AXI-MM and AXI-Stream, data is
read from Host and sent to Card (H2C). Then data is read from Card and sent to Host (C2H). Data
read from Card is compared with original data for data validity.
Limitations:
• Simulation does not support Interrupts. Test case just reads status and complete descriptor
count registers to decide if transfer is completed.
• Simulations are done only for Channel 0. In a future release, multi channels simulations will be
enabled.
• Transfer size is limited to 128 bytes and only one descriptor.
• Root port simulation model is not a complete BFM. Simulation supports one descriptor
transfer which shows a basic DMA procedure.
• By default, post-synthesis simulation is not supported for the example design. To enable post-
synthesis simulation, generate the IP using the following Tcl command:
set_property -dict [list CONFIG.post_synth_sim_en {true}] [get_ips
<ip_name>]
Note: With this feature, functional simulation time increases to approximately 2.5 ms.
1. The test case sets up one descriptor for the H2C engine.
2. The H2C descriptor is created in the Host memory. The H2C descriptor gives data length 128
bytes, source address (host), and destination address (Card).
3. The test case writes data (incremental 128 bytes of data) in the source address space.
4. The test case also sets up one descriptor for the C2H engine.
5. The C2H descriptor gives data length 128 bytes, source address (Card), and destination
address (host).
6. Write H2C descriptor starting address to register (0x4080 and 0x4084).
7. Write to H2C control register to start H2C transfer address (0x0004). Bit 0 is set to 1 to start
the transfer. For details of control register, refer to H2C Channel Control (0x04).
8. The DMA transfer takes the data host source address and sends to the block RAM
destination address.
9. The test case then starts the C2H transfer.
10. Write C2H descriptor starting address to register (0x5080 and0x5084).
11. Write to C2H control register to start the C2H transfer (0x1004). Bit 0 is set to 1 to start the
transfer. For details of control the register, see C2H Channel Control (0x04).
12. The DMA transfer takes data from the block RAM source address and sends data to the host
destination address.
13. The test case then compares the data for correctness.
14. The test case checks for the H2C and C2H descriptor completed count (value of 1).
15. The test case then disables transfer by deactivating the Run bit (bit0) in the Control registers
(0x0004 and 0x1004) for the H2C and C2H engines.
AXI4-Stream Interface
For AXI4-Stream, the example design is a loopback design. Channel H2C_0 is looped back to
C2H_0 (and so on) for all other channels. First, the test case starts the C2H engine. The C2H
engine waits for data that is transmitted by the H2C engine. Then, the test case starts the H2C
engine. The H2C engine reads data from host and sends to the Card, which is looped back to the
C2H engine. The C2H engine then takes the data, and writes back to host memory. The following
are the simulation steps:
1. The test case sets up one descriptor for the H2C engine.
2. The H2C descriptor is created in the Host memory. The H2C descriptor gives the data length
128 bytes, Source address (host), and Destination address (Card).
3. The test case writes data (incremental 128 bytes of data) in the Host source address space.
4. The test case also sets up one descriptor for the C2H engine in Host memory.
5. The C2H descriptor gives data length 128 bytes, source address (Card), and destination
address (host).
6. Write C2H descriptor starting address to register (0x5080 and 0x5084).
7. Write to the C2H control register to start the C2H transfer first.
8. The C2H engine starts and waits for data to come from the H2C ports.
9. Write H2C descriptor starting address to register (0x4080 and 0x4084).
10. Write to the H2C control register to start H2C transfer.
11. The H2C engine takes data from the host source address to the Card destination address.
12. The data is looped back to the C2H engine.
13. The C2H engine read data from the Card and writes it back to the Host memory destination
address.
14. The test case checks for the H2C and C2H descriptor completed count (value of 1).
15. The test case then compares the data for correctness.
16. The test case then disables transfer by deactivating the Run bit (bit 0) in the Control registers
0x0004 and 0x1004 for the H2C and C2H engines.
13. The test case then disables the transfer by deasserting the Run bit (bit 0) in the Control
register for the H2C and C2H engine (0x0004 and 0x1004).
When the transfer is started, one H2C and one C2H descriptor are transferred in Descriptor
bypass interface and after that DMA transfers are performed as explained in above section.
Descriptor is setup for 64 bytes transfer only.
Simulation Updates
Following is an overview of how existing root port tasks can be modified to exercise multi-
channels, and multi descriptor cases.
The same procedure applies for AXI4-Stream configuration. Refer to the above section for
detailed explanation of the AXI4-Stream transfer.
3. The DSC_H2C_1 descriptor has 128 bytes for DMA transfer, Host address S1 (source) and
destination address D1 (card).
4. Create a new descriptor (named DSC_H2C_2) in the Host memory at address DSC2 that is
different from DSC_H2C_1 Descriptor.
5. The DSC_H2C_2 descriptor has 128 bytes for DMA transfer, Host address S2 (source) and
destination address D2 (card).
6. Link these two descriptors by adding next descriptor address in DSC_H2C_1. Write DSC2 in
next descriptor field.
7. Wire the descriptor starting address to H2C Channel 0.
8. Enable DMA transfer for H2C Channel 0 by writing the Run bit in Control register 0x0004.
Test Tasks
Table 135: Test Tasks
Name Description
TSK_INIT_DATA_H2C This task generates one descriptor for H2C engine and initializes source data in
host memory.
TSK_INIT_DATA_C2H This task generates one descriptor for C2H engine.
TSK_XDMA_REG_READ This task reads the DMA Subsystem for PCIe register.
TSK_XDMA_REG_WRITE This task writes the DMA Subsystem for PCIe register.
COMPARE_DATA_H2C This task compares source data in the host memory to destination data written
to block RAM. This task is used in AXI4 Memory Mapped simulation.
COMPARE_DATA_C2H This task compares the original data in the host memory to the data C2H engine
writing to host. This task is used in AXI4 Memory Mapped simulation.
TSK_XDMA_FIND_BAR This task finds XDMA configuration space between different enabled BARs (BAR0
to BAR6).
For other PCIe-related tasks, see the “Test Bench” chapter in the 7 Series FPGAs Integrated Block
for PCI Express LogiCORE IP Product Guide (PG054), Virtex-7 FPGA Integrated Block for PCI Express
LogiCORE IP Product Guide (PG023), UltraScale Devices Gen3 Integrated Block for PCI Express
LogiCORE IP Product Guide (PG156), or UltraScale+ Devices Integrated Block for PCI Express
LogiCORE IP Product Guide (PG213).
Appendix A
GT Locations
For more information on GT Locations, see GT Locations appendix in UltraScale+ Devices
Integrated Block for PCI Express LogiCORE IP Product Guide (PG213).
Appendix B
Device Drivers
Figure 28: Device Drivers
Test App
User Space
Kernel Space
DMA Driver
PCIe
AMD Device
(DMA Example Design)
X28828-110923
The above figure shows the usage model of Linux XDMA software drivers. The DMA/Bridge
Subsystem for PCIe example design is implemented on an AMD FPGA , which is connected to an
X86 host through PCI Express. In this mode, the XDMA driver in kernel space runs on Linux,
whereas the test application runs in user space.
• Control character device for controlling DMA/Bridge Subsystem for PCI Express®
components.
• Events character device for waiting for interrupt events.
• SGDMA character devices for high performance transfers.
Interrupt Processing
Legacy Interrupts
There are four types of legacy interrupts: A, B, C and D. You can select any interrupts in the PCIe
Misc tab under Legacy Interrupt Settings. You must program the corresponding values for both
the IRQ Block Channel Vector (see IRQ Block Channel Vector Number (0xA0)) and the IRQ Block
User Vector (see IRQ Block User Vector Number (0x80)). Values for each legacy interrupts are A
= 0, B = 1, C = 2 and D = 3. The host recognizes interrupts only based on these values.
MSI Interrupts
For MSI interrupts, you can select from 1 to 32 vectors in the PCIe Misc tab under MSI
Capabilities, which consists of a maximum of 16 usable DMA interrupt vectors and a maximum of
16 usable user interrupt vectors. The Linux operating system (OS) supports only 1 vector. Other
operating systems might support more vectors and you can program different vectors values in
the IRQ Block Channel Vector (see IRQ Block Channel Vector Number (0xA0)) and in the IRQ
Block User Vector (see IRQ Block User Vector Number (0x80)) to represent different interrupt
sources. The AMD Linux driver supports only 1 MSI vector.
MSI-X Interrupts
The DMA supports up to 32 different interrupt source for MSI-X, which consists of a maximum
of 16 usable DMA interrupt vectors and a maximum of 16 usable user interrupt vectors. The
DMA has 32 MSI-X tables, one for each source (see MSI-X Vector Table and PBA (0x00–0xFE0)).
For MSI-X channel interrupt processing the driver should use the Engine’s Interrupt Enable Mask
for H2C and C2H (see H2C Channel Interrupt Enable Mask (0x90) or Table C2H Channel
Interrupt Enable Mask (0x90)) to disable and enable interrupts.
User Interrupts
The user logic must hold usr_irq_req active-High even after receiving usr_irq_ack (acks)
to keep the interrupt pending register asserted. This enables the Interrupt Service Routine (ISR)
within the driver to determine the source of the interrupt. Once the driver receives user
interrupts, the driver or software can reset the user interrupts source to which hardware should
respond by deasserting usr_irq_req.
Appendix C
Upgrading
This appendix contains information about upgrading to a more recent version of the IP.
New Parameters
The following new parameters are added in the IP in the current release.
New Ports
The ports in the following table appear at the boundary when the Internal Shared GT_COMMON
and Clocking option is selected in the Shared Logic tab for 7 series Gen2 devices.
Table 137: Ports For Shared Logic (Internal Shared GT_COMMON and Clocking Option)
Table 137: Ports For Shared Logic (Internal Shared GT_COMMON and Clocking Option)
(cont'd)
The ports in the following table appear at the boundary when the Shared GT_COMMON option is
selected in the Share Logic tab for 7 series Gen2 devices.
The ports in the following table appear at the boundary when the Shared Clocking option is
selected in the Share Logic tab for 7 series Gen2 devices.
Table 139: Ports For Shared Logic (Shared Clocking Option) (cont'd)
The following table shows the new port added in this version of the IP. This port is available at
the boundary when MSI-X feature is enabled and the device type is PCIe Endpoint.
Appendix D
Debugging
This appendix includes details about resources available on the AMD Support website and
debugging tools.
Documentation
This product guide is the main document associated with the subsystem. This guide, along with
documentation related to all products that aid in the design process, can be found on the Support
web page or by using the AMD Adaptive Computing Documentation Navigator. Download the
Documentation Navigator from the Downloads page. For more information about this tool and
the features available, open the online help after installation.
Debug Guide
For more information on PCIe debug, see PCIe Debug K-Map.
Answer Records
Answer Records include information about commonly encountered problems, helpful information
on how to resolve these problems, and any known issues with an AMD Adaptive Computing
product. Answer Records are created and maintained daily to ensure that users have access to
the most accurate information available.
Answer Records for this subsystem can be located by using the Search Support box on the main
Support web page. To maximize your search results, use keywords such as:
• Product name
• Tool message(s)
• Summary of the issue encountered
A filter search is available after results are returned to further target the results.
Technical Support
AMD Adaptive Computing provides technical support on the Community Forums for this AMD
LogiCORE™ IP product when used as described in the product documentation. AMD Adaptive
Computing cannot guarantee timing, functionality, or support if you do any of the following:
• Implement the solution in devices that are not defined in the documentation.
• Customize the solution beyond that allowed in the product documentation.
• Change any section of the design labeled DO NOT MODIFY.
Debug Tools
There are many tools available to address DMA/Bridge Subsystem for PCIe design issues. It is
important to know which tools are useful for debugging various situations.
The Vivado logic analyzer is used to interact with the logic debug LogiCORE IP cores, including:
See the Vivado Design Suite User Guide: Programming and Debugging (UG908).
Reference Boards
Various AMD development boards support the DMA/Bridge Subsystem for PCIe core. These
boards can be used to prototype designs and establish that the core can communicate with the
system.
○ KC705
○ VCU108
• AMD UltraScale+™
○ KCU116
○ VCU118
○ ZCU106
Hardware Debug
Hardware issues can range from link bring-up to problems seen after hours of testing. This
section provides debug steps for common issues. The AMD Vivado™ debug feature is a valuable
resource to use in hardware debug. The signal names mentioned in the following individual
sections can be probed using the debug feature for debugging the specific problems.
General Checks
Ensure that all the timing constraints for the core were properly incorporated from the example
design and that all constraints were met during implementation.
• Does it work in post-place and route timing simulation? If problems are seen in hardware but
not in timing simulation, this could indicate a PCB issue. Ensure that all clock sources are
active and clean.
• If using MMCMs in the design, ensure that all MMCMs have obtained lock by monitoring the
locked port.
• If your outputs go to 0, check your licensing.
Read H2C/C2H Channel status register 0x40 to see if there are any errors registered. Also check
if bit [0] busy bit is set to 1. If it is set to 1, then DMA is waiting for some user event to happen.
Read H2C/C2H Channel Completed Descriptor Count (0x48) to see how many descriptors are
fetched by the DMA, and compare it with what it should be, based on the DMA transfer. If MSIX
interrupt mode transfers have issues, try in poll mode to see if transfers go through.
Software Debug
Using AMD driver can help you debug some issues. Compile the driver in DEBUG mode which
enables verbose mode and prints more information for a transfer. Check the dmesg for a run to
see all the steps are listed for a dma transfer.
Descriptor used for a transfer is printed so check the descriptor to make sure source/destination
address is listed correctly. Check the transfer length for that descriptor is correct. Make sure that
the descriptor magic word is correct. Follow the dmesg log to see if there are any errors
recorded.
ILA Debug
You can add ILA on the output and input of the IP on the AXI side to see if any abnormal
transfers are happening. You can check if the packet sent into the IP is proper and it matches
with what is expected. You can add ILA in the PCIe core CQ/CC and RQ/RC interface, this lets
you know how the DMA IP is fetching the descriptor and data. You need to use all the above
suggestions to figure out the issue.
Related Information
Appendix E
XVC-over-PCIe should be used to perform FPGA debug remotely using the Vivado Design Suite
debug feature when JTAG debug is not available. This is commonly used for data center
applications where the FPGA is connected to a PCIe Host system without any other connections
to the hardware device.
Using debug over XVC requires software, driver, and FPGA hardware design components. Since
there is an FPGA hardware design component to XVC-over-PCIe debug, you cannot perform
debug until the FPGA is already loaded with an FPGA hardware design that implements XVC-
over-PCIe and the PCIe link to the Host PC is established. This is normally accomplished by
loading an XVC-over-PCIe enabled design into the configuration flash on the board prior to
inserting the card into the data center location. Since debug using XVC-over-PCIe is dependent
on the PCIe communication channel this should not be used to debug PCIe link related issue.
IMPORTANT! XVC only provides connectivity to the debug cores within the FPGA. It does not provide the
ability to program the device or access device JTAG and configuration registers. These operations can be
performed through other standard AMD interfaces or peripherals such as the PCIe MCAP VSEC and
HWICAP IP.
Overview
The main components that enable XVC-over-PCIe debug are as follows:
These components are provided as a reference on how to create XVC connectivity for AMD
FPGA designs. These three components are shown in the following figure and connect to the
Vivado Design Suite debug feature through a TCP/IP socket.
Running on local or remote Running on Host PC connected to AMD FPGA card Running on AMD FPGA
Host
X18837-032119
The Debug Bridge IP, when configured for From PCIe to BSCAN or From AXI to BSCAN,
provides a connection point for the AMD debug network from either the PCIe Extended
Capability or AXI4-Lite interfaces respectively. Vivado tool automation connects this instance of
the Debug Bridge to the AMD debug cores found in the design rather than connecting them to
the JTAG BSCAN interface. There are design trade-offs to connecting the debug bridge to the
PCIe Extended Configuration Space or AXI4-Lite. The following sections describe the
implementation considerations and register map for both implementations.
The PCIe Extended Configuration Interface uses PCIe configuration transactions rather than PCIe
memory BAR transactions. While PCIe configuration transactions are much slower, they do not
interfere with PCIe memory BAR transactions at the PCIe IP boundary. This allows for separate
data and debug communication paths within the FPGA. This is ideal if you expect to debug the
datapath. Even if the datapath becomes corrupt or halted, the PCIe Extended Configuration
Interface can remain operational to perform debug. The following figure describes the
connectivity between the PCIe IP and the Debug Bridge IP to implement the PCIe-XVC-VSEC.
Note: Although the previous figure shows the AMD UltraScale+™ Devices Integrated Block for PCIe IP,
other PCIe IP (that is, the AMD UltraScale™ Devices Integrated Block for PCIe, AXI Bridge for PCIe, or PCIe
DMA IP) can be used interchangeably in this diagram.
Note: Although the previous figure shows the PCIe DMA IP, any AXI-enabled PCIe IP can be used
interchangeably in this diagram.
The AXI-XVC implementation allows for higher speed transactions. However, XVC debug traffic
passes through the same PCIe ports and interconnect as other PCIe control path traffic, making it
more difficult to debug transactions along this path. As result the AXI-XVC debug should be used
to debug a specific peripheral or a different AXI network rather than attempting to debug
datapaths that overlap with the AXI-XVC debug communication path.
The PCIe-XVC-VSEC and AXI-XVC have a slightly different register map that must be taken into
account when designing XVC drivers and software. The register maps in the following tables
show the byte-offset from the base address.
• The PCIe-XVC-VSEC base address must fall within the valid range of the PCIe Extended
Configuration space. This is specified in the Debug Bridge IP configuration.
• The base address of an AXI-XVC Debug Bridge is the offset for the Debug Bridge IP peripheral
that was specified in the Vivado Address Editor.
The following tables describe the register map for the Debug Bridge IP as an offset from the base
address when configured for the From PCIe-Ext to BSCAN or From AXI to BSCAN modes.
Register
Register Name Description Register Type
Offset
0x00 PCIe Ext Capability Header PCIe defined fields for VSEC use. Read Only
0x04 PCIe VSEC Header PCIe defined fields for VSEC use. Read Only
0x08 XVC Version Register IP version and capabilities information. Read Only
0x0C XVC Shift Length Register Shift length. Read Write
0x10 XVC TMS Register TMS data. Read Write
0x14 XVC TDIO Register TDO/TDI data. Read Write
0x18 XVC Control Register General control register. Read Write
0x1C XVC Status Register General status register. Read Only
Register
Register Name Description Register Type
Offset
0x00 XVC Shift Length Register Shift length. Read Write
0x04 XVC TMS Register TMS data. Read Write
0x08 XVC TDI Register TDI data. Read Write
0x0C XVC TDO Register TDO data. Read Only
0x10 XVC Control Register General control register. Read Write
0x14 XVC Status Register General status register. Read Only
0x18 XVC Version Register IP version and capabilities information. Read Only
Bit Initial
Field Description Type
Location Value
PCIe
This field is a PCI-SIG defined ID number that indicates the nature
Extended
15:0 and format of the Extended Capability. The Extended Capability ID 0x000B Read Only
Capability
for a VSEC is 0x000B
ID
This field is a PCI-SIG defined version number that indicates the
Capability
19:16 version of the capability structure present. Must be 0x1 for this 0x1 Read Only
Version
version of the specification.
This field is passed in from the user and contains the offset to the
next PCI Express Capability structure or 0x000 if no other items
Next
exist in the linked list of capabilities. For Extended Capabilities
31:20 Capability 0x000 Read Only
implemented in the PCIe extended configuration space, this value
Offset
must always be within the valid range of the PCIe Extended
Configuration space.
Bit Initial
Field Description Type
Location Value
This field is the ID value that can be used to identify the PCIe-XVC-
15:0 VSEC ID 0x0008 Read Only
VSEC and is specific to the Vendor ID (0x10EE for AMD).
This field is the Revision ID value that can be used to identify the
19:16 VSEC Rev 0x0 Read Only
PCIe-XVC-VSEC revision.
This field indicates the number of bytes in the entire PCIe-XVC-
VSEC
31:20 VSEC structure, including the PCIe Ext Capability Header and PCIe 0x020 Read Only
Length
VSEC Header registers.
This register is used to set the scan chain shift length within the debug scan chain.
This register is used to set the TMS data within the debug scan chain.
This register is used for TDO/TDI data access. When using PCIePCI-XVC-VSEC, these two
registers are combined into a single field. When using AXI-XVC, these are implemented as two
separate registers.
When operating in PCIe-XVC-VSEC mode, the driver will initiate PCIe configuration transactions
to interface with the FPGA debug network. When operating in AXI-XVC mode, the driver will
initiate 32-bit PCIe Memory BAR transactions to interface with the FPGA debug network. By
default, the driver will attempt to discover the PCIe-XVC-VSEC and use AXI-XVC if the PCIe-
XVC-VSEC is not found in the PCIe configuration extended capability linked list.
The driver is provided in the data directory of the Vivado installation as a .zip file. This .zip
file should be copied to the Host PC connected through PCIe to the AMD FPGA and extracted
for use. README.txt files have been included; review these files for instructions on installing
and running the XVC drivers and software.
To add the BSCAN interface to the Reconfigurable Partition definition the appropriate ports and
port attributes should be added to the Reconfigurable Partition definition. The sample Verilog
provided below can be used as a template for adding the BSCAN interface to the port
declaration.
...
// BSCAN interface definition and attributes.
// This interface should be added to the DFX module definition
// and left unconnected in the DFX module instantiation.
(* X_INTERFACE_INFO = "xilinx.com:interface:bscan:1.0 S_BSCAN drck" *)
(* DEBUG="true" *)
input S_BSCAN_drck,
(* X_INTERFACE_INFO = "xilinx.com:interface:bscan:1.0 S_BSCAN shift" *)
(* DEBUG="true" *)
input S_BSCAN_shift,
(* X_INTERFACE_INFO = "xilinx.com:interface:bscan:1.0 S_BSCAN tdi" *)
(* DEBUG="true" *)
input S_BSCAN_tdi,
(* X_INTERFACE_INFO = "xilinx.com:interface:bscan:1.0 S_BSCAN update" *)
(* DEBUG="true" *)
input S_BSCAN_update,
(* X_INTERFACE_INFO = "xilinx.com:interface:bscan:1.0 S_BSCAN sel" *)
(* DEBUG="true" *)
input S_BSCAN_sel,
(* X_INTERFACE_INFO = "xilinx.com:interface:bscan:1.0 S_BSCAN tdo" *)
(* DEBUG="true" *)
output S_BSCAN_tdo,
(* X_INTERFACE_INFO = "xilinx.com:interface:bscan:1.0 S_BSCAN tms" *)
(* DEBUG="true" *)
input S_BSCAN_tms,
(* X_INTERFACE_INFO = "xilinx.com:interface:bscan:1.0 S_BSCAN tck" *)
(* DEBUG="true" *)
input S_BSCAN_tck,
(* X_INTERFACE_INFO = "xilinx.com:interface:bscan:1.0 S_BSCAN runtest" *)
(* DEBUG="true" *)
input S_BSCAN_runtest,
When link_design is run, the exposed ports are connected to the static portion of the debug
network through tool automation. The ILAs are also connected to the debug network as required
by the design. There might also be an additional dbg_hub cell that is added at the top level of
the design. For Tandem PCIe with Field Updates designs, the dbg_hub and tool inserted clock
buffer(s) must be added to the appropriate design partition. The following is an example of the
Tcl commands that can be run after opt_design to associate the dbg_hub primitives with the
appropriate design partitions.
The PCIe-XVC-VSEC can be added to the AMD UltraScale+™ PCIe example design by selecting
the following options.
Note: Although the previous figure shows to the UltraScale+ Devices Integrated Block for PCIe IP, the
example design hierarchy is the same for other PCIe IPs.
9. Double-click the Debug Bridge IP identified as xvc_vsec to view the configuration option
for this IP. Make note of the following configuration parameters because they will be used to
configure the driver.
• PCIe XVC VSEC ID (default 0x0008)
• PCIe XVC VSEC Rev ID (default 0x0)
IMPORTANT! Do not modify these parameter values when using an AMD Vendor ID or provided XVC
drivers and software. These values are used to detect the XVC extended capability. (See the PCIe
specification for additional details.)
10. In the Flow Navigator, click Generate Bitstream to generate a bitstream for the example
design project. This bitstream will be then be loaded onto the FPGA board to enable XVC
debug over PCIe.
After the XVC-over-PCIe hardware design has been completed, an appropriate XVC enabled
PCIe driver and associated XVC-Server software application can be used to connect the Vivado
Design Suite to the PCIe connected FPGA. Vivado can connect to an XVC-Server application that
is running local on the same Machine or remotely on another machine using a TCP/IP socket.
System Bring-Up
The first step is to program the FPGA and power on the system such that the PCIe link is
detected by the host system. This can be accomplished by either:
• Programming the design file into the flash present on the FPGA board, or
• Programming the device directly via JTAG.
If the card is powered by the Host PC, it will need to be powered on to perform this
programming using JTAG and then re-started to allow the PCIe link to enumerate. After the
system is up and running, you can use the Linux lspci utility to list out the details for the
FPGA-based PCIe device.
The XVC driver and software are provide as a ZIP file included with the Vivado Design Suite
installation.
1. Copy the ZIP file from the Vivado install directory to the FPGA connected Host PC and
extract (unzip) its contents. This file is located at the following path within the Vivado
installation directory.
XVC Driver and SW Path: …/data/xicom/driver/pcie/xvc_pcie.zip
The README.txt files within the driver_* and xvcserver directories identify how to
compile, install, and run the XVC drivers and software, and are summarized in the following
steps. Follow the following steps after the driver and software files have been copied to the
Host PC and you are logged in as a user with root permissions.
2. Modify the variables within the driver_*/xvc_pcie_user_config.h file to match your
hardware design and IP settings. Consider modifying the following variables:
• PCIE_VENDOR_ID: The PCIe Vendor ID defined in the PCIe® IP customization.
• PCIE_DEVICE_ID: The PCIe Device ID defined in the PCIe® IP customization.
• Config_space: Allows for the selection between using a PCIe-XVC-VSEC or an AXI-XVC
peripheral. The default value of AUTO first attempts to discover the PCIe-XVC-VSEC, then
attempts to connect to an AXI-XVC peripheral if the PCIe-XVC-VSEC is not found. A value
of CONFIG or BAR can be used to explicitly select between PCIe®-XVC-VSEC and AXI-
XVC implementations, as desired.
• config_vsec_id: The PCIe XVC VSEC ID (default 0x0008) defined in the Debug Bridge IP
when the Bridge Type is configured for From PCIE to BSCAN. This value is only used for
detection of the PCIe®-XVC-VSEC.
• config_vsec_rev: The PCIe XVC VSEC Rev ID (default 0x0) defined in the Debug Bridge IP
when the Bridge Type is configured for From PCIe to BSCAN. This value is only used for
detection of the PCIe-XVC-VSEC.
• bar_index: The PCIe BAR index that should be used to access the Debug Bridge IP when
the Bridge Type is configured for From AXI to BSCAN. This BAR index is specified as a
combination of the PCIe IP customization and the addressable AXI peripherals in your
system design. This value is only used for detection of an AXI-XVC peripheral.
• bar_offset: PCIe BAR Offset that should be used to access the Debug Bridge IP when the
Bridge Type is configured for From AXI to BSCAN. This BAR offset is specified as a
combination of the PCIe IP customization and the addressable AXI peripherals in your
system design. This value is only used for detection of an AXI-XVC peripheral.
3. Move the source files to the directory of your choice. For example, use:
/home/username/xil_xvc or /usr/local/src/xil_xvc
4. Make sure you have root permissions and change to the directory containing the driver files.
# cd /driver_*/
/lib/modules/[KERNEL_VERSION]/kernel/drivers/pci/pcie/Xilinx/
xil_xvc_driver.ko
6. Run the depmod command to pick up newly installed kernel modules:
# depmod -a
If you run the dmesg command, you will see the following message:
kernel: xil_xvc_driver: Starting…
Note: You can also use insmod on the kernel object file to load the module:
# insmod xil_xvc_driver.ko
However, this is not recommended unless necessary for compatibility with older kernels.
9. The resulting character file, /dev/xil_xvc/cfg_ioc0, is owned by user root and group
root, and it will need to have permissions of 660. Change permissions on this file if it does
not allow the application to interact with the driver.
# chmod 660 /dev/xil_xvc/cfg_ioc0
You should see various successful tests of differing lengths, followed by the following
message:
"XVC PCIE Driver Verified Successfully!"
1. Make sure the firewall settings on the system expose the port that will be used to connect to
the Vivado Design Suite. For this example, port 10200 is used.
2. Make note of the host name or IP address. The host name and port number will be required
to connect Vivado to the xvcserver application. See the OS help pages for information
regarding the firewall port settings for your OS.
3. Move the source files to the directory of your choice. For example, use:
/home/username/xil_xvc or /usr/local/src/xil_xvc
4. Change to the directory containing the application source files:
# cd ./xvcserver/
After the Vivado Design Suite has connected to the XVC-server application you should see
the following message from the XVC-server.
Enable verbose by setting VERBOSE evn var.
Opening /dev/xil_xvc/cfg_ioc0
8. Select the newly added XVC target from the Hardware Targets table, and click Next.
9. Click Finish.
10. In the Hardware Device Properties panel, select the debug bridge target, and assign the
appropriate probes .ltx file.
Vivado now recognizes your debug cores and debug signals, and you can debug your design
through the Vivado hardware tools interface using the standard debug approach.
This allows you to debug AMD FPGA designs through the PCIe connection rather than JTAG
using the Xilinx Virtual Cable technology. You can terminate the connection by closing the
hardware server from Vivado using the right-click menu. If the PCIe connection is lost or the
XVC-Server application stops running, the connection to the FPGA and associated debug cores
will also be lost.
For DFX designs, it is important to terminate the connection during DFX operations. During a
DFX operation where debug cores are present inside the dynamic region, a portion of the debug
tree is expected to be reprogrammed. Vivado debug tools should not be actively communicating
with the FPGA through XVC during a DFX operation.
Appendix F
The AMD Adaptive Computing Documentation Portal is an online tool that provides robust
search and navigation for documentation using your web browser. To access the Documentation
Portal, go to https://docs.xilinx.com.
Documentation Navigator
Documentation Navigator (DocNav) is an installed tool that provides access to AMD Adaptive
Computing documents, videos, and support resources, which you can filter and search to find
information. To open DocNav:
• From the AMD Vivado™ IDE, select Help → Documentation and Tutorials.
• On Windows, click the Start button and select Xilinx Design Tools → DocNav.
• At the Linux command prompt, enter docnav.
Note: For more information on DocNav, refer to the Documentation Navigator User Guide (UG968).
Design Hubs
AMD Design Hubs provide links to documentation organized by design tasks and other topics,
which you can use to learn key concepts and address frequently asked questions. To access the
Design Hubs:
Support Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see Support.
References
These documents provide supplemental material useful with this guide:
Revision History
The following table shows the revision history for this document.
• Updated PCIe: BARs tab, PCIe: Misc tab, and PCIe: DMA
tab.
Copyright
© Copyright 2016-2023 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, Artix, Kintex,
UltraScale+, Versal, Virtex, Vivado, Zynq, and combinations thereof are trademarks of Advanced
Micro Devices, Inc. PCI, PCIe, and PCI Express are trademarks of PCI-SIG and used under license.
AMBA, AMBA Designer, Arm, ARM1176JZ-S, CoreSight, Cortex, PrimeCell, Mali, and MPCore
are trademarks of Arm Limited in the US and/or elsewhere. Other product names used in this
publication are for identification purposes only and may be trademarks of their respective
companies.