Pciexpress Overview
Pciexpress Overview
OUTLINE
• PCI Express overview
• PCI architecture
➢ PCI Express link
➢ bus topology
➢ architecture layers
➢ transactions
➢ interrupts
• Introduced as "Third Generation I/O" (3GIO), PCI Express (PCIe) superseded both PCI and PCI-
X and new motherboards may come with a mix of PCI and PCIe slots or only PCIe.
PCI Express overview
PCIe is a Switched Architecture - Multiple Lanes* rather
than the shared parallel-bus structure of PCI
• PCIe provides a switched architecture of channels that
can be combined in x2, x4, x8, x16 and x32
configurations, creating a parallel interface of
independently controlled "lanes."
• The switch backplane gives the total bandwidth, and
cards and motherboards are compatible between
versions.
• For comparisons of all PCI technologies, see PCI-SIG,
PCI, ExpressCard, PCI-X, SATAExpress and
Thunderbolt.
https://www.yourdictionary.com/pci-express#computer
6
Year created 2004
Created by Intel · Dell · HP · IBM
Supersedes AGP · PCI · PCI-X
Width in bits 1–32
One device each on each endpoint of each
connection.
Number of devices PCI Express switches can create multiple endpoints
out of one endpoint to allow sharing one endpoint
with multiple devices.
Per lane (each direction):
•v1.x: 250 MB/s (2.5 GT/s)
•v2.x: 500 MB/s (5 GT/s)
•v3.0: 985 MB/s (8 GT/s)
•v4.0: 1969 MB/s (16 GT/s)
Capacity
So, a 16-lane slot (each direction):
•v1.x: 4 GB/s (40 GT/s)
•v2.x: 8 GB/s (80 GT/s)
•v3.0: 15.75 GB/s (128 GT/s)
•v4.0: 31.51 GB/s (256 GT/s)
Style Serial
Yes, if Express Card, Mobile PCI Express Module or
Hotplugging interface
XQD card
Yes, with PCI Express External Cabling, such as 7
External interface
Thunderbolt
PCI express features
• PCI Express architecture is a high performance, I/O interconnect for peripherals in
computing communication platforms.
• Evolved from PCI and PCI-X architectures and uses the same communication model
as these buses.
• The same address spaces are retained: memory, I/O and configuration.
• PCI and PCI-X generations shared parallel buses, the PCIe bus uses a serial point-to-
point interconnect for communication between two peripheral devices.
• PCIe implements packet-based protocol for information transfer
• Scalable performance based on number of signal lanes* implemented on the PCIe
interconnect (dual simplex)
• The PCIe bus allows the same types of transactions as the previous buses: memory
read/write, I/O read/write, and configuration read/write…
• The compatibility is maintained with existing OS and software drivers, which do not
require changes.
8
PCI Express features
• The interface is serial, which enables to reduce the pin count and to simplify the
interconnections
• It unifies the I/O architecture for different types of systems and embedded systems
• It enables to interconnect IC on the motherboard and expansion cards via connectors
or cables
• The communication is based on packets with high transfer rate and efficiency
• The bus is scalable, by ability to implement a particular interconnection via several
communication lanes
• The software model is compatible with the classical PCI architecture, which allows to
configure PCIe devices, to use existing software drivers, without the need for changes
• It provides a differentiated quality of service (QoS) through the ability to allocate
dedicated resources for certain data flows, to configure the QoS arbitration policies
for each component, and to use isochronous transfers for real-time applications
• It provides an advanced power management through the ability to identify power
management capabilities of each peripheral device
• It ensures link-level data integrity for all types of transactions.
• It supports advanced error reporting and handling to improve fault isolation and error
recovery
• It supports hot-plugging and hot-swapping of peripheral devices
PCI Express Topology
• PCIe system is comprised of PCIe links that interconnect a set of components
• An example of topology referred to as a hierarchy – composed of:
- a Root Complex ,
- multiple Endpoints (I/O devices),
- a Switch
- a PCI Express to PCI/PCI-X Bridge, all interconnected via PCI Express Links
11
Root Complex (RC)
● Root Complex (RC) – is the device that connects one or more processors and the memory
subsystem to the I/O devices.
● RC device represents the root of an I/O hierarchy
● Similar to a host bridge in a PCI system : - RC generates transaction requests on behalf of the
processor, which is interconnected through a local bus.
- RC may support one or more PCI Express Ports – Root Ports.
PCIe Device
Function2
Function1
• PCIe devices may have up to 8 logical functions and each endpoint is
assigned a device identifier (ID), which consists of:
a bus number + device number + function number.
• The link and PCIe functionality shared by all functions is managed through
Function 0
• All functions use a single Bus Number captured through the PCI enumeration process
Configuration Space
• Devices will allocate resources such as
memory and record the address into
this configuration space
Enumeration
● The process by which configuration software discovers the system topology and
assigns bus numbers and system resources.
● RC/Host sends Configuration Packets to assign unique Bus, Device and Function
numbers to the End Points connected to the bus
● On x86 PCI-e hierarchy enumeration done by BIOS on hardware initialization state –
all registers are configured before bootloader starts
● System software can reassign enumeration according to enumeration rules.
IO Hub
The Intel Quick Path Interconnect (QPI) is a point-to-point processor interconnect developed by Intel which replaced
the front-side bus (FSB) in Xeon, Itanium, and certain desktop platforms. It increased the scalability and bandwidth
available. Prior to the name's announcement, Intel referred to it as Common System Interface (CSI).
The core contains
the components of
the processor
involved in executing
instructions,
including
the ALU, FPU, L1
and L2 cache.
Uncore functions
include QPI controllers,
L3cache, snoop
agent pipeline, on-
die memory controller,
and Thunderbolt
controller.
"Uncore" is a term used by Intel to describe the functions of a microprocessor that are not in the core, but which must
be closely connected to the core to achieve high performance. It has been called "system agent" since the release of
the Sandy Bridge microarchitecture.
PCIe Architecture Layers
PCIe system may be structured into five logical layers:
• The configuration/OS layer manages the configuration of PCIe devices by the OS
based on the Plug-and-Play specifications for initializing, enumerating, and
configuring I/O devices.
• The software layer interacts with the
OS through the same drivers as the
conventional PCI bus.
• The transaction layer manages the
transmission and reception of
information using a packet-based
protocol.
• The data link layer ensures the integrity
of data transfers via error detection
using a Cyclic Redundancy Check(CRC).
• The physical layer performs packet
transmission over the PCIe serial links.
• PCIe specification defines the architecture of
PCIe devices in terms of three logical layers
• The PCIe bus uses packets for transferring information between pairs of devices
connected via a PCIe connection
• Packets are formed in the transaction layer based on information obtained from
the device core and application and stored in a buffer
• The data link layer extends the packet with additional information required for
error detection at a receiver device
• The packet is then encoded in the physical layer and transmitted through
differential signals over the PCIe link
24
PCI Express transaction layer packet (TLP) types
25
Methods for Data Routing
• Each request or completion header is tagged as to its type and each packet type are
routed based on one of three schemes.
26
CPU MRd targeting an Endpoint
CPU MWr targeting Endpoint
Endpoint MRd targeting system memory
Programmed I/O Transaction
30
Bus Mastering (DMA)
• Until PCIe there was something intrusive in telling the CPU to withdraw from the
bus during DMA
• On PCIe, it is much easier for any device to send read/write TLPs to the bus, just
like Root Complex. This allows the device to directly access the processor memory
(DMA) or exchange packets with other peripherals on a peer-to-peer basis (as long
as switching entities accept this).
There are two things that need to happen first, as with any PCIe device:
1. The device must receive bus control by setting the "Bus Master Enable" bit in one of the
standard configuration registers.
2. The software driver must inform the device about the physical address of the relevant buffer,
most likely by writing in a mapped Base Address Register (configuration space).
DMA Transaction
32
Peer-to-Peer Transaction
33
PCI Express Device Layers
34
Interrupt Model: Three Methods
37
PCI-Express 1x Connector Pin-Out
39
PCI Express Error Handling
• All PCI Express devices are required to support some combination of:
# Existing software written for generic PCI error handling, and which takes
advantage of the fact that PCI Express has mapped many of its error conditions
to existing PCI error handling mechanisms.
# Additional PCI Express-specific reporting mechanisms
• Errors are classified as correctable and uncorrectable.
• Uncorrectable errors are further divided into:
# Fatal uncorrectable errors
# Non-fatal uncorrectable errors.
40
Correctable Errors
41
Uncorrectable Errors
• Errors classified as uncorrectable impair the functionality of the interface
and there is no specification mechanism to correct these errors
• The two subgroups are fatal and non-fatal
1. Fatal Uncorrectable Errors: Errors which render the link unreliable
– First-level strategy for recovery may involve a link reset by the system
– Handling of fatal errors is platform-specific
2. Non-Fatal Uncorrectable Errors: Uncorrectable errors associated with a
particular transaction, while the link itself is reliable
– Software may limit recovery strategy to the device(s) involved
– Transactions between other devices are not affected
https://www.intel.com/content/www/us/en/docs/programmable/683824/quartus-prime-pro-v17-1-stratix-10-es-
editions/understanding-pci-express-throughput.html
42
PCI Express Throughput
The throughput in a PCI Express system depends on the following factors:
•Protocol overhead
•Payload size
•Completion latency
•Flow control update latency
•Devices forming the link
Protocol Overhead
Protocol overhead includes the following three components:
•128b/130b Encoding and Decoding (overhead is very small at 1.56%)—Gen3 links use
128b/130b encoding. This encoding adds two synchronization (sync) bits to each 128-bit data
transfer. Consequently, the encoding and decoding. The effective data rate of a Gen3 x8 link is
about 8 GBps.
•Data Link Layer Packets (DLLPs) and Physical Layer Packets (PLPs) The PLPs consist of SKP
ordered sets which are 16-24 bytes. The DLLPs are 2Dw. The DLLPs implement flow control and
the ACK/NAK protocol.
•TLP Packet Overhead—The overhead associated with a single TLP ranges from 5-7 Dw if the
optional ECRC is not included. The overhead includes the following fields:
• The Start and End Framing Symbols
• The Sequence ID
• A 3- or 4-dword TLP header
• The Link Cyclic Redundancy Check (LCRC)
• 0-1024 Dw of data payload
https://www.intel.com/content/www/us/en/docs/programmable/683824/quartus-prime-pro-v17-1-stratix-10-es-editions/understanding-pci-express-throughput.html
Throughput for Posted Writes
The theoretical maximum throughput calculation uses the following formula:
• The graph shows the maximum throughput with different TLP header and payload sizes.
The DLLPs and PLPs are excluded from this calculation. For a 256-byte maximum payload size
and a 3-dword header the overhead is 5-dwords . Because the interface is 256 bits, the 5-dword
header requires a single bus cycle. The 256-byte payload requires 8 bus cycles.
Maximum Throughput for Memory Writes
* High Performance – relates specifically to bandwidth, which is more than double that of PCI in an x1
link and grows linearly as more lanes are added. An additional benefit that is not immediately evident is
that this bandwidth is simultaneously available in both directions on each link. In addition, the initial
signaling speed of 2.5 Gb/s is expected to increase, yielding further speed improvements.
* I/O Simplification – relates to the streamlining of the plethora of both chip-to-chip and internal user
accessible buses, such as AGP, PCI-X, and HubLink. This feature reduces the complexity of design and
cost of implementation.
* Layered Architecture – PCI Express establishes an architecture that can adapt to new technologies,
while preserving software investment. Two key areas that benefit from the layered architectures are
the physical layer, with increased signaling rates, and software compatibility.
* Next-Generation I/O – PCI Express provides new capabilities for data acquisition and multimedia
through isochronous data transfers. Isochronous transfers provide a type of quality of services (QOS)
guarantee that ensures on-time data delivery through deterministic, time-dependent methods.
* Ease of Use – PCI Express will greatly simplify how users add and upgrade systems. PCI Express offers
both hot-swap and hot-plug. Because the hot-plug feature relies on specific OS features, it may lag the
hardware launch. In addition, the variety of formats for PCI Express devices, especially SIOM and
Express-Card, greatly increases the ability to add high-performance peripherals in servers and
notebooks.
45
Evolution of PCIe
46
48
Summary
49
References
https://indico.cern.ch/event/121654/attachments/68430/98164/Practical_introduction_to_PCI_Express_with_FPGAs
_-_Extended.pdf
Budruk, R., Anderson, D., Shanley, T., PCI Express System Architecture, MindShare Inc., Addison-Wesley Developer’s
Press, 2008, https://www.mindshare.com/files/ ebooks/PCI%20Express%20System%20Architecture.pdf
Ajanovic, J., “PCI Express (PCIe) 3.0 Accelerator Features”, Intel Corporation, 2008,
http://www.intel.com/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf.
PCI-SIG, “PCI Express Base Specification Revision 3.0”, November 10, 2010.
https://webcourse.cs.technion.ac.il/236376/Spring2017/ho/WCFiles/chipset_microarch.pdf
References
https://indico.cern.ch/event/121654/attachments/68430/98164/Practical_introduction_to_PCI_Express_with_FPGAs
_-_Extended.pdf
Budruk, R., Anderson, D., Shanley, T., PCI Express System Architecture, MindShare Inc., Addison-Wesley Developer’s
Press, 2008, https://www.mindshare.com/files/ ebooks/PCI%20Express%20System%20Architecture.pdf
Ajanovic, J., “PCI Express (PCIe) 3.0 Accelerator Features”, Intel Corporation, 2008,
http://www.intel.com/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf.
PCI-SIG, “PCI Express Base Specification Revision 3.0”, November 10, 2010.
https://webcourse.cs.technion.ac.il/236376/Spring2017/ho/WCFiles/chipset_microarch.pdf
http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-1
http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-2
http://hardwareverification.weebly.com/pci---express-introduction.html
52