Design and Verification For PCI Express Controller: Braham@yumail - Ac.kr and Kssung@yu - Ac.kr
Design and Verification For PCI Express Controller: Braham@yumail - Ac.kr and Kssung@yu - Ac.kr
Design and Verification For PCI Express Controller: Braham@yumail - Ac.kr and Kssung@yu - Ac.kr
Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE
verification environment is excellent to find error
which it not detected by general test vector. Device core Device core
CPU CPU
Transaction Transaction
Physcial Physcial
PCI Express Host
Graphics USB 2.0
Bridge Logical block Logical block
Switch
Sequence
Figure 1. The PCI Express topology Framing
Number
Header Data ECRC LCRC Framing
Transaction Layer
2. PCI Express Architecture[3][4]
Data Link Layer
Physical Layer
The PCI Express specifies the architecture in terms
of three discrete logical layers: the Transaction Layer, Figure 3. Packet flow through the layers
the Data Link Layer, and the Physical Layer. Each of
these layers is divided into a transmitting component The port at each end of every PCI Express link must
and a receiving component as shown in figure 2. implement Flow Control. Before a TLP can be sent
PCI Express uses packets to communicate across a link to the receiving component, the
information between a transmitting component and a transmitter must verify that the receiver has sufficient
receiving component. Packets are formed in the buffer space to accept the TLP to be sent. In other
Transaction and Data Link Layer to carry the architectures including PCI and PCI-X, transactions
information from the transmitting component to the are delivered to a target device without knowing if it
receiving component. can accept the transaction. If the transaction is rejected
As the transmitted packets flow through the other due to insufficient buffer space, the transaction is
layers, they are extended with additional information resent until the transaction completes. This procedure
necessary to handle packets at those layers as shown in can severely reduce the efficiency of a bus, by wasting
figure 3. At the transmitting side packet contents are bus band-width when other transactions are ready to be
formed in Transaction Layer with information sent. In other word, Flow Control guarantees that
obtained from the device core and application. This transmitters will never send TLPs that the receiver
packet is referred to as Transaction Layer Packet(TLP). cannot accept. This prevents receive buffer over-runs
TLP consists of header and data. Some TLPs do not and eliminates the need for inefficient disconnects,
contain a data section. An optional End–to-End retries, and wait-states on the link.
CRC(ECRC), which is to support end-to-end data The Flow Control mechanism uses a credit-based
integrity, is calculated and appended to the TLP. The mechanism that allows the transmitter to check buffers
TLP is forwarded to the Data Link Layer which then to the port at the opposite end of the link. That is, the
appends Link-to-Link Error(LCRC) and TLP sequence receiver must contain the Flow Control buffer. During
number. The LCRC is used by the neighboring initialization, each receiver reports the size of its
receiver device at the other end of the link to check for receive buffers to the port at the opposite end of the
CRC errors. The sequence number is used to detect link. The receiving port continues to report the amount
cases where one or more entire TLPs have been lost. of the buffer space regularly by transmitting the
The Physical Layer converts the packet received from number of credits that have been freed up. This is
the Data Link Layer into an appropriate serialized accomplished via Flow Control DLLPs. The DLLP is a
format and transmits it across the PCI Express link to packet generated in the Data Link Layer to support
the receiver of the other side. At the receiving side the link management.
reverse process occurs. Each Flow Control buffer at the receiver is managed
for each types of transaction; Posted Request, Non-
Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE
Posted Request, and Completion. The Posted Request associated TLPs from replay buffer. But, if the
is Memory Write transaction, which is used to transfer transmitter receives a NACK DLLP or it’s Replay
data to a resource mapped in the memory address timer is expired, it must replay the entire contents of
space, and Message transaction, which is used to the buffer.
support in-ban communications of events between PCI
Express devices. The Non-Posted Request is all Read
transaction and I/O transaction, which is used to
From
support supports I/O space for compatibility with Device Core Transmitting
Buffer To
legacy devices, or Configuration transaction, which is & Application
Physical
used to access configuration registers of PCI Express Replay
Layer
Before a transmitter sends TLPs delivered by the We design PCI Express controller, called as APCE,
Transaction Layer, the Data Link Layer appends a which satisfies PCI Express specification. The figure 5
Sequence Numbers to each TLP. It increments after shows the top block of APCE, which consists of eight
each new TLP is transmitted. The transmitter also major functional blocks; Transmitter Transaction
appends a LCRC calculated using the TLP contents Layer(TxTL), Transmitter Buffer(TxBf), Transmitter
including the header, data, ECRC and sequence Data Link Layer(TxDL), Receiver Transaction
number. Layer(RxTL), Receiver Buffer(RxBf), Receiver
When a receiver accepts TLPs from the transmitter, Transaction Layer(RxDL), Micro Processor(uP), and
the receiver must check for CRC errors. The receiver Configuration Register. And it has three external
also must compare the sequence number of received interfaces; master and slave local interface, and PCI
TLP with the next expected TLP’s sequence number to Express interface.
be calculated by the receiver itself. If these two
numbers should mach and the received TLP also has Local Interface
Master Slave
no LCRC error, it is normal operational condition. In
this case, the receiver should return acknowledgement Config.
to indicate successful receipt of TLPs to transmitter. Data Register Data
Handshaking Handshaking
The acknowledgement is referred to as ACK DLLP, Data
which contains successfully received TLP’s the TxTL
Control
RxTL
sequence number. On other hand, if the receiver Control Control
Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE
from Message Request issued by uP. The generated should not discard TLP0, TLP1, and TLP2 until
TLPs are stored into buffer spaces of TxBf indicated by accepting an associated ACK DLLP. If replay event
corresponding address. If the transferred TLP is occurs, Data Link Layer must re-transfer TLPs pointed
Memory Read Request, it requires corresponding by RSP, 0 to TSP-1, 3. TxBf employs this buffer
Completion. If APCE receives Completion Request architecture. While TxTL generates WSP, TxDL
issued by transmitter at the opposite end of link but it calculates TSP and RSP. And Replay event is issued
does not correspond to any of the outstanding requests, by uP unit.
it must be handled as error. So, TxTL must maintain In receiving side, RxDL processes TLPs received by
information of Memory Read Request in outstanding the Physical Layer as checking the LCRC and the
queue until it is completed. sequence number, and then reports the result to uP. If
TxDL prepares each TLP in TxBf for transmission by received TLP has error, uP will require TxDL to send
applying a sequence number and appending a LCRC NACK DLLP. On the other hand, if received TLP is
calculated. And then, TxDL sends these TLPs to OK, uP will require sending ACK DLLP after passing
Physical Layer after checking buffers to the port at the the TLP to the RxBf. After a while, TxDL will send
opposite end of the link for Flow Control. And TxDL corresponding ACK or NACK DLLP to the original
re-transmits unacknowledged TLPs in TxBf when requester cross the link. RxDL also accepts all DLLPs
Replay is issued by uP. Above mentioned, the Data and check error. RxDL also treats the received DLLP
Link layer of a transmitter must have a replay buffer to and it is reported to uP.
support Replay mechanism. So, the conceptual buffer RxTL checks TLP in RxBf whether the TLP is
architecture is required like above figure 4. Although supported by APCE. And then, RxTL converts received
this scheme is very simple, the efficiency of the whole TLPs into requests for local slave device. The
buffers may be reduced because two buffers are handshaking protocols are used in communication
separated. So, we proposed an efficient buffer between RxTL and local slave. If the TLP is a type that
management scheme in the previous works[6] as requires a Completion to be returned, for example,
shown in figure 6. Memory Read Request, RxTL must fetch data from
local slave and store information and data for
Transaction Layer corresponding Completion into completion buffer.
TLP Completion is sent to original requester through
transmitter of APCE. And, if the requested TLP is a
WSP Message Request, RxTL reports it to uP to manage
\
[ TLP4 Control
error reports.
Z TLP3
TSP
Unit We employ 80C51 to effectively manage functions
Y TLP2 mentioned above. We will implements MicroC/OS II,
X TLP1 RSP
which is one of Real-Time OS[7], on 80C51, and code
W TLP0
FIFO
software under real time environment. The coded
TLP
Physical Layer
software covers the following services:
x checking Message Request received by RxTL and
Figure 6. Diagram of the proposed transmitter buffer reporting it to configuration register.
x generating Message Request and issuing it to TxTL.
The proposed scheme merges two buffers into only x processing ACK and NACK DLLP received by RxDL.
one buffer and it can dynamically adjust size of the x issuing ACK DLLP or NACK DLLP transfer to TxDL.
replay buffer space. Transaction Layer can transfer x supporting Replay mechanism as using replay timer.
TLPs into FIFO space pointed by Write Start x management for exchanging Flow Control.
Pointer(WSP) and Data Link Layer can transfer the
TLPs pointed by Transmit Start Pointer(TSP) and 4. Verification Environment
must re-transmit TLPs indicated by Replay Start
Pointer(RSP) to TSP-1. First, we actually design and code each functional
For example, we assume that five TLPs are logic blocks of ACPE by Verilog HDL. For the
transferred into FIFO. In this case, Transaction Layer verification of the designed APCE, we also organize
will write next TLPs into area pointed by WSP, 5. We test bench for top-level logic simulation environment
also assume that TLP0, TLP1, and TLP2 are already using C language as shown in figure 7(a). It contains
transferred but not acknowledged yet. In this case, PCI Express Monitor and three functional models; a
Data Link Layer will start to send new TLPs in spaces Host Bridge, Local Master, and Local Slave model.
indicated by TSP, 3 to WSP-1, 4. And Data Link Layer
Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE
A PCI Express Monitor may check the protocol of ˍ Gthe probabilities that Host Bridge will transfer TLP
the PCI Express and produces a log file, which with CRC error to APCE.
contains an error message generated when the basic
PCI Express protocol is violated. The Host Bridge Testbench
Command File
supports all basic functionality of the Transaction
Layer and the Data Link Layer; for example, Local
Master
generating TLPs, receiving TLPs, detecting errors, Functional
Local Model
exchanging Flow Control, and TLP acknowledgement. Master
Host Rx Tx Interface
As shown figure 7(b), three models, PCI Express Bridge
APCE
Local
Functional Memory
Monitor, and APCE are connected on the PLI Model Tx Rx
First commands are used for Host Bridge transmitter PLI PLI PLI PLI PLI
verilog simulator
to create data transfer on the PCI Express and for (b)
Local Master to do on the local master interface. Some Figure 7. Top-level logic simulation environment
examples are following;
Last, we propose the efficient verification
ˍ GHB_MWrite 0x100, 0x500, 4; environment to verify the APCE under random testing.
ˍ GHB_MRead 0x300, 0x700, 4; This verification environment consists of Reference
ˍ GLM_MWrite 0x200, 0x600, 8; Model of APCE, Random Generator, and Compare
ˍ GLM_MRead 0x400, 0x800, 8; Engine as shown in figure 8.
In first example, ‘HB_MWrite’ is used for Host First, Random Generator determines following;
Bridge to perform data transfer using Memory Write • Request type - Memory Write, Memory Read, or so on,
command. That is, the Host Bridge will read 4 bytes • originator for this Request - Host Bridge or Local
data from the main memory indicated by 500h address Master,
and transfer these packet with destination address 100h • Staring address of source memory contained data to
and data to Local Slave through APCE. And then, the be transferred and destination memory,
Local Slave shall write these data into the local • data size to be transferred, and
memory indicated by 100h address. Second line is • the number of the whole requestsUG
Memory Read command performed by Host Bridge. Second, Random Generator set the parameters
Third and last command is applied to Local Master. mentioned above into random values. Last, it builds
Next, we define parameters to fully cover all into test bench command file using introduced
scenarios that it will occur in actual operation on the instructions. That is, Random Generator automatically
both interfaces. Some examples of parameters are and randomly generates all scenarios for data transfer
following; from/to main memory to/from local memory.
ˍ G all information needed for Host Bridge or Local This command file is used for test bench of figure
Master to form packets to be transferred to APCE, 7(a) and the Reference Model of APCE. Reference
ˍ Gthe maximum or minimum number of waiting phase Model supports full functionality of designed APCE
on the local interface, and executes all data transfer at once without a
ˍ G the probabilities that Host Bridge will send NACK common clock. Then, Compare Engine compares the
DLLP to APCE, data of main memory of APCE and those of Reference
ˍ G the probabilities that Host Bridge will convey Model and the data of local memory of APCE and
unsuccessful Completions for Memory Read those of Reference Model respectively. If it find
command requested by APCE, and
Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE
mismatch, it produces a log file to indicate this error to Express. We propose an efficient buffer management
user. scheme to obey Replay mechanism. We employ 80C51
This verification environment is excellent to find to effectively manage designed functional blocks and
error which it not detected by general test vector and implement Real-Time OS, MicroC/OS II on 80C51.
will become effective to verify compliance and corner We also code software under real time environment.
case testing[9]. This coded software fully covers the PCI Express
protocols; supporting the replay mechanism, checking
R andom and generating error message, processing TLP
G e n e ra to r
acknowledgment, and management for exchanging
Flow Control. For verification, we build a test bench
T e s tb e n c h
C om m and including functional models of Host Bridge, Local
F ile
Master, and Local Slave. We also define the
instructions to easily generate situations that it will
R e fe re n c e m o d e l
Test bench
o f fig u re 7 (a )
occur in actual operation. We propose an effective
verification to compliance and corner case testing as
M a in Local
m e m o ry M e m o ry
M a in
m e m o ry
Local
M e m o ry
using Reference Model, Random Generator, and
Compare Engine. This verification environment is
R andom
excellent to find error which it not detected by general
G e n e ra to r test vector.
Figure 8. Verification environment for random testing
REFERENCES
[1] Intel whitepaper, “Advanced Switching for the PCI Express
5. Simulation Result Architecture", www.intel.com, 2002
[2] Intel whitepaper, “Creating a PCI Express Interconnect",
The figure 9 shows the simulation result of Memory www.intel.com, 2002
Write transaction [3] http:// www.pcisig.com
(a) Exchanging for FC between APCE and Host Bridge. [4] PCI SIG, PCI Express Base Specifications Revision 1.0a,
(b) Host Bridge issues Configuration Write to APCE. PCI SIG, 2003.
(c) APCE returns corresponding Completion. [5] Ravi Budruk, Don Anderson, and Tom Shanley, PCI
(d) Local Master generates Memory Write Transaction Express System Architecture, MindShare, 200
on local interface. [6] Eugin Hyun, Kwang-Su Seong, “The effective buffer
(e) APCE generates and send TLP to Host Bridge. architecture for data link layer of PCI express”, ITCC2004,
(f) Host Bridge sends ACK DLLP to APCE after a while. Vol. 1, p.p 809-813, April 2004.
[7] Jean J. Labrosse, “MicroC/OS-II : The Real Time Kernel”,
6. Conclusions CMP books, 2001.
[8] Cadence, Verilog-XL Reference version 3.4, Cadence, 2002.
In this paper, we design a PCI Express controller [9] Michael Keating and Pierre Bricaud, Reuse
for Endpoint. The controller supports full functionality Methodology manual for Soc designs, Kluwer
of Transaction Layer and Data Link Layer of PCI Academic Publishers, 1999
Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE