Design and Verification For PCI Express Controller: Braham@yumail - Ac.kr and Kssung@yu - Ac.kr

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Design and Verification for PCI Express Controller

Eugin Hyun and Kwang-Su Seong


VLSI Lab., Dept. of Electronic Engineering, University of Yeungnam, KOREA.
braham@yumail.ac.kr and kssung@yu.ac.kr

Abstract I/O subsystem. Streaming data from various video and


audio sources are now commonplace on the desktop
In this paper, we design a PCI Express controller and mobile machines and there is no baseline support
for Endpoint. The controller supports full functionality for this time dependent data within the PCI 2.2 or PCI-
of Transaction Layer and Data Link Layer of PCI X specification. All approaches to push these limits to
Express. We also propose an efficient buffer create a higher bandwidth result in a large cost
management scheme to obey replay mechanism. We increase for little performance gain[1][2][3].
employ 80C51 to effectively manage designed PCI SIG announced the PCI Express as the third
functional blocks and implement Real-Time OS, generation I/O system, which adapts recent high-speed,
MicroC/OS II on 80C51. We also code software under low pin count, and point-to-point technologies for
real time environment. This coded software fully major bandwidth improvements. Currently, the PCI
covers the PCI Express protocols; supporting the Express transmission and reception data rate are
replay mechanism, checking and generating error 2.5Gbit/sec/direction. The architects have carried
message, processing TLP acknowledgment, and forward the most beneficial features for the previous
management for exchanging Flow Control. For PCI bus and have taken advantage of new
verification, we build a test bench including functional developments in computer architecture. Thus, existing
models of Host Bridge, Local Master, Local Slave, and operating systems and device drivers will be able to
Protocol Monitor. We also define the instructions to boot and execute without modification[1][2][3][4].
easily generate situations that it will occur in actual A PCI Express topology contains a Root Complex,
operation. We propose an effective verification to Switch, and several Endpoints as shown in figure 1. A
compliance and corner case testing using Reference Root Complex is the root of an I/O hierarchy that
Model, Random Generator, and Compare Engine. This connects the CPU/memory subsystem to the I/O. A
verification environment is excellent to find error Switch provides fan-out capability and enables a series
which it not detected by general test vector. of connectors for add-in high-performance I/O. An
Endpoint is an I/O device connected to the PCI
1. Introduction Express, for example, a PCI Express attached graphics
controller[1][2][3][4].
The PCI bus has been widely for the last 10 years In this paper, we design a PCI Express controller
and it will be used in the next few years. However, for Endpoint. The controller supports full functionality
today’s and tomorrow’s processors and I/O devices are of Transaction Layer and Data Link Layer of PCI
demanding much higher I/O bandwidth than the PCI Express. We employ 80C51 to effectively manage
2.2 or PCI-X can deliver. The processor system bus, designed functional blocks and implement Real-Time
which connects microprocessor and memory, OS, MicroC/OS II on 80C51. We also code software
continues to scale in both frequency and voltage at a under real time environment. This coded software fully
rate that will continue for the foreseeable. But, the PCI covers the PCI Express protocols. For verification, we
2.2 or PCI-X bus with parallel bus implementation build a test bench including functional models of Host
cannot be easily scaled up in frequency or down in Bridge, Local Master, Local Slave, and Protocol
voltage because its synchronously clocked data Monitor. We also define the instructions to easily
transfer is signal skew limited and the signal routing generate situations that it will occur in actual operation.
rules are at the limit for cost effective technology[1]. We propose an effective verification to compliance
In addition, today’s software applications are more and corner case testing as using Reference Model,
demanding of the platform hardware, particularly the Random Generator, and Compare Engine. This

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE
verification environment is excellent to find error
which it not detected by general test vector. Device core Device core

CPU CPU
Transaction Transaction

Data Link Data Link


Gb PCI Express
Memory
Ethernet

Physcial Physcial
PCI Express Host
Graphics USB 2.0
Bridge Logical block Logical block

PCI Express ATA Serial


Bridge HDD
Electrical block Electrical block
PCI Express
y { y {

Switch

Figure 2. High-level layering diagram


Endpoint Endpoint Endpoint Endpoint

Sequence
Figure 1. The PCI Express topology Framing
Number
Header Data ECRC LCRC Framing

Transaction Layer
2. PCI Express Architecture[3][4]
Data Link Layer

Physical Layer
The PCI Express specifies the architecture in terms
of three discrete logical layers: the Transaction Layer, Figure 3. Packet flow through the layers
the Data Link Layer, and the Physical Layer. Each of
these layers is divided into a transmitting component The port at each end of every PCI Express link must
and a receiving component as shown in figure 2. implement Flow Control. Before a TLP can be sent
PCI Express uses packets to communicate across a link to the receiving component, the
information between a transmitting component and a transmitter must verify that the receiver has sufficient
receiving component. Packets are formed in the buffer space to accept the TLP to be sent. In other
Transaction and Data Link Layer to carry the architectures including PCI and PCI-X, transactions
information from the transmitting component to the are delivered to a target device without knowing if it
receiving component. can accept the transaction. If the transaction is rejected
As the transmitted packets flow through the other due to insufficient buffer space, the transaction is
layers, they are extended with additional information resent until the transaction completes. This procedure
necessary to handle packets at those layers as shown in can severely reduce the efficiency of a bus, by wasting
figure 3. At the transmitting side packet contents are bus band-width when other transactions are ready to be
formed in Transaction Layer with information sent. In other word, Flow Control guarantees that
obtained from the device core and application. This transmitters will never send TLPs that the receiver
packet is referred to as Transaction Layer Packet(TLP). cannot accept. This prevents receive buffer over-runs
TLP consists of header and data. Some TLPs do not and eliminates the need for inefficient disconnects,
contain a data section. An optional End–to-End retries, and wait-states on the link.
CRC(ECRC), which is to support end-to-end data The Flow Control mechanism uses a credit-based
integrity, is calculated and appended to the TLP. The mechanism that allows the transmitter to check buffers
TLP is forwarded to the Data Link Layer which then to the port at the opposite end of the link. That is, the
appends Link-to-Link Error(LCRC) and TLP sequence receiver must contain the Flow Control buffer. During
number. The LCRC is used by the neighboring initialization, each receiver reports the size of its
receiver device at the other end of the link to check for receive buffers to the port at the opposite end of the
CRC errors. The sequence number is used to detect link. The receiving port continues to report the amount
cases where one or more entire TLPs have been lost. of the buffer space regularly by transmitting the
The Physical Layer converts the packet received from number of credits that have been freed up. This is
the Data Link Layer into an appropriate serialized accomplished via Flow Control DLLPs. The DLLP is a
format and transmits it across the PCI Express link to packet generated in the Data Link Layer to support
the receiver of the other side. At the receiving side the link management.
reverse process occurs. Each Flow Control buffer at the receiver is managed
for each types of transaction; Posted Request, Non-

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE
Posted Request, and Completion. The Posted Request associated TLPs from replay buffer. But, if the
is Memory Write transaction, which is used to transfer transmitter receives a NACK DLLP or it’s Replay
data to a resource mapped in the memory address timer is expired, it must replay the entire contents of
space, and Message transaction, which is used to the buffer.
support in-ban communications of events between PCI
Express devices. The Non-Posted Request is all Read
transaction and I/O transaction, which is used to
From
support supports I/O space for compatibility with Device Core Transmitting
Buffer To
legacy devices, or Configuration transaction, which is & Application
Physical
used to access configuration registers of PCI Express Replay
Layer

devices. The Completions are used only where Buffer


Transaction Layer Data Link Layer
required. That is, the Posted Request needs not any
Completion but the Non-Posted Request needs Figure 4. The conceptual transmitter buffer to support
Completion to return read data, or to acknowledge Replay mechanism
Completion of I/O and Configuration Write
transactions. 3. DESIGNED CONTROLLER

Before a transmitter sends TLPs delivered by the We design PCI Express controller, called as APCE,
Transaction Layer, the Data Link Layer appends a which satisfies PCI Express specification. The figure 5
Sequence Numbers to each TLP. It increments after shows the top block of APCE, which consists of eight
each new TLP is transmitted. The transmitter also major functional blocks; Transmitter Transaction
appends a LCRC calculated using the TLP contents Layer(TxTL), Transmitter Buffer(TxBf), Transmitter
including the header, data, ECRC and sequence Data Link Layer(TxDL), Receiver Transaction
number. Layer(RxTL), Receiver Buffer(RxBf), Receiver
When a receiver accepts TLPs from the transmitter, Transaction Layer(RxDL), Micro Processor(uP), and
the receiver must check for CRC errors. The receiver Configuration Register. And it has three external
also must compare the sequence number of received interfaces; master and slave local interface, and PCI
TLP with the next expected TLP’s sequence number to Express interface.
be calculated by the receiver itself. If these two
numbers should mach and the received TLP also has Local Interface
Master Slave
no LCRC error, it is normal operational condition. In
this case, the receiver should return acknowledgement Config.
to indicate successful receipt of TLPs to transmitter. Data Register Data

Handshaking Handshaking
The acknowledgement is referred to as ACK DLLP, Data
which contains successfully received TLP’s the TxTL
Control
RxTL
sequence number. On other hand, if the receiver Control Control

detects an LCRC error or a sequence number related Address TLP Address


TLP

error, it should send a NACK(negative ACK) DLLP.


If the transmitter receives a NACK DLLP, it knows TxBf uP RxBf
that a TLP transmitted at an earlier time had an error
when it reached the receiver. The transmitter must re- Address
TLP
Address TLP
transmit entire TLPs not to be acknowledged. This is
referred to as Replay. TxDL RxDL
Control Control
The transmitter implements a Replay timer, which is TLP TLP
used to measure the duration from when a TLP is Tx Rx
transmitted until the transmitter receives an associated PCI Express Physical Layer Interface
ACK or NACK DLLP from the receiver. If the timer is Figure 5. Top block of APCE
expired, the Replay event is occurred.
To support Replay mechanism, the Data Link Layer In transmitting side, TxTL generates TLPs to convey
of a transmitter must have extra buffer. Copies of Memory Write or Read Request issued by local master
transmitted TLPs in transmitter buffer must be stored and Completion requested by RxTL. The handshaking
in the replay buffer as shown in figure 4. And when protocols are used in communication between TxTL
the transmitter receives ACK DLLP that TLPs have and local master or RxTL. TxTL also generates TLPs
reached the receiver successfully, it clears the

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE
from Message Request issued by uP. The generated should not discard TLP0, TLP1, and TLP2 until
TLPs are stored into buffer spaces of TxBf indicated by accepting an associated ACK DLLP. If replay event
corresponding address. If the transferred TLP is occurs, Data Link Layer must re-transfer TLPs pointed
Memory Read Request, it requires corresponding by RSP, 0 to TSP-1, 3. TxBf employs this buffer
Completion. If APCE receives Completion Request architecture. While TxTL generates WSP, TxDL
issued by transmitter at the opposite end of link but it calculates TSP and RSP. And Replay event is issued
does not correspond to any of the outstanding requests, by uP unit.
it must be handled as error. So, TxTL must maintain In receiving side, RxDL processes TLPs received by
information of Memory Read Request in outstanding the Physical Layer as checking the LCRC and the
queue until it is completed. sequence number, and then reports the result to uP. If
TxDL prepares each TLP in TxBf for transmission by received TLP has error, uP will require TxDL to send
applying a sequence number and appending a LCRC NACK DLLP. On the other hand, if received TLP is
calculated. And then, TxDL sends these TLPs to OK, uP will require sending ACK DLLP after passing
Physical Layer after checking buffers to the port at the the TLP to the RxBf. After a while, TxDL will send
opposite end of the link for Flow Control. And TxDL corresponding ACK or NACK DLLP to the original
re-transmits unacknowledged TLPs in TxBf when requester cross the link. RxDL also accepts all DLLPs
Replay is issued by uP. Above mentioned, the Data and check error. RxDL also treats the received DLLP
Link layer of a transmitter must have a replay buffer to and it is reported to uP.
support Replay mechanism. So, the conceptual buffer RxTL checks TLP in RxBf whether the TLP is
architecture is required like above figure 4. Although supported by APCE. And then, RxTL converts received
this scheme is very simple, the efficiency of the whole TLPs into requests for local slave device. The
buffers may be reduced because two buffers are handshaking protocols are used in communication
separated. So, we proposed an efficient buffer between RxTL and local slave. If the TLP is a type that
management scheme in the previous works[6] as requires a Completion to be returned, for example,
shown in figure 6. Memory Read Request, RxTL must fetch data from
local slave and store information and data for
Transaction Layer corresponding Completion into completion buffer.
TLP Completion is sent to original requester through
transmitter of APCE. And, if the requested TLP is a
WSP Message Request, RxTL reports it to uP to manage
\

[ TLP4 Control
error reports.
Z TLP3
TSP
Unit We employ 80C51 to effectively manage functions
Y TLP2 mentioned above. We will implements MicroC/OS II,
X TLP1 RSP
which is one of Real-Time OS[7], on 80C51, and code
W TLP0
FIFO
software under real time environment. The coded
TLP

Physical Layer
software covers the following services:
x checking Message Request received by RxTL and
Figure 6. Diagram of the proposed transmitter buffer reporting it to configuration register.
x generating Message Request and issuing it to TxTL.
The proposed scheme merges two buffers into only x processing ACK and NACK DLLP received by RxDL.
one buffer and it can dynamically adjust size of the x issuing ACK DLLP or NACK DLLP transfer to TxDL.
replay buffer space. Transaction Layer can transfer x supporting Replay mechanism as using replay timer.
TLPs into FIFO space pointed by Write Start x management for exchanging Flow Control.
Pointer(WSP) and Data Link Layer can transfer the
TLPs pointed by Transmit Start Pointer(TSP) and 4. Verification Environment
must re-transmit TLPs indicated by Replay Start
Pointer(RSP) to TSP-1. First, we actually design and code each functional
For example, we assume that five TLPs are logic blocks of ACPE by Verilog HDL. For the
transferred into FIFO. In this case, Transaction Layer verification of the designed APCE, we also organize
will write next TLPs into area pointed by WSP, 5. We test bench for top-level logic simulation environment
also assume that TLP0, TLP1, and TLP2 are already using C language as shown in figure 7(a). It contains
transferred but not acknowledged yet. In this case, PCI Express Monitor and three functional models; a
Data Link Layer will start to send new TLPs in spaces Host Bridge, Local Master, and Local Slave model.
indicated by TSP, 3 to WSP-1, 4. And Data Link Layer

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE
A PCI Express Monitor may check the protocol of ˍ Gthe probabilities that Host Bridge will transfer TLP
the PCI Express and produces a log file, which with CRC error to APCE.
contains an error message generated when the basic
PCI Express protocol is violated. The Host Bridge Testbench
Command File
supports all basic functionality of the Transaction
Layer and the Data Link Layer; for example, Local
Master
generating TLPs, receiving TLPs, detecting errors, Functional
Local Model
exchanging Flow Control, and TLP acknowledgement. Master
Host Rx Tx Interface
As shown figure 7(b), three models, PCI Express Bridge
APCE
Local
Functional Memory
Monitor, and APCE are connected on the PLI Model Tx Rx

(Programming language interface), which is the one of Local


Slave
verilog simulator to interface verilog programs with Local
Slave
Functional
Model
programs written in C language[9]. And these are Interface

operated under a common clock through PLI. Main


PCI CLK
Express
We also define the instructions to easily generate Memory
Monitor
RST#
Tx_DL_Data

situations that it will occur in actual operation on the Tx_DL_Valid


Rx_DL_Data
both interfaces. The instructions are described into the log file Rx_DL_Valid

test-bench command file generated by user’s hands or (b)


automatically by the random generator. These are
divided to two types. First is used to generate all
sequence of data transfer and next is needed to control Host Bridge
Functional APCE
Local Master
Functional
Local Slave
Functional
PCI Express
Monitor
parameters for data transfer. Model Model Model

First commands are used for Host Bridge transmitter PLI PLI PLI PLI PLI

verilog simulator
to create data transfer on the PCI Express and for (b)
Local Master to do on the local master interface. Some Figure 7. Top-level logic simulation environment
examples are following;
Last, we propose the efficient verification
ˍ GHB_MWrite 0x100, 0x500, 4; environment to verify the APCE under random testing.
ˍ GHB_MRead 0x300, 0x700, 4; This verification environment consists of Reference
ˍ GLM_MWrite 0x200, 0x600, 8; Model of APCE, Random Generator, and Compare
ˍ GLM_MRead 0x400, 0x800, 8; Engine as shown in figure 8.
In first example, ‘HB_MWrite’ is used for Host First, Random Generator determines following;
Bridge to perform data transfer using Memory Write • Request type - Memory Write, Memory Read, or so on,
command. That is, the Host Bridge will read 4 bytes • originator for this Request - Host Bridge or Local
data from the main memory indicated by 500h address Master,
and transfer these packet with destination address 100h • Staring address of source memory contained data to
and data to Local Slave through APCE. And then, the be transferred and destination memory,
Local Slave shall write these data into the local • data size to be transferred, and
memory indicated by 100h address. Second line is • the number of the whole requestsUG
Memory Read command performed by Host Bridge. Second, Random Generator set the parameters
Third and last command is applied to Local Master. mentioned above into random values. Last, it builds
Next, we define parameters to fully cover all into test bench command file using introduced
scenarios that it will occur in actual operation on the instructions. That is, Random Generator automatically
both interfaces. Some examples of parameters are and randomly generates all scenarios for data transfer
following; from/to main memory to/from local memory.
ˍ G all information needed for Host Bridge or Local This command file is used for test bench of figure
Master to form packets to be transferred to APCE, 7(a) and the Reference Model of APCE. Reference
ˍ Gthe maximum or minimum number of waiting phase Model supports full functionality of designed APCE
on the local interface, and executes all data transfer at once without a
ˍ G the probabilities that Host Bridge will send NACK common clock. Then, Compare Engine compares the
DLLP to APCE, data of main memory of APCE and those of Reference
ˍ G the probabilities that Host Bridge will convey Model and the data of local memory of APCE and
unsuccessful Completions for Memory Read those of Reference Model respectively. If it find
command requested by APCE, and

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE
mismatch, it produces a log file to indicate this error to Express. We propose an efficient buffer management
user. scheme to obey Replay mechanism. We employ 80C51
This verification environment is excellent to find to effectively manage designed functional blocks and
error which it not detected by general test vector and implement Real-Time OS, MicroC/OS II on 80C51.
will become effective to verify compliance and corner We also code software under real time environment.
case testing[9]. This coded software fully covers the PCI Express
protocols; supporting the replay mechanism, checking
R andom and generating error message, processing TLP
G e n e ra to r
acknowledgment, and management for exchanging
Flow Control. For verification, we build a test bench
T e s tb e n c h
C om m and including functional models of Host Bridge, Local
F ile
Master, and Local Slave. We also define the
instructions to easily generate situations that it will
R e fe re n c e m o d e l
Test bench
o f fig u re 7 (a )
occur in actual operation. We propose an effective
verification to compliance and corner case testing as
M a in Local
m e m o ry M e m o ry
M a in
m e m o ry
Local
M e m o ry
using Reference Model, Random Generator, and
Compare Engine. This verification environment is
R andom
excellent to find error which it not detected by general
G e n e ra to r test vector.
Figure 8. Verification environment for random testing
REFERENCES
[1] Intel whitepaper, “Advanced Switching for the PCI Express
5. Simulation Result Architecture", www.intel.com, 2002
[2] Intel whitepaper, “Creating a PCI Express Interconnect",
The figure 9 shows the simulation result of Memory www.intel.com, 2002
Write transaction [3] http:// www.pcisig.com
(a) Exchanging for FC between APCE and Host Bridge. [4] PCI SIG, PCI Express Base Specifications Revision 1.0a,
(b) Host Bridge issues Configuration Write to APCE. PCI SIG, 2003.
(c) APCE returns corresponding Completion. [5] Ravi Budruk, Don Anderson, and Tom Shanley, PCI
(d) Local Master generates Memory Write Transaction Express System Architecture, MindShare, 200
on local interface. [6] Eugin Hyun, Kwang-Su Seong, “The effective buffer
(e) APCE generates and send TLP to Host Bridge. architecture for data link layer of PCI express”, ITCC2004,
(f) Host Bridge sends ACK DLLP to APCE after a while. Vol. 1, p.p 809-813, April 2004.
[7] Jean J. Labrosse, “MicroC/OS-II : The Real Time Kernel”,
6. Conclusions CMP books, 2001.
[8] Cadence, Verilog-XL Reference version 3.4, Cadence, 2002.
In this paper, we design a PCI Express controller [9] Michael Keating and Pierre Bricaud, Reuse
for Endpoint. The controller supports full functionality Methodology manual for Soc designs, Kluwer
of Transaction Layer and Data Link Layer of PCI Academic Publishers, 1999

Figure 9. The simulation results

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05)
0-7695-2316-1/05 $20.00 © 2005 IEEE

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy