Hsio Chapter11 Book
Hsio Chapter11 Book
HIGH-SPEED IO DESIGN
Warren R. Anderson
Intel Corporation
Abstract: This chapter explores common methods and circuit architectures used to transmit
and receive data through off-chip links.
Key words: High-speed IO; off-chip links; serial data transfer; parallel data bus; FR-4; skin
effect; dielectric loss; clock phase alignment; derived clocking; source syn-
chronous; forwarded clocking; plesiochronous; mesochronous.
1. Introduction
In order for system performance to keep pace with the ever-increasing speed
of the microprocessor, the bandwidth of the signaling into and out of the micro-
processor must follow the trend in on-chip processing performance. However,
the physical limitations imposed by the interconnect channel, which exhibits
increasing amounts of signal loss and jitter amplication at higher frequen-
cies, impedes the increase in off-chip signaling speed. Numerous strategies to
cope with the interconnect properties have been developed, enabled by more
sophisticated techniques to control and process the off-chip electrical signals.
This chapter discusses the most prevalent of these techniques, focusing
on the chip-to-chip communication topologies common for microprocessors,
namely access to memory, processor-to-processor communication for parallel
computing, and processor-to-chipset communication. These links generally run
over short distances of up to one or two meters and consist of buses of parallel
data lanes carrying wide data words. Although serial communication shares
many of the same properties and techniques as the parallel bus designs, serial
289
290 W.R. Anderson
communication, which generally takes place over much longer distances, will
not be discussed explicitly here.
Our discussion begins with a comparison of the on-chip and off-chip data
transmission environment, with an emphasis on the desired properties of the
off-chip communication system. Several common signaling methods will be
shown. We then explore the properties of the off-chip signaling medium, par-
ticularly those that dominate at high frequencies and therefore limit off-chip
signaling speed. Techniques and example circuit topologies for adapting to
these effects in both the time and voltage domains, as well as their limitations,
are shown.
2. IO Signaling
Figure 2. Typical timing of transmitter output and receiver input signals in a dual data-rate
signaling IO system.
292 W.R. Anderson
Figure 3. Bi-directional link using single-ended voltage mode signaling. The resistor in series
with the output sets the VOH and VOL levels and also provides board impedance matching. The
pre-driver outputs predrive h and predrive-l are separated to tri-state the output driver when
receiving input data. Separation of the transmitter and receiver supplies, as shown here, is often
used to avoid simultaneous switching noise from the transmitter appearing on the supply of the
more sensitive input receiver.
High-speed IO design 293
voltage as it reacts to providing the dI/dt needed when switching the output
[1, 9]. For example, consider the case of Figure 4 when the output drives low.
In this condition no current ows in the transmission line. When the driver
pulls the line high by sourcing a constant current from the positive supply
rail into the output, this current creates a dI/dt in the circuit loop formed by the
positive supply, the output, the interconnect signals transmission line, its return
path in the underlying ground plane, and the negative supply. Any inductance
in this loop generates a transient voltage excursion when it experiences the
dI/dt. This generally occurs where the drivers output pad and the supply rails
interact with the package, particularly in bond wire designs. In our example,
the inductance in the package between the positive supply on the board and
on the die develops a voltage that temporarily decreases the on-die supply
rail. Likewise, the inductance in the ground supply develops a voltage that
temporarily raises the on-die ground voltage. Although the effect is also present
on the signal itself, it is generally worse for the supplies since the ratio of
signals to supply pairs is generally at least 2 to 1. Therefore, the effect is worst
when all output drivers on a bus switch in the same direction simultaneously,
creating the largest possible dI/dt. Although simultaneous switching can be
tolerated through proper construction of the supply network [1, 9], it cannot be
completely eliminated for single-ended signaling.
Differential signaling avoids this problem by transmitting the data and its
complement on parallel interconnect wires to the receiver, where a positive or
negative difference between the signal pair indicates a 1 or a 0, respec-
tively. Provided the true and complement transmitters are balanced to draw
296 W.R. Anderson
The green epoxy glass resin printed wiring board material FR-4 is the
workhorse of the electronics industry. A metal wiring trace over a power or
ground plane with FR-4 dielectric in between forms a transmission line. This
transmission line is far from ideal, however [2, 6, 7]. Several mechanisms create
loss at high frequencies. Impedance discontinuities produce reections. Adja-
cent signals and noisy supplies cause interference. Understanding these mech-
anisms through studying the properties of the interconnect is key to achieving
higher speeds.
signal, and the dielectric constant. Signals injected into the line propagate with
a velocity
1 c
v0 = = , (1)
l0 c0 r
where c is the speed of light and r is the relative permittivity of the dielectric.
An ideal transmission line acts as a xed impedance element, with impedance
l0
Z0 = . (2)
c0
Because the ideal line has no loss, an injected signal travels un-attenuated
until it encounters a discontinuity, which may consist of a change in impedance
or a load, such as the termination at the end of the line. When an incident wave
of magnitude Vi propagating through a transmission line with impedance Z0
encounters change to a new impedance Z1 , a reection occurs. The ratio of the
voltage of the reected wave Vr to that of the incident wave Vi is given by
Vr Z1 Z0
= . (3)
Vi Z1 + Z0
Loads, stubs, and vias on the line create discontinuities as well. Capacitive
loads or capacitive-like vias create a complex impedance. An incident wave
into a capacitance initially sees a short, which decreases the impedance Z1 to
zero and causes a negative reection by Eq. (3). The impedance rises to its
steady-state value as the capacitor charges. Uniformly distributed loads add
distributed capacitance per unit length and also alter the impedance through
Eq. (2).
Since any reection causes a loss of incident wave energy to the backwards-
propagating pulse, only loss-less lines with uniform impedance allow all of the
injected signals energy to coherently propagate to the end of the line. Further-
more, termination that is not impedance-matched to the line causes reections
to occur which, if not completely absorbed, may combine with and distort
other symbols. Therefore, operation at higher speeds requires minimizing all
impedance discontinuities and terminating with a matched impedance.
indicating that the conductance of the dielectric, and therefore the dielectric
loss attenuation term from Eq. (6) is proportional to frequency. In typical FR-4
channels, dielectric loss dominates over skin-effect loss at frequencies greater
than about 1GHz.
receiver threshold at all. Furthermore, the dispersion of the rising and falling
edges compresses the symbol such that the pulse width is much narrower at the
receiver. Without correction, noise in either voltage or time could corrupt the
symbol.
Through superposition techniques, the pulse response can be extrapolated
to provide the output characteristics for any input data pattern and to nd the
pattern yielding the worst-case minimum voltage and timing margin [10].
3.3. Equalization
Equalization techniques compensate for the frequency-dependent charac-
teristics of the channel so that the combined frequency response of the system
is nearly uniform over the frequencies of interest [4]. Imagine, for example,
that the channel response is given by the transfer function H (s). If we
can process the input or output signal through another transfer function
G(s) = 1/H (s), the total transfer function of the system will be H (s)G(s) = 1.
In practice, it is difcult to cancel the channel response so accurately. How-
ever, even schemes that cancel the channel response at or near the highest
operational frequency, where channel losses are greatest, can provide a signif-
icant benet to IO performance.
High-speed IO design 301
Figure 9. The pulse response of the differential transmission line of Figure 8. The input and
output pulses are shown on different, translated time and voltage scales (left-bottom and top-
right), that have been shifted for better comparison, with the arrows indicating the respective
axes for each type.
Equalization can occur at the transmitter, the receiver, or both. At the trans-
mitter, equalization is usually performed through pre-distortion of the input
signal processed through on-chip logic [3, 4, 11]. Symbols with high-frequency
components are injected into the line with higher amplitude while those of lower
frequency are injected with lower amplitude. The example shown in Figure 10
uses a scheme where any symbol that is different from the previously trans-
mitted symbol is sent with full amplitude. Symbols with the same value are
attenuated. This is known as two-tap de-emphasis since only two bits of history
are examined to decide the amplitude for the symbol entering the line, which
is generally referred to as the cursor.
This scheme can be extended to any arbitrary number of symbols of history,
either before or after the cursor, at the cost of power, die area, and potentially
data latency. If we represent the data stream xi with values of +1 and 1,
equalization can be performed for the cursor symbol x0 through a nite impulse
response (FIR) lter as given by
y0 = m xm + + 1 x1 + 0 x0 + 1 x1 + + m xm , (10)
Figure 10. An example two-tap de-emphasis equalized waveform at the output of the
transmitter.
arbitrary number of taps [11], but a typical 20-inch channel at 5 Gbits/s will
require no or one tap prior to the cursor and one to four taps following the
cursor.
Equalization can also be performed in the receiver through similar logic
processing means. Receiver equalization requires an accurate capture of the
initial portion of a data stream, usually through a training sequence, to provide
knowledge of the history of the data stream. It also suffers from the amplication
of the noise at the receiver along with the signal. However, the main advantage
of receiver equalization is that it may apply a larger gain than the transmitter,
which is limited in range between the maximum signal amplitude allowed out
of the transmitter and the minimum signal needed to maintain an acceptable
signal-to-noise ratio.
4. IO Clocking
Even if the link architecture can convey coherent symbols from transmitter
to receiver through a high-loss channel, the symbols must be captured at the
appropriate time and delivered synchronously to the receivers processing unit.
Figure 11(a) shows an ideal timing diagram for edge-triggered clocking of dual
data-rate input data at the receivers data sampling unit. Both rising and falling
edges of the clock sample the input data. The clock is aligned to the center of
the data symbol, which provides the greatest voltage in the sample and also the
greatest timing margin. Any jitter of the data or the sampling clock with respect
to one another results in timing margin loss, as shown in Figure 11(c). Jitter
amounting to more than half of the symbol width causes the sampling clock to
miss the data symbol entirely.
Two standard clocking topologies, derived clocking and source-
synchronous or forwarded clocking, deliver the timing reference to the trans-
mitter and receiver.
High-speed IO design 303
Figure 11. Synchronous capture of input data: (a) synchronous capture circuit, (b) ideal data
and clock timing, (c) example timing with voltage noise and jitter.
adjusts the phase of the sampling clock at the receiver to coincide with the
data transitions at the symbol boundary. The second clock is shifted 90 degrees
from the rst clock and is used to sample the data at its midpoint. Following
its capture, the data typically passes to an on-chip logic clock domain derived
from the same input source but not adjusted in phase.
Figure 13 shows how over-sampling the input data in this manner provides
phase alignment information through the clock alignment decision logic. If, as
in the left two cases, the edge sampled data matches the earlier data sample,
the clocks are early and should move later. In the right two cases, the edge
samples match the later data sample, indicating that the clocks are late and
should be moved earlier. By collecting this alignment information from the
nal data sampling point and feeding it back to the phase adjustment units, this
architecture can also compensate for clock distribution delays in the system.
In fact, dynamic feedback also allows the alignment units to continu-
ously track any drift between the clock and the data. Continuous alignment
is often desirable for several reasons. In mesochronous systems, described in
Section 4.3, the average frequency in the transmitter and in the receiver must
be identical. However, voltage and temperature changes can cause a change
High-speed IO design 305
Figure 13. Over-sampling the input to lock the clock to the input data.
Although dynamic tracking schemes follow slow drift, they are generally
unable to correct for high-frequency jitter between clock and data. Such jitter
causes misalignment of the sampling clock with respect to the data symbol and
creates loss of timing margin which, when it exceeds half of the symbol width,
causes data loss.
Timing misalignment occurs when differing jitter arises along the clock and
data paths or when clock and data paths are of unmatched lengths so that they
no longer share the common characteristics of the source clock. Specically,
the sources of jitter include
1. Jitter injected along one path that is not injected along the other, such as
supply noise-induced delay variations in the transmitter or receiver.
2. Source clock jitter ltered by different PLL with differing characteristics.
3. Path length differences from the common point, which separates the
original edge that creates the data from the original edge that creates the
sampling clock.
306 W.R. Anderson
With enough knowledge of the system characteristics, the timing loss from
these effects can be calculated [12] or measured [14].
input, as shown in Figure 13. The quadrature clock is shifted 90 degrees from
this position so that it samples the data in the center of its valid region, providing
the greatest margin for timing degradation through jitter. Alignment with the
forwarded clock input eliminates the need for in-phase over-sampling at the
data lane receivers.
Furthermore, because the forwarded clock shares the same source timing as
the input data, both data and clock experience the same timing drift and jitter
from the transmitter. This creates an inherent tracking between clock and data.
In fact, dynamic tracking is often unnecessary in forwarded clock systems.
Only periodic re-alignment is needed to compensate for temperature drift in
the receiver clock path.
As with the derived clocking case, if skew among data lanes becomes too
large, the clock alignment and phase adjustment can be pushed into every data
lane to perform the clock alignment on a per-lane basis. The over-sampling
receiver must be added back to the data lanes and the phase alignment overhead
must be duplicated for every lane.
million. For both cases, the data path topology must comprehend the difference
in data transfer rate arising from the clock system topology.
Although the rate-matching of a mesochronous system implies a straight-
forward one-for-one transfer along the data path, it is usually not so simple in
actual systems. Although the average clock frequency must match between
transmitter and receiver, short-term deviations may occur. These arise, for
example, from differences in the response of the transmitter and the receiver
PLL to phase noise from the reference oscillator or from voltage and tempera-
ture drift in the transmitter or receiver. These drifts cause the transmitter clock
to run temporarily faster or slower than the receiver clock by a slight amount.
To overcome the data-rate difference, we can buffer the data in a rst-in rst-
out (FIFO) structure. This makes data available to the receiver in case its clock
temporarily speeds up and also provides a buffer for received data in case the
transmitter clock temporarily speeds up.
In a plesiochronous system, the clock frequency difference implies a con-
tinuous difference in the data rate between transmitter and receiver. Buffering
the data does not provide a solution since any nite buffer will eventually run
out. Possible solutions include either handshake mechanisms or skip charac-
ters. Hand-shake mechanisms work by transferring data only when it becomes
available [15]. This can be done either per serial bit or by constructing the data
into a parallel packet and transferring it when the packet is complete. Skip
characters work by allowing the receiver to ignore or add in occasional null
data sequences so that the same effective data transfer rate can be maintained.
5. Conclusions
Signicant physical effects impede the operation of off-chip signaling at
speeds higher than 1 to 2 Gbit/s. These include transmission-line effects such
as dielectric loss and skin effect loss, inter-symbol interference, and clock-
and data-skew loss. These effects can be overcome through the dedication
of more on-chip computational resources to process the signal in a way that
compensates for these effects. Such schemes include precision calibration of
signaling levels, equalization, and clock phase adjustment. Further methods
will be required to achieve speeds in excess of 5 Gbit/s.
Acknowledgements
References
[1] Bakoglu, H.B. Circuits, Interconnections, and Packaging for VLSI. Addison-Wesley,
1990.
[2] Dabral, S.; Maloney, T.J. Basic ESD and I/O Design, John Wiley & Sons, 1998.
[3] Dally, W.J.; Poulton, J.W. Digital Systems Engineering, Cambridge University Press,
1998.
[4] Dally, W.J.; Poulton, J.W. Transmitter equalization for 4-Gbps signaling, IEEE
Micro, 1997, 4856.
[5] Horowitz, M.; Yang, C.-K.K.; Sidiropoulos, S. High-speed electrical signaling:
overview and limitations, IEEE Micro, 1998, 1224.
[6] Johnson, H.W.; Graham, M. High-Speed Digital Design: A Handbook of Black Magic,
Prentice Hall, 1993.
[7] Hall, S.H.; Hall, G.W.; McCall, J.A. High-Speed Digital System Design: A Handbook
of Interconnect Theory and Design Practices, John Wiley & Sons, 2000.
[8] Sidiropoulos, S.; Yang, C.-K.K.; Horowitz, M. High-speed inter-chip signaling.
In Design of High-Performance Microprocessor Circuits, Anantha Chandrakasan,
William J. Bowhill, and Frank Fox, eds., IEEE Press, 2000.
[9] Thierauf, S.C.; Anderson, W.R. I/O and ESD circuit design. In Design of High-
Performance Microprocessor Circuits, Anantha Chandrakasan, William J. Bowhill,
and Frank Fox, eds., IEEE Press, 2000.
[10] Casper, B.K.; Haycock, M.; Mooney, R. An accurate and efcient analysis method
for multi-Gb/s chip-to-chip signaling schemes, Symposium on VLSI Circuits, 2002,
5457.
[11] Erdogan, A.T.; Arslan, T.; Horrocks, D.H. Low-power multiplication schemes for
single multiplier CMOS based FIR digital lter implementations, IEEE Int. Symp.
Circuits Systems, 1997, 19401943.
[12] PCI Express Jitter Modeling (July 14, 2004), http://www.pcisig.com.
[13] Stojanovic, V.; Horowitz, M. Modeling and analysis of high-speed links, Custom
Integrated Circuits Conference, September 2003.
[14] Kossel, M.A.; Schmatz, M.L. Jitter measurements of high-speed serial links, IEEE
Design Test Comput., 2004, 536543.
[15] PCI Express Base Specication, Revision 1.1 (March 28, 2005),
http://www.pcisig.com.