0% found this document useful (0 votes)
159 views21 pages

Hsio Chapter11 Book

This document summarizes high-speed off-chip signaling techniques used to transmit data between integrated circuits. It discusses how off-chip communication faces greater challenges than on-chip signaling due to signal degradation over distance. It then summarizes several common signaling methods used for off-chip links, including single-ended voltage mode signaling, current mode signaling, and differential signaling techniques. It explains how higher-speed signaling requires more robust representations and synchronization across parallel buses to compensate for increased skew between chips.

Uploaded by

Akshay P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views21 pages

Hsio Chapter11 Book

This document summarizes high-speed off-chip signaling techniques used to transmit data between integrated circuits. It discusses how off-chip communication faces greater challenges than on-chip signaling due to signal degradation over distance. It then summarizes several common signaling methods used for off-chip links, including single-ended voltage mode signaling, current mode signaling, and differential signaling techniques. It explains how higher-speed signaling requires more robust representations and synchronization across parallel buses to compensate for increased skew between chips.

Uploaded by

Akshay P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter 11

HIGH-SPEED IO DESIGN

Warren R. Anderson
Intel Corporation

Abstract: This chapter explores common methods and circuit architectures used to transmit
and receive data through off-chip links.

Key words: High-speed IO; off-chip links; serial data transfer; parallel data bus; FR-4; skin
effect; dielectric loss; clock phase alignment; derived clocking; source syn-
chronous; forwarded clocking; plesiochronous; mesochronous.

1. Introduction

In order for system performance to keep pace with the ever-increasing speed
of the microprocessor, the bandwidth of the signaling into and out of the micro-
processor must follow the trend in on-chip processing performance. However,
the physical limitations imposed by the interconnect channel, which exhibits
increasing amounts of signal loss and jitter amplication at higher frequen-
cies, impedes the increase in off-chip signaling speed. Numerous strategies to
cope with the interconnect properties have been developed, enabled by more
sophisticated techniques to control and process the off-chip electrical signals.
This chapter discusses the most prevalent of these techniques, focusing
on the chip-to-chip communication topologies common for microprocessors,
namely access to memory, processor-to-processor communication for parallel
computing, and processor-to-chipset communication. These links generally run
over short distances of up to one or two meters and consist of buses of parallel
data lanes carrying wide data words. Although serial communication shares
many of the same properties and techniques as the parallel bus designs, serial

Vojin G. Oklobdzija and Ram K. Krishnamurthy (eds.),


High-Performance Energy-Efcient Microprocessor Design, 289309.
c 2006 Springer. Printed in the Netherlands.

289
290 W.R. Anderson

communication, which generally takes place over much longer distances, will
not be discussed explicitly here.
Our discussion begins with a comparison of the on-chip and off-chip data
transmission environment, with an emphasis on the desired properties of the
off-chip communication system. Several common signaling methods will be
shown. We then explore the properties of the off-chip signaling medium, par-
ticularly those that dominate at high frequencies and therefore limit off-chip
signaling speed. Techniques and example circuit topologies for adapting to
these effects in both the time and voltage domains, as well as their limitations,
are shown.

2. IO Signaling

The overall function of IO is to faithfully convey data from a transmitter


chip to a receiver chip [5, 8]. Similar to on-chip communication, where data is
passed from one section of the chip to another, off-chip communication must
dene the data representation for a 1 and a 0. It must also transmit the
output data and capture the input data synchronously so that the input data
stream can enter the synchronously clocked logic on the receiver side.
Atypical conguration for microprocessor IO is shown in Figure 1. It depicts
the processor in communication with a variety of external components, such as
memory, a memory chipset, another processor, and a chipset communicating
with external storage and networking devices. Off-chip communication takes
place through a variety of parallel data buses. Each data bus consists of sev-
eral parallel data lanes conveying information through an interconnection wire

Figure 1. Typical components connected to a microprocessor.


High-speed IO design 291

or channel routing through the off-chip interconnection medium, which may


consist of package wiring, board wiring, or cables.
Because the properties of the interconnect medium differ signicantly from
on-chip and off-chip connections, the optimal methods for communicating
in these two environments also differ signicantly. Off-chip communication
must often take place between integrated circuits with different supply volt-
ages, must compensate for losses in the interconnect medium, must traverse
larger distances, and must take place in a noisy off-chip environment. Higher
performance requires a robust representation in the voltage domain.
In the time domain, synchronization must be maintained across the parallel
data bus. As shown in Figure 2, the larger off-chip communication distance
creates skew across a parallel data bus, which, unless extremely well controlled,
increases between transmitter and receiver. In addition, the receiver clock must
fan out to all of the data lane receivers, which increases its jitter, and must
be phase aligned to capture all of the data symbols in the valid data region. A
data symbol is the representation for one bit of a 0 or a 1 on a data lane.
Whereas on-chip clocks are designed with low skew between transmitting
and receiving synchronization points, the IO transmitter and receiver reside
on different pieces of silicon and often cannot be so well controlled. The IO
architecture must compensate by aligning the receiver clock to sample the input
data at the most optimum time.

Figure 2. Typical timing of transmitter output and receiver input signals in a dual data-rate
signaling IO system.
292 W.R. Anderson

2.1. Single-ended Voltage-mode Signaling


Returning to the voltage domain, we rst consider how the value of the
data can be represented for off-chip signaling. As opposed to on-chip data
propagation where a voltage representing a 1 must be near the positive supply
rail Vdd and a voltage representing a 0 must be near the negative supply rail
Vss, numerous schemes may be used for IO.
Some single-ended schemes work analogously to on-chip signaling, but
must be standardized in order to enable interoperability among integrated cir-
cuits regardless of IC process technology. Off-chip signaling standards dene
explicit signaling voltage levels and tolerances independent from the on-chip
supply, which can vary with process technology generation. At the input
receiver, for example, a voltage below a maximum input voltage level VIL
represents a 0 and a voltage above a minimum input voltage level VIH repre-
sents a 1. Minimum and maximum output levels are dened as VOL and VOH ,
usually with a slightly wider separation to allow for noise to degrade the signal
between transmitter and receiver. Examples of voltage-mode signaling include
the TTL and LVTTL standards. Such a scheme for a bi-directional interface is
shown in Figure 3.
For high-speed communication, single-ended voltage-mode signaling
exhibits numerous disadvantages. Since high-speed operation requires deliver-
ing as much energy as possible to the electrical pulse representing a symbol, any
effect that removes energy from a symbol and combines it with another degrades
operation speed. Transmission-line reections represent one such effect.

Figure 3. Bi-directional link using single-ended voltage mode signaling. The resistor in series
with the output sets the VOH and VOL levels and also provides board impedance matching. The
pre-driver outputs predrive h and predrive-l are separated to tri-state the output driver when
receiving input data. Separation of the transmitter and receiver supplies, as shown here, is often
used to avoid simultaneous switching noise from the transmitter appearing on the supply of the
more sensitive input receiver.
High-speed IO design 293

As will be discussed in Section 3.1.1, impedance discontinuities cause


reections that, if reected back toward the receiver, create noise on un-related
symbols. In order to absorb these reections, high-speed transmitters must be
impedance matched to the line. As shown in Figure 3, the driver itself becomes
part of the termination network; therefore the driver must not only be controlled
to launch the correct VOL and VOH levels, but it must also form the proper ter-
mination network, preferably at all voltage levels. This becomes extremely
difcult in practice.
To overcome these difculties and provide more noise immunity, the VOL ,
VOH , VIL , and VIH levels are often made wider than necessary, consuming more
power. The wider swings are usually less well controlled, making higher-speed
operation difcult.

2.2. Current Mode Signaling


Faster data rates can generally be obtained with current mode drivers.
Signaling speeds can be further enhanced using current-mode signaling in
pseudo-differential or differential mode, as will be described in Sections 2.3
and 2.4.
In current mode signaling, the output driver forces a current into the trans-
mission line, using the natural impedance of the line to create a voltage. An
example is shown in Figure 4, where the driver forces Idrive = 20 mA into the
parallel combination of the 50  interconnect impedance and the Rterm = 50 
near-end termination, generating a 500 mV signal. To avoid reections, the line
is terminated at the receiver end (far end). To absorb reections from impedance
discontinuities, the line is often terminated at the transmitter (near end) as well.
The termination usually ties to ground to avoid depending on the value of the
positive supply rail and to avoid coupling the signal to noise on the positive
supply rail.

Figure 4. Current mode signaling.


294 W.R. Anderson

Although a calibrated current-mode driver can launch the same signal on


the line as a well-tuned voltage-mode driver, the current-mode driver decou-
ples the driver function from the impedance matching section, here provided
by the independent termination resistor. Since the current source portion of
the current-mode driver has a high impedance, the actual impedance of the
current source is not critical, as long as its impedance is much greater than
that of the termination resistor. As a result, the driver is simpler to design and
control than a voltage-mode driver, where the driver and impedance-matching
functions are combined. Furthermore, current-mode signaling provides greater
noise immunity since it decouples the driver from the positive supply rail.
Through the calibration of the current-mode output source with a known
current or voltage, the output level can generally be driven within 10% or greater
accuracy. Periodic calibration can compensate for voltage and temperature
drifts on the transmitter die. The receiver, however, must still detect if the input
voltage is above or below the input thresholds. As described in the next two
sections, additional means are required to improve the accuracy of the input
levels.

2.3. Pseudo-differential Signaling


The reduced swings usually found in current-mode signaling require a more
sensitive means to discriminate between high and low input levels. One method
for achieving this uses pseudo-differential signaling, which compares the signal
at the receiver to a reference voltage. Since this reference voltage can be gen-
erated through matched resistor devices or other precision means, it will not
depend strongly on process condition or temperature. Depending on the off-
set between the transmitter and receiver supplies and the receiver tolerance
requirements, the reference voltage may be generated in the receiver, at the
transmitter and routed to the receiver, or externally with precision, matched
resistors. An example of a pseudo-differential bus is shown in Figure 5.
Pseudo-differential signaling has been used in both voltage mode and cur-
rent mode standards, such as HSTL, GTL, and SSTL.

2.4. Differential Signaling

Although the explicit reference in pseudo-differential signaling provides


for more robust signal detection, pseudo-differential signaling is still subject
to several problems that must be eliminated for operation at higher speeds.
One such problem is simultaneous switching noise. Any signaling scheme
using a single wire for each data lane creates simultaneous switching noise
(SSO). SSO creates a transient decrease or increase in the transmitter supply
High-speed IO design 295

Figure 5. A parallel bus with pseudo-differential signaling.

voltage as it reacts to providing the dI/dt needed when switching the output
[1, 9]. For example, consider the case of Figure 4 when the output drives low.
In this condition no current ows in the transmission line. When the driver
pulls the line high by sourcing a constant current from the positive supply
rail into the output, this current creates a dI/dt in the circuit loop formed by the
positive supply, the output, the interconnect signals transmission line, its return
path in the underlying ground plane, and the negative supply. Any inductance
in this loop generates a transient voltage excursion when it experiences the
dI/dt. This generally occurs where the drivers output pad and the supply rails
interact with the package, particularly in bond wire designs. In our example,
the inductance in the package between the positive supply on the board and
on the die develops a voltage that temporarily decreases the on-die supply
rail. Likewise, the inductance in the ground supply develops a voltage that
temporarily raises the on-die ground voltage. Although the effect is also present
on the signal itself, it is generally worse for the supplies since the ratio of
signals to supply pairs is generally at least 2 to 1. Therefore, the effect is worst
when all output drivers on a bus switch in the same direction simultaneously,
creating the largest possible dI/dt. Although simultaneous switching can be
tolerated through proper construction of the supply network [1, 9], it cannot be
completely eliminated for single-ended signaling.
Differential signaling avoids this problem by transmitting the data and its
complement on parallel interconnect wires to the receiver, where a positive or
negative difference between the signal pair indicates a 1 or a 0, respec-
tively. Provided the true and complement transmitters are balanced to draw
296 W.R. Anderson

Figure 6. Differential signaling on one lane of an IO link.

equal currents even when switching, no simultaneous switching noise occurs.


Furthermore, the voltage swing is effectively doubled. Rather than swinging
1/2 Vs around the pseudo-differential reference, for example, the signals can
swing Vs between each other with the same single-ended output voltage.
A nal benet is that noise that couples to both signals in the differential pair
only alters the common mode and does not change the differential voltage.
Therefore, differential signaling is more tolerant to certain types of noise than
single-ended or pseudo-differential signaling.
The penalty for differential signaling is that each data lane now uses two
interconnect wires, which reduces in half the number of lanes for a bus with
a xed number of interconnect wires. Furthermore, without optimization, dif-
ferential signaling doubles the driver power. The factor of two decrease in bus
width is justied if each lane in the link can run at double the single-ended data
rate, which is often the case. Differential signaling is used in such standards as
LVDS and PCI-Express. An example link topology is shown in Figure 6.

3. Coping with the Interconnect

The green epoxy glass resin printed wiring board material FR-4 is the
workhorse of the electronics industry. A metal wiring trace over a power or
ground plane with FR-4 dielectric in between forms a transmission line. This
transmission line is far from ideal, however [2, 6, 7]. Several mechanisms create
loss at high frequencies. Impedance discontinuities produce reections. Adja-
cent signals and noisy supplies cause interference. Understanding these mech-
anisms through studying the properties of the interconnect is key to achieving
higher speeds.

3.1. Interconnect Properties

An ideal transmission line consists of a distributed inductance per unit length


l0 and capacitance per unit length c0 , where both l0 and c0 depend on the
geometry of the signal trace, the geometry of the dielectric surrounding the
High-speed IO design 297

signal, and the dielectric constant. Signals injected into the line propagate with
a velocity
1 c
v0 = = , (1)
l0 c0 r
where c is the speed of light and r is the relative permittivity of the dielectric.
An ideal transmission line acts as a xed impedance element, with impedance

l0
Z0 = . (2)
c0

3.1.1. Reections and impedance discontinuities

Because the ideal line has no loss, an injected signal travels un-attenuated
until it encounters a discontinuity, which may consist of a change in impedance
or a load, such as the termination at the end of the line. When an incident wave
of magnitude Vi propagating through a transmission line with impedance Z0
encounters change to a new impedance Z1 , a reection occurs. The ratio of the
voltage of the reected wave Vr to that of the incident wave Vi is given by
Vr Z1 Z0
= . (3)
Vi Z1 + Z0
Loads, stubs, and vias on the line create discontinuities as well. Capacitive
loads or capacitive-like vias create a complex impedance. An incident wave
into a capacitance initially sees a short, which decreases the impedance Z1 to
zero and causes a negative reection by Eq. (3). The impedance rises to its
steady-state value as the capacitor charges. Uniformly distributed loads add
distributed capacitance per unit length and also alter the impedance through
Eq. (2).
Since any reection causes a loss of incident wave energy to the backwards-
propagating pulse, only loss-less lines with uniform impedance allow all of the
injected signals energy to coherently propagate to the end of the line. Further-
more, termination that is not impedance-matched to the line causes reections
to occur which, if not completely absorbed, may combine with and distort
other symbols. Therefore, operation at higher speeds requires minimizing all
impedance discontinuities and terminating with a matched impedance.

3.1.2. Transmission line losses

Up to this point we have only considered loss-less lines. Unfortu-


nately, transmission-line losses cause high-speed signals, even in a perfectly
298 W.R. Anderson

Figure 7. Transmission line with loss components.

impedance-matched and terminated line, to lose a portion of their energy


through other means. A real transmission line exhibits a distributed resistance
per unit length R() in the signal trace and also contains, at high frequencies,
a nite amount of conductance per unit length G() through the dielectric
between the signal and its return path, as shown in Figure 7.
For an injected signal Vi (0, ) with angular frequency , the resulting signal
V (z, ) at any point z along the line is given by
V (z, ) = Vi (0, )e ()z , (4)
where

() = (R() + j L0 )(G() + j C0 ). (5)
If both R() and G() are small, we can approximate () by
R() Z0 G()
()
= + . (6)
2Z0 2
As the notation indicates, both the series resistance R() and dielectric
loss G() are frequency dependent. The frequency dependence of R() arises
from the skin effect, which connes the current ever closer to the surface of
the conductor at higher frequencies. For a strip conductor of trace width w and
resistivity , the frequency dependence is given by [3]:

1
R() = . (7)
2w 2
The frequency dependence in the dielectric loss arises from the response
of the medium to high-frequency electro-magnetic waves. From Figure 7, the
admittance of the dielectric contains a real term G() and an imaginary term
j C. Their ratio denes the loss tangent , given by
G()
tan = . (8)
C
The loss tangent is nearly constant in frequency and is a fundamental property
of the dielectric material. Rearranging Eq. (8) yields
G() = C tan , (9)
High-speed IO design 299

indicating that the conductance of the dielectric, and therefore the dielectric
loss attenuation term from Eq. (6) is proportional to frequency. In typical FR-4
channels, dielectric loss dominates over skin-effect loss at frequencies greater
than about 1GHz.

3.2. Inter-symbol Interference

Now consider a system sending symbols representing data from transmitter


to receiver. For long lines at high data rates, the interconnect will carry many
symbols in ight between the transmitter and the receiver. Because data carries
information, an arbitrary pattern of 0 and 1 symbols will be present on the
line, representing a variety of frequency components. Under these conditions,
both reections and frequency-dependent losses cause the output of the line at
the receiver to depend strongly on the input data pattern.
Reections not only take energy away from any given symbol, they also send
energy from that symbol in the opposite direction. If the initial reection reects
again, the reected pulse joins other non-related symbols propagating in the
direction of the receiver, interfering with and potentially corrupting the victim
symbol. Furthermore, frequency-dependent loss causes data patterns with a
high-frequency content to be attenuated while patterns with low frequency
content are not. Both of these effects illustrate inter-symbol interference (ISI),
where symbols can interfere with each other, resulting in strong data pattern-
dependent characteristics for the signaling medium.
The frequency domain characteristics, represented by the ratio of the out-
put signal to the input signal as shown in Figure 8, demonstrate where both of
these effects occur. In the frequency domain, reections cause dips and spikes
at frequencies where the reections result in destructive or constructive inter-
ference at the output. Loss is evident in the increasing attenuation of the output
at high frequency. In the time domain, reections diminish the amplitude of
the initial step and also introduce delayed glitches at the output, potentially
interfering with later symbols. Loss causes dispersion of the input step as well
as a slowly-rising tail after the initial step.
These two effects, dispersion of the initial step and the slow, asymptotic
approach to the steady-state condition, combine with the data pattern to create
loss of margin on both the time and the voltage axes. Consider a lone symbol
representing a 1 in eld of 0 symbols, as shown by the pulse response
characteristics of Figure 9. Prior to the 1, the line will have sat inactive for
a period of time, allowing it to decay close to its steady-state condition. The
interconnect losses disperse the rising edge of the 1 pulse and attenuate its
peak. Likewise, losses do the same to the falling edge at the tail of the 1
pulse. As a result, the peak pulse amplitude, which must rise from the low
steady-state level, is severely attenuated and, in some cases, may not cross the
300 W.R. Anderson

Figure 8. Frequency-domain characteristics of a differential transmission line consisting of


two daughter cards and one baseboard, total length 15 inches (38 cm).

receiver threshold at all. Furthermore, the dispersion of the rising and falling
edges compresses the symbol such that the pulse width is much narrower at the
receiver. Without correction, noise in either voltage or time could corrupt the
symbol.
Through superposition techniques, the pulse response can be extrapolated
to provide the output characteristics for any input data pattern and to nd the
pattern yielding the worst-case minimum voltage and timing margin [10].

3.3. Equalization
Equalization techniques compensate for the frequency-dependent charac-
teristics of the channel so that the combined frequency response of the system
is nearly uniform over the frequencies of interest [4]. Imagine, for example,
that the channel response is given by the transfer function H (s). If we
can process the input or output signal through another transfer function
G(s) = 1/H (s), the total transfer function of the system will be H (s)G(s) = 1.
In practice, it is difcult to cancel the channel response so accurately. How-
ever, even schemes that cancel the channel response at or near the highest
operational frequency, where channel losses are greatest, can provide a signif-
icant benet to IO performance.
High-speed IO design 301

Figure 9. The pulse response of the differential transmission line of Figure 8. The input and
output pulses are shown on different, translated time and voltage scales (left-bottom and top-
right), that have been shifted for better comparison, with the arrows indicating the respective
axes for each type.

Equalization can occur at the transmitter, the receiver, or both. At the trans-
mitter, equalization is usually performed through pre-distortion of the input
signal processed through on-chip logic [3, 4, 11]. Symbols with high-frequency
components are injected into the line with higher amplitude while those of lower
frequency are injected with lower amplitude. The example shown in Figure 10
uses a scheme where any symbol that is different from the previously trans-
mitted symbol is sent with full amplitude. Symbols with the same value are
attenuated. This is known as two-tap de-emphasis since only two bits of history
are examined to decide the amplitude for the symbol entering the line, which
is generally referred to as the cursor.
This scheme can be extended to any arbitrary number of symbols of history,
either before or after the cursor, at the cost of power, die area, and potentially
data latency. If we represent the data stream xi with values of +1 and 1,
equalization can be performed for the cursor symbol x0 through a nite impulse
response (FIR) lter as given by

y0 = m xm + + 1 x1 + 0 x0 + 1 x1 + + m xm , (10)

where the i represent the coefcients of equalization required to cancel the


channel response. This summation can be performed in digital logic for any
302 W.R. Anderson

Figure 10. An example two-tap de-emphasis equalized waveform at the output of the
transmitter.

arbitrary number of taps [11], but a typical 20-inch channel at 5 Gbits/s will
require no or one tap prior to the cursor and one to four taps following the
cursor.
Equalization can also be performed in the receiver through similar logic
processing means. Receiver equalization requires an accurate capture of the
initial portion of a data stream, usually through a training sequence, to provide
knowledge of the history of the data stream. It also suffers from the amplication
of the noise at the receiver along with the signal. However, the main advantage
of receiver equalization is that it may apply a larger gain than the transmitter,
which is limited in range between the maximum signal amplitude allowed out
of the transmitter and the minimum signal needed to maintain an acceptable
signal-to-noise ratio.

4. IO Clocking

Even if the link architecture can convey coherent symbols from transmitter
to receiver through a high-loss channel, the symbols must be captured at the
appropriate time and delivered synchronously to the receivers processing unit.
Figure 11(a) shows an ideal timing diagram for edge-triggered clocking of dual
data-rate input data at the receivers data sampling unit. Both rising and falling
edges of the clock sample the input data. The clock is aligned to the center of
the data symbol, which provides the greatest voltage in the sample and also the
greatest timing margin. Any jitter of the data or the sampling clock with respect
to one another results in timing margin loss, as shown in Figure 11(c). Jitter
amounting to more than half of the symbol width causes the sampling clock to
miss the data symbol entirely.
Two standard clocking topologies, derived clocking and source-
synchronous or forwarded clocking, deliver the timing reference to the trans-
mitter and receiver.
High-speed IO design 303

Figure 11. Synchronous capture of input data: (a) synchronous capture circuit, (b) ideal data
and clock timing, (c) example timing with voltage noise and jitter.

4.1. Derived Clock Design


In derived clocking, a synchronization source is provided to both the trans-
mitter and the receiver. This source may be common to both, as shown in
Figure 12, or it may come from independent sources that are frequency matched
to within a certain tolerance. Phase-locked loops (PLL) in both the transmitter
and receiver multiply the input clock to the link frequency. In the transmitter,
the link frequency clock is generally used without further phase adjustment to
capture data from the internal processing unit and feed it onto the link. In the
receiver, however, the PLL output clock must be phase aligned to coincide with
the input data as shown in Figure 11(b).
The overall architecture to perform the phase alignment is shown in
Figure 12. The receiver clock from the PLL enters two phase adjustment units,
which deliver two clock phases to the input samplers. The rst alignment unit
304 W.R. Anderson

Figure 12. Clocking in a derived-clock architecture.

adjusts the phase of the sampling clock at the receiver to coincide with the
data transitions at the symbol boundary. The second clock is shifted 90 degrees
from the rst clock and is used to sample the data at its midpoint. Following
its capture, the data typically passes to an on-chip logic clock domain derived
from the same input source but not adjusted in phase.
Figure 13 shows how over-sampling the input data in this manner provides
phase alignment information through the clock alignment decision logic. If, as
in the left two cases, the edge sampled data matches the earlier data sample,
the clocks are early and should move later. In the right two cases, the edge
samples match the later data sample, indicating that the clocks are late and
should be moved earlier. By collecting this alignment information from the
nal data sampling point and feeding it back to the phase adjustment units, this
architecture can also compensate for clock distribution delays in the system.
In fact, dynamic feedback also allows the alignment units to continu-
ously track any drift between the clock and the data. Continuous alignment
is often desirable for several reasons. In mesochronous systems, described in
Section 4.3, the average frequency in the transmitter and in the receiver must
be identical. However, voltage and temperature changes can cause a change
High-speed IO design 305

Figure 13. Over-sampling the input to lock the clock to the input data.

in circuit delay, shifting the data or clock in time. In plesiochronous systems,


also described in Section 4.3, clocks may differ slightly in frequency, causing
a continuous shift in phase between data and clock.
The scheme where the clock alignment is common across all data lanes, as
illustrated in Figure 14, works only when the skew among data lanes is low
enough to allow an adequate timing margin. When skew among data lanes
becomes large, the clock alignment must be adjusted on a per-lane basis. This
can be done by making the clock alignment decision and clock phase adjustment
for every lane, at a cost of more circuits, area, and power to duplicate these
circuit blocks in every lane.

4.1.1. Jitter in derived clock systems

Although dynamic tracking schemes follow slow drift, they are generally
unable to correct for high-frequency jitter between clock and data. Such jitter
causes misalignment of the sampling clock with respect to the data symbol and
creates loss of timing margin which, when it exceeds half of the symbol width,
causes data loss.
Timing misalignment occurs when differing jitter arises along the clock and
data paths or when clock and data paths are of unmatched lengths so that they
no longer share the common characteristics of the source clock. Specically,
the sources of jitter include
1. Jitter injected along one path that is not injected along the other, such as
supply noise-induced delay variations in the transmitter or receiver.
2. Source clock jitter ltered by different PLL with differing characteristics.
3. Path length differences from the common point, which separates the
original edge that creates the data from the original edge that creates the
sampling clock.
306 W.R. Anderson

Figure 14. Clocking in a forwarded or source-synchronous clock architecture.

With enough knowledge of the system characteristics, the timing loss from
these effects can be calculated [12] or measured [14].

4.2. Source Synchronous Design

An alternative clocking structure is source synchronous or forwarded clock-


ing, shown in Figure 14. In this architecture, a clock lane is added in parallel
with the data lanes and a clock is sent from transmitter to receiver. The for-
warded clock at the receiver is amplied, phase aligned, and distributed to the
input data samplers.
As long as the skew among data and clock lanes is within tolerable limits,
clock alignment can be performed with the in-phase clock at the forwarded
clock lane sampling input. Alignment is performed such that the in-phase clock
is placed in the edge sampling position with respect to the forwarded clock
High-speed IO design 307

input, as shown in Figure 13. The quadrature clock is shifted 90 degrees from
this position so that it samples the data in the center of its valid region, providing
the greatest margin for timing degradation through jitter. Alignment with the
forwarded clock input eliminates the need for in-phase over-sampling at the
data lane receivers.
Furthermore, because the forwarded clock shares the same source timing as
the input data, both data and clock experience the same timing drift and jitter
from the transmitter. This creates an inherent tracking between clock and data.
In fact, dynamic tracking is often unnecessary in forwarded clock systems.
Only periodic re-alignment is needed to compensate for temperature drift in
the receiver clock path.
As with the derived clocking case, if skew among data lanes becomes too
large, the clock alignment and phase adjustment can be pushed into every data
lane to perform the clock alignment on a per-lane basis. The over-sampling
receiver must be added back to the data lanes and the phase alignment overhead
must be duplicated for every lane.

4.2.1. Jitter in source synchronous systems

Although in source synchronous systems the clock timing is common with


the data timing at the point of transmission, clock and data paths are not iden-
tically matched. The clock traverses the amplier, phase alignment, and distri-
bution circuits in the receiver, each of which may add jitter to the clock that
will not be seen in the data. Furthermore, these circuits plus any channel skew
cause a delay mismatch between clock and data. Since the receiver clock path
delay is xed, jitter will accumulate up to the clock and data delay difference.
Therefore, it is still critical to minimize jitter in the transmitter clock to achieve
higher speeds.

4.3. Clock Drift Considerations

The construction of the source clocks into either a forwarded or a derived


clock system also affects the gross synchronization between transmitter and
receiver. Two situations are possible. In the rst, the source clock to the trans-
mitter and to the receiver may come from the same oscillator or from separate
oscillators that are frequency matched such that the average frequency of the
transmitter and the receiver clock is the same. This is known as a mesochronous
system. In the second situation, known as a plesiochronous system, the aver-
age clock frequency of the source clock at the transmitter and at the receiver
may differ by a small amount. The difference is usually constrained to parts per
308 W.R. Anderson

million. For both cases, the data path topology must comprehend the difference
in data transfer rate arising from the clock system topology.
Although the rate-matching of a mesochronous system implies a straight-
forward one-for-one transfer along the data path, it is usually not so simple in
actual systems. Although the average clock frequency must match between
transmitter and receiver, short-term deviations may occur. These arise, for
example, from differences in the response of the transmitter and the receiver
PLL to phase noise from the reference oscillator or from voltage and tempera-
ture drift in the transmitter or receiver. These drifts cause the transmitter clock
to run temporarily faster or slower than the receiver clock by a slight amount.
To overcome the data-rate difference, we can buffer the data in a rst-in rst-
out (FIFO) structure. This makes data available to the receiver in case its clock
temporarily speeds up and also provides a buffer for received data in case the
transmitter clock temporarily speeds up.
In a plesiochronous system, the clock frequency difference implies a con-
tinuous difference in the data rate between transmitter and receiver. Buffering
the data does not provide a solution since any nite buffer will eventually run
out. Possible solutions include either handshake mechanisms or skip charac-
ters. Hand-shake mechanisms work by transferring data only when it becomes
available [15]. This can be done either per serial bit or by constructing the data
into a parallel packet and transferring it when the packet is complete. Skip
characters work by allowing the receiver to ignore or add in occasional null
data sequences so that the same effective data transfer rate can be maintained.

5. Conclusions
Signicant physical effects impede the operation of off-chip signaling at
speeds higher than 1 to 2 Gbit/s. These include transmission-line effects such
as dielectric loss and skin effect loss, inter-symbol interference, and clock-
and data-skew loss. These effects can be overcome through the dedication
of more on-chip computational resources to process the signal in a way that
compensates for these effects. Such schemes include precision calibration of
signaling levels, equalization, and clock phase adjustment. Further methods
will be required to achieve speeds in excess of 5 Gbit/s.

Acknowledgements

The author is grateful to Xiaoxiong (Kevin) Gu and Mohiuddin Mazumder


for providing the channel characteristics and simulation results. Thanks to Ken
Drottar and Pascal Meier for valuable feedback.
High-speed IO design 309

References
[1] Bakoglu, H.B. Circuits, Interconnections, and Packaging for VLSI. Addison-Wesley,
1990.
[2] Dabral, S.; Maloney, T.J. Basic ESD and I/O Design, John Wiley & Sons, 1998.
[3] Dally, W.J.; Poulton, J.W. Digital Systems Engineering, Cambridge University Press,
1998.
[4] Dally, W.J.; Poulton, J.W. Transmitter equalization for 4-Gbps signaling, IEEE
Micro, 1997, 4856.
[5] Horowitz, M.; Yang, C.-K.K.; Sidiropoulos, S. High-speed electrical signaling:
overview and limitations, IEEE Micro, 1998, 1224.
[6] Johnson, H.W.; Graham, M. High-Speed Digital Design: A Handbook of Black Magic,
Prentice Hall, 1993.
[7] Hall, S.H.; Hall, G.W.; McCall, J.A. High-Speed Digital System Design: A Handbook
of Interconnect Theory and Design Practices, John Wiley & Sons, 2000.
[8] Sidiropoulos, S.; Yang, C.-K.K.; Horowitz, M. High-speed inter-chip signaling.
In Design of High-Performance Microprocessor Circuits, Anantha Chandrakasan,
William J. Bowhill, and Frank Fox, eds., IEEE Press, 2000.
[9] Thierauf, S.C.; Anderson, W.R. I/O and ESD circuit design. In Design of High-
Performance Microprocessor Circuits, Anantha Chandrakasan, William J. Bowhill,
and Frank Fox, eds., IEEE Press, 2000.
[10] Casper, B.K.; Haycock, M.; Mooney, R. An accurate and efcient analysis method
for multi-Gb/s chip-to-chip signaling schemes, Symposium on VLSI Circuits, 2002,
5457.
[11] Erdogan, A.T.; Arslan, T.; Horrocks, D.H. Low-power multiplication schemes for
single multiplier CMOS based FIR digital lter implementations, IEEE Int. Symp.
Circuits Systems, 1997, 19401943.
[12] PCI Express Jitter Modeling (July 14, 2004), http://www.pcisig.com.
[13] Stojanovic, V.; Horowitz, M. Modeling and analysis of high-speed links, Custom
Integrated Circuits Conference, September 2003.
[14] Kossel, M.A.; Schmatz, M.L. Jitter measurements of high-speed serial links, IEEE
Design Test Comput., 2004, 536543.
[15] PCI Express Base Specication, Revision 1.1 (March 28, 2005),
http://www.pcisig.com.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy