Master Thesis: Design and Implementation of A DDR Sdram Controller For System On Chip
Master Thesis: Design and Implementation of A DDR Sdram Controller For System On Chip
Magnus Sjlander
Abstract
The aim of this study was to investigate the different problems associated with the design and implementation of a DDR SDRAM Controller for CMOS technology. This study has lead to a working implementation of a DDR SDRAM Controller that is meant to be used as a reference for future implementations. The report highlight design issues and propose solutions to problems like data resynchronization and how to phase shift the data strobe. The result of this study highlights design issues common to any Double Data Rate interface that can be used when designing and implementing applications in need of the higher performance given by a Double Data Rate interface and is not limited to the design and implementation of a DDR SDRAM Controller.
Acknowledgement
I would like to thank the people at the unit of ASIC Technology & System on Chip at Ericsson AB in Mlndal for all the support I received. Especially I want to thank my supervisor Fredrik Johansson for all the given support. Further I want to thank Martin Johansson and Tomas Sand for the help with the backend work and discussions about various timing problems. I also would like to thank Sten Gunnarsson for all the interesting conversations about everything from analog filters and mixers to rock climbing and hiking that together with our weekly climbing made the time in Gteborg that much more interesting. Finally I would like to thank Per Lindgren acting as my examiner at Lule University of Technology.
Index



B C D E

1 Introduction
The demand for ever higher performance with highly integrated System on Chips that can consist of a number of processing units like DSP and CPU cores has lead to an increased demand in memory bandwidth to of chip memory. This demand has been recognized by the unit of ASIC Technology & System on Chip within Ericsson AB and has in cooperation with EISLAB at Lule University of Technology conducted an investigation in the form of a master thesis in computer engineering to explore the possibility for the DDR SDRAM standard 12 to meet up to Ericsson AB needs of memory bandwidth for their future system platforms to be used in the telecom industry. The goal was to investigate if it is possible to implement a DDR SDRAM Controller as an Intellectual Property core (IP core). For the DDR SDRAM Controller to be general and possible to use as an IP core for System on Chip production it was decided that the DDR SDRAM Controller would be based on an AMBA Advanced Highspeed Bus interface 12 for on chip communication. The reason to investigate the possibilities of the DDR SDRAM standard to be able to meet Ericsson ABs need for increased bandwidth is that DDR SDRAM has become a commodity used for every day desktop computers which have lead to an attractive price performance relation. Further the DDR SDRAM has a significant impact on bandwidth performance compared to the standard SDRAM solutions that Ericsson ABs system platforms have been based on up to date. The reason to base the DDR SDRAM Controller on an AMBA Advanced High-speed Bus interface is because it is an acclaimed industry standard that is widely used as within Ericsson AB.
10
I
2 DOUBLE DATA RATE SOURCE-SYNCHRONOUS INTERFACES......ERROR: REFERENCE SOURCE NOT FOUND 3 DDR SDRAM........................ERROR: REFERENCE SOURCE NOT FOUND 3.1 ARCHITECTURE....................................................ERROR: REFERENCE SOURCE NOT FOUND 3.1.1 A closer look on a read and write operation. Error: Reference source not found 3.1.2 Why is the memory dynamic and needs to be refreshed................Error: Reference source not found 3.1.3 Frequency considerations of SDR and DDR SDRAM.......Error: Reference source not found 3.2 COMMANDS........................................................ERROR: REFERENCE SOURCE NOT FOUND 3.2.1 Activation and Precharge...................Error: Reference source not found 3.2.2 Read...................................................Error: Reference source not found 3.2.3 Write..................................................Error: Reference source not found 3.2.4 Refresh...............................................Error: Reference source not found 3.2.5 Mode Register Set..............................Error: Reference source not found 3.2.6 Extended Mode Register Set..............Error: Reference source not found 3.3 INITIALIZATION.....................................................ERROR: REFERENCE SOURCE NOT FOUND 3.4 ADDRESSING.......................................................ERROR: REFERENCE SOURCE NOT FOUND 4 HIGH SPEED LOW VOLTAGE SWING BUS. .ERROR: REFERENCE SOURCE NOT FOUND 4.1 4.2 4.3 4.4 REFLECTIONS......................................................ERROR: INPUT BUFFER.....................................................ERROR: DIFFERENTIAL SIGNALS...........................................ERROR: TRISTATE SIGNALS................................................ERROR: REFERENCE REFERENCE REFERENCE REFERENCE
SOURCE NOT FOUND SOURCE NOT FOUND SOURCE NOT FOUND SOURCE NOT FOUND
11
12
13
Don't care
14
15
3 DDR SDRAM
As with standard SDRAM the architecture is pipelined and consists of multiple banks allowing concurrent operation and thereby providing high effective bandwidth 12, 12 and 12. All reads and writes are burst oriented and are programmable to burst lengths of 2, 4 or 8 beats (other vendor specific burst lengths might exist). A DDR SDRAM is divided into four banks where each bank consists of a number of rows. A row is then divided into columns which each contains 32 bits of data. The number of rows and columns are dependent on the size of the DDR SDRAM and the internal organization. The organization with four banks makes it possible to issue one command to every bank. It is not possible though to make more than one read or write operation at a single time since the data bus only can handle data for one operation. It is possible to while a read or write burst is in progress in one bank to precharge and activate a row in one of the other banks.
3.1 Architecture
The memory array of the DDR SDRAM is divided into four equally large memory banks. Each memory bank consists of a number of row and column select lines 12. At each crossing of a row and column select line there exist a transistor and a capacitor (M1 and Cs Figure 2) called a 1T memory cell since there is only one transistor for every bit of information stored. Each capacitor is capable of storing one bit of information and the transistor is used for accessing the data stored in the capacitor. Before any read or write operations can be performed a row has to be activated within one of the banks. The activation is done by first decoding the row address into a word line which then turns on all the transistors on that row and the data of those memory cells are read out to the sense amplifiers. When the data have been read out to the sense amplifiers the first read or write operation can be done by connecting the sense amplifiers to the data lines by decoding the column address. Before an access can be done to another row the bank first has to be precharged. 3.1.1 A closer look on a read and write operation
To be able to read out the data from a row the bit line, same as the column line, has to be precharged to half the VDD voltage. When the bit line have been precharged the word line is activated which turns the transistor M1 on and the charge in the capacitor CS is redistributed between CS and CBL. CBL is the parasitic capacitance of the bit line. The redistribution of the charge gives a change of the voltage on the bit line, which can be detected by the sense amplifier 12. The sense amplifier is a voltage differential amplifier with positive feedback that uses an inactive bit line BL* that also have been precharged to half the VDD that is used as a reference voltage which the voltage on the bit line with the memory cell is compared to. If the memory cell contains a zero, that means that the capacitor CS does not contain any charges,
16
then when the transistor M1 is turned on charges from the bit line represented as CBL will be redistributed between the CS capacitor and CBL capacitor. This will lower the voltage on the bit line and the sense amplifier will notice that the bit line voltage is lower than the reference voltage and drives the voltage on the bit line towards ground level because of the positive feed back of the sense amplifier. If the memory cell on the other hand contains a one meaning that the capacitor C S is fully charged then when the transistor M1 is turned on the charges will be redistributed resulting in a net increase of the charges on the bit line and a decrease in the charges of the C s capacitor. This will raise the voltage on the bit line and the sense amplifier will drive the bit line towards VDD. During a write operation the sense amplifier are forced to change its state by either driving the bit line to ground or VDD depending on if the data to be written is a zero or one. The data held by the sense amplifiers are propagated along the bit lines to the capacitors of the memory cells that will be charged or discharged according to the data to be stored. When the row is being precharged the transistor M1 are turned of which stores the charge in the capacitors.
17
Sense Amplifiers
Sense Amplifiers
Row Decoder
Sense Amplifiers
Row Decoder
Central I/O
Row Decoder Row Decoder Row Decoder Row Decoder
Sense Amplifiers
Sense Amplifiers
Sense Amplifiers
Sense Amplifiers
BL*
BL* BL CBL
VDD
M1 SE* SE BL WL
Cs
Figure 2 Internal architecture of a SDRAM with a sense amplifier and memory cell The architecture and type of sense amplifier shown in Figure 2 is only one example of how a DDR SDRAM can be constructed. In an actual DDR SDRAM there exists sense amplifiers interleaved with the memory array between the row select lines, which is not being shown.
Bit Line Word Line Trench
18
Gate L
Trench
Figure 4 Memory cell in top view Figure 3 and Figure 4 shows an example on how a memory cell can be constructed in silicon. The trench works as the capacitor C s and are capable of storing charges. The word line crosses over the gate of the transistor and the bit line is connected through a VIA, a connection between the different layers of the Die, to the source. The drain of the transistor is in direct contact with the trench. 3.1.2 Why is the memory dynamic and needs to be refreshed
The transistor M1 can never be made ideal and will always have a current leakage. The current leakage will eventually corrupt the data stored in the capacitor CS if nothing is being made to prevent loss of data. To prevent loss of data the DDR SDRAM has to be refreshed within a certain period of time which is defined by how fast the voltage level of the memory cells capacitor alters because of the current leakage through the access transistor. The refresh cycle is usual within a couple of microseconds. When the DDR SDRAM is being refreshed the information of the memory cells is read out and at the same time written back thus restoring the voltage level of the memory cells capacitor to its previous value of VDD or ground. 3.1.3 Frequency considerations of SDR and DDR SDRAM
In a Single Data Rate SDRAM the data is read out and then clocked on to the data bus on the next coming rising edge of the clock. This limits the clock period to a minimum of about 7.5 ns for practical performance since the time for the data to be read out is about 7 ns for a standard SDRAM.
Clk Data
7 ns Read Data started available D0 D1
Figure 5 Single Data Rate SDRAM clock period limit The time to read out the data is limited by the time it takes to decode the column address and propagate the values from the sense amplifiers to the output buffers. This time is not easily reduced and thus is a
19
limiting factor on the clock frequency for standard SDR SDRAM. In the case of the DDR SDRAM this limitation is solved by using a Delay Lock Loop, DLL, to delay the clock signal which is used for clocking the data at the output buffers 12, Figure 6. With the use of the DLL it is possible to clock the DDR SDRAM in a higher frequency than what is possible with the regular SDR SDRAM as shown in Figure 7.
Fifo Sense Amplifiers Fifo Fifo Fifo Out DQ
Column Decoder
Pointer
EN
DLL Clk
Figure 7 Comparisons of a SDR and a DDR with DLL For the DDR SDRAM to be able to deliver data at the rapid pace that is needed a method called 2n-prefetch is used 12. The method is quite
20
simple and is done so that for every rising edge of the clock data is being read out from the current address and the following address. In Figure 8 this can be seen by the 64 bit bus from the sense amplifiers that go to the 2n-prefetch block. The data on the 64 bit bus is then muxed out on the output bus with the first 32 bits on the rising edge and the next 32 bits on the negative edge. As can be seen in Figure 9 the two consecutive operations are overlapping which results in data from more than one operation propagates along the internal bus lines making the timing requirements significant thus demanding the need of the DLL which is capable of delaying the clock with only 100 picoseconds of jitter. A write is done in the same way. Data from two beats are sampled by the data input registers and then written back simultaneously. This technique limits the burst length to a minimum of two beats and the burst length can only be in multiples of two.
CK, CK
Bank Select
I/O Control
64 32
Input Buffer
WEi
DMi
Row Decoder
ADDR
Address Register
CK, CK
DQ
Column Buffer
Programming Register
WEi
DLL
CK, CK DMi
DQS
Timing Register
CK, CK CKE RAS CAS WE DM CS
21
Clk Address
A0, A1 A2, A3
Data
7 ns Read started
D0 D1 D2 D3
3.2 Commands
The set of commands for the DDR SDRAM is almost identical to its predecessor SDRAM. The Extended Mode Register Set command is the only new command, which is used for controlling the Delay Lock Loop of the DDR SDRAM. All commands are not stated in this report. For a full set of available commands see JESD 79. 3.2.1 Activation and Precharge
Before any read or write operations can be issued the row in the bank to be accessed has to be activated. Activation is done by placing the command on the command bus along with the bank and the row to be activated. After a specific row in a bank has been activated only read and write operations to that row within that bank can be issued. To make a read or write operation to another row within that same bank the currently activated row have to be closed which is done by issuing a precharge on that bank and then activate the row to be accessed. Since there are four banks it is possible to have four activated rows at one time, one active row in each bank. 3.2.2 Read
A read command to an activated row is done by placing the command on the command bus along with the starting column address of the first location to be read. After a specified setup time (called CAS latency) the first location of data is presented on the data bus along with the DDR SDRAM generated data strobe signal. Unless the read burst is terminated or interrupted by another read command then the specified number of locations in the MRS will be presented on the data bus. The data strobe is edge-aligned with the data on the data bus. The DDR SDRAM will drive the data strobe and bus into High-Z if there is no read burst in progress.
22
Figure 10 Wave chart of a read burst with eight beats 3.2.3 Write
A write command to an activated row is done by placing the command on the command bus along with the starting column address of the first location to be written. Within a certain time from that the command have been issued the data strobe generated by the memory controller have to make a switch from LOW to HIGH marking the first presented data on the data bus along with the masking signal. The data strobe is center-aligned with the data on the data bus making it possible for the DDR SDRAM to sample the data on the edges of the data strobe. The data mask is for masking out which byte(s) of the presented data that is to be stored in the memory. The memory controller will drive the data strobe and bus into High-Z if there is not a write burst in progress.
Figure 11 Wave chart of a write burst with eight beats 3.2.4 Refresh
The DDR SDRAM is a dynamic memory and has to be refreshed with regular intervals to recharge the capacitances that are representing the stored data. The longest interval between two refreshes is defined by the size of the memory but is never shorter than 7.8 us. To improve the efficiency in scheduling and switching between tasks some flexibility in
23
the absolute refresh interval is provided. A maximum of eight refresh commands can be posted to any given DDR SDRAM and the maximum absolute interval between two refresh commands is eight times the defined refresh period for the DDR SDRAM used. Before a refresh command can be applied all rows have to be closed which can be done by applying a Precharge All command before the Refresh command is issued.
NOP
REFRESH
NOP
Don't care
24
3.2.5
The Mode Register Set command is for writing data to the Mode Register in the DDR SDRAM which defines the specific mode of operation of the DDR SDRAM. For a full description of the Mode Register see Figure 13.
BA1 BA0 A13 A12 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0
Operating Mode
CAS Latency
BT Burst Length
Burst Length A2 A1 A0 0 0 0 An-A9 A8 A7 A6-A0 0 0 0 0 1 0 0 0 1 Valid Valid VS Operation Mode Normal Operating Normal Operation/Reset DLL Vendor Specific Test Mode All other states reserved 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 A3 = 0
Reserved
A3 = 1
Reserved
2 4 8
Reserved Reserved Reserved Reserved
2 4 8
Reserved Reserved Reserved Reserved
A3 0 1
A2 A1 A0 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1
CAS Latency Reserved Reserved 2 3 (optional) Reserved 1.5 (optional) 2.5 Reserved
25
3.2.6
The Extended Mode Register Set command is for writing data to the Extended Mode Register in the DDR SDRAM which control functions such as the Delay Lock Loop. For a full description of the Extended Mode Register see Figure 14.
BA1 BA0 A13 A12 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0
Operating Mode
DS
DLL
A0 0 1
A1 0 1
A2 0
3.3 Initialization
The DDR SDRAM has to be powered up and initialized in a predefined manner to assure accurate performance of the memory. The powering up consists of a predefined order to assert the different voltages to the memory (exact way can be found on page 7 of the JESD 79 specification). During the power up sequence CKE must be set to LOW to guarantee that the DQ and DQS outputs will be in High-Z state. After all power supply and reference voltages are stable as well as the clock the DDR SDRAM requires a 200 s delay prior to applying any executable commands. Once the 200 s delay has been satisfied the initialization of the DDR SDRAM can start. The initialization starts with applying a NOP or DSELECT command and raising CKE. Following the NOP is a PRECHARGE ALL command and next an EXTENDED MODE REGISTER SET (EMRS) command to enable the internal DLL. Then a MODE REGISTER SET (MRS) command is applied to reset the DLL and to program the operating parameters of the DDR SDRAM. From that the DLL is reset it has to pass 200 clock cycles before any READ commands are applied (there exists vendor specific requirements there a bank can not be activated within these 200 clock cycles). When the DLL have been reset a PRECHARGE ALL command should be applied to place the DDR SDRAM in the "all banks idle state". Two AUTO refresh cycles have to 26
be performed followed by a MRS where the bit for DLL reset is deactivated. Following this the DDR SDRAM will be ready for normal operation.
INITIALIZE AND MODE REGISTER SET VDD VDDQ VTT VREF
(( (( ((
MRS ACT
(( ((
NOP
PRE
EMRS
(( ((
MRS
(( ((
PRE
(( (( (( (( (( ((
AR
(( (( (( (( (( ((
(( (( (( ((
CODE
(( ((
CODE
(( ((
(( (( (( ((
(( (( (( ((
BA0=H BA1=L
(( ((
BA0=L BA1=L
((
T = 200 us Extended Mode Register Set Load Mode Register Reset DLL (With A8 = H)
3.4 Addressing
The addressing of the DDR SDRAM can be either sequential or interleaved. The sequential addressing wraps within the default burst length. So if the default burst length is four the two least significant bits of the address will wrap. So when the two least significant bits are 11 the address wraps and the two least significant bits become 00 without the rest of the address changing.
27
((
((
((
((
((
((
((
((
High-Z((
((
((
((
((
((
((
((
High-Z((
((
((
((
((
((
((
((
((
((
((
((
((
((
((
((
((
((
((
A10 =H
A10 =H
((
CODE RA
((
((
((
((
((
((
((
((
((
((
((
((
((
((
((
((
((
((
AR
((
((
((
((
((
((
((
((
((
((
((
((
(( (( (( (( ((
((
((
((
((
((
(( ((
(( ((
((
BA0=L BA1=L
((
((
((
((
CLK
((
((
((
((
((
BA
Don't care
4.1 Reflections
For all signal transmissions where the wave length of the transmitted signal is shorter or close to the length of the wire, which the signal is transmitted on, reflections to some extent will occur. Any reflection will contribute to the distortion of the signal and if the reflection is not minimized it can come to corrupt the whole signal. To minimize the reflection the designer always strives to make the signal path to be matched. A matched signal path means that at all cross sections of the signal path the resistance in both directions is equal, Figure 16.
50 50
28
A common method to match a wire to the resistance of the output and input buffer is to use stubs which is a wire with the length equal to a quarter of the wavelength of the signal to be transmitted. The stub has to be placed a certain distance from the input and output buffer. The exact workings of a stub and how to design it is beyond the scope of this report 12. In the ideal case a stub will cancel out all reflections for the particular frequency it is designed for. In any practical case there will always be reflections to some extent. To make the situation even worse the square wave used for digital transmission is built up of a numerous number of sinus waves where only one frequency can be matched with the use of stubs. The ground sinus wave is the most dominant and therefore matched with the use of stubs. To reduce the distortion on the signal by the reflections the signal path with the stubs can be isolated by using a resistor in series with the wire, RS in Figure 17. The resistor will dampen the signal and thus making the reflections dissipate faster.
Output Buffer
V DDQ V TT = 0.5 * V DDQ
RT 2 = 50 RS = 25 50
V SS
V TT = 0.5 * V DDQ
V SS
V DDQ
V REF - 0.31 V
V SS
V SWING
V REF
TLOW TPERIOD
V SS THIGH
Figure 19 Illustration of the effect of different slew rate for positive and negative flank
30
Output Buffer
V TT = 0.5 * V DDQ
RT 2 = 50 RS = 25 V OUT V OUT 50 50
Input Buffer
RT 1 = 50
V TT = 0.5 * V DDQ
V SS
Figure 20 Example of a SSTL_2, Class II, Differential signals To come to terms with this problem it is possible to use differential signals where together with the transmission of the original signal is also a negated version of the signal, Figure 20. The negated signal is then used as the reference voltage for the differential amplifier giving an equally long high as low period, Figure 21.
V DDQ
V SWING
V REF
TLOW TPERIOD
V SS THIGH
Figure 21 Differential signal wave chart The resistor values in Figure 17 and Figure 20 are only an example. Actual values are a system design decision.
31
32
II
5 DDR SDRAM CONTROLLER....ERROR: REFERENCE SOURCE NOT FOUND 5.1 CORE MEMORY CONTROLLER....................................ERROR: REFERENCE SOURCE NOT FOUND 5.1.1 Current and Next Address..................Error: Reference source not found 5.1.2 Open Banks........................................Error: Reference source not found 5.1.3 Read Write Command........................Error: Reference source not found 5.1.4 Command Timing...............................Error: Reference source not found 5.1.5 Address Handling...............................Error: Reference source not found 5.1.6 Crossing a Row Boundary..................Error: Reference source not found 5.2 AHB INTERFACE..................................................ERROR: REFERENCE SOURCE NOT FOUND 5.2.1 AHB Core............................................Error: Reference source not found 5.2.2 Data Buffer.........................................Error: Reference source not found 5.2.3 x2.......................................................Error: Reference source not found 5.3 APB...............................................................ERROR: REFERENCE SOURCE NOT FOUND 5.4 ARBITER...........................................................ERROR: REFERENCE SOURCE NOT FOUND 6 SOURCE RESYNCHRONIZATION........ERROR: REFERENCE SOURCE NOT FOUND 6.1 PHASE SHIFT THE DATA STROBE...............................ERROR: REFERENCE SOURCE NOT FOUND 6.1.1 Delay Lock Loop.................................Error: Reference source not found 6.1.2 Inverter Delay....................................Error: Reference source not found 6.1.3 PCB Line Delay...................................Error: Reference source not found 6.1.4 Programmable Delay Line with Temperature sensing....Error: Reference source not found 6.2 SYNCHRONIES THE DATA.........................................ERROR: REFERENCE SOURCE NOT FOUND 6.2.1 Simplified Phase Detector..................Error: Reference source not found 6.2.2 Alternative method............................Error: Reference source not found 7 FUNCTIONAL SIMULATION....ERROR: REFERENCE SOURCE NOT FOUND 7.1 7.2 8 8.1 8.2 8.3 8.4 PERFORMANCE.....................................................ERROR: REFERENCE DDR SDRAM SIMULATION MODELS.........................ERROR: REFERENCE SYNTHESIS.........................................................ERROR: FLOORPLAN........................................................ERROR: PLACEMENT........................................................ERROR: TIMING PROBLEMS................................................ERROR: REFERENCE REFERENCE REFERENCE REFERENCE
SOURCE NOT FOUND SOURCE NOT FOUND
33
34
APB
Figure 23 Schematic view of the memory controller with one AHB Interface When an AHB interface has been granted access to the data bus and the core memory controller the AHB interface can tell the core memory controller to do either a read or write operation. The AHB interface also tells how long the burst is going to be. The core memory controller then handles the activation of rows and if necessary splits the burst into more than one command and issues them to the DDR SDRAM. The core memory controller then signals to the AHB interface when data has to be sampled from or presented on the data bus depending on if it is a read or write operation that have been requested. The core memory controller handles all the timings that are involved when a command can or has to be issued to the DDR SDRAM.
35
Address
Next Address
Command
Address
Figure 24 Schematic view of the Core memory Controller 5.1.1 Current and Next Address
The Address decoder divides the 32 bit address into a Column Address, Row Address, Bank Address and which Chip the burst is meant for. Since the width of the Column and Row address changes with the size and internal organization of the DDR SDRAM the Address decoder has to know what DDR SDRAM that is in use. This information is set up by the APB interface before the initialization of the DDR SDRAM is started. The Address decoder for the current address also notifies if the next burst will end at the boundary to the next row. When the Address decoder is asked to increment the address it increments it in a predefined manner which makes it possible to do incremental burst even though the address of the DDR SDRAM is of wrapping nature. How this is achieved is described in Chapter 5.1.5.
36
5.1.2
Open Banks
A bank that has a row that is activated is called to be an open bank, therefore the name of this module. Since the activation of a command is time consuming it is vital to keep track of if a bank is open and in that case which row that are activated so that if a consecutive command is to the same bank and row it does not have to be activated. If the memory controller has more than one AHB interface and an arbitrator that tells which AHB interface that will be granted access to the core memory controller after the current one then the Open Banks module are also able to look at that AHB interface Address, (Next Address), and in advance activate the row. The Open Bank module is also responsible to make sure that the timing limits between PRECHARGE to ACTIVATE and ACTIVATE to ACTIVATE as well as ACTIVATE to READ or WRITE commands are fulfilled. To accomplish this, the Open Banks module actually consists of an Open Bank module for each DDR SDRAM chip. The Open Bank modules are within a top module that by looking at the addresses decides which chip that is currently being accessed and which will be accessed next. From the Open Bank modules that are handling the chips that are accessed their commands are propagated to the Command Timing module. Commands from the current Open Bank has of course higher priority than commands issued by the next Open Bank. The next address does not have to be an access to another chip than the current one for the row to be activated in advance. It is enough that the two addresses are not to the same bank in the same chip. To keep track of all the timings that are involved each module have a precharge timer and activate timer for each bank which is set to zero whenever a PRECHARGE or ACTIVATE command is issued to that bank. A command can not be made to that bank unless the timers are above certain values. There is also a general timer for PRECHARGE commands to handle the minimum time necessary for precharges between different banks. The open bank module that is handling the current command is also responsible for notifying when the row is activated and are ready for READ and WRITE commands. 5.1.3 Read Write Command
The Read Write Command module, (RW module) is responsible for taking the command from the AHB interface and divides the burst into as many necessary commands as needed to complete the burst. The RW module waits until it gets notified by the Open Blocks module that the row is activated before it issues the first command to the Command Timing module. Read more about how the burst is divided up into more than one command in Chapter 5.1.5. Since a burst is to consecutive locations of the DDR SDRAM the RW module only checks that the row is active the first time it issues a command of a new burst or if the Address decoder notifies that a row
37
boundary have been reached. If a row boundary has been reached it means that the bank will be precharged and a new row will be activated before the next READ or WRITE command can be issued. 5.1.4 Command Timing
The actually issuing of commands to the DDR SDRAM chips is done by the Command Timing module. The timing module keeps track of when a command can be issued to the DDR SDRAM so for example if the default burst length of the DDR SDRAM is eight beats then it is the timing module that controls that a new READ or WRITE command is issued every fourth clock cycle as long as the RW module issues new commands (for each clock cycle two beats of data is transferred one on positive edge and one on negative edge). The different commands are prioritized such that a REFRESH command is issued before a READ or WRITE command which in turn is issued before a PRECHARGE or ACTIVATE command. If there is not a pending command with a higher priority or the command with higher priority is not due to be issued a command with lower priority can be issued. For example if there is a pending READ command and the default burst length is eight beats and the previous command were a READ then there are two clock cycles before its time to issue the READ command so a pending PRECHARGE or ACTIVATE command can then be issued. The timing module also indicates to the AHB interface when it is time to present data on the DDR SDRAM data bus on a WRITE burst. The timing module also indicates how many beats the AHB interface is to sample during a READ burst by raising the SampleData signal for one clock cycle for every beat to be read. The indication for when to present data is needed since the AHB interface does not have any other way to know when a WRITE command have been issued to the DDR SDRAM and therefore no way to know when its time to present the data on the data bus. During a READ the AHB interface knows when to sample the data by observing the activity on the data strobe signal and only have to know how many beats it should sample. The Command Timing module also enables the data strobe during writes. 5.1.5 Address Handling
The address of the DDR SDRAM wraps within the default burst length, as described in Chapter 3.4, while the burst issued by the AHB interface is sequential and incremental. The first observation to be done is that a wrapping address starting at location zero never wraps so if the address always would start at zero there would not be any problems with wrapping addresses. Always starting at location zero would introduce long latencies though since if the burst actually start at location six the first six locations would have to be masked out for the burst. This is not acceptable so in order to be able to start at a location not being zero and still get an incremental burst the fact that a burst can be interrupted by another burst has to be used. For the example where the burst start at location six a burst starting at location six can be issued lasting for two beats and then issue a new burst from location zero of
38
the following address block. By cleverly addressing the memory with interrupting bursts it is possible to achieve incremental bursts. Since it is not possible to make a burst that is shorter than two beats or beats that are of an odd number of beats to the DDR SDRAM the first thing is to make the address aligned to two beat bursts by checking the least significant bit. If the least significant bit is a one a burst of length one is issued to the command timing module that issues a burst to the DDR SDRAM with the least significant bit of the column address set to zero. The first beat of the burst is then masked out either by the AHB interface during READ, by ignoring to sample the first beat or by the command timing module by masking out the data with the data mask. Once the least significant bit is zero it is possible to issue bursts that are of even length. The second thing that is checked is the second least significant bit. If this bit is a one a burst of length two is issued without and changes of the column address. After that burst the third least significant bit is checked and if it is a one a burst of length four is issued. When all these steps have been taken the three least significant bits will all be zero and a burst length of eight can be issued without the address wrapping. If the default burst length is shorter than eight all steps are not necessary to take before burst of the default length can be issued without the address wrapping. The Current Address module is aware of the steps and increments the address according to it as well as the RW module that issues the burst lengths to be made to the Command Timing module. The one thing the Command Timing module does is to look at the least significant bit and always sets it to zero and if it is a one it masks out the first beat during WRITE. For READ the masking is handled by the AHB interface. The algorithm for calculating the size of the next burst can be found in Appendix E.
0
D4 D5 D6 D7
Beat masked out Address w ill not w rap since it ends w ith tw o zeros
Don't care
Figure 25 Eight beat burst starting at address 0001 with default burst length of four
39
5.1.6
When the current burst will end at the boundary between two rows, that is the column address will consist of only ones, the Current Address module indicates this by raising the Boundary signal. The RW module uses this information so that it knows that the row will be closed and the consecutive row will be opened. While waiting for the row to be opened the RW module places itself in closed state and waits for the Row Open signal to go HIGH. When reaching a row boundary it can be necessary to make out unwanted data. Take for example if the row boundary would have been reached when address 0010 is issued in Figure 25, then the following WRITE command would not have been issued since a row boundary has been reached and the current row has to be closed. The consecutive row has to be activated before the next WRITE command can be issued thus making the address to wrap and two beats of unwanted data would have been presented by the AHB interface corrupting the whole burst. During a WRITE the masking of the two unwanted beats are done by lowering the PresentData signal generated by the timing module. For a READ the SampleData signal have only been raised for as many beats as it should sample so when it reaches the state that the address have wrapped it already has read all the data and it will ignore the following two beats. Instead of masking out the unwanted data it would be possible to issue a precharge command to terminate the read burst making the need for keeping track of how many beats to read by the AHB interface obsolete. However all memory vendors does not allow a read burst to be terminated by a precharge even though the JEDEC specification states that a read burst can be terminated by a precharge.
40
Data Strobe Data Mask Write Data Read Data DQ Even even Read Data DQ Odd odd
The AHB Core is the only module that has direct access to the AHB bus and is responsible for moving data between the AHB bus and the data buffer as well as handle read and write requests from the AHB bus. It also tells the Core Memory Controller the location and length of the burst. The AHB Core is capable of handling all types of bursts that are handled by the AHB bus like incrementing bursts where the data is written to consecutively higher addresses as well as bursts where the address wraps. The wrapping bursts are handled such that the data being stored in the Data Buffer is stored incremental. For example assume that there is a request for an eight beat write wrapping burst and the three least significant bits of the address is set to three. The first data is then stored to the Data Buffer starting at location three of the Data Buffer and the next four consecutive data is stored on address four to seven. The address is then wrapping so the rest of the data is stored on locations zero to six in the Data Buffer. This makes the
41
wrapping burst into an incremental burst to the DDR SDRAM since the data now is stored incremental in the Data Buffer starting from location zero. Similar is done during a read wrapping burst where the lowest address of the wrapping read is calculated, by setting the right amount of least significant bits to zero. The data is then read from that location and stored in the Data Buffer. The data is then read from the Data Buffer in the wrapping order. By handling wrapping writes in this manner the AHB interface is not capable of handling a loss of the bus during the write since then all data is not available for making an incremental write to the DDR SDRAM. Read increment, that does not have a predefined length, is handled by filling the buffer with data from the DDR SDRAM and then present it on the AHB bus as long as the burst continues. If the burst is longer than the data stored in the Data Buffer then HReady is lowered, to suspend the burst, and the Data Buffer is filled once more with data and as soon as the buffer starts to fill up HReady is raised, which resumes the burst, and data is presented on the AHB bus. This continues until the burst is terminated by the AHB master. For a write increment with no predefined length the scenario is similar. The data buffer is filled until the AHB master terminates the burst or the buffer gets full upon HReady is lowered and the data stored in the buffer is written to the DDR SDRAM. Once the buffer is empty HReady is raised and the buffer starts to fill up again. This continues until the AHB master terminates the burst and the last data stored in the buffer have been written to the DDR SDRAM. 5.2.2 Data Buffer
The Data Buffer is a generated quad port on chip memory with an independent clock for each port that can store 16 words of 32 bits. The Data Buffer is used for buffering data and to move data between the AHB clock domain and the clock domain used for communicating with the DDR SDRAM(s). The depth of the Data Buffer has been chosen to the longest predefined burst of the AHB bus, which is 16 words. 5.2.3 x2
The x2 module is the most timing critical module of them all since it is responsible for presenting or sampling data in the rate handled by the DDR SDRAM. To achieve this, the x2 module run at the same clock frequency as the data is switched with on the data bus. During a write, data is read out from the Data Buffer and presented on the data bus as long as the Present Data signal from the Command Timing module is high. Along with the data presented on the data bus is also the corresponding Data Mask signal. The Data Mask signal is to indicate for the DDR SDRAM which bytes of the word to be written. In order to write a byte to the DDR SDRAM the data is presented on the data bus along with some other random data, since a byte is only 8 bits and the data bus is 32 bits wide. The data mask is used to tell which one of the bytes that is to be written. It is only possible to mask out data in groups of one byte and therefore restrains the data to be byte aligned. To simplify 42
the handling of data for the memory controller the access has been restricted to also be halfword and word aligned depending on the requested size on the AHB bus. Since most modern CPU's also follows this restriction in practice it does not have any effect on efficiency or flexibility. The AHB address has the byte as the smallest addressable block while the DDR SDRAM address has the word as the smallest addressable block. This is used for creating the Data Mask by looking at the two least significant bits of the AHB address this tells which byte within the first word to start reading from. If the AHB address would end with 01 it means that the first byte of the word has to be masked else the data at that location will be corrupted. Since all burst are converted into incremental bursts to consecutive addresses to the DDR SDRAM the rest of the data mask is generated by not masking out any of the data until the end of the burst. At the end of the burst the two least significant bits of the sum from the burst length and the starting address is used. This tells which byte is the last byte to be written to the DDR SDRAM and the rest will be masked out by the Mask Data signal. For reads it is much simpler since the data does not get corrupted when it is read from the DDR SDRAM so the first word containing the first byte to be read is read into the Data Buffer and then the AHB core looks at the AHB address, using the two least significant bits, in much the same way as describe above, to present the data on the AHB bus. When a write is made the data is aligned with the data strobe which is also generated by the memory controller so the timing is relatively easy to achieve. For a read on the other hand the data is aligned with the data strobe generated by the DDR SDRAM. The data strobe therefore needs to be delayed so that the data strobe becomes center aligned with the data. Different methods for this can be read in Chapter 6.1. The data strobe is then used to sample the data with the use of two FlipFlops. One of the Flip-Flops sample data presented on the positive edge of the data strobe while the other Flip-Flop sample data presented on the negative edge. This makes the data available for twice the duration than it is on the original data bus creating more time to synchronize the data to the internal clock, Figure 27.
43
Clk Command Address Data Strobe Data Data Even Data Odd Clk x2
0 1
NOP
READ Col n
NOP
1 0
3 2
5 4
7 6
1
2 3
3
4 5
5
6 7
Don't care
Figure 27 Wave chart of a read using odd and even data The data strobe is not synchronized with the clock of the receiving IC and therefore a method for telling when the data on the Data Even and Data Odd bus is stable and can be sampled by the internal Clk x2 clock is needed. Such a method is being discussed in Chapter 6.2.
44
5.3 APB
The APB interface is used to initialize the memory controller so that it knows the size and organization of the DDR SDRAM that it is to work with. The initialization is done by writing data to two registers in the APB interface. The first register is for setting the size and organization of the memory as well as which refresh period that is to be used and is called the Variable Register. The second register is for setting the data to be used during Extended Mode Register Set and Mode Register Set and is called the MRS register. For the memory controller to work properly the Variable Register has to be set before any data is written to the MRS register. As soon as any data has been written to the MRS register the APB interface tells the Core Memory Controller to start the initialization of the DDR SDRAM chip(s). Once the initialization of the DDR SDRAM chip(s) have started it is not possible to change the values of the two registers without resetting the Memory Controller first.
D27 D26 D25 D24 D23 D22 D21 D20 D19 D18 D17 D16 D15 D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 D0
Mode Register
Figure 28 MRS Register see Chapter 3.2.5 and 3.2.6 for closer details
45
D5
D4
D3
D2
D1
D0
BW
Memory Size
D2 D1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1
D0 0 1 0 1 0 1 0 1
D4 D3 0 0 1 1 0 1 0 1
D5 0 1
5.4 Arbiter
With more than one AHB Interface the utilization of the memory is increased making it possible to transfer more data to and from the DDR SDRAM(s) than if only one AHB Interface would be used. The arbiter can be implemented with any kind of arbitration protocol, as for example round robin or prioritized. The arbiter is also the module that limits how many AHB Interfaces the memory controller can service. The arbiter that has been implemented uses round robin and supports two AHB Interfaces. This makes the Arbiter simple since there is no need to implement any functionality to decide which AHB Interface that will be granted access when the current AHB Interface has finished its burst since there is only one more AHB Interface and it will be granted access if it has a pending burst.
46
Command AHB Buss 0 AHB Buss 1 Address Write Data Read Data AHB I Data Strobe Data Mask Write Data Command Address
Arbiter
Command Address Data Mask Write Data Data Strobe Read Data Read Data
DQeven DQodd
Figure 30 Schematic view of a DDR SDRAM Controller with two AHB interfaces
47
6 Source Resynchronization
The source-synchronous interface has several advantages compared to a standard synchronous interface as told in Chapter 2. One new problem though with the source-synchronous interface is that the data presented on the data bus is in synchronization with the data strobe and not with the internal clock of the receiving IC. This makes it necessary to resynchronize the incoming data to the internal clock. To make it even more difficult the data from a DDR SDRAM is edged aligned with the data strobe making it impossible to directly use the data strobe to sample the data without altering the data strobe first.
The most common method to phase shift the data strobe is with the use of a DLL. The DLL is constructed of a phase detector and a digital delay line.
Phase Detector and Control Logic
Data Strobe
Figure 31 Schematic view of a Delay Lock Loop The advantage with using a DLL is that regardless of variations of the data strobe the phase detector will always make sure that the delay is 90 degrees. The design difficulty when constructing a DLL is to create a Digital Delay Line that does not cause jitter problems and that have a fast initial locking. The time for the DDL to lock is the time it takes from the first transition of the data strobe until the Digital Delay Line is set up correctly so that the output signal is delayed with 90 degrees. When constructing a DDR SDRAM controller with the use of a DLL the designer has to have in mind that initial read bursts is needed to give the DLL time to make its initial lock to the data strobe. During these read bursts the data should not be attempted to be sampled since the DLL has not had time to delay the data strobe which can result in meta-
48
stability in the sampling Flip-Flops. The data strobe used during write bursts also has to be masked from the DLL. 6.1.2 Inverter Delay
An inverter delay chain is a number of inverters connected in series and for each inverter the signal passes, the signal is delayed by the time it takes to propagate through the inverter. To create a 90 degrees phase shifted signal it is enough to pass the signal through the right amount of inverters. The problem with this method is that the delay through an inverter is dependent on the process voltage temperature so the total delay can vary significantly. To achieve a more stable phase shift, inverters that are voltage temperature independent should be used. 6.1.3 PCB Line Delay
The PCB Line Delay is a simple and robust way to phase shift the data strobe. To create the phase shift a delay line on the PCB is inserted in the data strobes path. The delay line is of a fixed value and does not track variations of the data strobe. The PCB has to be redesigned if a data strobe with another frequency would be used. This is most likely the case even though a PCB Line Delay would not have been used since the stubs of the data bus are constructed to reduce the reflections for a specific frequency. Making it necessary to redesign the PCB regardless of the PCB Line Delay being used or not. 6.1.4 Programmable Delay Line with Temperature Sensing
The major problem with the inverter delay chain is that the delay of the inverters can change with the process voltage temperature and process variations in the construction of the IC can also give variations in the inverter delay. This problem does not exist when using a DLL but on the other hand a DLL is large and expensive. A possible alternative could be to use a programmable delay line that is controlled by a temperature sensitive logic, Figure 32.
49
Data Strobe
Temperature Sensor
Figure 32 Programmable Delay Line with Temperature sensing The advantage with this method is that most of it components are regular digital components that can be dealt with by the regular ASIC flow used for the creation of the entire chip. The only sophisticated component is the temperature sensor that is capable of giving a digital representation of the temperature. For the case in Figure 32 the temperature sensor has a representation of four bits giving 16 temperature levels which should be enough to create a stable delay of the data strobe. The temperature level is looked up in the Look Up Table, (LUT) and the stored information is used for setting up the muxes in the delay line. Depending on how much the data strobe has to be delayed the signal is propagated through a different number of muxes. The use of muxes is only one example on how the delay line could be constructed. The LUT is programmable and by running the IC core at different temperature levels it is possible to test how much the data strobe has to be delayed to work properly and can be programmed into the LUT. One problem with this method is to create reliable tests for the different temperature levels and it might not be enough to make the test with one IC and program the LUT for all other ICs according to that result. It might be necessary to make the temperature test for every IC.
50
Data Even
Data Odd
Do not care
Figure 34 Wave chart of the sampling Flip-Flops Using a clock frequency equal to the frequency which the data changes that is twice the frequency of the data strobe there will always be at least one rising edge of the clock where the data of Data Even and Data Odd will be stable regardless of the phase of the data strobe. To decide when the data is stable a reference clock is used with equal frequency as the data strobe. The relationship between the faster clock and the reference clock is that the rising edge of the faster clock is 90 degrees after the rising edge the slower clock. The phase of the data strobe is compared against the reference clock. If the rising edge of the data strobe is when the reference clock is low the data will be stable when the reference clock is high and if the rising edge of the data strobe is when the reference clock is low the data will be stable when the reference clock is low, Figure 35. So if the relative phase between the reference clock and the data strobe is know it is also known when the data will be stable and can be sampled.
51
Figure 35 Wave chart illustrating when data is stable 6.2.1 Simplified Phase Detector
D Q
High Clk I
&
S R
Phase
High Clk II
Figure 36 Phase detector Figure 36 is a simplified phase detector and is constructed from a regular phase detector used for example in Phase Lock Loops (PLLs). The input port of the Flip-Flops are held at a constant high level and will drive the Out-port high as soon as a rising edge is detected at the clock
52
input. When both Flip-Flops have had a rising edge on the clock input both Out-ports will be at a high level driving the AND-gate to high resetting the Flip-Flops driving the Out-ports to a low level. A regular phase detector uses the difference in high level on the Out-ports to decide the phase difference. The larger difference in the outputs from the Flip-Flops the large difference in phase. In this simplified version the Out-ports are connected to a SR Flip-Flop. A SR Flip-Flop works such that the Out-port is set to high when the Set-port is high and is reset to low when the Reset-port is high. When both the Set-port and Reset-port is low the Out-port is unchanged. The Out-port is undefined if both the Set and Reset-port is high. When Clk I has its rising edge before Clk II it will get periods of high levels on it Out-port while the Clk II Out-port will only have a sharp spike long enough to drive the and-gate high resetting the Flip-Flops. If the wires are long enough or decoupled by capacitors, inserted before the SR Flip-Flop, the spike will be seen as a low level thus not being registered by the SR Flip-Flop. This will set the SR FlipFlop Out-port to high, Figure 37. In the other case when the Clk II rising edge appears before Clk I the Out-port of Clk II will have periods of high levels while the Clk I Out-port will get the spikes and this will reset the SR Flip-Flop to a low level.
Figure 37 Wave chart describing the Phase Detector If the rising edge of both clocks are simultaneous or almost simultaneous both Out-ports will have short spikes which both will be seen at the SR Flip-Flop as low levels not changing the Out-port of the SR Flip-Flop. This however is not a problem because if the data strobe is in phase with the reference clock the data will be stable for both sampling occurrences, Figure 38.
53
Figure 38 Wave chart of Data Strobe and Reference Clk in phase Assuming that the data strobe is connected to Clk I on the phase detector the sampling of data will start after the first rising edge of the data strobe have been detected and at the first occurrence when the phase is of the opposite level of the reference clock, Figure 39. In the DDR SDRAM Controller the clock of the Core Memory Controller is used as the reference clock. The resynchronized data can easily be moved to another clock domain by the use of an asynchronous FIFO or a memory with independent clocks for the data ports.
First rising edge
Data Strobe Reference Clk Phase Data Data Even Data Odd Clk x2 Resynchronized Data
0 1 2 3 4 5 6 7 0 1 0 1 2
Opposite levels
3 2
5 4
7 6
Sampling Started
It can seem strange to use the Clk x2 clock to resynchronize the data just to use an asynchronous FIFO or a memory to move the data to 54
another clock domain. This is done because the data strobe is not oscillating continuously which makes it impossible to be used as a clock input to a FIFO or memory. However if the data strobe is phase shifted using a DLL an internal version of the data strobe can be kept oscillating making it possible to use it directly to drive the clock input of a FIFO or memory. The data strobe is not switching fast enough since the data changes on both rising and falling edge making it necessary to use two FIFOs or memories, one storing the data for the rising edge and one for the falling edge. It would also be possible to use a PLL possibly together with a DLL to create a clock with twice the clock frequency and in phase with the data strobe making it possible to use only one FIFO or memory.
55
56
7 Functional Simulation
To verify the functionality of the DDR SDRAM Controller a test environment has been implemented. The test environment consists of one simulation test bench for the core memory controller where the test bench acts as an AHB Interface. The core memory controller is totally separated from the data bus so with this test bench no actual data is written or read from the simulation module of the DDR SDRAM that is used. The purpose of this test bench is to verify that the control signals generated by the core memory controller is correct and have been used mainly in an early stage of the implementation of the Memory Controller. For a more complete verification of the Memory Controllers functionality a test bench that generates random traffic on the AHB interface has been implemented. The generated traffic is based on a 32 bit word being generated by a pseudo random generator. The generated word is used as the starting address of the burst and parts of the bits is also used as the representation of the control signals for the AHB interface. By looking at parts of the random word the test bench sets up a burst on the AHB interface and acts as a fully compliant AHB master. During a write the data that the test bench provides to be written is the same as the address. This makes the test bench simple since it does not have to keep track of what data have been written to the memory since during a read all it has to do is to compare if the data that the memory controller presents on the data bus is the same as the address. This however limits the size of the burst to be 32 bits else the data written to the memory and the address will not be equal. To check that the memory controller also works in these cases the test bench has been used to create a set of different burst of different sizes and then manually checking that the memory controller works correctly. Since this had to been done manually sizes of other than 32 bits have not been tested excessively. The reason for not verifying the functionality more thorough for other sizes than 32 bits is that it is enough to verify that the memory controller works for one size to show that the design is sound and functional. This implementation is meant to be a prototype and used as a reference for future implementations and is meant to highlight issues and give solutions and proposals on how to design and implement an effective DDR SDRAM controller.
7.1 Performance
The performance observed during simulation is that there is a latency of two clock periods between two consecutive write burst accessing already activated rows. The reason for this delay is that the arbiter for the two AHB interfaces is simple and is not capable of keeping track of which AHB interface have access to the Core Memory Controller and the Data bus when bursts from two different AHB interfaces start to overlap combined with the information the arbiter has to make its decisions. The latency between a write to read burst and read to read burst is two clock cycle plus the CAS latency of the read burst. To be able to reduce
57
the impact of the CAS latency the Core Memory Controller would have to start the read burst earlier which could be done by looking at the command to follow and start to execute it in advance. The other two clock cycles of delay is because the simulation modules for the DDR SDRAM used for simulation did not support write bursts being interrupted by read burst. The JESD 79 specification states that this is supported so this might be a vendor specific latency. An observation on how much the utilization of the bandwidth on the data bus can be increased by using more than one interface was also made. By adding a second AHB interface with the same load as the first AHB interface the time to complete all bursts only increased with 25 percent. That means that the data bus bandwidth utilization increased with as much as 60 percent by using two AHB interfaces. Even though this was just a random example being observed it shows that there is a significant performance gain achieved by using more than one AHB interface. The same performance gain can not be expected to be achieved by adding a third AHB interface but it would possible make a difference. A Memory Controller with more than four AHB interfaces would probably not perform any better than a Memory Controller with fewer AHB interfaces since the AHB interfaces would spend most of their time waiting for the Core Memory Controller or data bus to be free. These observations were made with an AHB bandwidth of half the maximum bandwidth of the data bus to the DDR SDRAM and with full load of the AHB bus. For a system with equal bandwidth on the AHB bus as the data bus to the DDR SDRAM would not gain any by using two AHB interfaces since one interface is enough to sustain the Memory Controller with data.
58
8 Backend
The technology used for the backend is LSIs Gflx technology which is a cell based CMOS ASIC process technology for the 0.13-micron generation offering copper and low-K interconnects. To better meet the timing requirements the performance version has been used (Gflx-p). For the RTL netlist generation Cadences Ambit BuildGates have been used and for floorplanning and placement Avant!s ApolloII together with LSIs tools for place and route.
8.1 Synthesis
The minimum target for the implementation was to meet the timing requirements for DDR 200. Early in the development it stood clear that the design would be able to meet the timing requirements for higher specifications and a decision to try to meet the timing requirements for DDR 400 was taken. The higher target frequency led to that the design had to be improved and long combinatorial paths had to be split into shorter stages without trying to introduce any latency. The biggest challenge lied within the memory core where the most critical path have been when a new address is introduced that has to be compared to see if the current row is already open and if so issue a command to the Command timing module and increment the address for the next burst. To improve the timing the Read Write module only checks that the row is activated when the address is introduced and that it get notified by the Current Address module that the row boundary will be crossed in advance which tells the Read Write module when the row will get closed and that it need to back off. With an improved implementation it was possible to meet the timing requirements for DDR 400. The quad port on chip memories with independent clocks used as Data Buffers was created by a LSI memory compiler for the Gflx technology.
8.2 Floorplan
For the floorplan it has been assumed that the memory controller will be placed at the edge of a larger chip giving fast access to the pads of the chip and short busses to the DDR SDRAM chips. This assumption makes it convenient to gather all pins that are used for accessing the DDR SDRAM(s) on one side. Further it has been assumed that the pins of the Memory Controller are not restricted to a particular side of the core. The synthesized implementation of the Memory Controller contains two AHB interfaces and the corresponding pins for each interface have been placed on either side of the core. Pins for clocks and APB interface have been placed on the opposite side of the pins for access to the DDR SDRAM(s), Figure 40.
59
Figure 40 Pin placement The memories take up a large part of the entire core area and needs to be placed manually since they are hard macros. The memories have one read port and one write port on each side. This made it ideal to place each memory along a side with the ports for the AHB read, write and address ports to face out from the chip and the read, write and address ports for the internal data path for the DDR SDRAM to face inwards. To guide the placement of the modules a region has been constructed around each memory wherein each AHB interface is contained. The size of the core and placement of the memories and regions can be seen in Figure 41. The figure does not show the two power rings with VDD and ground that exists around the core and together uses 12 m of space on each side.
60
50 m
155 m
50 m
185 m
50 m
155 m
50 m 35 m 630 m 700 m 20 m 15 m
Figure 41 Floorplan
61
8.3 Placement
The place and route was done by LSI own tools for the Gflx technology and with the described floorplan there was no problems with either congestion or timing. The only problems during place and route was that Apollo core dumped when inserting clock trees making it impossible to create proper clock trees with little skew. Instead a tree used for reset signals had to be created, to avoid that Apollo core dumped. This problem is not related to the design of the DDR SDRAM Controller rather it is a problem within the Apollo program. Another problem with Apollo is that the hierarchical netlist generated after final place and route did not correspond to the spef-file containing timing information which is also generated by Apollo. The netlist and spef-file is used to create an sdf-file which is used for back annotated timing simulations. This problem could be solved by using one of LSIs tools together with the ECO information from the place and route tools to generate a functional netlist.
62
Figure 42 Placement
63
retrieved can create meta-stability in the AHB Cores counter. The clock for AHB x2 is phase shifted compared to the clock for AHB Core and this together with when the sampling of data is started can lead to that the Increment signal only has 1.25 nanoseconds to travel from the AHB x2 module to the AHB Core module that updates the AHB Core counter, Figure 43. Along the path the increment counter passes through three logic gates and gets delayed 800 picoseconds and together with the transition time of the Flip-Flop which the Increment signal comes from the setup time for the Flip-Flops in the counter are violated. A solution to the problem would be to check if the clock of the AHB Core is low and in that case wait until the next rising edge of the Clk x2 giving the Increment signal 3.75 nanoseconds, Figure 44.
Figure 44 Increment signal not creating a setup problem This has not been implemented because the increment counter is dependent on the relation between the two clocks which can vary dependent on which clock frequency is chosen on the AHB bus. The implemented increment counter works when the Clk x2 clock is a multiple of the clock frequency of the Clk clock. This is not always the case and therefore a change of the increment counter would be necessary making any changes done now irrelevant.
64
III
9 NEW DRAM INTERFACES......ERROR: REFERENCE SOURCE NOT FOUND 9.1 9.2 9.3 9.4 9.5 10 DDR II SDRAM...............................................ERROR: ESDRAM........................................................ERROR: QDR SDRAM..................................................ERROR: DIRECT RAMBUS..................................................ERROR: FCDRAM........................................................ERROR: REFERENCE REFERENCE REFERENCE REFERENCE REFERENCE
SOURCE NOT FOUND SOURCE NOT FOUND SOURCE NOT FOUND SOURCE NOT FOUND SOURCE NOT FOUND
65
66
9.2 ESDRAM
ESDRAM is an enhancement of the DRAM made by a caching structure consisting of a direct-mapped SRAM cache line, the same size as the DRAM row, associated with each bank. This makes it possible to make accesses to the most recently activated row regardless if the bank have been refreshed or precharged making it possible to make these service operations in the background while read and write operations are made to the cache line. This enhancement can be made to any SDRAM and comes down to a question of performance gain compared to cost if it is going to be manufactured in any larger scale.
67
possible to make both a read and a write burst at the same time to the same memory chip. The functionality of the busses and commands are the same as for the DDR. The success for the QDR is questionable except for specific high demanding applications since the added data bus add pins to the IC needed by the memory controller. Today pin count is a limiting factor when designing ICs so an increase in pin count is not preferred.
9.5 FCDRAM
Fast Cycle DRAM, developed by Fujitsu, is an enhancement to SDRAM which allows faster repetitive access to a single bank. This is done by dividing each bank into smaller blocks reducing the access time due to reduced capacitance of the word and bit line and enables pipelining of consecutive access to the same bank. Multistage pipelining of the core array hides precharge and allows it to occur simultaneously with inputsignal latching and data transfer to the output latch. A disadvantage of this technique is the extra space needed on the Die for the dividing of the banks leading to high manufacturing costs. The FCDRAM can be found for both SDR and DDR SDRAM interfaces.
68
10 Future work
The continued work for the DDR SDRAM Controller would be to simulate it together with a real system and analyze the traffic to find out the characteristics of the AHB transactions to try to optimize the latency for read and write bursts. One part in the work of latency reduction is to optimize the arbiter to improve its knowledge of the state of the Memory Controller in order to be able to reduce the impact of the CAS latency in the beginning of a read burst. Another way to improve the efficiency of the AHB transactions would be to improve the buffer handling during writes. The implemented buffer handling work such that a write to the DDR SDRAM is not started until the AHB transaction is finished or the Data Buffer is full. This introduces latency since the stored data has to be written to the DDR SDRAM before the next AHB transaction can be started. To improve the efficiency there are two possible solutions. One is that when the first location of the Data Buffer has been written to the DDR SDRAM the next AHB transaction can be started if it is a write transaction. It can start writing data to the locations of the Data Buffer that already has been written to the DDR SDRAM. This does not improve the efficiency for read transaction since all data have to be written to the DDR SDRAM before the data bus will be available for reading data to the Data Buffer. To improve the efficiency of the read transactions it would be possible to start writing before the current transaction is finished or the Data Buffer is full. This could be done by splitting the burst that the AHB Interface module tells the Core Memory Controller to make. By doing this it would be possible to start writing data to the DDR SDRAM earlier and with that the time to finish the write would be reduced. The scheduling of these split up bursts would have to be studied so not to impact the overall performance of the Memory Controller. The overall utilization would be reduced if the scheduling would grant other AHB Interface modules access to the DDR SDRAM which can cause unnecessary precharge and activates in between the split bursts. To allow for improved efficiency the DDR SDRAM has the ability to receive a maximum of eight refresh commands and to function without a refresh command for a period of up to eight times the standard refresh period. This gives the possibility to schedule the refresh commands such that it limits the impact on the efficiency of other commands. Such a scheduling scheme has not been implemented and the behavior of a complete system would be needed to be studied to find an effective scheduling routine. For systems where the accesses to the memory are short bursts to consecutive locations a caching functionality should be implemented. The implemented Memory Controller reads the exact number of beats that is stated by the AHB transaction even if more data is presented on the data bus. The reason for this is that the AHB x2 module is not aware of if the data presented on the data bus is consecutive or if the data presented is because of that a row boundary have been reached, making the address to wrap, and needs to be discarded. The largest gain with a caching system would be when the bursts are short and to
69
consecutive locations in conjunction with using a default burst length of eight. In a system with longer bursts the Data Buffer would have a higher utilization leaving little space to be used for caching making a caching system inefficient. To improve the efficiency of such a system the Data Buffers would need to be larger but that would increase the core size of the Memory Controller. A caching system for such a system has to be evaluated with respect of performance gain compared to the extra cost for the larger memories.
70
71
11 Conclusion
It has been shown that it would be possible to design and implement a DDR SDRAM Controller in house by Ericsson AB to be able to meet their increased demand for higher bandwidth to of chip memory for their future system platforms. The crucial design issues lies within the phase shift of the data strobe and resynchronization of retrieved data during read bursts. This report gives several solutions to these issues and how to come to terms with them. The presented solutions are applicable when designing any Double Data Rate interface and not only DDR SDRAM interfaces. The preferable solution would be the PCB line delay for shifting the data strobe since it is a simple and robust solution not demanding any advanced and costly circuitry as compared to the use of a DLL. For sampling the data it is necessary to know when to start the sampling, which is done by observing the preamble and the first rising edge of the data strobe. For this the simplified phase detector could be used to detect the phase of the data strobe to find out when the data will be stable and it is possible to sample without creating any meta-stability. The study has lead to a working implementation of a DDR SDRAM Controller that can be used as a reference design for future implementations.
72
73
12 References
[1] [2] [3] JEDEC, "Double Data Rate (DDR) SDRAM Specification", JESD 79 Release 2, May 2002, JEDEC Solid State Technology Association ARM, "AMBA Specification" (Rev 2.0), 1999, ARM IHI 0011A Hansel A. Collins, Ronald E. Nikel, DDR-SDRAM, High-speed, Source-synchronous Interfaces Create Design Challenges, September 2 1999, EDN www.ednmag.com Tegze P. Haraszti, CMOS Memory Circuits, 2000, Kluwer Academic Publishers, ISBN 0-7923-7950-0 Jan M. Rabaey, Digital Integrated Circuits a Design Perspective, 1996, Prentice Hall Inc, ISBN 0-13-394271-6 Elpida, How to use DDR SDRAM, April 2002, Elpida Memory Inc., www.elpida.com/pdfs/E0234E30.pdf Lostcircuits, Inside the EDDR Chip, Combining DRAM storage and SRAM speed, November 27 2000, Lostcircuits, www.lostcircuits.com/memory/eddr/ Joerg Vollrath, Tutorial: Characterizing SDRAMS, 1999, IEEE International Workshop on Memory Technology, Design and Testing Valerie Lines, Mammoun Abou-Seido, Cynthia Mar, Arun Achyuthan, Sampei Miyamoto, Yoshihiro Murashima, Shinzo Sakuma, High Speed Circuit Techniques in a 150MHz 64M SDRAM, 1997, IEEE International Workshop on Memory Technology, Design and Testing
[8] [9]
[10] Yasuhiro Konishi, Hisashi Iwamoto, Seiji Sawada, Yasumistu Muria, Takashi Araki, Masaki Kumanoya, Dual Clock Scheme for over 200 MHz Synchronous DRAM System, September 17-19 1996, 22nd European Solid State Circuits Conference [11] JEDEC, Stub Series Terminated Logic for 2.5 V (SSTL_2), JESD89B, May 2002, JEDEC Solid State Technology Association [12] Reinhold Ludwig, Pavel Bretchko, RF Circuit design, 2000, Prentice Hall ISBN 0-13-095323-7 [13] Samsung Electronics, Key Points for Controller Design, 1998, DDR SDRAM/ SGRAM Application Note Samsung Electronics MPPJLEE-Q4-98, www.samsungelectronics.com/semiconductors/dram/technical_data/ application_notes
74
[14] Brian Davis, Bruce Jacob, Trevor Mudge, The New DRAM Interfaces: SDRAM, RDRAM and Variants, 2000, Vol. 1940 of Lecture Notes in Computer Science pp.26-31
75
76
13 Appendix
77
78
Figures
FIGURE 1 COMPARISON OF AN EIGHT BEAT BURST WITH SDR AND DDR....................14 FIGURE 2 INTERNAL ARCHITECTURE OF A SDRAM WITH A SENSE AMPLIFIER AND MEMORY CELL...................................................................................................................................18 FIGURE 3 MEMORY CELL IN CROSS SECTION........................................................................18 FIGURE 4 MEMORY CELL IN TOP VIEW....................................................................................19 FIGURE 5 SINGLE DATA RATE SDRAM CLOCK PERIOD LIMIT..........................................19 FIGURE 6 SCHEMATIC VIEW OF THE CLOCK DELAY IN A DDR SDRAM........................20 FIGURE 7 COMPARISONS OF A SDR AND A DDR WITH DLL................................................20 FIGURE 8 SCHEMATIC VIEW OF DDR SDRAM ARCHITECTURE........................................21 FIGURE 9 WAVE CHART ILLUSTRATING 2N-PREFETCH......................................................22 FIGURE 10 WAVE CHART OF A READ BURST WITH EIGHT BEATS...................................23 FIGURE 11 WAVE CHART OF A WRITE BURST WITH EIGHT BEATS................................23 FIGURE 12 WAVE CHART OF A REFRESH..................................................................................24 FIGURE 13 MODE REGISTER DEFINITION.................................................................................25 FIGURE 14 EXTENDED MODE REGISTER DEFINITION..........................................................26 FIGURE 15 WAVE CHART OF THE INITIALIZATION OF THE DDR SDRAM.....................27 FIGURE 16 MATCHED SIGNAL PATH...........................................................................................28 FIGURE 17 EXAMPLE OF A SSTL_2, CLASS II, OUTPUT ENVIRONMENT.........................29 FIGURE 18 VOLTAGE SWING.........................................................................................................30 FIGURE 19 ILLUSTRATION OF THE EFFECT OF DIFFERENT SLEW RATE FOR POSITIVE AND NEGATIVE FLANK................................................................................................30 FIGURE 20 EXAMPLE OF A SSTL_2, CLASS II, DIFFERENTIAL SIGNALS.........................31 FIGURE 21 DIFFERENTIAL SIGNAL WAVE CHART.................................................................31 FIGURE 22 PRE AND POST AMBLE...............................................................................................32 FIGURE 23 SCHEMATIC VIEW OF THE MEMORY CONTROLLER WITH ONE AHB INTERFACE..........................................................................................................................................35 FIGURE 24 SCHEMATIC VIEW OF THE CORE MEMORY CONTROLLER.........................36 FIGURE 25 EIGHT BEAT BURST STARTING AT ADDRESS 0001 WITH DEFAULT BURST LENGTH OF FOUR..............................................................................................................................39
79
FIGURE 26 SCHEMATIC VIEW OF THE AHB INTERFACE.....................................................41 FIGURE 27 WAVE CHART OF A READ USING ODD AND EVEN DATA................................44 FIGURE 28 MRS REGISTER SEE CHAPTER 3.2.5 AND 3.2.6 FOR CLOSER DETAILS.......45 FIGURE 29 VARIABLE REGISTER.................................................................................................46 FIGURE 30 SCHEMATIC VIEW OF A DDR SDRAM CONTROLLER WITH TWO AHB INTERFACES........................................................................................................................................47 FIGURE 31 SCHEMATIC VIEW OF A DELAY LOCK LOOP.....................................................48 FIGURE 32 PROGRAMMABLE DELAY LINE WITH TEMPERATURE SENSING................50 FIGURE 33 SCHEMATIC VIEW OF THE SAMPLING FLIP-FLOPS.........................................51 FIGURE 34 WAVE CHART OF THE SAMPLING FLIP-FLOPS.................................................51 FIGURE 35 WAVE CHART ILLUSTRATING WHEN DATA IS STABLE.................................52 FIGURE 36 PHASE DETECTOR ......................................................................................................52 FIGURE 37 WAVE CHART DESCRIBING THE PHASE DETECTOR.......................................53 FIGURE 38 WAVE CHART OF DATA STROBE AND REFERENCE CLK IN PHASE...........54 FIGURE 39 WAVE CHART OF THE RESYNCHRONIZATION OF DATA...............................54 FIGURE 40 PIN PLACEMENT...........................................................................................................60 FIGURE 41 FLOORPLAN...................................................................................................................61 FIGURE 42 PLACEMENT...................................................................................................................63 FIGURE 43 INCREMENT SIGNAL CREATING SETUP PROBLEM.........................................64 FIGURE 44 INCREMENT SIGNAL NOT CREATING A SETUP PROBLEM............................64
80
81
Terminologies
AHB AMBA APB ASIC Beat Burst Core Advanced High-performance Bus Advanced Microcontroller Bus Architechture Advanced Peripheral Bus Application Specific Integrated Circuit The transaction of one set of data A set of consecutive beats Core is usually used as the name of a larger functional block like a CPU core, DSP or memory controller core in an IC. Core is also used in this document as a name of modules with vital functionality. Complementary Metal-Oxide Semiconductor Double Data Rate Burst The burst length set by the Mode Register Set command The silicon upon the IC is made Delay Look Loop Engineering Change Order First In First Out Flank triggered digital curcuit that on either positive or negative flank of its clock signal samples the input port and presents its value on the output port untill next flank on the clock LSIs name of there 0.13 m CMOS technology Integrated Circuit Intellectual Property Variations in the oscillation of the clock ASIC vendor Mega bit Mega bit per second
82
Module
Hirarchy level, logic with close relation is gathered in one module and a module can consist of several modules. The Command Timing is an example of a module and the Core Memory Controller is an example of a module consisting of other modules The representation of the implemented logic as a file describing the interconnects between standard cells. Printed Curcuit Board Register Transfer Level Single Data Rate Synchronous Dynamic Random Access Memory The maximum possible change of a magnitude of units per unit of time (Voltage/second or Amper/second) The time it takes for a signal to travel between two Integrated Curicuits Voltage source VHSIC Hardvare Description Language Very High Speed Integrated Circuit Connection from one metal layer to another
Netlist
83
84
85
Libraries Information Library Name gflxp.techlib rr_ddr_rwbuffer.tech Section One: Design Summary =========================== Top Module Technology Design Quoted Dimension :ddr_top :gflxp :(X = 0.76, Y = 0.76)
Section Two: Design Statistics Summary ====================================== (2.1). I/O Statistics Input Pins Used Output Pins Used Bidirect Pins Used Total Pins Used Total Pads Used Total Slots Used (2.2). Design Statistics Logic Units Used(lu) : Logic Gates Used(lg) : Mix-n-match Logic Gates(hmg) : ClockOverhead Logic Gates(cog): Megacell Units Used(mu) : Cell Units Area : Chip Raw Units : Chip Usage : Total Cells : Total Cell Types : Total Units (lu + mu) : Power Leakage Power Type vdd2 Leakage Power 177.47107 uW 98648.00 28998.00 0.00 225.50 121401.20 2.0172 um^2 82004.00 2.68340 9737.00 200.00 220049.20 : 227 : 165 : 0 : 392 : 0 : 0
Total Chip Leakage Power : 177.47107 uW Total Signal Nets Total NC Nets Average Pins/Nets : 10218 : 2 : 3.77720
86
87
VHDL Code
-- Address Address location of next burst -- Size unsigned(3 dowonto 0) The size of the next burst -- DefaultBurstLength The burst length set with MRS if Address(0) = 1 then Size <= x1; elsif Address(1) = 1 then Size <= x2; elsif Address(2) = 1 and DefaultBurstLength >= 4 then Size <= x4; else Size <= DefaultBurstLength; end if;
88
89