0% found this document useful (0 votes)
46 views

An Introduction To Vlsi Processor Architecture For Gaas

This document discusses the advantages and disadvantages of using gallium arsenide (GaAs) instead of silicon for VLSI processor architecture. Some key advantages of GaAs include being able to operate faster than silicon at the same power level, having higher electron mobility and saturation velocity. However, GaAs also has disadvantages such as lower transistor yields, poorer noise margins, and being much more expensive than silicon. The document outlines how the technical limitations of GaAs, such as smaller chip sizes and limited fan-in/fan-out, would affect microprocessor architecture design.

Uploaded by

Bibin Johnson
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

An Introduction To Vlsi Processor Architecture For Gaas

This document discusses the advantages and disadvantages of using gallium arsenide (GaAs) instead of silicon for VLSI processor architecture. Some key advantages of GaAs include being able to operate faster than silicon at the same power level, having higher electron mobility and saturation velocity. However, GaAs also has disadvantages such as lower transistor yields, poorer noise margins, and being much more expensive than silicon. The document outlines how the technical limitations of GaAs, such as smaller chip sizes and limited fan-in/fan-out, would affect microprocessor architecture design.

Uploaded by

Bibin Johnson
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

An Introduction

  to
VLSI Processor Architecture
for GaAs
Advantages

• For the same power consumption, at least half order of magnitude faster than Silicon.
(Up to 70% reduction in power dissipation over fastest Si tech as ECL)
•Electron mobility 6 to 7 times that of Si
•A semi-insulating substrate with lower parasitics.
•1.4 improvement factor for carrier saturation velocity.
• Tolerant of temperature variations. Operating range: [200C, 200C].(Large band
gap 1.4 ev )
• Radiation hard. Several orders of magnitude more than Silicon due to no gate oxide
Disadvantages:
• High density of wafer dislocations
 Low Yield  Small chip size  Low transistor count.

• Noise margin not as good as in Silicon.


 Area has to be traded in for higher reliability.

• At least two orders of magnitude more expensive than Silicon.

• Currently having problems with high-speed test equipment.

•During implantation, when a high energy ion enters a single crystal lattice at a critical
angle to the major axis of the GaAs crystal, the ion is steered down the open directions
of the lattice. This steering is called axial channeling.The channeling effect is not as
dramatic in the <100> direction when compared with <110> direction.
Basic differences of Relevance for Microprocessor Architecture

• Small area and low transistor count

• High ratio of off-chip and on-chip delays

• Limited fan-in and fan-out

• High demand on efficient fault-tolerance


A Brief Look Into the GaAs IC Design

•Bipolar (TI + CDC)

•JFET (McDAC)

•GaAs MESFET Logic Families (TriQuint + RCA) 


 

D-MESFET
(* Depletion Mode *)
 
E-MESFET
(* Enhancement Mode *)
Speed Dissipation Complexity
(ns) (W) (K transistors)

Arithmetic
32‑bit adder 2,9 total 1,2 2,5
(BFL D‑MESFET)
1616‑bit multiplier 10,5 total 1,0 10,0
(DCFL E/D MESFET)
 
Control
1K gate array 0,4/gate 1,0 6,0
(STL HBT)
2K gate array 0,08/gate 0,4 8,2
(DCFL E/D MESFET)
 
Memory
4Kbit SRAM 2,0 total 1,6 26,9
(DCFL E/D MODFET)
16K SRAM 4,1 total 2,5 102,3
(DCFL E/D MESFET)
 
Figure 7.1. Typical (conservative) data for speed, dissipation, and complexity of digital GaAs chips.
GaAs Silicon Silicon Silicon Silicon
(1 m E/D-MESFET) (2 m NMOS) (2 m CMOS) (1.25 m NMOS) (2 m ECL)

Complexity
On-chip transistor count 40K 200K 200K 400K 40K (T or R)
Speed
Gate delay 50-150 ps 1-3 ns 800-1000 ps 500-700 ps 150-200 ps
(minimal fan-out)
On-chip memory access 0.5-2.0 ns 20-40 ns 10-20 ns 5-10 ns 2-3 ns
(3232 bit capacity)
Off-chip, on package 4-8 ns 40-80 ns 30-40 ns 20-30 ns 6-10 ns
memory access (25632 bits)
Off-package memory 10-50 ns 100-200 ns 60-100 ns 40-80 ns 20-80 ns
access (1k32 bits)

 
Figure 7.2. Comparison (conservative) of GaAs and silicon, in terms of complexity and speed of the chips (assuming equal
dissipation). Symbols T and R refer to the transistors and the resistors, respectively. Data on silicon ECL technology
complexity includes the transistor count increased for the resistor count.
GaAs E/D‑DCFL Silicon SOS‑CMOS

Minimal geometry 1m 1.25 m


Levels of metal 2 2
Gate delay 250 ps 1.25 ns
Maximum fan-in 5 NOR, 2 AND 4 NOR, 4 NAND
Maximum fan-out 4 20
Noise immunity level 220 mV 1.5 V
Average gate transistor count 4.5 7
On-chip transistor count 25 000 100 000-150 000
 
Figure 7.3. Comparison of GaAs and silicon, in the case of actual 32-bit microprocessor implementations (courtesy of
RCA). The impossibility of implementing “phantom” logic (wired-OR) is a consequence of the low noise immunity of GaAs
circuits (200 mV).
The Information Bandwidth Problem of GaAs
Assume a 10:1 advantage in on-chip switching speed, but
only a 3:1 advantage in off-chip/off-package memory access.

Will the microprocessor be 10 times faster?

The Reduced Philosophy:


Large register file
st or all on-chip memory is used for the register file
  On chip instruction cache is out of question

Instruction fetch must be from an off-chip environment


Applications for GaAs Microprocessor

• General purpose processing in defense and aerospace,


and execution of compiled HLL code.
• General purpose processing and substitution
of current CISC microprocessors.
• Dedicate special-purpose applications
in digital control and signal processing.
• Multiprocessing of the SIMD/MIMD type,
for numeric and symbolic applications.
Which Design Issues Are Affected?
On-chip issues:
•Register file
•ALU
•Pipeline organization
•Instruction set
 
Off-chip issues:
•Cache
•Virtual memory management
•Coprocessing
•Multiprocessing
 
System software issues:
Compilation
Code optimization
Major Bottlenecks are
• GaAs Technology itself(Practical speed and
radiation)
• Packaging and interconnection Technology
• Compiler Technology(deep pipeline,special
restriction)
• Architecture
Adder Design

figure 7.6. Comparison of GaAs and silicon. Symbols CL and RC refer to the basic adder types (carry look ahead and ripple carry).
Symbol B refers to the word size.
a) Complexity comparison. Symbol C[tc] refers to complexity, expressed in transistor count.
b) Speed comparison. Symbol D[ns] refers to propagation delay through the adder, expressed in nanoseconds. In the case
of silicon technology, the CL adder is faster when the word size exceeds four bits (or a somewhat lower number, depending on the
diagram in question). In the case of GaAs technology, the RC adder is faster for the word sizes up to n bits (actual value of n
depends on the actual GaAs technology used).
32-bit
GaAs MICROPROCESSORS
 
Goals and project requirements:
•200 MHz clock rate
•0.6um GaAs DCFL technology used
•32-bit parallel data path
•32-bit shifter
•Support for 2 levels of off-chip instruction and data caches
•16 general purpose registers
•Reduced Instruction Set Computer (RISC) architecture
•24-bit word addressing
•Virtual memory addressing
•Up to four coprocessors connected to the CPU
(Coprocessors can be of any type and all different)
System software
1. Core-MIPS translators MC680x0+1750A
2.    Compilers C + Pascal + Ada

Technology Limitations
2.    Power Levels: High, Reference, and Low:
a.    Circuits are always drawing current, and the number of such circuits on a die is severely limited, due
to power consumption.
b.    If smaller-size device are used, the circuit`s internal impedance becomes higher, it needs less power to
operate, the number of circuits on a die increases, but the fun–out gets severely limited.
3.The outputs of two circuits can not be tied together:
a.  one can not utilize phantom logic on the chip, to implement functions like WIRED-OR
(all outputs active).Circuits have a low “operating noise margin”.
B . One can not use three-state logic on the chip, to implement functions
like MULTIPLE-SOURCE-BUS (only the output active). Circuits have no “off-state”.
C . Actually, if one insist on having a MULTIPLE-SOURCE- BUS on the chip,
one can have it at the cost of only one active load and the need to precharge
(both mean “constraints” and “slowdown on the architecture level).
D . Fortunately, logic function AND-OR is exactly what is needed to create
a multiplexer - a perfect replacement for a bus.

E
E . Consequently, in hand-crafted areas (register file and barrel shifter),
busses were used (no need for multiple active loads, and time was not critical).
In standard-cell areas (all the rest) multiplexers were used.
F . Using multiplexers frequently resulted in extra functionality on the architecture level,simply
because it was cheaper to keep them, than to exclude them.
The CPU Architecture
1. Deep Memory Pipelining:
Optimal memory pipelining depends on the ratio of off-chip and on-chip delays, plus
many other factors. Therefore, precise input from DP and CD people was crucial.
Unfortunately, these data were not quite known at the design time, and some solutions
(e.g. PC-stack) had to work for various levels of the pipeline depth.All control signals are
pipelined and decoded during the cycle before they are used.

2. Latency Stages:
One group of latency stages (WAIT) was associated to instruction fetch; the other
group was associated to operand load.

3. cache miss detection is brought on chip to minimise detection delay.


4. three level of aluminum metallization for signal and power
interconnection
5. two phase clocking is used to minimise clock skew problems
Silicon

IR
M
GRF
CPU

GaAs

CPU M3 M6 M9
ALU CLASS
 
CATALYTIC MIGRATION
from the
RISC ENVIRONMENT
POINT-OF-VIEW

 
DEFINITION: DIRECT MIGRATION
Migration of an entire hardware resource into the system software.
 
EXAMPLES:
 
Pipeline interlock.
Branch delay control.
 
ESSENCE:
 
Examples that result in code* speed-up are very difficult to invent.
DELAYED CONTROL TRANSFER
 
 
I1 execution
  I1 fetch branch address calculation
branch target calculation

I2 fetch I2 execution

I3 fetch
time 

Delayed Branch Scheme


DEFINITION: Catalytic Migration
 
Migration base on the utilization of a catalyst.
 
 
MIGRANT vs CATALIST

Figure 7.13. The catalytic migration concept. Symbols M, C, and P refer to the migrant, the catalyst, and the processor, respectively.
The acceleration, achieved by the extraction of a migrant of a relatively large VLSI area, is achieved after adding a catalyst of a
significantly smaller VLSI area.
 
 
ESSENCE:
 
Examples that result in code speed-up are much easier to invent.
METHODOLOGY:
Area estimation: Migrant
Area estimation: Catalyst
Real estate to invest: Difference
Investment strategy: R

Compile time algorithms


Analytical analysis
Simulation analysis
Implementational analysis
 
 
NOTE: Before the reinvestment,
the migration may result in slow-down.
i: load r1, MA{MEM – 6}
i + 1: load r2, MA{MEM – 3}

a)

b)
Figure 7.14. An example of catalytic migration: Type HW (hand walking): (a) before the migration; (b) after the migration. Symbols P
and GRF refer to the processor and the general-purpose register file, respectively. Symbols RA and MA refer to the register address and
the memory address in the load instruction. Symbol MEM – n refers to the main store which is n clocks away from the processor.
Addition of another bus for the register address eliminates a relatively large number of nop instructions (which have to separate the
interfering load instructions).
Figure 7.15. An example of catalytic migration: type II (ignore instruction): (a) before the migration; (b) after the migration. Symbol t
refers to time, and symbol UI refers to the useful instruction. This figure shows the case in which the code optimizer has successfully
eliminated only two nop instructions, and has inserted the ignore instruction, immediately after the last useful instruction. The addition
of the ignore instruction and the accompanying decoder logic eliminates a relatively large number of nop instructions, and speeds up
the code, through a better utilization of the instruction cache.
CODE INTERLEAVING

a)

b)
Figure 7.17. An example of the CI (code interleaving) catalytic migration: (a) before the migration; (b) after the migration. Symbols A
and B refer to the parts of the code in two different routines that share no data dependencies. Symbols GRF and SGRF refer to the
general purpose register file (GRF), and the subset of the GRF (SGRF). The sequential code of routine A is used to fill in the slots in
routine B, and vice versa. This is enabled by adding new registers (SGRF) and some additional control logic which is quite. The speed-
up is achieved through the elimination of nop instructions, and the increased efficiency of the instruction cache (a consequence of the
reduced code size).
APPLICATION:

1. Technologies with small on-chip transistor count.


The larger the ratio of off-chip to on-chip delays,
the better it works.
 
2. Technologies with dissipation-related limitations.
The larger the dissipation costs, the better it works.
 
 
EXAMPLES:
CLASSIFICATION:
CM
 
 
 

ICM ACM
 
   
 

C-+ C++ -+ ++
 
 
 
 
 
EXAMPLES:
(N2)*W vs DMA
RDEST BUS vs CFF
IGNORE
CODE INTERLEAVING
for i := 1 to N
do:

1. MAE
2. CAE
3. DFR
4. RSD
5. CTA

6. AAP
7. AAC
8. SAP
9. SAC
10. SLL

  end do
Figure 7.18. A methodological review of catalytic migration (intended for a detailed study of a new catalytic migration example).
Symbols S and R refer to the speed-up and the initial register count. Symbol N refers to the number of generated ideas. The meaning of
other symbols is as follows: MAE—migrant area estimate, CAE—catalyst area estimate, DFR—difference for reinvestment, RSD—
reinvestment strategy developed, CTA—compile-time algorithm, AAC—analytical analysis of the complexity, AAP—analytical
analysis of the performance, SAC—simulation analysis of the complexity, SAP—simulation analysis of the performance, SLL—
summary of lessons learned.
RISCs FOR NN: Core + Accelerators

 
Figure 8.1. RISC architecture with on-chip accelerators. Accelerators are labeled ACC#1, ACC#2, …, and they are placed in parallel
with the ALU. The rest of the diagram is the common RISC core. All symbols have standard meanings.
 
Figure 8.6. VLSI layout for the complete architecture of Figure 8.5. Symbol T refers to the delay unit, while symbols IN and OUT refer
to the inputs and the outputs, respectively
Figure 8.7. Timing for the complete architecture of Figure 8.5. Symbol t refers to time, symbol F refers to the moments of triggering,
and symbol P refers to the ordinal number of the processing element.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy