William Stallings Computer Organization and Architecture 9 Edition
William Stallings Computer Organization and Architecture 9 Edition
William Stallings Computer Organization and Architecture 9 Edition
William Stallings
Computer Organization
and Architecture
9th Edition
+
Chapter 2
Computer Evolution and Performance
+
History of Computers
First Generation: Vacuum Tubes
ENIAC
Electronic Numerical Integrator And Computer
Designed and constructed at the University of Pennsylvania
Started in 1943 – completed in 1946
By John Mauchly and John Eckert
Its first task was to perform a series of calculations that were used to help determine the
feasibility of the hydrogen bomb
Continued to operate under BRL management until 1955 when it was disassembled
ENIAC
Major
Memory drawback
drawback
consisted was
Occupied was the
the need
need
Contained Capable of 20 accumulators,
1500 Decimal
more
more of
of each
each for manual
Weighed square 140 kW rather
than 5000 capable programming
30 feet Power than
18,000 additions of
tons of consumption binary by setting
vacuum
vacuum per
per holding
holding switches
floor machine
tubes second a and
space
10 digit plugging/
number
number unplugging
cables
+
John von Neumann
EDVAC (Electronic Discrete Variable Computer)
First publication of the idea was in 1945
IAS computer
Princeton Institute for Advanced Studies
Prototype of all subsequent general-purpose computers
Completed in 1952
Structure of von Neumann Machine
+
IAS Memory Formats
Both data and instructions are stored
The memory of the IAS consists there
of 1000 storage locations (called
words) of 40 bits each Numbers are represented in binary
form and each instruction is a binary
code
+
Structure
of
IAS
Computer
+ Registers
Memory buffer register • Contains a word to be stored in memory or sent to the I/O unit
(MBR) • Or is used to receive a word from memory or from the I/O unit
Memory address register • Specifies the address in memory of the word to be written from or read
(MAR) into the MBR
Instruction register (IR) • Contains the 8-bit opcode instruction being executed
Instruction buffer register • Employed to temporarily hold the right-hand instruction from a word in
(IBR) memory
Accumulator (AC) and • Employed to temporarily hold operands and results of ALU operations
multiplier quotient (MQ)
+
IAS
Operations
+
Table 2.1
The IAS
Instruction
Set
Backward compatible
+
Was the major manufacturer of
punched-card processing equipment
Cheaper
It was not until the late 1950’s that fully transistorized computers
were commercially available
Table 2.2
Computer Generations
+
Computer Generations
+
Second Generation Computers
Discrete component
Single, self-contained transistor
Manufactured separately, packaged in their own containers, and soldered
or wired together onto masonite-like circuit boards
Manufacturing process was expensive and cumbersome
The two most important members of the third generation were the
IBM System/360 and the DEC PDP-8
+
Microelectronics
+ A computer consists of gates,
Integrated memory cells, and
interconnections among these
Circuits elements
The pace slowed to a doubling every 18 months in the 1970’s but has sustained that rate
ever since
The cost of computer logic and memory circuitry has fallen at a
dramatic rate
+
Table 2.4
Characteristics of the
System/360 Family
Generations
VLSI
Very Large
Scale
Integration
ULSI
Semiconductor Memory Ultra Large
Microprocessors Scale
Integration
+ Semiconductor Memory
+
Microprocessors
The density of elements on processor chips continued to rise
More and more elements were placed on each chip so that fewer and fewer
chips were needed to construct a single computer processor
a. 1970s Processors
b. 1980s
Evolution of Intel Microprocessors
c. 1990s Processors
d. Recent Processors
+
Microprocessor Speed
Techniques built into contemporary processors include:
P rocessor analyzes
moves data
which
or
DataBranch
Speculative flow
●
Proc essor looks a heand
ad daintathe
●
instructions into
are dependent
a conceptual
on
Pipelining
ainstruc
na lysi s, tion
some cprocodee ssors
fe tcspe
hecula
d f tirom
ve l y
e xe cute i nstruc ti ons ahea d of t he ir a c tua l
pipe with
each
ame other’s
ppeamory all results,
stages of
ra nce and or the
data,
i n theprprogra
e dicmts ewhic
xe c ut hion, hol di ng
execution
prediction
analysis
brarenche
pipe
to
the sult s s,
create
eaxerecuti on ly
like
in tor
processing
an optimized
e ngi
gr oupsloca
emporary
tonebe
s asproc
schedule of instructions
simultaneously
oftiinstr
busy eassse
ons, kee
possible
uctions,
d ne xt
pi ng
+
Performance
Balance
Increase the number
Adjust the organization and of bits that are
retrieved at one time
architecture to compensate by making DRAMs
for the mismatch among the “wider” rather than
“deeper” and by
capabilities of the various using wide bus data
components paths
RC delay
Speed at which electrons flow limited by resistance and capacitance of
metal wires connecting them
Delay increases as RC product increases
Wire interconnects thinner, increasing resistance
Wires closer together, increasing capacitance
Memory latency
Memory speeds lag processor speeds
+ Processor
Trends
Strategy
With
As
The caches
use of is
two to useprocessors
became
multiple two
larger it
Multicore simpler
made
on
potential
create two
chip rather
processors
performance
the same
processors and larger
to increase
onthe
sense
chip provides theto
performance
thanthen
one three
more
caches
without
levels
complex are justified
increasing
of cache on a chiprate
processorthe clock
+
Many Integrated Core (MIC)
Graphics Processing Unit (GPU)
MIC GPU
Leap in performance as well as the Core designed to perform parallel
challenges in developing software operations on graphics data
to exploit such a large number of
cores Traditionally found on a plug-in
graphics card, it is used to encode
The multicore and MIC strategy and render 2D and 3D graphics as
involves a homogeneous collection well as process video
of general purpose processors on a
single chip Used as vector processors for a
variety of applications that require
repetitive computations
+ Overview ARM
Results of decades of design effort on
complex instruction set computers (CISCs) Intel
Excellent example of CISC design
8086
16-bit machine
Used an instruction cache, or queue
First appearance of the x86 architecture
80386
Intel’s first 32-bit machine
First Intel processor to support multitasking
80486
More sophisticated cache technology and instruction
pipelining
Built-in math coprocessor
x86 Evolution - Pentium
Penti Penti
Pent Penti Penti
um um
ium um II um 4
Pro III
+ Increased MMX
●
●
●
Superscal superscalar Additional
●
Includes
●
technology
ar organization floating- additional
●
Aggressive ●
Designed
Multiple register
point floating-
●
specifically
renaming instruction point and
instructio ●
Branch to process
s to other
ns prediction video,
Data flow support 3D enhanceme
executed
●
audio, and
analysis
graphics graphics nts for
in parallel ●
Speculative
data software multimedia
execution
x86 Evolution (continued)
Core
First Intel x86 microprocessor
with a dual core, referring to the
implementation of two processors
on a single chip
Core 2
Extends the architecture to 64 bits
Recent Core offerings have up to
10 processors per chip
General definition: Embedded
“A combination of computer
hardware and software, and perhaps
additional mechanical or other parts,
designed to perform a dedicated Systems
function. In many cases, embedded
systems are part of a larger system or
+ product, as in the case of an antilock
braking system in a car.”
Table 2.7
Examples of Embedded Systems and Their Markets
+
Embedded Systems
Requirements and Constraints
Small to large systems,
implying different cost
constraints and different needs
for optimization and reuse
Relaxed to very strict
Different models of requirements and combinations
computation ranging from of different quality requirements
discrete event systems to with respect to safety, reliability,
hybrid systems real-time and flexibility
Different application
characteristics resulting in static
versus dynamic loads, slow to Short to long
fast speed, compute versus
interface intensive tasks, and/or life times
combinations thereof
Different environmental
conditions in terms of
radiation, vibrations, and
humidity
+ Figure 2.12
Possible Organization of an Embedded System
+
Acorn RISC Machine (ARM)
Secure applications
Smart cards, SIM cards, and
payment terminals
Application platforms
Embedded real-time systems
Devices running open operating
Systems for storage, automotive
systems including Linux, Palm
body and power-train, industrial,
OS, Symbian OS, and Windows
and networking applications
CE in wireless, consumer
entertainment and digital imaging
applications
+
System Clock
+ Table
Performance Factors 2.9
and
System Attributes
Benchmarks
For example, consider this high-level language statement:
Written in aofhigh-level
Re pre senta tive a pa r tic ula r kind
Can
of
Re
progra mbe
language,
Written
pre senta
system progr
ming
tive
making
inamm
style , suc it
aofhigh-level
a panume
ing,
h as
r tic ula
r icr akind
l
Can
portable
measured
language,
of
pr progra
ogr
system
pr
mbe across
ming
am ming,
ogr amprogr
making
or style
c omm
mingamm ing, nume
, different
easily
er cit
suc hiaals
r ic a l
machines
portable across
or c ommdifferent
measured
pr ogr am ming,
pr ogr am ming
machines
easily
er c ia l
H
a
s
w
i
d
e
d
is
tr
i
b
u
ti
o
n
+
System Performance Evaluation
Corporation (SPEC)
Benchmark suite
A collection of programs, defined in a high-level language
Attempts to provide a representative test of a computer in a particular
application or system programming area
SPEC
An industry consortium
Defines and maintains the best known collection of benchmark suites
Performance measurements are widely used for comparison and research
purposes
+ Best known SPEC benchmark suite
Queuing system
If server is idle an item is served immediately, otherwise an arriving item
joins a queue
There can be a single queue for a single server or for multiple servers, or
multiples queues with one being for each of multiple servers