COS2621-notes_stallings
COS2621-notes_stallings
ws/Unisa
Chapter 1 - Introduction
Computer architecture = computer aspects visible to the programmer, that have a
direct impact on program execution. E.g. having a multiply instruction.
Many computer models have the same architecture, but different organisations,
resulting in different prices & performance.
A particular architecture can span many years, with its organisation changing
every now and then.
Function
There are four basic functions a computer can perform:
• Data processing
• Data storage
• Data movement
• Control
Structure
There are four main structural components:
• Central Processing Unit (CPU)
• Main memory
• I/O
• System interconnection
John von Neumann began the design of a new stored-program computer, called the
IAS computer, which is the prototype of all subsequent general-purpose
computers.
All of today’s computers have the same general structure and function, and are
referred to as von Neumann machines.
1
http://wikistudent.ws/Unisa
• Main memory (containing data and instructions)
• ALU (which performs operations on binary data)
• Control unit (which interprets the instructions and causes them to be
executed)
• I/O equipment
Moore’s law:
The number of transistors that can be put on a single chip doubles every 18
months.
Computers are becoming faster and cheaper, but the basic building blocks are
still the same as those of the IAS computer from over 50 years ago.
Microprocessor speed
The raw speed of the microprocessor won’t achieve its potential unless it is
fed a constant stream of work to do in the form of computer instructions.
Some ways of exploiting the speed of the processor:
• Branch prediction - The processor looks ahead and predicts which
instructions are likely to be processed next, prefetching and buffering
them so that there’s more work available.
• Data flow analysis - The processor analyses which instructions are
dependent on each other’s data to create an optimised schedule of
instructions. Instructions are scheduled to be executed when ready,
independent of the original order, preventing delays.
• Speculative execution - The processor uses branch prediction and data
flow analysis to speculatively execute instructions ahead of their
appearance in the program execution, holding the results in temporary
locations. The processor is kept as busy as possible by executing
instructions that are likely to be needed.
2
http://wikistudent.ws/Unisa
Processor speed and memory capacity have grown rapidly, but the speed with
which data can be transferred between main memory and the processor has lagged.
The interface between processor and main memory is the most crucial pathway in
the computer, because it is responsible for carrying a constant flow of program
instructions and data between memory chips and the processor. If memory or the
pathway can’t keep up with the processor, valuable processing time is lost.
DRAM density is going up faster than the amount of main memory needed, which
means that the number of DRAMs per system is actually going down, so there is
less opportunity for parallel transfer of data.
Designers must strive to balance the throughput and processing demands of the
processor, main memory, I/O devices, and the interconnection structures.
The design must cope with two evolving factors:
• The rate at which performance is changing differs from one type of
element to another
• New applications / peripherals keep on changing the nature of the demand
on the system
Hardware and software are generally logically equivalent, which means that they
can often perform the same function. Designers have to decide which functions
to implement in hardware and which in software. Cost usually plays a role.
Hardware offers speed, but not flexibility.
Software offers flexibility, but less speed.
Intel’s Pentium
This is an example of CISC design.
3
http://wikistudent.ws/Unisa
workstation, the RT PC. IBM then produced a third system, which built on the
801 and RT PC, called the IBM RISC System/6000 and referred to as the POWER
architecture. IBM then entered into alliance with Motorola and Apple, which
resulted in a series of machines that implement the PowerPC architecture
(derived from the POWER architecture).
The PowerPC architecture is a superscalar RISC system, and one of the most
powerful and best-designed ones on the market.
Hardwired program = a program that is not stored in memory, but forms part of
the hardware. Logic components are physically and permanently connected, and
contain the sequence of steps that the program should execute.
Computer function
Instruction cycle = the processing required for a single instruction.
Two steps of instruction processing:
1. Fetch cycle: The processor reads instructions from memory one at a time
2. Execute cycle: The processor executes each instruction
Program execution consists of repeating the process of instruction fetch and
instruction execution.
Program execution halts only if
• the machine is turned off,
• some sort of unrecoverable error occurs, or
• a program instruction halts the computer
4
http://wikistudent.ws/Unisa
2. The processor fetches the instruction from memory (The PC holds the
address of the instruction to be fetched next and is incremented after
each fetch)
3. The fetched instruction is loaded into the IR (Instruction Register)
4. The control unit interprets (decodes) the instruction to see what is to
be done
5. (If required, address(es) of the operand(s) are determined)
6. (If required, the operand(s) are fetched from memory)
7. The processor performs the required action (see * below)
8. The result is stored in memory, if required
9. Interrupt processing takes place if an interrupt has occurred (see **)
Interrupt cycle **
At the end of the instruction cycle, the processor checks to see if any
interrupts have occurred. If not, the processor proceeds to the fetch cycle,
otherwise:
• The processor suspends execution of the current program and saves its
context (including the address of the next instruction to be executed)
• The program counter is set to the starting address of an interrupt
handler routine
There is extra overhead with the interrupt handling process (Extra instructions
must be executed to determine the nature of the interrupt and to decide on the
appropriate action), but it still wastes less time than waiting for an I/O
operation to finish.
5
http://wikistudent.ws/Unisa
Multiple interrupts
Two approaches can be taken to dealing with multiple interrupts:
1. Disable interrupts while an interrupt is being processed
When an interrupt occurs, interrupts are disabled and the processor ignores
interrupt request signals. After the interrupt handler routine completes,
interrupts are enabled and the processor checks to see if additional interrupts
have occurred.
Advantage: Simple approach because interrupts are handled in sequential order.
Disadvantage: It doesn’t take into account relative priority or time-critical
needs.
2. Define priorities for interrupts
Interrupts of higher priority cause lower-priority interrupt handlers to be
interrupted.
I/O function
An I/O module (like a disk controller) can exchange data directly with the
processor. (The processor identifies a specific device that is controlled by a
particular I/O module).
If necessary, the processor can grant an I/O module the authority to read
from / write to memory, so that the I/O-memory transfer can occur without tying
up the processor. (Direct memory access).
Interconnection structures
Interconnection structure = the collection of paths connecting the various
modules (processor, memory, I/O).
Data to be exchanged:
Memory
A memory module consists of N words of equal length, with unique addresses. A
word of data can be read from or written into the memory.
I/O module
I/O is functionally similar to memory (read & write operations). An I/O module
can control more than one external device, each of which has a unique address.
There are external data paths for the input & output of data with an external
device. An I/O module may also be able to send interrupt signals to the
processor.
Processor
The processor reads in instructions and data, writes out data after processing,
and uses control signals to control the overall operation of the system. It
also receives interrupt signals.
Bus interconnection
Bus = a communication pathway connecting two or more devices.
Key characteristic: it is a shared transmission medium, and only one device can
transmit signals at a time.
A bus consists of multiple communication pathways (lines), which can each
transmit signals representing binary 1 / 0.
A sequence of digits can be transmitted across a single line.
Several lines can be used to transmit digits in parallel.
System bus = a bus that connects major computer components (processor, memory,
I/O).
6
http://wikistudent.ws/Unisa
Bus structure
A system bus consists of about 50 hundreds of separate lines, which each
have a particular function.
Bus operation:
A module that wants to send data must
1. Obtain the use of the bus
2. Transfer data via the bus
A module that wants to request data from another module must
1. Obtain the use of the bus
2. Transfer a request to the module over the appropriate control and address
lines
3. Wait for the other module to send the data
Multiple-bus hierarchies
Why connecting too many devices to the bus causes performance to suffer:
• The more devices attached, the greater the bus length, the greater the
transmission delay.
• The bus may become a bottleneck as the capacity of the bus is reached.
Expansion Bus
Interface
Expansion bus
Modem Network
7
http://wikistudent.ws/Unisa
Main memory is moved off the local bus onto a system bus, so I/O transfers to
and from memory don’t interfere with the processor’s activity.
You may connect I/O controllers directly onto the system bus, but it is more
efficient to use an expansion bus (see diagram), because this allows support
for a wide variety of I/O devices and insulates memory-to-processor traffic
from I/O traffic.
1. Bus types
* Dedicated
• Functional dedication:
E.g. Using separate dedicated address and data lines.
• Physical dedication:
E.g. Using an I/O bus to interconnect all I/O modules.
Advantage: High throughput because there is less bus contention.
Disadvantage: Increased size and cost of the system.
* Multiplexed
Using the same lines for multiple purposes.
Advantage: The use of fewer lines saves space and cost.
Disadvantage: More complex circuitry is needed and there is a potential
reduction in performance.
2. Method of arbitration
Arbitration is needed because only one unit at a time can transmit over the bus
* Centralised
A single hardware device (a bus controller / arbiter) is responsible for
allocating time on the bus.
The device may be a separate module or part of the processor.
* Distributed
Each module contains access control logic and the modules act together to share
the bus.
3. Timing
Timing refers to the way in which events are coordinated on the bus.
* Synchronous
The occurrence of events on the bus is determined by a clock. The bus has a
clock line upon which a clock transmits a regular sequence of alternating 1s
and 0s. A single 1-0 transmission is referred to as a clock cycle / bus cycle,
and defines a time slot. All other devices on the bus can read the clock line,
and all events start at the beginning of a clock cycle. Most events occupy a
single clock cycle.
Advantage: Simpler to implement and test than asynchronous timing.
Disadvantage: Less flexible than asynchronous timing.
Disadvantage: Because all devices are tied to a fixed clock rate, the system
can’t take advantage of advances in device performance.
* Asynchronous
The occurrence of one event on a bus follows and depends on the occurrence of a
previous event.
Advantage: A mixture of slow and fast devices, using older and newer
technology, can share a bus.
8
http://wikistudent.ws/Unisa
4. Bus width
* Address
The wider the address bus, the greater the range of locations referrenced.
* Data
The wider the data bus, the greater the number of bits transferred at one time.
Advantages:
• Delivers better system performance for high-speed I/O subsystems
• Requires very few chips to implement
• Supports other buses attached to it
Characteristics:
• Supports a variety of microprocessor-based configurations (including
single- and multiple-processor systems)
• Provides a general-purpose set of functions
• Makes use of synchronous timing and a centralised arbitration scheme
Bus structure
PCI may be configured as a 32- or 64-bit bus.
The signal lines can be divided into the following functional groups:
Mandatory PCI signal lines
• System pins
• Address and data pins
• Interface control pins
• Arbitration pins
• Error reporting pins
Optional PCI signal lines
• Interrupt pins
• Cache support pins
• 64-bit bus extension pins
• JTAG/boundary scan pins
PCI commands
• Interrupt Acknowledge
• Special cycle
• I/O Read / Write
• Memory Read / Read Line / Read Multiple / Write / Write and Invalidate
• Configuration Read / Write
• Dual Address Cycle
9
http://wikistudent.ws/Unisa
Internal (main. E.g. Cache)
External (secondary. E.g. Peripheral storage devices)
Capacity
Word size (common word lengths are 8, 16, and 32 bits)
Number of words
Unit of transfer (= The no of bits read out of / written into memory at a time)
Word
Block (= a larger unit than a word; for external memory transfers)
Access Method
Sequential (Access is made in a linear sequence. E.g. Tapes)
Direct (Access is made by direct access to reach a general vicinity first)
Random (Any location can be selected at random and directly accessed)
Associative (A word is retrieved based on a portion of its contents)
Performance
Access time
(For RAM: The time it takes to perform a read / write operation)
(For non-RAM: the time it takes to position the read-write mechanism)
nd
Cycle time (Access time + additional time needed before a 2 access can begin)
Transfer rate (Rate at which data can be transferred in / out of a memory unit)
Physical type
Semiconductor
Magnetic (used for disk and tape)
Optical
Magneto-optical
Physical Characteristics
Volatile / non-volatile
Erasable / non-erasable
Organisation
(Organisation = the physical arrangement of bits to form words)
The designer can’t rely on a single memory component, but must employ a memory
hierarchy for the combination of high capacity, low cost, and fast access.
The hierarchy:
Inboard memory - Registers cache main memory
Outboard storage - Magnetic disk, CD-ROM, CD-RW, DVD-RW, DVD-RAM
Off-line storage - Magnetic tape, MO, WORM
As you go down the hierarchy, the frequency of access of the memory by the
processor decreases.
Locality of reference
During program execution, memory references by the processor tend to cluster.
Once a loop / subroutine is entered, there are repeated references to a small
set of instructions. Over a long period of time, the clusters in use change,
but over a short period of time, the processor is mainly working with fixed
clusters of memory references.
You can organise data across the hierarchy so that the percentage of accesses
to the lower levels is less than the higher ones. E.g. you can temporarily
place low-level clusters in higher levels, and later swap them back to make
room for new clusters coming in.
10
http://wikistudent.ws/Unisa
number of words) is read into the cache and then the word goes to the
processor. Locality of reference means that it’s likely there will be future
references to other words in the same block.
Each line of cache contains a block of memory (with a fixed number of words).
The number of cache lines is considerably less than the number of main memory
blocks! At any time, some subset of the blocks of memory resides in lines in
the cache. Because there are more blocks than lines, an individual line can’t
be uniquely and permanently dedicated to a particular block. So, each line
includes a tag that identifies which particular memory block is currently being
stored. (The tag is usually a portion of the main memory address).
data
data
buffer
Diagram explanation:
• The cache connects to the processor via data, control, and address lines
• The data and address lines also attach to buffers, which attach to the
system bus from which main memory is reached
• Cache hit: The data and address buffers are disabled and communication is
only between processor and cache, with no system bus traffic
• Cache miss: The address is loaded onto the system bus and the data are
returned through the data buffer to both the cache and the processor (In
some organisations, the word is first read into the cache and then
transferred from cache to processor)
Mapping function
Direct mapping
Each block of memory is mapped onto only one possible cache line. Only the
address is used to determine the corresponding cache line. A tag is necessary
to distinguish the data block from other data blocks that can fit into the
relevant line
Advantage: Simple and inexpensive to implement.
Disadvantage: There is a fixed cache location for any given block.
Associative mapping
Each main memory block can be loaded into any line of cache. The cache control
logic interprets each main memory address as a tag and a word field, where the
tag field uniquely identifies a block of main memory. All lines have to be
examined to determine whether a particular memory block is in the cache.
Advantage: Flexible
Disadvantage: Requires complex circuitry in order to examine all the tags of
the cache in parallel.
Set-associative mapping
A compromise between direct and associative mapping. The cache is divided into
sets, each of which consists of a number of lines. A block can be mapped into
any of the lines of a particular set. The tag is only compared to the tags
within a single set.
11
http://wikistudent.ws/Unisa
Advantages of both direct and associative methods, while their disadvantages
are reduced.
Number of caches
Single and two level caches
The on-chip cache reduces the processor’s external bus activity and therefore
speeds up execution times and increases overall system performance. (The bus is
then also free to support other transfers).
Two-level cache organisation: Internal on-chip cache (L1) and external cache
(L2), which is the organisation found in most contemporary designs.
Without L2 cache, if the processor makes an access request for a memory
location not in L1 cache, it would have to access DRAM or ROM memory across the
bus, which results in poor performance because of slow bus speed.
Design features for multilevel caches:
• For an off-chip L2 cache, many designs don’t use the system bus as the
transfer path between L2 and processor, but use a separate data path to
reduce the burden on the system bus.
• With the continued shrinkage of processor components, many processors now
incorporate the L2 cache on the processor chip, improving performance.
Advantage of multilevel caches: Improved performance
Disadvantage of multilevel caches: Complicated design
12
http://wikistudent.ws/Unisa
Instruction cache
Unlike the organisation used in all previous Pentium models (and in most other
processors), the Pentium 4 instruction cache sits between the instruction fetch
/ decode unit and the (out-of-order) execution core.
The reason:
Machine instructions are decoded into simple RISC-like instructions called
micro-operations. (Using these simple micro-ops enhances performance). However,
the Pentium machine instructions are cumbersome to decode, so performance is
enhanced if this decoding is done independently of the scheduling and
pipelining logic.
Data cache
The data cache employs a write-back policy: Data are written to main memory
only when they are removed from the cache and there has been an update. (The
write-back technique minimises memory writes because updates are only made in
the cache).
The Pentium 4 processor can be dynamically configured to support write-through
caching (A simple technique where all write operations are made to main memory
as well as to cache, ensuring that main memory is always valid)
Two bits in control registers that control the data cache:
• CD (cache disable)
• NWT (not write-through)
Two instructions that control the data cache:
• INVD - invalidates / flushes the internal cache memory and signals the
external cache (if any) to invalidate
• WBINVD - writes back and invalidates internal, then external cache
Control Control
WRITE READ
RAM
Characteristics:
• You can read & write to memory easily & rapidly
• Reading and writing are accomplished by using electrical signals
• Volatile (There must be a constant power supply)
13
http://wikistudent.ws/Unisa
Two traditional forms of RAM used in computers:
1. DRAM (dynamic)
The cells store data as charges, on capacitors. The presence or absence of a
charge on a capacitor is interpreted as binary 1 or 0. DRAMs require periodic
charge refreshing to maintain data storage, because capacitors have a natural
tendency to discharge. ‘Dynamic’ refers to this tendency of the stored charge
to leak away, even with power continuously applied.
Operation of DRAM:
Write operation
1. A voltage signal is applied to the bit line (high = 1, low = 0)
2. A signal is then applied to the address line, allowing a charge to be
transferred to the capacitor
Read operation
1. The address line is selected
2. The transistor turns on and the charge stored on the capacitor is fed out
onto a bit line and to a sense amplifier
3. The sense amplifier compares the capacitor voltage to a reference value
and determines if the cell contains 1 or 0
4. The read out from the cell discharges the capacitor, which must be then
restored to complete the operation
Although the DRAM cell is used to store a single bit (0 or 1), it is
essentially an analogue device. The capacitor can store any charge value within
a range, but it is interpreted as 1 or 0.
2. SRAM (static)
Binary values are stored using traditional flip-flop logic-gate configurations.
A static RAM will hold its data as long as power is supplied to it. No refresh
is needed to retain data.
Operation of SRAM:
The address line (which controls transistors) is used to open or close a
switch. When a signal is applied to the address line, the transistors are
switched on, allowing a read / write operation.
DRAM SRAM
Volatile (power must be continuously supplied to the memory to preserve the bit
values)
Analogue Digital
Smaller and simpler memory cell Larger and more complex memory cell
More dense and less expensive Less dense and more expensive
Requires supporting refresh circuitry
Favoured for large memory requirements
Slower Faster
Used for main memory Used for cache memory (on & off chip)
Types of ROM
ROM contains a permanent pattern of data that can’t be changed.
A ROM is non-volatile: no power is needed to maintain the bit values in memory.
You can’t write to a ROM
PROM
The writing process (‘programming’) is performed electrically, with special
equipment.
Advantages:
• A less expensive alternative when only a small number of ROMs with a
particular memory content is needed
• Provide flexibility and convenience
• Good for high-volume production runs
Read-mostly memory:
(For when read operations are far more frequent than write operations)
EPROM Flash memory EEPROM
Erasure Yes Yes No
required
before a write
14
http://wikistudent.ws/Unisa
operation
Method of Exposure to UV Electrical Electrical
erasure radiation erasure erasure
Erasure level Chip level Block level Byte level
Density High High Lower
Advantages Can be updated An entire flash Flexible: can be
multiple times memory can be updated in place
erased in a few
seconds
Error correction
Hard failure = a permanent physical defect so that the memory cell(s) affected
can’t reliably store data, but become stuck at 0 or 1 or switch erratically
between 0 and 1.
Possible causes: harsh environmental abuse, manufacturing defects, and wear.
Soft error = a random, non-destructive event that alters the contents of one or
more memory cells, without damaging the memory.
Possible causes: power supply problems or alpha particles resulting from
radioactive decay.
DRAM SDRAM
Asynchronous Synchronous (external clock signal)
The processor must wait during the The processor can do other tasks
access-time delay, slowing down while the SDRAM is processing
system performance. requests, improving system
performance.
15
http://wikistudent.ws/Unisa
- Mode register and associated control
logic, for customisation.
- Multiple-bank internal architecture
that improves opportunities for on-
chip parallelism.
Bits near the centre of a rotating disk travel slower than bits on the outside,
so it is necessary to compensate for the variation in speed so that the head
can read all the bits at the same rate.
16
http://wikistudent.ws/Unisa
Disks have control data recorded on them to indicate the starting point on the
track and start & end of each sector.
Physical characteristics
Head motion
• Fixed head disk
One read-write head per track.
All of the heads are mounted on a rigid arm that extends across all
tracks (rare).
• Movable head disk
Only one read-write head.
The head is mounted on an arm, which can extend and retract for the head
to be positioned above any track.
Disk portability
• Non-removable disk
Permanently mounted in the disk drive (E.g. the hard disk)
• Removable disk
Can be removed and replaced with another disk, so unlimited amounts of
data are available (E.g. floppy disks and ZIP cartridge disks)
Sides
• Single sided
• Double sided
Platters
• Single platter
• Multiple platters
Multiple arms are provided. All of the heads are mechanically fixed so
that they all are at the same distance from the centre of the disk (i.e.
same track number) and move together. Cylinder = the set of all the
tracks in the same relative position on the platter.
Head mechanism
• Contact (floppy)
The head mechanism comes into physical contact with the medium
• Fixed gap
The read-write head is positioned a fixed distance above the platter,
allowing an air gap
• Aerodynamic gap (Winchester)
A head must generate / sense an electromagnetic field of sufficient
magnitude to write / read properly. The narrower the head, the closer it
must be to the platter. (Narrower heads narrower tracks greater data
density). However, the closer the head is to the disk, the greater the
risk of error from impurities / imperfections.
Winchester heads are used in sealed drive assemblies that are almost free
of contaminants. They are designed to operate closer to the disk’s
surface, allowing greater data density. The head is an aerodynamic foil
that rests lightly on the platter’s surface when the disk is motionless.
When the disk spins, the air pressure generated is enough to make the
foil rise above the surface.
17
http://wikistudent.ws/Unisa
Wait for device Wait for channel Seek Rotational delay Data transfer
RAID
RAID is a standardised scheme for multiple-disk database design.
The RAID strategy replaces large-capacity disk drives with multiple smaller-
capacity drives and distributes data in such a way as to enable simultaneous
access to data from multiple drives, thereby improving I/O performance and
allowing easier incremental increases in capacity.
Allowing multiple heads to operate simultaneously achieves higher I/O and
transfer rates, but increases the probability of failure. To compensate for
this decreased reliability, RAID makes use of stored parity information that
enables the recovery of data lost due to a disk failure.
RAID level 0
Not true RAID because it doesn’t include redundancy to improve performance.
Category: Striping
Striping = distributing data over multiple drives in round robin fashion.
Advantage of striping: If a single I/O request consists of multiple logically
contiguous strips, then up to n (n = no of disks) strips for that request can
be handled in parallel, greatly reducing the I/O transfer time.
Performance is excellent and the implementation is straightforward.
Disadvantage: Works worst with OS that mainly ask for data one sector at a time
Disadvantage: Reliability is potentially worse than having a single large disk.
Applications: Supercomputers, where performance and capacity are the main
concern and low cost is more important than improved reliability.
RAID 0 for High Data Transfer Capacity:
Two requirements must be met for RAID 0 to achieve a high data transfer rate:
* A high transfer capacity must exist along the entire path between host memory
and the individual disk drives. (Includes buses, I/O adapters, etc.)
* The application must make I/O requests that drive the disk array efficiently.
(E.g. if large amounts of logically contiguous data are requested).
RAID 0 for High I/O Request Rate:
For an individual I/O request for a small amount of data, the I/O time is
dominated by the motion of the disk heads (seek time) and the movement of the
disk (rotational latency). A disk array can provide high I/O execution rates by
balancing the I/O load across multiple disks. The larger the strip size, the
more requests that can be handled in parallel.
(RAID 0 works best with large requests).
RAID level 1
Category: Mirroring
Differs from RAID levels 2 6 in the way in which redundancy is achieved. In
these other RAID schemes, some form of parity calculation is used to introduce
redundancy, but in RAID 1, redundancy is achieved by duplicating all the data.
Data striping (as in RAID 0) is used, but each logical strip is mapped to two
separate physical disks so that every disk in the array has a mirror disk with
the same data.
Advantages:
• A read request can be serviced by either of the two disks that contains
the requested data
18
http://wikistudent.ws/Unisa
• A write request requires that both corresponding strips be updated, but
this can be done in parallel. (There are no parity bits to update as
well)
• Recovery from a failure is simple. When a drive fails, the data may still
be accessed from the second drive (Excellent fault tolerance)
Disadvantage:
• Cost. Requires twice the disk space of the logical disk that it supports
RAID 1 can achieve high I/O request rates if the bulk of the requests are reads
(doubling the performance of RAID 0).
Applications: System drives, critical files
RAID level 2
Category: Parallel access
RAID levels 2 & 3 make use of a parallel access technique. In a parallel access
array, all member disks participate in the execution of every I/O request. The
spindles of the individual drives are synchronised so that each disk head is in
the same position on each disk at any given time.
Data striping is used (In RAID 2 & 3 the strips are as small as a byte or a
word). An error-correcting code is calculated across corresponding bits in each
data disk, and the bits of the code are stored in the corresponding bit
positions on multiple parity disks. Typically, a Hamming code is used, which is
able to correct single-bit errors and detect double-bit errors.
E.g. You could split bytes into two pairs of 4 bits, each one with three parity
bits, forming 7-bit words.
Although RAID2 requires fewer disks than RAID1, it is still costly.
On a single read, all disks are simultaneously accessed. The requested data and
the associated error-correcting code are delivered to the array controller. If
there is a single-bit error, the controller can recognise and correct it
instantly, so that the read access time is not slowed.
On a single write, all data disks and parity disks must be accessed for the
write operation.
Disadvantage: all drives must be rotationally synchronised, and this only makes
sense with a substantial number of drives.
Applications: none (environments in which many disk errors occur)
RAID level 3
Category: Parallel access
The difference between RAID3 and RAID2 is that RAID3 requires only a single
redundant disk, no matter how large the disk array.
RAID3 employs parallel access, with data distributed in small strips. Instead
of an error-correcting code, a simple parity bit is computed for the set of
individual bits in the same position on all of the data disks.
Redundancy:
In the event of a drive failure, the parity drive is accessed and data is
reconstructed from the remaining devices. Once the failed drive is replaced,
the missing data can be restored on the new drive and operation resumed.
In the event of a disk failure, all of the data are still available in what is
referred to as reduced mode.
Reduced mode reads: The missing data are regenerated on the fly using an
exclusive OR calculation.
Reduced mode writes: Consistency of the parity must be maintained for later
regeneration.
Return to full operation requires that the failed disk be replaced and the
entire contents of the failed disk be regenerated on the new disk.
Performance:
19
http://wikistudent.ws/Unisa
Because data are striped in very small strips, RAID3 can achieve very high data
transfer rates. Any I/O request will involve the parallel transfer of data from
all of the disks. For large transfers, the performance improvement is very
noticeable. However, only one I/O request can be executed at a time, so in a
transaction-oriented environment performance suffers.
Applications: Large I/O request size applications, like imaging and CAD
RAID level 4
Category: Independent access
In an independent access array, each member disk operates independently, so
that separate I/O requests can be satisfied in parallel.
Independent access arrays are better suited for high I/O request rates than
high data transfer rates.
RAID 4 6: Striping is used (large strips)
RAID 4 has strip-for-strip parity written onto an extra device.
Disadvantage: If one sector is changed, it is necessary to read all the drives
to recalculate the parity.
Applications: None
RAID level 5
Category: Independent access
Eliminates the bottleneck of RAID 4 by distributing the parity bits uniformly
over all the drives.
Disadvantage: in the event of a drive crash, reconstructing the contents of the
failed drive is a complex process.
Applications: High request rate, read intensive, data lookup
RAID level 6
Category: Independent access
Two different parity calculations are carried out and stored in separate blocks
on different disks, making it possible to regenerate data even if two disks
fail.
Advantage: Extremely high data availability.
Disadvantage: Substantial write penalty, because each write affects two parity
blocks.
Applications: Those requiring extremely high availability
Optical memory
CD-ROM
The audio CD and CD-ROM share a similar technology (both are made the same
way). The main difference is that CD-ROM players are more rugged and have error
correction devices to ensure that data are properly transferred from disk to
computer. CDs can hold 60 minutes of music, and CD-ROMS can hold 650 Mbytes.
Appearance:
20
http://wikistudent.ws/Unisa
The disk is formed from a resin (like polycarbonate). Digital information is
imprinted as a series of microscopic pits on the surface of the polycarbonate.
(This is done firstly with a laser to create a master disk, which is used to
make a die to stamp out copies onto the polycarbonate). The pitted surface is
then coated with a highly reflective surface (aluminium / gold), which is
covered with clear acrylic to protect it against dust and scratches. Finally, a
label can be silk-screened onto the acrylic.
Polycarbonate aluminium acrylic label.
Data retrieval:
Information is retrieved from a CD or CD-ROM by a laser in the player / drive
unit. The laser shines through the clear polycarbonate while the disk spins.
The intensity of the reflected light of the laser changes as it encounters pits
and lands: Pits a low intensity is reflected back to the source; lands a
higher intensity is reflected back. The change between pits and lands is
detected by a photosensor and converted into a digital signal. (Beginning / end
of a pit = 1, no change in elevation = 0).
Data organisation:
Information is organised on a single spiral track, spiralling outwards (Greater
capacity than concentric tracks!) Sectors near the outside are the same length
as those near the inside, so information is packed evenly. The information is
scanned at the same rate by rotating the disk at variable speed (Slower for
accesses near the outer edge and faster for the centre). The pits are read by
the laser at a constant linear velocity (CLV).
Data are organised as a sequence of blocks, with the following fields:
• Sync
Identifies the beginning of a block
Consists of a byte of all 0s, 10 bytes of all 1s, and a byte of all 0s
• Header
Contains the block address and the mode byte
Mode 0 = blank data field
Mode 1 = use of an error-correcting code and 2048 bytes of data
Mode 2 = no error-correcting code and 2236 bytes of data
• Data
User data
• Auxiliary
Additional user data in mode 2
Error-correcting code in mode 3
With the use of CLV, random access becomes more difficult. The head moves to
the general area, and then tries to find and access the specific sector.
Applications:
CD-ROM is appropriate for the distribution of large amounts of data to a large
number of users.
Not appropriate for individualised applications, because of the expense of the
initial writing process.
Advantages:
• The optical disk and info on it can be mass replicated inexpensively
(unlike a magnetic disk).
• The optical disk is removable, allowing the disk itself to be used for
archival storage. (Most magnetic disks are not removable).
Disadvantages:
• Read-only, so can’t be updated
• Access time is much longer than that of a magnetic disk drive
CD Recordable
The disk can be written to once with a moderately intensive laser beam.
Instead of pitting the surface to change reflectivity, there is a dye layer
which does the same thing, after being activated by a laser.
The disk can be read on a CD-R / CD-ROM drive.
The CD-R optical disk is good for archival storage of documents and files,
providing a permanent record of large volumes of data.
21
http://wikistudent.ws/Unisa
CD Rewritable
Phase change: A disk uses a material that has two different reflectivities in
two different phase states.
Amorphous state: The molecules exhibit a random orientation; reflects light
poorly
Crystalline state: Smooth surface that reflects light well.
A beam of laser light can change the material from one phase to another.
Disadvantage of phase change optical disks: The material eventually loses its
properties, so you can have only up to 1 000 000 erase cycles.
Advantage over CD-ROM & CD-R: Can be rewritten and used as true secondary
storage.
DVD
Can store 7 times the amount of data as CD-ROMs.
Come in writeable (DVD-R, DVD-RW: one-sided) as well as read-only (DVD-ROM:
one- or two-sided) versions.
Three differences from CDs:
• Bits are packed more closely on a DVD, and a laser with shorter
wavelength is used, increasing the capacity to 4.7 GB
• The DVD has a second layer of pits and lands on top of the first layer,
doubling the capacity to about 8.5 GB. (By adjusting focus, the lasers in
DVD drives can read each layer separately)
• The DVD-ROM can be two sided, bringing total capacity up to 17 GB
Magnetic tape
Tape systems use the same reading and recording techniques as disk systems. A
flexible polyester tape is coated with magnetisable material and is housed in a
cartridge.
Parallel recording:
Data are structured as a number of parallel tracks running lengthwise. Tapes
used to have 9 tracks (to store one byte at a time + a parity bit), but now use
18 (word) or 36 (double word) tracks.
Serial recording:
Data are laid out as a sequence of bits along each track, as is done with
magnetic disks. Most modern systems use this method.
As with the disk, data are read and written in contiguous blocks, called
physical records, on a tape. Blocks on the tape are separated by gaps called
inter-record gaps. As with the disk, the tape is formatted to assist in
locating physical records.
Serpentine recording (for serial tapes):
When data are being recorded, the first set of bits is recorded along the whole
length of the tape. When the end of the tape is reached, the heads are
repositioned to record a new track, and the tape is again recorded on its whole
length, this time in the other direction. The process continues until the tape
is full. To increase speed, the read-write head is capable of reading and
writing a number of adjacent tracks simultaneously. Although data are recorded
serially along individual tracks, blocks in sequence are stored on adjacent
tracks.
22
http://wikistudent.ws/Unisa
• The data transfer rate of some peripherals is much slower than that of
memory or processor. It is impractical to use the high-speed system bus
to communicate directly with the peripheral
• The data transfer rate of some peripherals is faster than that of the
memory or processor
• Peripherals often use different data formats and word lengths than the
computer to which they are attached
External devices
An external device attaches to the computer by a link to an I/O module. This
link is used to exchange control, status, and data between the I/O module and
the external device.
Three categories of external devices:
• Human readable (for communicating with the user. E.g. printers)
• Machine readable (for communicating with equipment. E.g. tape systems)
• Communication (for communicating with remote devices E.g. another PC)
An external device has an interface to the I/O module with:
• Control signals (that determine the function the device will perform)
• Data (in the form of a set of bits to be sent/received from the module)
• Status signals (to indicate the state of the device)
Control logic controls the device’s operation in response to the I/O module.
The transducer converts data from electrical to other forms of energy.
A buffer, associated with the transducer, temporarily holds data for transfer
Keyboard / monitor
Each character is associated with a code (7 or 8 bits). IRA (International
Reference Alphabet) is the most commonly used text code, with 128 different
characters.
Printable characters: alphabetic, numeric, and special characters
Control characters: can have to do with controlling printing / displaying (E.g.
carriage return) or with communications procedures
Keyboard input:
1. When you press a key, an electronic signal is generated
2. The signal is interpreted by the transducer in the keyboard and
translated into the bit pattern of the corresponding IRA code
3. This bit pattern is then transmitted to the I/O module
Output:
1. IRA code characters are transmitted to an external device from the I/O
module
2. The transducer at the device interprets this code and sends the required
electronic signals to the output device (for e.g. display)
Disk drive
A disk drive contains electronics for exchanging data, control, and status
signals with an I/O module plus the electronics for controlling the disk
read/write mechanism.
Fixed-head disk:
The transducer can convert between the magnetic patterns on the moving disk
surface and bits in the device’s buffer.
Moving-head disk:
Must additionally be able to cause the disk arm to move radially in and out
across the disk’s surface.
I/O modules
Module function
The major functions for an I/O module fall into these categories:
23
http://wikistudent.ws/Unisa
1. Control and timing
Co-ordinates the flow of traffic between internal resources and external
devices.
Steps for transferring data from an external device to the processor:
1. The processor interrogates the I/O module to check the status of the
attached device
2. The I/O module returns the device status
3. If the device is ready, the processor requests the transfer of data
4. The I/O module obtains a unit of data from the external device
5. The data are transferred from the I/O module to the processor
2. Processor communication
Involves the following:
• Command decoding
The I/O module accepts commands from the processor, sent as signals on the
control bus.
• Data
Data are exchanged between processor and I/O module over the data bus.
• Status reporting
Because peripherals are so slow, it is important to know the status of the
I/O module, which can be reported with a status signal.
• Address recognition
An I/O module must recognise one unique address for each peripheral it
controls.
3. Device communication
This communication involves commands, status information, and data.
4. Data buffering
Data travelling from main memory to the I/O module:
Data coming from main memory are sent to an I/O module in a rapid burst,
because of the high transfer rate. The data are buffered in the I/O module and
then sent to the peripheral device at its data rate.
Data travelling from the device to the I/O module:
Data must be buffered so as not to tie up the memory in a slow transfer
operation.
The I/O module must be able to operate at both device and memory speeds.
5. Error detection
Classes of errors:
• Mechanical and electrical malfunctions reported by the device (e.g. paper
jam)
• Unintentional changes to the bit pattern as it is transmitted from device
to I/O module
Some form of error-detecting code is often used to detect transmission errors,
e.g. the use of a parity bit on each character of data.
Module structure
Data registers buffer data transferred to and from the module.
A status register provides current status info, and may also function as a
control register, to accept detailed control information from the processor.
Control lines are used by the processor to issue commands to the module.
Address lines are for generating the module’s unique address.
The module contains I/O logic specific to the interface with each device that
it controls.
Data
Data registers External
Data lines
device
Status
interface
Status/control registers logic
Data lines I/O Control
logic
Address lines
24
http://wikistudent.ws/Unisa
Control lines
Interface to Interface to
system bus external device
I/O channel / I/O processor = an I/O module that takes on most of the
processing burden, presenting a high-level interface to the processor.
(Commonly seen on mainframes).
I/O controller / device controller = a primitive I/O module that requires
detailed control. (Commonly seen on microcomputers).
1. Programmed I/O
1. When the processor encounters a program instruction relating to I/O, it
issues a command to the appropriate I/O module
2. The I/O module performs the requested action and sets the appropriate
bits in the I/O status register
3. (The I/O module takes no further action to alert the processor)
4. The processor periodically checks the status of the I/O module until it
finds that the operation is complete
Disadvantage: Programmed I/O is a time-consuming process that keeps the
processor busy needlessly. As a result, the level of the performance of the
entire system is severely degraded.
25
http://wikistudent.ws/Unisa
2. Interrupt-driven I/O
1. The processor issues an I/O command to a module and then goes on to do
some other useful work
2. The I/O module will interrupt the processor to request service when it is
ready to exchange data with the processor
3. The processor then executes the data transfer, and resumes its former
processing
Advantage: More efficient than programmed I/O because it eliminates needless
waiting.
Disadvantage: Still consumes a lot of processor time, because every word of
data that moves between memory and I/O module must pass through the processor.
Interrupt processing
When an I/O device completes an I/O operation, the following sequence of events
occur:
1. The device issues an interrupt signal to the processor
2. The processor finishes execution of the current instruction
3. The processor tests for an interrupt, and sends an acknowledgement signal
to the device that issued the interrupt
4. The processor prepares transferring control to the interrupt routine by
saving information needed to resume the current program at the point of
the interrupt. (The PSW and PC are pushed onto the control stack)
5. The processor now loads the new PC value based on the interrupt
6. The rest of the ‘state’ information of the interrupted program must be
saved (like the contents of the processor registers) on the stack
7. The interrupt handler then processes the interrupt
8. When interrupt processing is complete, the saved register values are
retrieved from the stack and restored to the registers
9. Finally the PSW and PC values from the stack are restored, so the next
instruction to be executed will be from the previously interrupted
program
Design issues
Device identification
• Multiple interrupt lines
It is impractical to dedicate more than a few bus lines to interrupt lines,
so even if multiple lines are used, it is likely that each line will have
multiple I/O modules attached to it, so one of the other techniques must be
used on each line.
• Software poll
1. When the processor detects an interrupt, it branches to an interrupt-
service routine which polls each I/O module to determine which module
caused the interrupt
2. Once the correct module is identified, the processor branches to a
device-service routine specific to that device.
Disadvantage: time-consuming.
• Daisy chain (hardware poll, vectored interrupt)
All I/O modules share a common interrupt request line. The interrupt
acknowledge line is daisy chained through the modules.
1. When the processor senses an interrupt, it sends out an interrupt
acknowledge
2. This signal passes through a series of I/O modules until it gets to
a requesting module
3. The requesting module responds by placing a word on the data lines
4. (The word (vector) is a unique id, like the I/O module’s address)
5. The processor uses the vector as a pointer to the appropriate
device-service routine
Advantages: more efficient than software polls, and avoids the need to execute
a general interrupt-service routine first.
• Bus arbitration (vectored)
26
http://wikistudent.ws/Unisa
An I/O module must first gain control of the bus before it can raise the
interrupt request line. Thus, only one module can raise the line at a time.
1. When the processor detects the interrupt, it responds on the interrupt
acknowledge line
2. The requesting module then places its vector on the data lines
Order of processing
The above techniques also provide a way of assigning priorities when more than
one device is requesting interrupt service.
With multiple lines, the processor just picks the interrupt line with the
highest priority.
With software polling, the order in which the modules are polled determines
their priority.
Bus arbitration can also employ a priority scheme.
DMA function
A DMA module is added to the system bus, which is capable of mimicking the
processor and taking over control of the system for the processor. It needs to
do this to transfer data to and from memory over the system bus. The DMA module
must either use the bus only when the processor doesn’t need it, or force the
processor to suspend operation temporarily (= cycle stealing).
27
http://wikistudent.ws/Unisa
1. Selector channel
Controls multiple high-speed devices and, at any one time, is dedicated to the
transfer of data with one of those devices. Thus, the I/O channel selects one
device and effects the data transfer. Each device is handled by a controller
(I/O module). Thus, the I/O channel serves in place of the CPU in controlling
these I/O modules (controllers).
2. Multiplexor channel
Can handle I/O with multiple devices at the same time. For low-speed devices, a
byte multiplexor accepts / transmits characters as fast as possible to multiple
devices. For high-speed devices, a block multiplexor interleaves blocks of data
from several devices.
FireWire is finding favour not only for computer systems, but also in digital
cameras, VCRs etc. where it is used to transport images.
28
http://wikistudent.ws/Unisa
One of FireWire’s strengths is that it uses serial transmission rather than
parallel. (Parallel interfaces, like SCSI, require more wires so it’s more
expensive and synchronisation between wires can be a problem).
Computers are getting physically smaller. Handheld computers have little room
for connectors, yet need high data rates to handle images and video.
FireWire configurations
FireWire uses a daisy-chain configuration, with up to 63 devices connected off
a single port. Up to 1022 FireWire buses can be interconnected using bridges,
enabling a system to support as many peripherals as required.
FireWire provides for hot plugging (You can connect peripherals without
switching the computer off).
FireWire provides for automatic configuration (You don’t need to manually set
device IDs).
InfiniBand
InfiniBand is a recent I/O specification aimed at the high-end server market.
The main purpose of InfiniBand is to improve data flow between processors and
intelligent I/O devices.
InfiniBand is intended to:
• Replace the PCI bus in servers
• Provide greater capacity
• Provide increased expandability
• Provide enhanced flexibility in server design
In essence, InfiniBand eables servers, remote storage, and other network
devices to be attached in a central fabric of switches and links. The switch-
based architecture can connect up to 64 000 servers, storage systems, and
networking devices.
Infiniband architecture
PCI is a limited architecture compared to InfiniBand. With InfiniBand, you
don’t have to have the basic I/O interface hardware inside the server chassis -
remote storage, networking, and connections between servers are accomplished by
attaching all devices to a central fabric of switches and links. Removing I/O
from the server chassis allows greater server density and allows for a more
flexible and scalable data centre, as independent nodes may be added as needed.
Infiniband operation
Each physical link between a switch and an attached interface can support up to
16 logical channels, called virtual lanes. One lane is reserved for fabric
management and the other lanes for data transport. Data are sent in the form of
a stream of packets, with each packet containing a portion of the total data to
be transferred, plus addressing and control information. A set of
communications protocols is used to manage the transfer of data. A virtual lane
is temporarily dedicated to the transfer of data from one end node to another
29
http://wikistudent.ws/Unisa
over the InfiniBand fabric. The InfiniBand switch maps traffic from an incoming
lane to an outgoing lane to route the data between the desired end points.
Integer representation
Sign-Magnitude representation
The left-most bit represents the sign (1 = -, 0 = +).
Disadvantages:
• Addition and subtraction require a consideration of both the signs of the
numbers and their relative magnitudes
• There are two representations of 0
When changing a number to a greater bit length, just move the sign-bit to the
new leftmost position and fill in with zeros.
E.g. 1010 10000010
Fixed-point representation
The radix point (period) is fixed and assumed to be to the right of the
rightmost digit. You can use the same number representation for binary
fractions by scaling the numbers so that the binary point is implicitly
positioned at another location.
Integer arithmetic
Negation
For sign-magnitude representation, just invert the sign bit.
The reason why you have to add 1 when you negate a number is because the
numbers aren’t centred around zero.
Note that the numbers range from -2 n −1 to 2 n −1 -1, where n is the number of bits
in use (3 in the above example). In total, there are 2 n different numbers.
Addition
In twos complement representation, just add the two numbers and ignore the
carry bit (if any).
30
http://wikistudent.ws/Unisa
Overflow = when the result is larger than can be held in the word size. You
know there is an overflow if you add two numbers with the same sign and the
result has the opposite sign.
Subtraction
In two’s complement representation, just take the twos complement of the
subtrahend, and add both numbers together.
Floating-point representation
Floating-point numbers are expressed as a significand (fractional number)
multiplied by the base B (i.e. 2) to the power E. This B E is what moves the
decimal point right (if E is positive) or left (if E is negative).
General format:
Floating-point numbers must be normalised, which just means putting the number
in a different format. This is done by shifting the radix point and adjusting
the exponent accordingly. (This varies with the different standards).
Note: The more bits allocated to the exponent, the larger the range of
expressible numbers (but the number of different values isn’t increased!). The
more bits allocated to the significand, the greater the precision.
Examples:
31
http://wikistudent.ws/Unisa
110.1011 = 1.101011 * 2 2 (The decimal point was shifted left 2 places)
Remember to exclude the bit to the left of the decimal when working out f!
IEEE format:
Step 5. Use the sign, exponent, and fraction to represent the number.
IEEE format:
32
http://wikistudent.ws/Unisa
1 10000001 10101100000000000000000
Instruction representation
Each instruction is represented by a sequence of bits, and is divided into
fields, e.g:
Opcode 4 bits *MOV Operand reference 6 bits *BX Operand reference 6 bits *AX
The instruction is read into an instruction register (IR) in the CPU. The CPU
extracts the data from the various instruction fields to perform the required
operation.
Instruction types
• Data processing (Arithmetic and logic instructions)
• Data storage (Memory instructions)
• Data movement (I/O instructions)
• Control (Test and branch instructions)
Number of addresses
Most instructions have one or two operand addresses, with the address of the
next instruction being implicit (obtained from the program counter).
With one-address instructions, the second address is implicit (the
accumulator). Zero-address instructions reference the stack.
With one-address instructions, you only have one general-purpose register (the
accumulator), but with multiple-address instructions, you have multiple
general-purpose registers. This allows some operations to be performed solely
on registers. Because register references are faster than memory references,
this speeds up execution.
33
http://wikistudent.ws/Unisa
Instruction set design
Fundamental design issues:
• Operation repertoire (How many and which operations to provide)
• Data types (Different types of data upon which operations are performed)
• Instruction format (Length, number of addresses, size of fields, etc.)
• Registers (Number of CPU registers that can be referenced)
• Addressing (The mode(s) by which the address of an operand is specified)
Types of operands
Machine instructions operate on data. The most important general categories of
data are:
1. Addresses
Addresses can be considered as unsigned integers, and can be used in
calculations E.g. to determine the main memory address.
2. Numbers
Three types of numerical data are common:
• Integer / fixed point
• Floating point
• Decimal
Human users deal with decimal numbers, so there is a need to convert from
decimal to binary & vice versa. If there is a lot of I/O, and not much
computation, it is preferable to store and operate on the numbers in decimal
form. Packed decimal representation stores each digit as a 4-bit binary number,
and the 4-bit codes are strung together (1111 on the left = negative).
3. Characters
The most commonly-used character code is ASCII, which represents 128 different
characters, in 7 bits each.
EBCDIC is used in IBM S/390 machines, and is an 8-bit code.
4. Logical data
It can be useful to see data as a string of bits, with the values 0 and 1.
Advantages of the bit-oriented view:
• You can store an array of Boolean data items (true & false)
• You can manipulate the bits of a data item (like shift the bits)
Note: The same data can be treated as sometimes logical and other times as
numerical or text. The ‘type’ of a unit of data is determined by the operation
being performed on it.
Data types
The Pentium can deal with data types of 8 (byte), 16 (word), 32 (doubleword),
and 64 (quadword) bits in length. To allow maximum flexibility in data
structures and efficient memory utilisation, words need not be aligned at even-
numbered adddresses.
The Pentium uses the little-endian style: the least significant type is stored
in the lowest address.
Types of operations
The same general types of operations are found on all machines:
1. Data transfer
• The location of the source and destination operands must be specified.
Each location could be memory, a register, or the top of the stack.
• The length of data to be transferred must also be indicated.
• The mode of addressing for each operand must also be specified
34
http://wikistudent.ws/Unisa
If both source and destination are registers, then the CPU simply causes data
to be transferred from one register to another; this is an operation internal
to the CPU.
If one or both operands are in memory, then the CPU must perform some or all of
the following actions:
1. Calculate the memory address, based on the address mode
2. If the address refers to virtual memory, translate from virtual to actual
memory address
3. Determine whether the addressed item is in cache
4. If not, issue a command to the memory module
2. Arithmetic
Most machines provide the basic arithmetic operations of add, subtract,
multiply, and divide. Other possible operations include: Absolute, negate,
increment, and decrement. The ALU portion of the CPU performs the desired
operation.
3. Logical
NOT, AND, OR, and XOR are the most common logical functions.
Bit shifting:
• With a logical shift, the bits of a word are shifted left or right. On
one end, the bit shifted out is lost.
• With an arithmetic shift, the data is treated as a signed integer so the
sign bit isn’t shifted. On a right shift, the sign bit is replicated, and
on a left shift, the shift is performed on all bits but the sign bit.
• With a rotate, all the bits being operated on are preserved because the
shift is cyclic.
4. Conversion
Conversion instructions are those that change the format or operate on the
format of data. (E.g. converting from decimal to binary or converting one 8-bit
code to another).
5. I/O
Variety of approaches: isolated programmed I/O, memory-mapped programmed I/O,
DMA, and the use of an I/O processor.
6. System control
These instructions can be executed only while the processor is in a certain
privileged state or is executing a program in a special privileged area of
memory. Typically, these instructions are reserved for the use of the OS. (E.g.
a system control instruction may read / alter a control register).
7. Transfer of control
These instructions change the sequence of instruction execution. The CPU must
update the program counter to contain the address of some instruction in
memory.
Some reasons why transfer-of-control instructions are required:
• Some instructions need to be executed multiple times
• Virtually all programs involve some decision making
• Programming is made simpler if you can break up a task into procedures
The most common transfer-of control operations:
Branch instructions
Conditional or unconditional branches (jne / jmp) can be used to create a
repeating loop of instructions.
Skip instructions
The skip instruction includes an implied address (Typically the next address is
skipped).
Procedure call instructions
The two main reasons for using procedures are economy and modularity.
Two basic instructions are involved: a call instruction that branches from the
present location to the procedure, and a return instruction that returns from
the procedure to the place from which it was called.
35
http://wikistudent.ws/Unisa
There are three common places for storing the return address:
• Register
• Start of called procedure
• Top of stack
Parameters can be passed in registers, or be stored in memory just after the
CALL instruction. The best way to pass parameters is by using the stack.
Operation types
Call/return instructions
The Pentium provides four instructions to support procedure call/return:
CALL, ENTER, LEAVE, RETURN.
Memory management
A set of specialised instructions deals with memory segmentation. These are
privileged instructions that can only be executed from the OS.
Condition codes
Condition codes are bits in special registers that may be set by certain
operations and used in conditional branch instructions. These conditions are
set by arithmetic and compare operations.
Assembly language
Programs written in assembly language are translated into machine language by
an assembler.
‘db’ and ‘dw’ are assembler directives that don’t form part of the program code
that will be executed, but just tell the assembler where and how to reserve
memory and how it should be initialised.
Stacks
Stack pointer: Contains the address of the top of the stack
Stack base: Contains the address of the bottom of the stack
Stack limit: Contains the address of the end of the reserved block
Expression evaluation
Postfix / reverse Polish notation:
a + b = ab+
(a + b) * c = ab + c *
a + (b * c) = abc * +
How it works: When reading a postfix expression from left to right, as soon as
you have two variables, followed by an operator, do the calculation and replace
those three items with the result, then proceed.
(You can do the same with the stack by pushing variables and popping the top
two when you get an operator and then pushing the result back on).
E.g. 12345678
Address 100 101 102 103
Big-endian value 12 34 56 78
Little-endian value 78 56 34 12
36
http://wikistudent.ws/Unisa
Direct addressing
The address field contains the effective address of the operand.
E.g. mov ax,[102h], or mov [110h],bx
Not common on contemporary architectures.
Advantage: Requires only one memory reference and no special calculation.
Disadvantage: Provides only a limited address space.
Indirect addressing
The address field refers to the address of a word in memory, which contains a
full-length address of the operand.
Advantage: For a word length of N, an address space of 2 N is now available.
Disadvantage: Instruction execution requires two memory references to fetch the
operand: one to get its address and a second to get its value.
Register addressing
Similar to direct addressing, but the address field refers to a register rather
than a main memory address.
E.g. mov ax,bx
Advantage: Only a small address field is needed in the instruction.
Advantage: No memory references are required.
Disadvantage: The address space is very limited.
37
http://wikistudent.ws/Unisa
Indexed addressing
You can use SI to index array elements.
E.g. mov si,array
add si,3 ; now si indexes the fourth element
mov al,[si] ; (extract the contents pointed to by si)
Base-indexed addressing
This can also be used to index array elements, but here BP or BX are used as
base registers. The base register is normally set to the start of the array and
the index is used as an offset into the array.
E.g. add al,[bx+si] or mov dx,[m_addr+si]
Displacement addressing
Combines the capabilities of direct addressing and register indirect
addressing. The instruction must have two address fields, at least one of which
is explicit. The value contained in one address field is used directly. The
other address field refers to a register whose contents are added to the value
to produce the effective address.
Three common uses of displacement addressing:
Relative addressing
The implicitly referenced register is the program counter (PC). I.e. The
current instruction address is added to the address field to produce the
effective address.
Base-register addressing
The referenced register contains a memory address, and the address field
contains a displacement from that address. (The register reference may be
explicit or implicit).
Indexing
The address field references a main memory address, and the referenced register
contains a positive displacement from that address. (This is the opposite of
base-register addressing).
Stack addressing
A pointer is associated with the stack whose value is the address of the top of
the stack. If the top two elements of the stack are in CPU registers, the stack
pointer references the third element. The stack pointer is maintained in a
register, so references to stack locations in memory are in fact register
indirect addresses. The stack mode of addressing is a form of implied
addressing. The machine instructions need not include a memory reference but
implicitly operate on the top of the stack.
E.g. pop ax
Note: The stack ‘grows backwards’ in memory, so after a push, SP is
decremented, and after a pop, SP is incremented. (E.g. if SP = FFFC, after PUSH
AX, SP = FFFA)
Immediate mode
The operand (byte, word, or doubleword of data) is included in the instruction.
38
http://wikistudent.ws/Unisa
Displacement mode
The operand’s offset is contained as part of the instruction. With
segmentation, all addresses in instructions refer merely to an offset in a
segment.
Relative addressing
A displacement is added to the value of the program counter, which points to
the next instruction. The displacement is treated as a signed byte, word, or
doubleword value, and that value either increases or decreases the address in
the program counter.
Instruction formats
An instruction format must include an opcode and, implicitly or explicitly,
zero or more operands. The format must, implicitly or explicitly, indicate the
addressing mode for each operand.
Instruction length
1. There is a trade-off between the desire for a powerful instruction
repertoire and a need to save space:
The more opcodes, operands, addressing modes, and greater address range, the
easier it is for the programmer because shorter programs can be written with
more flexibility. However, all these things lead to longer instruction lengths,
which can be wasteful.
2. The instruction length should be equal to the memory-transfer length (bus
length) or one should be a multiple of the other.
3. The instruction length should be a multiple of the character length, which
is usually 8 bits, and the length of fixed-point numbers.
Allocation of bits
There is a trade-off between the number of opcodes and the power of the
addressing capability:
The more opcodes, the more bits in the opcode field.
For an instruction format of a given length, this reduces the number of bits
available for addressing. Using variable-length opcodes means there is a
minimum opcode length, but for some opcodes, additional operations may be
specified by using additional bits in the instruction. (For a fixed-length
instruction, this leaves fewer bits for addressing).
39
http://wikistudent.ws/Unisa
• Register versus memory
With a single user-visible register (the accumulator), one operand
address is implicit and consumes no instruction bits.
Even with multiple registers, only a few bits are needed to specify the
register. (The more that registers can be used for operand references,
the fewer bits that are needed).
• Number of register sets
Most machines have one set of general-purpose registers, with 32 / more
registers in the set. The Pentium has several specialised sets.
Advantage: For certain registers, a functional split requires fewer bits
to be used in the instruction.
• Address range
For addresses that reference memory, the range of addresses that can be
referenced is related to the number of address bits. Because this imposes
a severe limitation, direct addressing is rarely used.
• Address granularity
For addresses that reference memory rather than registers, another factor
is the granularity of addressing. An address can reference a word or a
byte (with byte-addressing requiring more address bits).
Variable-length instructions
If the designer provides a variety of instruction formats of different lengths,
it makes it easier to provide a large repertoire of opcodes, with different
opcode lengths. Addressing can also be made more flexible, with various
combinations of register and memory references plus addressing modes.
Disadvantage: Increased CPU complexity.
Because the CPU doesn’t know the length of the next instruction to be fetched,
it fetches a number of bytes / words equal to at least the longest possible
instruction. This means that sometimes multiple instructions are fetched.
Register organisation
User-visible registers
Some design issues:
• Should you use completely general-purpose registers, or specialise their
use?
Advantage: Specialised registers can have implicit opcodes, saving bits.
Disadvantage: Specialisation limits the programmer’s flexibility.
• How many registers (general purpose or data + address) should be
provided?
Disadvantage: Fewer registers results in more memory references.
Disadvantage: More registers require more operand specifier bits.
• How long should the registers be?
Registers that hold addresses must be long enough to hold the largest
address.
Registers that hold data must be able to hold values of most data types.
40
http://wikistudent.ws/Unisa
General purpose registers
These can be assigned to a variety of functions by the programmer.
AX Primary accumulator
Mainly used for operations involving data movement, I/O and arithmetic
MUL assumes that AX contains the multiplicand
DIV assumes that AX contains the dividend
BX Base register
This is the only general-purpose register that can be used as a pointer
Also used for arithmetic
CX Count register
Used to control the number of times loops are to be executed or the number
of shifts to perform
Also used for arithmetic
DX Data register
Some I/O operations like In and OUT use the DX register
Multiply & divide operations involving 16-bit registers use DX:AX
Data registers
These may be used only to hold data and can’t be employed in the calculation of
an operand address.
Address registers
These may be general-purpose, or may be devoted to a particular addressing
mode, e.g:
* Segment pointers
A segment register holds the address of the base of the segment.
Each segment in memory is 64K bytes long. A segment begins on a paragraph
boundary that is a multiple of 16 (i.e. 10h). Since the start address of a
segment always ends with 0 in hex, it is unnecessary to store the last digit.
(E.g. The address of a segment starting at 18A30h is stored as 18A3h and can be
written as 18A3[0]h).
A program running under DOS is divided into three primary segments:
• Code segment (CS)
Contains the machine instructions of the program.
All references to memory locations that contain instructions are relative
to the start of a segment specified by the CS register.
segment:offset references a byte of memory (offset = 0FFFF / 64K-1).
The IP register contains the offset (relative to the start of the
segment) of the next instruction to be executed, so CS:IP forms the
actual 20-bit address of the next instruction (= effective address).
E.g. BCEF:0123 (CS contains BCEFh and IP contains 123h)
Add the segment address BCEF0 and the offset 0123 to get the actual
(effective) address BD013.
The IP register can’t be referenced directly by a programmer, but it can
be changed indirectly with JMP instructions.
• Data segment (DS)
Contains the variables, constants and work areas of a program.
With ‘MOV AL,[120h]’, the instruction at location 120h relative to the
contents of the DS register is fetched.
To work out the actual address from where the byte of data will be moved,
do the same calculation as above (i.e. append 0 to the data segment
address and add it to the given address).
• Stack segment (SS)
Contains the program stack, which is used to save data and addresses that
need to be temporarily stored.
* Index registers
These are used for indexed addressing and may be autoindexed.
Index registers contain the offset, relative to the start of the segment, for
variables.
SI (source index) usually contains an offset value from the DS register, but it
can address any variable.
DI (destination index) usually contains an offset from the ES register but can
address any variable.
41
http://wikistudent.ws/Unisa
SI and DI registers are available for extended addressing and for use in
addition and subtraction. They are required for some string operations.
* Stack pointer
If there is user-visible stack addressing, then the stack is in memory and
there is a dedicated register that points to the top of the stack. This allows
implicit addressing (i.e. push and pop don’t need to contain an explicit stack
operand).
The stack is located in the stack segment (SS) and the stack pointer (SP)
register holds the address of the last element that was pushed on. (SP contains
the offset from the beginning of the stack to the top of the stack).
SS:SP contains the address of the top-of-the-stack.
The BP (base pointer) register contains an offset from the SS register, and
facilitates the referencing of parameters (data & addresses passed via the
stack). Normally the only word in the stack that is accessed is the one on top.
However, the BP register can also keep an offset in the SS and be used in
procedure calls, especially when parameters are passed to subroutines. SS:BP
contains the address of the current word being processed in the stack.
Condition codes (flags)
Condition codes are bits set by the CPU as a result of operations.
The code may be tested as part of a conditional branch operation, but cannot be
altered by the programmer.
Flag Debug representation Description
CF: Carry Flag CY = CarrY Contains ‘carries’ from the high-
NC = No Carry order bit following arithmetic
operations and some shift & rotate
operations.
PF: Parity Flag PE = Parity Even Checks the low-order eight bits of
PO = Parity Odd data operations. Odd = 0, Even = 1
AF: Auxiliary AC = Auxilliary Carry Set to 1 if arithmetic causes a
Flag NA = No Auxilliary carry.
ZF: Zero Flag NZ = Not Zero Set as a result of arithmetic /
ZR = ZeRo compare operations. 0 = no, 1 =
yes (= zero result). JE and JZ
test this flag.
SF: Sign Flag PL = PLus Set according to the sign after an
NG = NeGative arithmetic operation (0=+, 1=-).
JG and JL test this flag.
TF: Trap Flag (Not shown in Debug) Debug sets the trap flag to 1 so
that you can step through
execution one instruction at a
time. (Use the ‘t’ command).
IF: Interrupt EI = Enable Interrupts Indicates if interrupts are
Flag DI = Disable Interrupt disabled. 0 = disabled, 1 =
enabled.
DF: Direction UP = UP (right) Used by string operations to
Flag DN = DowN (left) determine the direction of data
transfer.
0: left-to-right data transfer
1: right-to-left data transfer
OF: Overflow Flag NV = No Overflow Indicates a carry into and out of
OV = OVerflow the high-order sign bit following
a signed arithmetic operation.
42
http://wikistudent.ws/Unisa
Program counter (PC) - contains the address of an instruction to be fetched
Instruction register (IR) - contains the instruction most recently fetched
Memory address register (MAR) - contains the address of a location in memory
Memory buffer register (MBR) - contains a word of data to be written to memory
or the word most recently read
The CPU updates the PC after each instruction fetch so the PC always points to
the next instruction to be executed. The fetched instruction is loaded into an
IR, where the opcode and operand specifiers are analysed. Data are exchanged
with memory using the MAR and MBR. The MAR connects directly to the address
bus, and the MBR connects directly to the data bus. User-visible registers
exchange data with the MBR.
Within the CPU, data must be presented to the ALU for processing. (The ALU may
have direct access to the MBR and user-visible registers, or there may be
additional buffering registers at the boundary to the ALU).
Program status word (PSW) = a set of registers that contain status information
Common flags include the ones mentioned in the table above.
Instruction cycle
The main subcycles of the instruction cycle are: Fetch, Execute, Interrupt.
Data flow
Fetch cycle:
1. The PC contains the address of the next instruction to be fetched
2. This address is moved to the MAR and placed on the address bus
3. The control unit requests a memory read and the result is placed on the
data bus and copied into the MBR and then moved to the IR
4. The PC is incremented by 1, preparing for the next fetch
Indirect cycle:
1. The control unit examines the contents of the IR to determine if it
contains an operand specifier using indirect addressing
2. If it does, the right-most n bits of the MBR (which contain the address
reference) are transferred to the MAR
43
http://wikistudent.ws/Unisa
3. The control unit requests a memory read, to get the desired address of
the operand into the MBR
Execute cycle:
• This cycle takes many forms, depending on which instruction is in the IR
Interrupt cycle:
1. The current contents of the PC must be saved (transferred to the MBR) so
that the CPU can resume normal activity after the interrupt
2. The PC is loaded with the address of the interrupt routine so the next
instruction cycle will begin by fetching the appropriate instruction
Instruction pipelining
Instruction prefetch / fetch overlap
A two-stage pipeline:
1. The first stage fetches an instruction and buffers it
2. When the second stage is free, the first stage passes it the buffered
instruction
3. While the second stage is executing the instruction, the first stage
fetches and buffers the next instruction
Some problems:
• The execution time will generally be longer than the fetch time, so the
fetch stage may have to wait before it can empty its buffer
• A conditional branch instruction makes the address of the next
instruction to be fetched unknown (but the next instruction can be
fetched anyway, just in case)
A multi-stage pipeline:
Instruction processing can be decomposed into these stages:
Fetch Instruction FI DI CO FO EI WO
Decode Instruction FI DI CO FO EI WO
Calculate Operands FI DI CO FO EI WO
Fetch Operands FI DI CO FO EI WO
Execute Instruction FI DI CO FO EI WO
Write Operand
With this decomposition, the various stages will be of more or less equal
duration. However, not all instructions will go through all stages.
It may seem that the greater the number of stages in the pipeline, the faster
the execution rate, but there are factors that slow things down:
• At each stage of the pipeline, there is some overhead involved in moving
data from buffer to buffer
• The amount of control logic required to handle memory and register
dependencies and to optimise the use of the pipeline increases enormously
with the number of pipeline stages
44
http://wikistudent.ws/Unisa
Register type Number Purpose
Integer unit:
General 8 General-purpose user registers, like AX, BX
Segment 6 Contain segment selectors, like CS, SS
Flags 1 Status and control bits, like CF (Carry Flag)
Instruction pointer 1 Instruction pointer, IP
Floating-point unit:
Numeric 8 Hold floating-point numbers
Control 1 Control bits
Status 1 Status bits
Tag word 1 Specifies contents of numeric registers
Instruction pointer 1 Points to instruction interrupted by execution
Data pointer 1 Points to operand interrupted by exception
Interrupt processing
Interrupts Exceptions
Generated by a signal from hardware Generated from software
May occur at random times during the Provoked by the execution of an
execution of a program, e.g. if you instruction, e.g. INT 21h
press a key on the keyboard
1. Maskable interrupts: 1. Processor-detected exceptions:
The processor doesn’t recognise a Result when the processor encounters
maksable interrupt unless the an error while attempting to execute
interrupt enable flag is set an instruction
2. Non-maskable interrupts: 2. Programmed exceptions:
Recognition of such interrupts can’t Instructions that generate an
be prevented exception
For interrupt type n, the instruction offset is stored in the word at address
4*n and the code segment address in the word at address (4*n)+2.
Each code segment and offset points to its own interrupt handler (interrupt
service routine), which is a block of code that executes if that particular
interrupt occurs.
If more than one exception or interrupt is pending, the processor services them
in a predictable order.
(Priority is not determined by the location of vector numbers within the table)
Interrupt handling
A transfer to an interrupt-handling routine uses the system stack to store the
processor state. When an interrupt occurs, the following sequence of events
takes place:
1. The stack segment register and extended stack pointer register are pushed
onto the stack (if the transfer involves a change of privilege level)
2. The current value of the EFLAGS register is pushed onto the stack
3. The interrupt and trap flags are cleared
4. The current code segment pointer (CS) and IP are pushed onto the stack
5. If the interrupt is accompanied by an error code, the error code is
pushed onto the stack
45
http://wikistudent.ws/Unisa
6. The interrupt vector contents are fetched and loaded into the CS and IP
registers. Execution continues from the interrupt service routine
To return from an interrupt, all the saved values are restored and execution
resumes from the point of the interrupt.
Register windows
The problem with procedure calls is that local variables change with each call
and return, and parameters must be passed too.
Studies show that procedures employ only a few passed parameters and local
variables. To exploit this, multiple small sets of registers are used, each
assigned to a different procedure. A procedure call automatically switches the
processor to use a different fixed-size window of registers, rather than saving
registers in memory. Windows for adjacent procedures are overlapped to allow
parameter passing.
At any one time, only one window of registers is visible and is addressable as
if it were the only set of registers. The window is divided into three fixed-
sized areas:
46
http://wikistudent.ws/Unisa
• Parameter registers - hold parameters for procedure calls
• Local registers - used for local variables, as assigned by the compiler
• Temporary registers - used to exchange parameters and results with the
next lower level (procedure called by current procedure). They are
physically the same as the parameter registers at the next lower level.
This overlap permits parameters to be passed without the actual movement
of data.
The actual organisation of the register file is as a circular buffer of
overlapping windows. (The register windows hold the most recent procedure
activations, while older activations are saved in memory, to be restored
later).
Global variables
The window scheme provides an efficient organisation for storing local scalar
variables in registers, but doesn’t address the need to store global variables
(i.e. those accessed by more than one procedure).
There are two alternatives:
1. Variables declared as global in an HLL can be assigned memory locations
by the compiler, and all machine instructions that reference these
variables will use memory-reference operands. (This is straightforward,
but inefficient for frequently accessed global variables).
2. Incorporate a set of global registers in the processor. These registers
would be fixed in number and available to all procedures. (Disadvantages:
hardware burden to accommodate the split in register addressing, and the
compiler must decide which global variables should be assigned to
registers).
47
http://wikistudent.ws/Unisa
In general, there is a trade-off between the use of a large set of registers
and compiler-based optimisation. (The larger the number of registers, the
smaller the benefit of register optimisation).
48
http://wikistudent.ws/Unisa
• Most instructions generated by a compiler are relatively simple anyway
(so a control unit built specifically for those instructions could
execute them faster than a comparable CISC)
• Instruction pipelining can be applied much more effectively with a
reduced instruction set
• RISC processors are more responsive to interrupts because interrupts are
checked between rather elementary operations
VLSI implementation benefits
With the advent of VLSI, it is possible to put an entire processor on a single
chip. For a single-chip processor, there are two motivations for following a
RISC strategy:
• Performance issue (On-chip delays are of much shorter duration than
inter-chip delays)
• Design-and-implementation time (A RISC processor is far easier to develop
than a VLSI processor)
49