Csa Module I Notes
Csa Module I Notes
16
Functional units of a computer
Input unit accepts Arithmetic and logic unit(ALU):
information: •Performs the desired
•Human operators, operations on the input
•Electromechanical devices (keyboard) information as determined
•Other computers by instructions in the memory
Memory
Arithmetic
Input
Instr1 & Logic
Instr2
Instr3
Data1
Output Data2 Control
I/O Processor
Stores
information: Control unit coordinates
Output unit sends various actions
results of processing: •Instructions,
•Data •Input,
•To a monitor display, •Output
•To a printer •Processing
17
Information in a computer -- Instructions
18
Information in a computer -- Data
19
Input unit
Binary information must be presented to a computer in a specific format. This
task is performed by the input unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.
Real world Computer
Memory
Keyboard
Audio input
Input Unit
……
Processor
20
Memory unit
Memory unit stores instructions and data.
Recall, data is represented as a series of bits.
To store data, memory unit thus stores bits.
Processor reads instructions and reads/writes data from/to
the memory during the execution of a program.
In theory, instructions and data could be fetched one bit at a
time.
In practice, a group of bits is fetched at a time.
Group of bits stored or retrieved at a time is termed as “word”
Number of bits in a word is termed as the “word length” of a
computer.
In order to read/write to and from memory, a processor
should know where to look:
“Address” is associated with each word location.
21
Memory unit (contd..)
Processor reads/writes to/from memory based on the
memory address:
Access any word location in a short and fixed amount of time
based on the address.
Random Access Memory (RAM) provides fixed access time
independent of the location of the word.
Access time is known as “Memory Access Time”.
22
Memory unit (contd..)
23
Arithmetic and logic unit (ALU)
24
Output unit
•Computers represent information in a specific binary form. Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device.
Memory Printer
Graphics display
Speakers
……
Output Unit
Processor
25
Control unit
26
A Typical Instruction
Add LOCA, R0
Add the operand at memory location LOCA to the operand in
a register R0 in the processor.
Place the sum into register R0.
The original contents of LOCA are preserved.
The original contents of R0 is overwritten.
Instruction is fetched from the memory into the processor
– the operand at LOCA is fetched and added to the
contents of R0 – the resulting sum is stored in register R0.
Separate Memory Access and ALU Operation
Load LOCA, R1
Add R1, R0
Whose contents will be overwritten?
How are the functional units connected?
•For a computer to achieve its operation, the functional units need to
communicate with each other.
•In order to communicate, they need to be connected.
Bus
29
Organization of cache and main memory
Main Cache
memory memory Processor
Bus
Why is the access time of the cache memory lesser than the
access time of the main memory?
30
Registers
In addition to the ALU and the control circuitry, the processor contains
language form into a machine language form and stored on the disk.
Assume that part of the program's task involves reading a data file from
the disk into the memory, performing some computation on the data, and
printing the results.
When execution of the program reaches the point where the data file is
needed, the program requests the OS to transfer the data file from the
disk to the memory.
The OS performs this task and passes execution control back to the
application program, which then proceeds to perform the required
computation.
46
Cont..
When the computation is completed and the results are ready
the results.
i.e. execution control passes back and forth between the
47
Execution of more than one application program at a time
The OS can load the next program to be executed into the memory from
program's results while the current program is being loaded from the disk.
multitasking
48
User Program and OS Routine Sharing
time-line diagram
Cont..
During the time period to to t1|, an OS routine initiates loading
program.
50
Multiprogramming or Multitasking
Performance
The speed with which a computer executes programs is
affected by the design of its hardware and its machine
language instructions
Multiprocessor computer
Execute a number of different application tasks in parallel
Execute subtasks of a single large task in parallel
All processors have access to all of the memory – shared-
memory multiprocessor
Cost – processors, memory units, complex interconnection
networks
Multicomputers
Each computer only have access to its own memory
Exchange message via a communication network – message-
passing multicomputers
The Performance Equation
where ‘T’ is the execution time or the processor time, ‘N’ is the instruction count
(IC), ‘S’ is the number of clock cycles per instruction or CPI, ‘R’ is the clock rate
or clock frequency and ‘P’ is the Clock Time (CT) which is the reciprocal of the
clock frequency, i.e., P=1/R.
For example, a 1 GHz processor has a cycle time of 1.0 ns and a 4 GHz processor
has a cycle time of 0.25 ns.
This equation remains valid if the time units are changed on both sides of the
equation. The left-hand side and the factors on the right-hand side are discussed in
the following sections.
The three factors are, in order, known as the instruction count (IC), clocks per
instruction (CPI), and clock time (CT). CPI is computed as an effective value.
Instruction Count
Computer architects can reduce the instruction count by adding more powerful
instructions to the instruction set. However, this can increase either CPI or clock
time, or both.
Clock Time
Clock time depends on transistor speed and the complexity of the work done in a
single clock. Clock time can be reduced when transistor sizes decrease. However,
power consumption increases when clock time is reduced. This increase the
amount of heat generated.
Instruction Count
For predicting the effects of incremental changes, architects use execution traces of
benchmark programs to get instruction counts. If the incremental change does not
change the instruction set then the instruction count normally does not change. If
there are small changes in the instruction set then trace information can be used to
estimate the change in the instruction count.
For comparison purposes, two machines with different instruction sets can be
compared based on compilations of the same high-level language code on the two
machines.
Clocks per instruction (CPI) is an effective average. It is averaged over all of the
instruction executions in a program.
For computing clocks per instruction as an effective average, the cases are
categories of instructions, such as branches, loads, and stores. Frequencies for the
categories can be extracted from execution traces. Knowledge of how the
architecture handles each category yields the clocks per instruction for that
category.
Clock Time
Clock time (CT) is the period of the clock that synchronizes the circuits in a
processor. It is the reciprocal of the clock frequency.
For example, a 1 GHz processor has a cycle time of 1.0 ns and a 4 GHz processor
has a cycle time of 0.25 ns.
Clock time is affected by circuit technology and the complexity of the work done
in a single clock. Logic gates do not operate instantly. A gate has a propagation
delay that depends on the number of inputs to the gate (fan in) and the number of
other inputs connected to the gate's output (fan out). Increasing either the fan in or
the fan out slows down the propagation time. Cycle time is set to be the worst-case
total propagation time through gates that produce a signal required in the next
cycle. The worst-case total propagation time occurs along one or more signal paths
through the circuitry. These paths are called critical paths.
For the past 35 years, integrated circuit technology has been greatly affected by a
scaling equation that tells how individual transistor dimensions should be altered as
the overall dimensions are decreased. The scaling equations predict an increase in
speed and a decrease in power consumption per transistor with decreasing size.
Technology has improved so that about every 3 years, linear dimensions have
decreased by a factor of 2. Transistor power consumption has decreased by a
similar factor. Speed increased by a similar factor until about 2005. At that time,
power consumption reached the point where air cooling was not sufficient to keep
processors cool if the ran at the highest possible clock speed.
Problem Statement 1
Solution
We have the instruction count: 109 instructions. The clock time can be computed
quickly from the clock rate to be 0.5×10-9 seconds. So we only need to to compute
clocks per instruction as an effective value:
Then we have
Problem Statement 2
Suppose the processor in the previous example is redesigned so that all instructions
that initially executed in 5 cycles now execute in 4 cycles. Due to changes in the
circuitry, the clock rate has to be decreased from 2.0 GHz to 1.9 GHz. No changes
are made to the instruction set. What is the overall percentage improvement?
Solution Form
We can determine the percentage improvement quickly by first finding the ratio
between before and after performance. The performance equation implies that this
ratio will be a product of three factors: a performance ratio for instruction count, a
performance ratio for CPI or its reciprocal, instruction throughput, and a
performance ratio for clock time or its reciprocal, clock frequency. We can ignore
the first factor in this problem since it is 1.0: the instruction count has not changed.
We are left with determining the performance ratio for CPI and, since we are given
clock frequencies, the performance ratio for clock frequencies.
Solution
The performance ratio for frequencies must be less than 1.0: if other factors are the
same then a slower clock rate implies worse performance. So this factor of the
improvement ratio must be 1.9/2.0.
For the clocks per instruction, we had a value of 3.7 before the change. We
compute clocks per instruction after the change as an effective value:
Now, lower clocks per instruction means higher instruction throughput and thus
better performance, so we expect this part of the performance ratio to be greater
than 1.0; that is, 3.7/3.5.
Then we have