Pentium - Salient Features
Pentium - Salient Features
Pentium - Salient Features
Typical questions
•Draw and discuss the architecture of Pentium
•List the new Pentium instructions and their functions.
• Explain the memory management of Pentium
•Explain the different floating point instructions newly available in
Pentium.
Describe Cache mem. organization of Pentium.
Distinguish between pipelining and super-pipelining?
• Explain the salient features of Pentium architecture.
Draw the schematic blocks of Floating Point Unit (FPU) of Pentium and
explain its different segments.
•Explain the features of Level 1 instruction and data caches of Pentium
•Discuss the functions of branch prediction,Branch Target Buffer of
Pentium
Salient features of Pentium
• Superscalar execution, superpipelined architechure
• On chip floating point unit
• Two caches – data cache and instruction cache
• Branch prediction using BTB
• 64-bit external data bus thus can handle 2dataload.
• Enhanced instruction set for Trigno and exp
• EAX, ECX, EDX, EBX, ESP, EBP,ESI, or EDI – registers
• Instruction optimization – less time than 486
Four modes
• Protected Mode – best perf. And capability
• Real Mode - like 8086 but can change to
protected easily
• System Management Mode – for power
management and OEM.
• Virtual 8086 mode vmode
Superscalar architechture
• Hardware decides which instructions to be
issued concurrently at run time
• Processor complex as multiple instructions to
be issued in each cycle to EU
• Two instructions in parallel to two
independent integer pipelines U and V, each
has 5 stages
Pentium Pipeline stages
• Prefetch stage – aligns codes as they are of variable length, fetches inst
from cache
• Decode stage D1: decodes and generates a control word – microcoded
control seq.
• D2: Control word again decoded for execution, also generates addresses
for data memory references
• Execution E stage – accesses data operands from cache or executes in
ALU, FPU.
• WB – write back stage – updates registers and flags
• Superpipelining simply refers to pipelining that uses a longer pipeline
(with more stages) than "regular" pipelining. In theory, a design with
more stages, each doing less work, can be scaled to higher clock
frequency
Separate code and data cache
• 8KB cache for data and code separately to
support superscalar organization. It
demanded more BW not there in unified
cache. It also helps in efficiently executing
branch prediction.
Floating point Unit
• FPU has massive pipelining with 8 stage
pipeline, with two executions stages and error
stage in addition.
• 8 general purpose flaoting point registers
• FADD, FAND, FDD, FEXP, FRD adder, multiplier,
divider, exponent, rounder segments do
single, double and extended precisions.
Floating point exceptions
• 6 - /0, Over, Under, Denormal Operand, Invalid
op,
• SIR – safe instruction recognition
Branch Prediction – 25% improvement
• Branch instructions moderately frequent – 15% - 25%
• Change normal sequential flow and may stall pipelining.
Conditional – wait till exec for next
• A 256 entry branch target buffer holds branch target addresses
for previously executed branches. It is a four way associative
memory. Whenever a branch is there, branch and destination
addreses entered in BTF
• During decoding, BTF searched for corresponding branch inst.
• Hit – CPU uses history to decide to take branch, fetches next inst
from target address and decodes them
• Acutally status known at ‘write back’ stage.
• If wrong prediction, pipeline flushed and actual correct target
address instruction is fetched.
Enhanced instruction Set
• FSIN, FCOS, FSINCOS, FPTAN, FPATAN, F2XMI,
FYL2X, FYL2XP – Y*log2(X+1)
• N-Way Set Associative Cache: "N" here is a number, typically 2, 4, 8 etc. This is
a compromise between the direct mapped and fully associative designs. In this
case the cache is broken into sets where each set contains "N" cache lines, let's
say 4. Then, each memory address is assigned a set, and can be cached in any
one of those 4 locations within the set that it is assigned to. In other words,
within each set the cache is associative, and thus the name.
This design means that there are "N" possible places that a given memory
location may be in the cache. The tradeoff is that there are "N" times as many
memory locations competing for the same "N" lines in the set. Let's suppose in
our example that we are using a 4-way set associative cache. So instead of a
single block of 16,384 lines, we have 4,096 sets with 4 lines in each. Each of
these sets is shared by 16,384 memory addresses (64 M divided by 4 K) instead
of 4,096 addresses as in the case of the direct mapped cache. So there is more
to share (4 lines instead of 1) but more addresses sharing it (16,384 instead of
4,096).