Advanced Topics in Computer Architecture ECE 7373
Advanced Topics in Computer Architecture ECE 7373
Advanced Topics in Computer Architecture ECE 7373
Architecture
ECE 7373
Pauline Markenscoff
N320 Engineering Building 1
E-mail: markenscoff@uh.edu
Multiple issue processors
Single issue processors:
Eliminate data and control stalls to achieve an ideal CPI of 1.
Multiple issue processors:
Reduce CPI below 1.
Fig. 3.15
Statically Scheduled Superscalar Processors
The pipeline would receive from the instruction fetch unit from one to
k instructions, where k is the width of the issue packet.
Issue packet
- The set of instructions that could potentially issue.
If an instruction would cause a structural hazard or data hazard
- either due to an earlier instruction already in execution
- or earlier in the issue packet
then the instruction is not issued.
RAW hazard
- When the second instruction of the pair depends on the rst
Structural hazard
- Contention for the FP register ports
Assume
All FP ops are adds (3 execution clock cycles)
Integer instruction is always shown rst, although it may be the second
instruction in the issue packet.
The rate at which instructions can be issued has been substantially boosted.
To improve the rate at which instructions are executed
Pipelined FP units
Multiple independent FP units.
Complication
Maintaining precise exception model
Fig. 3.19
Any instructions following a branch cannot start execution until after the
branch condition has been evaluated.
For 3 iterations:
Issue rate:14 instructions in 8 clock cycles=14/8= 1.75
Execution rate:15 instructions in 19 clock cycles=15/19= 0.79
Time of Issue, Execution, and Writing result for a
Dual-issue version of our pipeline without speculation
Fig. 3.20
For 3 iterations:
Issue rate= 1.75 (14 instructions in 8 clock cycles=14/8=1.75)
Execution rate=0.79 (15 instructions in 19 clock cycles=15/19=0.79)
Because completion rate falls behind the issue rate rapidly, the
nonspeculative processor will stall when a few more iterations are issued!
Performance of nonspeculative processor can be
improved by allowing memory access instructions to
complete effective address calculation before a
branch is decided.
Fig. 3.20
Fig. 3.16
Fig. 3.16
Drawbacks:
Increase in code size
- Ambitious unrolling of loops
- Instructions might not be full and unused functional units are translated to
wasted bits.
Limitations of lockstep operation
- A stall in any functional unit must cause the entire processor to stall.