0% found this document useful (0 votes)
67 views4 pages

A Novel Asynchronous Pipeline Architecture For CISC Type Embedded Controller, A8051

This document proposes a novel asynchronous pipeline architecture called A8051 that is compatible with the Intel 8051 microcontroller. A8051 aims to improve performance over the Intel 8051 by using an asynchronous design with 5 pipeline stages instead of a synchronous design. It addresses challenges in implementing an asynchronous pipeline for CISC instructions by using techniques like branch prediction without a clock, balancing pipeline stage delays, grouping similar instructions to remove bubbles, and handling variable length instructions. Simulation results showed A8051 has around 18 times higher speed than Intel 8051 and 5 times higher speed than another asynchronous 8051 design.

Uploaded by

shankarnarendra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views4 pages

A Novel Asynchronous Pipeline Architecture For CISC Type Embedded Controller, A8051

This document proposes a novel asynchronous pipeline architecture called A8051 that is compatible with the Intel 8051 microcontroller. A8051 aims to improve performance over the Intel 8051 by using an asynchronous design with 5 pipeline stages instead of a synchronous design. It addresses challenges in implementing an asynchronous pipeline for CISC instructions by using techniques like branch prediction without a clock, balancing pipeline stage delays, grouping similar instructions to remove bubbles, and handling variable length instructions. Simulation results showed A8051 has around 18 times higher speed than Intel 8051 and 5 times higher speed than another asynchronous 8051 design.

Uploaded by

shankarnarendra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

A Novel Asynchronous Pipeline Architecture for CISC type

Embedded Controller, A8051


Je-Hoon Lee, Won-ChulLee and Kyoung-Rok Cho
CCNS Lab. Chungbuk Nat’l University
San-48 Gaeshin-dong Cheongiu-city, 361-763, Korea
Ernail: leejh@hbt.chungbuk.ac.kr

Abstract according to the available following state. As a result


asynchronous pipeline is free from the restriction the worst time
The asynchronous design methods is known to have the higher delay, and the average time delay is applied to it.
performance in power consumption and execution speed than
synchronous ones because it just needs to activate the required
module without feeding clock and power to the entire system. In
this paper, we propose an asynchronous processor A8051 is
compatible with Intel 8051, which i s a challenge of a pipelined
asynchronous design for ClSC type microcontroller. A8051 has
special features such as an optimal instruction execution scheme
that eliminates the bubble state, variable instruction length
handling and multi-looping pipeline architectures for ClSC
machine. A8051 is composed of the 5 stages pipeline based on
the ClSC architecture. It is implemented with RTL level
languages and a verified behavioral model i s synthesized with
0.35 CMOS standard cell library. As the results, AX051
shows about 18 times higher speed than that of Intel 80‘31 and
Figure 1. A comparison of synchronous and
about 5 times higher than other asynchronous design 8051 in [I].
asynchronous pipeline

1. Introduction The proposed A8051, In this paper, shows various solutions for
these problems. To begin with, it proposes the simple way to
While recent VLSI technology development caused the gate predict branches through data-dependent control without clock in
delay to decrease, it also caused the wire delay to increase order to solve control hazard that happens in branch instructions.
relatively. The more complicated system gets the more difficult Secondly, it constructs 5 stages with equal data path delay to
synchronization of the clock and so clock skew tend to happen improve pipeline performance. Thirdly, it groups simple
more frequently. In addition, it yields power waste with clock instruction execution schemes together by inshuction to remove
supplying to modules don’t need activation. To overcome these, bubble states that is occupied in the execution of multi-cycle
the research about asynchronous circuit design i s being instruction. Finally, it verifies the architecture to handle the
conducted very a lot nowadays. variable length of instruction through simulation.
The pipeline architecture i s one of the most commonly used for The paper is organized as follows: Section 2 presents the
high speed computing machinery. In asynchronous processor, the instruction set and the implementation architecNre of Intel
pipeline can be clocked when their control signals are activated 80C51 and A8051. Section 3 evaluates the control mechanism
by widely distributed clock signal or an event driven when they and the asynchronous pipeline architecture. Section 4 describes
are activated independently by the handshake protocol between the overall results and performance. The conclusions and some
stages instead of the system clock. The advantage of the final remarks are offered in Section 5.
asynchronous pipeline architecture over its synchronous
counterpart is now well recognized in [2]. The difference 2. Architecture of Intel 80C51 and A8051
between the behavioral mechanisms of synchronous pipeline and
those of asynchronous one is depicted in Fig. 1. In conventional Intel 80C5I is a popular 8-bit processor with complex inshuction
synchronous pipeline, all stages are finished within a time slot set that i s classified to five classes: arithmetic, logical, data
given by the global clock whose length is set up to the slowest transfer, boolean and jump instructions, and it i s the memory-
time to complete its workload. By contrast, the asynchronous memory architectllre which has six addressing modes. Intel
pipeline architecture does not use global clock so that all stages 80C51 instruction set has 255 instructions are encoded to
takes variable execution time to finish. For this reason, the variable lengths with one, two or three bytes. Each instruction is
asynchronous pipeline indicates the average-case performance. executed in one, two or four machine cycles respectively [3]. In
Therefore whenever the execution of each state completes, the addition, each machine cycle is divided into 6 states, which lead
output from the previous state put into the next state as an input to some communication or computations. As depicted in Fig. 2,
without delay or waits. It has a space time between each state the bus plays a central role in the synchronous architecture.

0-7803;1523-8102/$17.00 02002 IEEE


II-675
During each state of the instruction execution, a half of the wires instructions. (c) 3rd-stage is OF Unit that is fetching operands in
of the bus switch approximately[3]. However, since A8051 the instructions. (d) 4th-stage is EX Unit that performs arithmetic,
supports a point-to-point connection between registers it can moves and shifts according to the microinstructions. (e) 5th-stage
reduce amount of the bus load and control block. The comparison is WB Unit saving the results in the register or data memory. The
of synchronous execution scheme of each stage and performance of A8051 is estimated the average processing speed
asynchronous ones is shown in Fig. 3. of each stage.

3. Control Mechanisms and Pipeline


Architecture of A8051
When we apply the asynchronous pipeline architecture into Intel
80C51, there are many problems to he solved. This chapter
describes the solutions used in A805 I, solving these problems.

A8051 outline
A8051 consists of 5-stage pipelines as IF, ID, OF, EX and WB as
Figure 2. The architecture of Intel 8OC51 showed in Fig. 4. In this study, we introduce DI (delay
insensitive) delay model for an asynchronous operation. To do so
I Machina Cycle it is needed to know when the combinational circuits are
completed. The proposed A8051 uses 4-phase handshake
I1 protocol for the data transmission and dual-rail encoding to get
I IF 8 I ID I OF
I ROM.=. IAcc->TTI I WN m.IROM.*.
:I O P >E,T r n P I * LWB" m J
I I
the completion signal of the combinational circuits as shown in
(a) Eumpl. axe~Y1Ionscheme OF Intel 8OCS1 Fig. 5 [ 5 ] .
H Uedundancy slaw
11 IF ID EX 1 ~Hmdshrk*overhead Redefine suitable execution scheme
(b) Example executln scheme of A8051
As described in section 2, Intel 8OC51 uses the redundancy
stages and machine cycles for ClSC operation. Although A8051
Figure 3. Comparison of instruction execution scheme
does not use the redundancy states for the synchronization
far "INC A" instructions
between the stages and it has to wait until the next stage is
In Intel 80C51, a machine cycle for the instructions goes through available is called a space time. For this we newly defined the
6 state transitions from SI to S6. This scheme results in many instmction set of A8051 that has new execution sequences. Table
redundant states because not all states are required for the 1 shows newly defmed instruction groups for A8051. Each
instruction execution. The execution scheme of the proposed instruction group does not require all stages and connects only
A8051 uses the optimum architecture skipping the redundancy the necessary stages. Group 5, 6 and I have the iteration of the
state and connecting to the next stage execution directly. In this OF and EX stages once or twice. So A8051 requires an
case, it needs only the space time for waiting the next state. arbitration block and additional control signals for the iteration of
processing this stage. Though the A8051 has multi-cycle
A805 1 consists of 5-stage asynchronous pipeline architecture as execution scheme, it was based on simple linear pipeline
shown in Fig. 4 (a) 1st-stage is IF stage that is responsible for execution that some special cases of the skipping stages and the
the instruction fetch, pre-decode and checking the existence of iterating stages are just added. Taking the asynchronous data path
the branch instructions. @) 2nd-stage is ID Unit which is only, it reduced more idle time than the synchronous version has
responsible for mapping of the opcode to an entry point of the dummy state.
microinstruction and checking data dependency for the previous

Figure 4. The architecture of A8051

II-676
Handling the variable instruction length
The instructions of A8051 ale changeable from 1 to 3 bytes
according to the number of operands. IF stage checks fetched
instructions, determines the length of instruction, and saves it to
IR register. To do so, as represented in Fig. I the mapping table
is required to contain the information regarding the number of
operand in advance given by an opcode. IR register can,
therefore, keep maximum 3 bytes of data.
The instructions fetched from the program memory are sent to
both IR register and mapping table. Mapping table selects the
position where the input data is saved correspondingly to the
input opcode and signals the information for that to Mux. Mux
transmits Rin to Latch-CTL block of IR register. In the end it
handshakes with Aout for the next inshuction fetch.

Figure 5. (a) lnterconnection block, @) 4-phase


Handshake model and (c) Completion signal generation

Table 1. The re-defined instruction execution scheme

Figure 7. The detect and fetch unit with variable


MUL, DIV, POP instruction length in IF stage.
4 IF-ID-OF-EX- 24 Funelion h ( o r @Ri, dir), A
WB Function bit ad& Resolving fhe eontrol hazard problem
MOV Rn(or Ri), dir(or #data)
Accordingly, we distinguish the two types of branch conditional
5 IF-ID-OF-EX- 4 Function A. @A+DF'TR(or PC) and unconditional. The conditional branches involve a condition
OF-EX NC DPTR to jump lo another address. Therefore, they are either taken or not
MOV DPTR,#dala 16 taken, depending whether the specified condition is true or false.
Function dir, #data
OF-EXX-WB MOV dir, dir When the A8051 executes branch instruction, IF stage is
IF-ID-OF-EX- Conditional branch responsible for determining BTA (Branch Target Address).
OF-EX-OF-EX When branch inshuction is detected in ID stage, stall happens but
Moreover, A8051 has a simple execution scheme near to RISC it is possible to avoid the stall if pre-decoding is used in IF stage.
as shown in Fig. 6. The fewer intervals between the subsequent Figure 8 shows a basic scheme of the IF stages. The opcode
executions make getting the higher performance in pipeline fetched from program memory is compared with the data saved
system. Moreover, A8051 has a simple execution scheme near to in the mapping table and then the result of comparison is sent to
RISC as shown in Fig. 6. The fewer intervals between the each block. BR represents the length of address needed for
subsequent executions make getting the higher performance. operation, NUM is the length of instructions and STATE is the
position of BTA register, respectively.
Group1

Group2

Group3

Omup4

GmupS
IC"
Gmup6

Gmup7

Figure 6. A logical layout for A805 1 execution scheme Figure 8. The BTA calculation unit in IF stage

II-677
SEL signal chooses one address for the input to Mux. The The most influential factor in performance improvement of
address chosen like this is stored on Addr Latch and sent to A805 1 is branch instructions. Therefore, the difference between
program memory by handshake protocol. Thus this operation is the worst and the best operation speed is about 12 MIPS as
overlapped with IF stage for the next instruction. And a branch shown in Fig. 9. Consequently, A8051 shows an average
penalty doesn’t occur although it takes longer execution time. In operation speed of 76 MlPS on 0.35 um CMOS technology.
addition, when a conditional branch is executed, the resolving
concept takes a static prediction. When the branch condition was
detected, the processor checks the prediction. If the prediction is
correct, IF continue the execution of the sequential path and if
the predication is incorrect, IF reset all speculation by executed
instructions in the sequential path.

Number ofpipeline stages


The number of pipeline stages is one of the fundamental
decisions. While a pipeline is made deeper, data and control
dependencies occur more frequently, which is directly decreasing
the performance. Furthermore, the partitioning of the entire task
becomes less balanced and increases latches used in each stage.
For the proposed A8051, the execution of instructions is broken
down to 5 stages. The average execution time of each stage is
shown in table 2, which was well balanced except for WB stage.
Table 2. The average execution time of each stage

I Stwe I IF 1 ID ) O F ( E X I W B
Avrngs rrrcutlonnms 5.911s 4.8“s Sons 5.111s 6.6”s
Figure 9. Comparison results using test-bench program

5. Conclusions
In this paper, A8051 with a new pipeline architecture is proposed
and implemented using the asynchronous data path and control
scheme based on the 4-phase handshake. The proposed
architecture supports multi-cycle pipeline to solve the several
problems occurred in ClSC machine. A8051 has couples of
special features including many asynchronous processing
techniques to handle the complex features of ClSC type micro-
controller, such as variable length handling, modifying of
execution scheme of instruction set in Intel 8051 for A8051 and
applying the multiple-looping pipeline architecture. Based on
those circuit techniques, the proposed A8051 shows high
operation speed more about 18 times than that of the synchronous
version and 5 times higher than Async 80C51 by [I].

REFERENCES
[ I ] H. van gageldonk et al.. “An asynchronous low-power
80C51 microcontroller,” In Proc. International Symposium
on Advanced Research in Asynchronous Circuits and
Systems, pp. 96-107, 1998.
[2] I. E. Sutherland, ”Micropipelines”, Communication of the
ACM, Volume 32, No. 6, pp. 720-739, June 1989.
[3] Intel, “Microprocessor and Peripheral Handbook,” 1987.
[4] lamin M. C. Tse and Daniel P. K. Lun, “ASYNMPU : A
Fullv Asvnchronous CISC Micro-Drocessor”. ISCAS’97. UP.
..
Async80CSI Pmprsd A8051 Proposed A8051 1816-1si9,1997.
Intel 805 I 151 K. R. Cho. K. Okura. K. Asada. “Desien a 32-bit Fullv
Ver. by [I] Non-pipeline Pipeline L 1

As chronous Microprocessor (FAM)”, froceedings of thk


MIPS(ave.)
(36MHz)
4 MIPS
0 . 6 p C M O S 0.3SwCMOS 0 . 3 5 p C M O S
4 MIPS 35.8 MIPS 75.5 MIPS
lT
35 MWSCAS,volume2,pp. 1500-1503, 1992.

II-678

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy