0% found this document useful (0 votes)
10 views

Gate Level Power Estimation by RTL Activity File PDF

Power Estimation

Uploaded by

GoobeD'Great
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Gate Level Power Estimation by RTL Activity File PDF

Power Estimation

Uploaded by

GoobeD'Great
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Gate Level Power Estimation by RTL Activity File

Completion

Elio Guidetti, Bjorn Kraabol, Francesco Pappalardo, Giuseppe Visalli

Advanced System Technology


ST Microelectronics

Elio.Guidetti@st.com
Bjorn.Kraabol@st.com
Francesco.Pappalardo@st.com
Giuseppe-ast.Visalli@st.com

ABSTRACT

This paper presents a novel enhancement of the tools used for power estimation of synthesized circuits
at gate level. Using an existing commercial tool flow, this paper presents an additional custom tool
providing increased certainty in the power consumption numbers. Accurate power estimations at gate
level requires the use of tools such as Synopsys Power Compiler, working together with activity file
(SAIF) obtained by gate level simulation. The quality of net list evaluation and annotation puts an upper
bound on the accuracy of estimated consumed power through switching. Power consumption can be
estimated at the RTL level however in this case, the low level of annotation obliges Power Compiler to
an estimation found quite conservative in part due to incomplete annotation at gate level. This article
shows a methodology of gate level power estimation by automatic SAIF RTL activity file completion
using PERL scripts annotating the missing objects with best, worst and mean values from locally
analyzing the SAIF. This produces a range in power consumption within which our device is expected
to operate.
1.0 Introduction.

Power estimations of complex SoC make sense when power is a constraint, such as portable devices
or, in general, systems in which power has to be reduced. This is made possible if a power
characterized technology library is available. Power estimations need activity information of each object
in the design collected into a unique file. Switching activity is a linear contribution of the dynamic power;
other parameters are frequency of work, capacitive loads and power supply (last, as quadratic
contribution). Annotating the gate level activity file over a front-end (or back-end) net list is not so
simple; some problems can occur due to the following reasons:

• Misalignment from simulation net list used to produce the activity file and synthesis net list used
in power estimations. A typical example is the use of custom blocks (such as embedded
memories); in this case the behavioral models do not show the glitches loosing a lot of activity
during simulation. When annotating switching activities produced by behavioral model over a
logic library of the same design, several objects are un-annotated. Synopsys Power Compiler
makes conservative power estimation over a partial annotated design compared to fully
annotated modules.
• Simulations of front-end net list often use behavioral models of technology libraries where every
cell has the same delay. Timing errors, in this case, often occur and SoC goes to abnormal
condition; if SoC is a micro-core, it could load an invalid op-code, generating exceptions. The
micro goes to halt state: the annotation is full, but does not represent a real test bench. Although
this is not an “annotation-problem” toggle counts assume values not coherent with the
simulation. An example is the toggles related to registers in SoC; if SoC goes to halt state they
assume a value too low, giving lower power estimation.

For the reasons above, the percentage of annotation is often incomplete. This produces a qualitative
upper bound on power estimations performed. The methodology presented in this paper uses a
procedure of activity file self-completion, from analyzing the missing objects after the first partial RTL
annotation. A PERL script extracts information of toggle count and static probability in the condition of
best, worst and mean case for each module in the design. This produces an interval of power numbers
within which our device is expected to operate. This methodology considers a design (35K cells) that
has not problems described as above as reference. The design was synthesized up to front-end net list,
simulated at gate level. For each sub-module, power estimation is performed. To the same front-end net
list we annotated an activity file, which we obtained from a RTL simulation of the same design. Power
values obtained will be conservative because of the partial annotation. Last, self-completion
methodology is applied over the front-end net list, using a completed RTL activity file. Percentage of
annotation reaches a number very close to 100%. Power numbers in these three cases are compared.
We expect the power values with completion of mean activity will be very close to gate level power
estimation, and best power and worst power will be good bounds for gate level power values. If the
goal of a good bounds is reached, net lists which have problems described above, can be annotated
with the completed RTL activity file.

SNUG Europe 2003 2 Gate Level Power Estimation


by RTL activity file completion
2.0 Power Estimations using Synopsys Flow.

When designers wish to perform gate level power estimation over a front-end (or back-end) net list,
Synopsys Power Compiler can annotate activity file in SAIF format to the net list. SAIF exists in two
formats: gate SAIF from gate level simulation and RTL SAIF from RTL simulation. This file, in both
formats, is used to determine two main quantities in power estimation:

• Static probability, as percentage of time that the signal is in the high level.
• Toggle count, as number of 0->1 and 1->0 transitions for each object (port, net and pin).

The Synopsys tool used for power estimation, executes the following steps collected in a script:

• Read net list (read_db).


• Clocks definition (create_clock).
• Reset Switching Activity (reset_switching_activity).
• Annotating SAIF (read_saif).
• Report percentage of annotation (report_saif –flat).
• Power estimation (report_power).

Compared to a VCD dump file, a SAIF file is more compact. This is due to the fact, that VCD stores
more information respect to SAIF. For every signal VCD has internally the time in which it changes,
while SAIF reports only total toggle count. Synopsys provides a utility (vcd2saif) for conversion from
VCD to SAIF format. Furthermore, Synopsys provides Verilog/VHDL extension (PLI) to write a
SAIF file. The procedure changes if the estimation is at RTL level or gate level. In both cases the
generation of a SAIF file starts by creating an intermediate Forward SAIF in which the list of objects to
trace is being dumped. The command to generate the forward SAIF is:

• Lib2saif at gate level.


• Rtl2saif at RTL level.

They are executed both inside Design Compiler (see reference guide for more information). The PLI
routines add new commands in the simulator to generate a simulation, dumping the signals and toggle
information into the activity file (also called Backward SAIF see Fig. 1). The structure of a typical
Verilog top module is shown in Fig.2.

SNUG Europe 2003 3 Gate Level Power Estimation


by RTL activity file completion
Figure 1: Backward Saif.

Figure 2: Example of stimulus Verilog file.

SNUG Europe 2003 4 Gate Level Power Estimation


by RTL activity file completion
The source in Fig.2 is part of gate level test bench and generates the gate SAIF during simulation,
reading the technology forward SAIF (read_lib_saif), defining the region of interest to be dumped
(set_toggle_region) (it is not necessary to trace the toggle information from test bench or parts that we
are not interested in), starting and stopping the process of dumping (toggle_start and toggle_stop) (to
exclude e.g. reset behavior) and finally writing the activity file (toggle_report). The same source with
few modifications is part of RTL test bench and generates the RTL SAIF during simulation. See Power
Compiler Reference Guide for more information. The script in Table 1 performs the standard gate level
power estimation.

# Import Database
Read_db netlist.db
Link
#Timing Constraint
Create_clock –period 10 –name clock get_ports
clock.
Set_input_delay
Set_output_delay
#Annotating SAIF
read_saif –input backward .Saif –instance top
Report_saif –flat –missing
#Completion (if present)
Source completion.tcl
Report_saif –flat –missing
# Report power
Report_power
Quit

Table 1: Power estimation flow script (for Design Compiler).

3.0 Activity File Completion Methodology.

Using a list of un-annotated objects, the annotation can be completed with values met locally in each
sub-module of the design. Design Compiler provides the following command for reporting the un-
annotated objects: report_saif -flat –missing. SAIF file collects the toggle count of each object (nets,
ports and pins), dumped during simulation, as activity information. These data are organized with the
same hierarchy of the design simulated (see Fig. 1) Starting from ASCII dump of “report_saif”
command a PERL script, using the backward RTL SAIF, analyze each module in the design following
the hierarchy of SAIF. The completion methodology assigns to every un-annotated object in a sub-
module a local value of toggle count in best, worst and mean condition:

• Best as minimum toggle count of annotated objects in the sub-module.

SNUG Europe 2003 5 Gate Level Power Estimation


by RTL activity file completion
• Worst as maximum toggle count of annotated objects in the sub-module.
• Mean as mean toggle count of annotated objects in the sub-module.

Let consider a little sub-module with four nets. SAIF reports only activity of three net as follows:

Net Toggle Count

Net1 3
Net2 7
Net4 2

Net3 is not dumped in SAIF file. The PERL scripts analyzes the SAIF and for this module assumes:

Best Toggle Count 2


Worst Toggle Count 7
Mean Toggle Count 3 = (3 + 7 +2) /4

In this way we suppose the real activity information (toggle count we do not know) of missing
objects is between minimum (best) and maximum (worst) values. Since switching activity has a linear
role in power estimation, minimum toggle count produces best power dissipation and maximum toggle
count produces worst power dissipation. An intermediate ASCII file registers these values in terms of
toggle count, static probability and T0, T1 and TX values. Fig. 3 shows the structure of intermediate file.
The first line is dedicated to the hierarchy of the current module; the next line represents the minimum (or
maximum or mean) toggle count; third line reports static probability value. A copy of the SAIF line
associated to best (worst or mean) net related to TO, T1 and TX values, goes in the fourth line.

SNUG Europe 2003 6 Gate Level Power Estimation


by RTL activity file completion
Figure 3: Example of intermediate file.

One last PERL script receives as input:

• The intermediate file.


• Report SAIF –missing ASCII dump.
• Duration of simulation.

And generates a TCL script for Design Compiler to complete the RTL SAIF with instruction of the
form:

set_switching_activity –period pe –static_probability sp –toggle_rate tr

The TCL script will be used as illustrated in Table 1. In this way a new procedure to enable power
estimation based on self-completion activity information has been created. This methodology produces
improved power range in which a device is expected to operate as will be shown in Chapter 5.

4.0 EMC: Evaluation Micro Core.

The quality of the methodology tested with a real design, which we synthesized, simulated and
estimated its power dissipation. EMC is a little micro-programmed core developed for the purpose of
this article. The following units compose its architecture (see Fig. 4).

• Address Bus of 32 lines, Data Bus of 32 bits.


• Register File of 32 registers of 32 bits.
• ALU with Booth2 integer multiplier. All arithmetic (2’s complement data) and logic operation
are supported.
• Load Store Unit for memory communication. It is composed by three ports:
o Port Zero connected to the pipeline Fetch Unit.
o Port One connected to the pipeline Read Unit.
o Port Two connected to the pipeline Write Unit.
• Pipeline five stages (FDREW).
• 20 OP-codes supported.
• Stack mapped into main memory.
• Flag Register.

The following functionality is not supported yet.

• Pre-Fetch.

SNUG Europe 2003 7 Gate Level Power Estimation


by RTL activity file completion
• Branch circuitry.
• Caches.

Figure 4: EMC simplified architecture.

5.0 Simulation: Results.

EMC was simulated with a little program written in assembly code (see appendix). The choice
of best test bench for optimal power estimation is an open point in the research community. We
stimulated every sub-module of the design as practical point of view. So, in the source there are
arithmetic and logic operations and load-store instructions for stimulating the memory controller. The
activity files, produced by gate level and RTL simulations, are used to compare:

• Power values with gate level SAIF annotation (gate-level power estimation).
• Power values with RTL SAIF annotation not optimized (RT-level power estimation).
• Power values with RTL SAIF annotation completed (enhanced RT-level power estimation
which is the new proposed methodology within this paper).

We did power estimations at gate level using technology library at 0.13 micron and a frequency of
100MHz. Table 2 shows the power values for each module of EMC in case 1 and 2.

Module Power with SAIF Power with SAIF Error % to


Gate RTL gate
ALU 1.1uW 1.4uW +20%
FU 2.4uW 2.5uW +4%

SNUG Europe 2003 8 Gate Level Power Estimation


by RTL activity file completion
DU 2.9uW 2.9uW ~0%
RU 0.4uW 0.5uW +28%
EU 1.8uW 2.1uW +16%
LSU 1210.8uW 1410uW +16%
LSU-CORE 4.2uW 4.3uW +3%
Table 2: Comparison between power values estimated.

Table 3 shows the power values in case 3 compared to power values in 1.

Module Power SAIF Gate Best Worst Mean Mean % to


gate
ALU 1.1uW 1.1uW 1.4uW 1.0uW 11%
FU 2.4uW 0.5uW 3.8uW 1.2uW 50%
DU 2.9uW 2.8uW 4.8uW 2.8uW 3%
RU 0.4nW 0.1nW 0.7uW 0.3uW 22%
EU 1.8uW 1.8uW 1.9uW 1.7uW 1%
LSU 1210.8uW 1209.2uW 1221.4uW 1209.8uW ~0%
LSU-CORE 4.2uW 4.0uW 4.2uW 4.1uW 1%
Table 3: Best, Worst and Mean Power.

The diagram in Fig.5 shows the upper and lower limit in dynamic power estimation for each
module of the EMC. The first contribution is related to Best power value, the second value is the
incremental data to full power value (using gate level SAIF as our reference). Last additional
contribution is an incremental value, due the worst-case power estimation. Figure 5 shows good results
in execution unit (EU) and load-store unit (LSU); in fact the best power is very close to gate level
power value. Fetch unit (FU) gives the worst results; this can be justified seeing the percentage of
annotation of RTL SAIF in Table 4. This shows

Module Nets Ports Pins


ALU 25.79% 100% 33.02%
DU 57.14% 100% 56.40%
EU 38.75% 100% 27.05%
FU 17.12% 100% 32.30%
RU 41.73% 100% 24.47%
LSU 41.46% 100% 34.78%
LSU-CORE 40.39% 100% 37.93%
Table 4: Percentage of RTL annotation before completion.

The worst results in best-case power are those related to the minimum percentage of annotation in Nets
and Pins (most relevant respect Ports):

• Fetch Unit FU 17.12% of Nets.

SNUG Europe 2003 9 Gate Level Power Estimation


by RTL activity file completion
• Read Unit RU 24.47% of Pins.

Best-case annotation, in this case, misses too relevant activity information for accurate power
estimation. Regarding the worst power values, these numbers are too large in decode unit (DU), read
unit (RU) and fetch unit (FU). In these cases, activity file shows that the clock signal (considered in
toggle analysis) has activity much greater than other objects; completion, as searching for the maximum
toggle count, assignees to the un-annotated objects clock activity.

Two different programs stimulate ALU as follows:

§ Using ALU instructions only.


§ ALU instructions not included.

Table 5 reports the power numbers stimulating the ALU with the two programs of 10.2.1 and 10.2.2.
One is using arithmetic logic instructions while the other isn’t. Figure 6 shows the stacked bar regarding
ALU power numbers with the three different test benches. Changing test, gate level power changes, but
the accuracy of best-case estimation is the same.

Test Gate Level Annotation RTL annotation Best Worst Mean


Test ALU (10.2.1) 1.5uW 1.7uW 1.2uW 2.6uW 1.2uW
Test NO ALU (10.2.2) 0.34uW 0.8uW 0.3uW 0.4uW 0.3uW
Table 5: Power bounds for ALU, changing test bench.

EMC : Power Bounds by completed RTL annotation.

200 Worst
Gate
Best
150
Power %

100

50

0
ALU FU DU RU EU LSU LSU-CORE
EMC Modules
Figure 5: Power values of EMC.

SNUG Europe 2003 10 Gate Level Power Estimation


by RTL activity file completion
ALU Power Bounds varying Test-benches

Worst
200
Gate Power
Best
150
Power %

100

50

0
Test 10.1 Test 10.2.1 Test 10.2.2

ALU Test benches

Figure 6: Power bounds for ALU with three different test benches.

6.0 Lessons Learned.

The process of RTL SAIF completion in large SoC shows different problems involving the
scripts executions. In particular, the completion TCL script could have several commands to be
executed; only powerful workstations can execute this script without slackening. The completion flow
proposed uses an intermediate file to collect best (or worst or mean) toggle information. Adding a
completion methodology in a power estimation commercial tool this file is not necessary; working in the
same environment speed grows up. Starting from worst-case power numbers, these estimations could
be improved excluding clocks from analysis.

7.0 Improvements.

The activity completion proposed is simple to apply for end-user of commercial power
estimation tool flows. This is the start point of evaluation, in terms of bounds, of the power values in a
complex SoC. A possible enhancements in the technique is to find the nearest annotated object moving
forward from the start point of un-annotated objects. Synopsys Design Compiler allows to follows

SNUG Europe 2003 11 Gate Level Power Estimation


by RTL activity file completion
paths inside a net list (report_cell -connection). Another point of view is to model the toggle count as
random variable; this variable can be considered as an “incremental Poisson random process”.
Modeling the inter-arrival time in best and worst case (using a more complete VCD file rather than
SAIF) we can complete the annotation with random values.

8.0 Conclusion.

If gate level power estimations could be expensive, it might as well consider a completed RTL
annotation, which is fast, safe and cheaper. In this article we propose to push the accuracy of RTL
annotation with the same cost of RTL simulation. The problem of power bounds calculus is well placed
considering the behavior of commercial power estimation tools. Power values validate the algorithms of
toggle counts in best, worst and mean case. These algorithms are consistent with the results found. This
methodology has to be used in a real large SoC (million of gates). Internal data owned by ST
Microelectronics confirm that the methodology is still useful. The results found are the starting point of a
study for defining new additional features in power estimation commercial tools satisfying the practical
need of the developers.

9.0 Glossary.

“Front-end net list”: gate level net list, which is not placed and routed.
“Back-end net list”: gate level net list with clock tree and place & route.

10.0 Appendix

10.1 Main Test.

This is the code used for the test bench.

LD R4, 0000F000
LD R3, 000001F4
LD R7, R3
MUL R8, R7, R3
ADD R2, R3, R4
ADD R2, R2, R3
LD R5, R2
LD R3, 00000000
ADD R7, R5, 00001000
LD R3, 00000004
MUL R7, R3, 000001F4
ST 00001000, R7
SEC

SNUG Europe 2003 12 Gate Level Power Estimation


by RTL activity file completion
CLC
NEG R3, R7
SHL R4, R3
XOR R2, R4, R3
PUSH R2
CLC
POP R7
NOP
NOP
EXIT

10.2 ALU Test

10.2.1 Arithmetic and logic instructions used to stimulate ALU.

MUL R8, R7, R3


ADD R2, R3, R4
ADD R2, R2, R3
ADD R7, R5, 00001000
MUL R7, R3, 000001F4
SEC
CLC
NEG R3, R7
SHL R4, R3
XOR R2, R4, R3
CLC
EXIT

10.2.2 Load and store instructions used to do not stimulate ALU.

LD R4, 0000F000
LD R3, 000001F4
LD R7, R3
LD R5, R2
LD R3, 00000000
LD R3, 00000004
ST 00001000, R7
PUSH R2
POP R7
EXIT

SNUG Europe 2003 13 Gate Level Power Estimation


by RTL activity file completion

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy