Gate Level Power Estimation by RTL Activity File PDF
Gate Level Power Estimation by RTL Activity File PDF
Completion
Elio.Guidetti@st.com
Bjorn.Kraabol@st.com
Francesco.Pappalardo@st.com
Giuseppe-ast.Visalli@st.com
ABSTRACT
This paper presents a novel enhancement of the tools used for power estimation of synthesized circuits
at gate level. Using an existing commercial tool flow, this paper presents an additional custom tool
providing increased certainty in the power consumption numbers. Accurate power estimations at gate
level requires the use of tools such as Synopsys Power Compiler, working together with activity file
(SAIF) obtained by gate level simulation. The quality of net list evaluation and annotation puts an upper
bound on the accuracy of estimated consumed power through switching. Power consumption can be
estimated at the RTL level however in this case, the low level of annotation obliges Power Compiler to
an estimation found quite conservative in part due to incomplete annotation at gate level. This article
shows a methodology of gate level power estimation by automatic SAIF RTL activity file completion
using PERL scripts annotating the missing objects with best, worst and mean values from locally
analyzing the SAIF. This produces a range in power consumption within which our device is expected
to operate.
1.0 Introduction.
Power estimations of complex SoC make sense when power is a constraint, such as portable devices
or, in general, systems in which power has to be reduced. This is made possible if a power
characterized technology library is available. Power estimations need activity information of each object
in the design collected into a unique file. Switching activity is a linear contribution of the dynamic power;
other parameters are frequency of work, capacitive loads and power supply (last, as quadratic
contribution). Annotating the gate level activity file over a front-end (or back-end) net list is not so
simple; some problems can occur due to the following reasons:
• Misalignment from simulation net list used to produce the activity file and synthesis net list used
in power estimations. A typical example is the use of custom blocks (such as embedded
memories); in this case the behavioral models do not show the glitches loosing a lot of activity
during simulation. When annotating switching activities produced by behavioral model over a
logic library of the same design, several objects are un-annotated. Synopsys Power Compiler
makes conservative power estimation over a partial annotated design compared to fully
annotated modules.
• Simulations of front-end net list often use behavioral models of technology libraries where every
cell has the same delay. Timing errors, in this case, often occur and SoC goes to abnormal
condition; if SoC is a micro-core, it could load an invalid op-code, generating exceptions. The
micro goes to halt state: the annotation is full, but does not represent a real test bench. Although
this is not an “annotation-problem” toggle counts assume values not coherent with the
simulation. An example is the toggles related to registers in SoC; if SoC goes to halt state they
assume a value too low, giving lower power estimation.
For the reasons above, the percentage of annotation is often incomplete. This produces a qualitative
upper bound on power estimations performed. The methodology presented in this paper uses a
procedure of activity file self-completion, from analyzing the missing objects after the first partial RTL
annotation. A PERL script extracts information of toggle count and static probability in the condition of
best, worst and mean case for each module in the design. This produces an interval of power numbers
within which our device is expected to operate. This methodology considers a design (35K cells) that
has not problems described as above as reference. The design was synthesized up to front-end net list,
simulated at gate level. For each sub-module, power estimation is performed. To the same front-end net
list we annotated an activity file, which we obtained from a RTL simulation of the same design. Power
values obtained will be conservative because of the partial annotation. Last, self-completion
methodology is applied over the front-end net list, using a completed RTL activity file. Percentage of
annotation reaches a number very close to 100%. Power numbers in these three cases are compared.
We expect the power values with completion of mean activity will be very close to gate level power
estimation, and best power and worst power will be good bounds for gate level power values. If the
goal of a good bounds is reached, net lists which have problems described above, can be annotated
with the completed RTL activity file.
When designers wish to perform gate level power estimation over a front-end (or back-end) net list,
Synopsys Power Compiler can annotate activity file in SAIF format to the net list. SAIF exists in two
formats: gate SAIF from gate level simulation and RTL SAIF from RTL simulation. This file, in both
formats, is used to determine two main quantities in power estimation:
• Static probability, as percentage of time that the signal is in the high level.
• Toggle count, as number of 0->1 and 1->0 transitions for each object (port, net and pin).
The Synopsys tool used for power estimation, executes the following steps collected in a script:
Compared to a VCD dump file, a SAIF file is more compact. This is due to the fact, that VCD stores
more information respect to SAIF. For every signal VCD has internally the time in which it changes,
while SAIF reports only total toggle count. Synopsys provides a utility (vcd2saif) for conversion from
VCD to SAIF format. Furthermore, Synopsys provides Verilog/VHDL extension (PLI) to write a
SAIF file. The procedure changes if the estimation is at RTL level or gate level. In both cases the
generation of a SAIF file starts by creating an intermediate Forward SAIF in which the list of objects to
trace is being dumped. The command to generate the forward SAIF is:
They are executed both inside Design Compiler (see reference guide for more information). The PLI
routines add new commands in the simulator to generate a simulation, dumping the signals and toggle
information into the activity file (also called Backward SAIF see Fig. 1). The structure of a typical
Verilog top module is shown in Fig.2.
# Import Database
Read_db netlist.db
Link
#Timing Constraint
Create_clock –period 10 –name clock get_ports
clock.
Set_input_delay
Set_output_delay
#Annotating SAIF
read_saif –input backward .Saif –instance top
Report_saif –flat –missing
#Completion (if present)
Source completion.tcl
Report_saif –flat –missing
# Report power
Report_power
Quit
Using a list of un-annotated objects, the annotation can be completed with values met locally in each
sub-module of the design. Design Compiler provides the following command for reporting the un-
annotated objects: report_saif -flat –missing. SAIF file collects the toggle count of each object (nets,
ports and pins), dumped during simulation, as activity information. These data are organized with the
same hierarchy of the design simulated (see Fig. 1) Starting from ASCII dump of “report_saif”
command a PERL script, using the backward RTL SAIF, analyze each module in the design following
the hierarchy of SAIF. The completion methodology assigns to every un-annotated object in a sub-
module a local value of toggle count in best, worst and mean condition:
Let consider a little sub-module with four nets. SAIF reports only activity of three net as follows:
Net1 3
Net2 7
Net4 2
Net3 is not dumped in SAIF file. The PERL scripts analyzes the SAIF and for this module assumes:
In this way we suppose the real activity information (toggle count we do not know) of missing
objects is between minimum (best) and maximum (worst) values. Since switching activity has a linear
role in power estimation, minimum toggle count produces best power dissipation and maximum toggle
count produces worst power dissipation. An intermediate ASCII file registers these values in terms of
toggle count, static probability and T0, T1 and TX values. Fig. 3 shows the structure of intermediate file.
The first line is dedicated to the hierarchy of the current module; the next line represents the minimum (or
maximum or mean) toggle count; third line reports static probability value. A copy of the SAIF line
associated to best (worst or mean) net related to TO, T1 and TX values, goes in the fourth line.
And generates a TCL script for Design Compiler to complete the RTL SAIF with instruction of the
form:
The TCL script will be used as illustrated in Table 1. In this way a new procedure to enable power
estimation based on self-completion activity information has been created. This methodology produces
improved power range in which a device is expected to operate as will be shown in Chapter 5.
The quality of the methodology tested with a real design, which we synthesized, simulated and
estimated its power dissipation. EMC is a little micro-programmed core developed for the purpose of
this article. The following units compose its architecture (see Fig. 4).
• Pre-Fetch.
EMC was simulated with a little program written in assembly code (see appendix). The choice
of best test bench for optimal power estimation is an open point in the research community. We
stimulated every sub-module of the design as practical point of view. So, in the source there are
arithmetic and logic operations and load-store instructions for stimulating the memory controller. The
activity files, produced by gate level and RTL simulations, are used to compare:
• Power values with gate level SAIF annotation (gate-level power estimation).
• Power values with RTL SAIF annotation not optimized (RT-level power estimation).
• Power values with RTL SAIF annotation completed (enhanced RT-level power estimation
which is the new proposed methodology within this paper).
We did power estimations at gate level using technology library at 0.13 micron and a frequency of
100MHz. Table 2 shows the power values for each module of EMC in case 1 and 2.
The diagram in Fig.5 shows the upper and lower limit in dynamic power estimation for each
module of the EMC. The first contribution is related to Best power value, the second value is the
incremental data to full power value (using gate level SAIF as our reference). Last additional
contribution is an incremental value, due the worst-case power estimation. Figure 5 shows good results
in execution unit (EU) and load-store unit (LSU); in fact the best power is very close to gate level
power value. Fetch unit (FU) gives the worst results; this can be justified seeing the percentage of
annotation of RTL SAIF in Table 4. This shows
The worst results in best-case power are those related to the minimum percentage of annotation in Nets
and Pins (most relevant respect Ports):
Best-case annotation, in this case, misses too relevant activity information for accurate power
estimation. Regarding the worst power values, these numbers are too large in decode unit (DU), read
unit (RU) and fetch unit (FU). In these cases, activity file shows that the clock signal (considered in
toggle analysis) has activity much greater than other objects; completion, as searching for the maximum
toggle count, assignees to the un-annotated objects clock activity.
Table 5 reports the power numbers stimulating the ALU with the two programs of 10.2.1 and 10.2.2.
One is using arithmetic logic instructions while the other isn’t. Figure 6 shows the stacked bar regarding
ALU power numbers with the three different test benches. Changing test, gate level power changes, but
the accuracy of best-case estimation is the same.
200 Worst
Gate
Best
150
Power %
100
50
0
ALU FU DU RU EU LSU LSU-CORE
EMC Modules
Figure 5: Power values of EMC.
Worst
200
Gate Power
Best
150
Power %
100
50
0
Test 10.1 Test 10.2.1 Test 10.2.2
Figure 6: Power bounds for ALU with three different test benches.
The process of RTL SAIF completion in large SoC shows different problems involving the
scripts executions. In particular, the completion TCL script could have several commands to be
executed; only powerful workstations can execute this script without slackening. The completion flow
proposed uses an intermediate file to collect best (or worst or mean) toggle information. Adding a
completion methodology in a power estimation commercial tool this file is not necessary; working in the
same environment speed grows up. Starting from worst-case power numbers, these estimations could
be improved excluding clocks from analysis.
7.0 Improvements.
The activity completion proposed is simple to apply for end-user of commercial power
estimation tool flows. This is the start point of evaluation, in terms of bounds, of the power values in a
complex SoC. A possible enhancements in the technique is to find the nearest annotated object moving
forward from the start point of un-annotated objects. Synopsys Design Compiler allows to follows
8.0 Conclusion.
If gate level power estimations could be expensive, it might as well consider a completed RTL
annotation, which is fast, safe and cheaper. In this article we propose to push the accuracy of RTL
annotation with the same cost of RTL simulation. The problem of power bounds calculus is well placed
considering the behavior of commercial power estimation tools. Power values validate the algorithms of
toggle counts in best, worst and mean case. These algorithms are consistent with the results found. This
methodology has to be used in a real large SoC (million of gates). Internal data owned by ST
Microelectronics confirm that the methodology is still useful. The results found are the starting point of a
study for defining new additional features in power estimation commercial tools satisfying the practical
need of the developers.
9.0 Glossary.
“Front-end net list”: gate level net list, which is not placed and routed.
“Back-end net list”: gate level net list with clock tree and place & route.
10.0 Appendix
LD R4, 0000F000
LD R3, 000001F4
LD R7, R3
MUL R8, R7, R3
ADD R2, R3, R4
ADD R2, R2, R3
LD R5, R2
LD R3, 00000000
ADD R7, R5, 00001000
LD R3, 00000004
MUL R7, R3, 000001F4
ST 00001000, R7
SEC
LD R4, 0000F000
LD R3, 000001F4
LD R7, R3
LD R5, R2
LD R3, 00000000
LD R3, 00000004
ST 00001000, R7
PUSH R2
POP R7
EXIT