CDC Plassan 2018 Diffusion
CDC Plassan 2018 Diffusion
properties
Guillaume Plassan
Présentée par
Guillaume PLASSAN
M. Nicolas HALBWACHS
Directeur de recherche, CNRS Alpes, Président
M. Wolfgang KUNZ
Professeur, Université de Kaiserslautern, Rapporteur
M. Jean-Pierre TALPIN
Directeur de recherche, INRIA Rennes, Rapporteur
Mme Dominique BORRIONE
Professeur émérite, Université Grenoble Alpes, Directrice de thèse
Mme Katell MORIN-ALLORY
Maître de conférences, Grenoble INP, Co-directrice de thèse
M. Hans-Jörg PETER
Ingénieur-Docteur, Harman International, Co-encadrant de thèse
M. Florian LETOMBE
Ingénieur-Docteur, Synopsys, Membre
ii
Acknowledgments
This thesis has been a long journey, which could not have been possible without the help
and support of many great people.
First and foremost, I owe a great deal to my industrial advisor Hans-Joerg Peter, who
consistently proved to be a great colleague, teacher and researcher. I wish to thank very
much my directors Dominique Borrione and Katell Morin-Allory, whose time and precious
advices made me learn a lot. The University of Grenoble Alpes and Grenoble INP are also
acknowledged for giving me the opportunity of this thesis, and for providing the quality
education that led to it.
Thank you very much to Jean-Pierre Talpin and Wolfgang Kunz, who accepted to
review my manuscript. For accepting to be part of the jury, Nicolas Halbwachs is greatly
thanked. I also thank Florian Letombe for doing more than just representing Synopsys
in the jury.
For providing such a pleasant and proficient working environment, I thank my col-
leagues Julien, Laurent, Paul, Emmanuel, Paras, Shaker, Fahim, and particularly Nikos
Andrikos and Dmitry Burlyaev. Mejid Kebaili and Jean-Christophe Brignone also deserve
credit for initiating me to the CDC topic and participating in a fruitful collaboration.
I thank my dear family for their support throughout my studies and for always cheering
me up with good meals.
Last but not least, a thought for my friends, who persistently provided a hearty comfort
zone, whether climbing, improvising or sharing good moments. Thank you to Adé (for
enduring our engineering talks), Kévin (for resigning from his job to open a bar), Léa (for
her cakes), Marc (for his impressive climbing skills) and Pierre (for his persistent DIY
spirit). Of course, there is a special thank to my Quand Mêmes: Baptiste (because we are
INProx), Bé (for her joy), Candice (for being herself), Catherine (for becoming a QM),
Cathy (for her love of order), Diane (for her music taste), Guiouane (because we will lead
the world), Ioul (for his quantic spirit), Johan (for being a climbing circus), Julien (for
his dog), Ludo (for his inspiration), Micham (for making decisions), Nico (for managing
us), Pablo (a.k.a. Mr. Pleasure), Pat (for forcing me to take breaks), Polo (for keeping
his coolness), Robin (even though I’m purple now), Steph (for his absurdity).
iii
iv
Contents
1 Introduction 1
1.1 Hardware Design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Clock-Domain Crossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Metastability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Glitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Bus Incoherency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Data Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Verification of CDCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Structural Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
v
vi CONTENTS
A Table of Notations 79
F Résumé en Français 93
viii CONTENTS
CHAPTER 1
Introduction
Our society increasingly relies on digital systems to assist us in our daily life. Some systems
improve our quality of life: smartphones provide an easy access to worldwide informa-
tion, thermostats adjust the room temperature without any human interaction,. . . Other
systems compute complex tasks better than the human mind: planes navigate steadily,
modern cars break when facing an immediate danger, factory robotic arms perform repe-
titive task at a high rate,. . .
Any engineer would agree that the creation of a system necessarily comes with functi-
onal issues, also called bugs. Having a bug in the final product may have tremendous
consequences. If multiple smartphones explode in the pockets of random customers, this
will lead to a vast product recall, losses in brand recognition, and an unsellable product.
In addition to the financial impact, bugs in security-critical systems may lead to leaking
private data or even endangering lives. Even if a bug is identified before getting to the
customer, it may take weeks to be fixed hence delaying the whole production and leading
to potential market losses. Thus, systems require proper techniques to be validated on
multiple functionality, safety and security criteria, so that bugs are identified and fixed
as early as possible in the design flow.
In particular, most critical systems involve embedding software on hardware. Al-
ong its development, software can be tested and patched to fix the bugs, even remotely.
Conversely, integrated hardware cannot be modified after its fabrication. For these mi-
croelectronic designs, the validation is then required to be performed on a virtual model,
before producing the real circuit.
As the society asks for more and more complex tasks to be automated, the complexity
of these designs significantly increases. The more complex the system, the more bugs are
prone to be introduced, and the harder it is to identify them. To manage the increasing
complexity of hardware systems, a specific design and verification flow is processed.
1
2 INTRODUCTION
Specification
Simulation
HDL
Formal Methods
RTL
= Equivalence Check
Synthesis
Netlist Static Timing Analysis
Place&Route = Layout vs Schematics
Layout
Design Rule Check
Production
Chip Test
project to the next, which accelerates the overall development and verification. For in-
stance, a common scheme to design central processing units (CPU) is to assemble generic
modules from previous projects along with state-of-the-art cores from other companies.
Whereas some individual modules have already been verified on their own, their intercon-
nections remain at risk. Indeed, this modular approach requires a lot of interconnections,
and since the modules have a different origin, their specifications need to be very precise,
which is often not the case.
To formally represent the functionality described by the specification, digital hardware
is usually designed using a Hardware Description Language (HDL) such as Verilog or
VHDL. In HDL, logic gates are designed (AND, OR, NOT) along with memories (D-flip-
flop, latch) and arithmetic operations (adder, comparator, . . . ). This Register Transfer
Level (RTL) allows to design large hardware with modularity and flexibility in the ar-
chitecture. To verify its functionality, simulation stands as the most accepted technique
(see Figure 1.13). Simulation virtually exercises the system by feeding patterns of values
from a test bench to the inputs of the design, while internal values are checked against
a reference. However, this method cannot exhaustively exercise all potential runs of a
design, which leads to a hard effort in maximizing the coverage. Thus, formal methods
are used to model the design into mathematical equations which fully describe its beha-
vior. Whereas this method guarantees the absence of bugs, it has been observed not to be
able to deal with recent complex systems. It is then mostly used on small critical modules
whose correctness is required to be guaranteed.
While digital hardware models abstract the physical values as zeros and ones, the
final product will be analog. For instance, when an RTL signal toggles, the corresponding
physical wire will actually experience a voltage ramp which stabilizes at the intended value
(0 being the ground, and 1 being a fixed voltage). Thus, the RTL is synthesized into a
netlist which only instantiates standard analog components: logic gates and memories.
Each component is associated to a technology library with timing constraints related
to technology, voltage, and environment factors. The netlist is then formally checked
for equivalence against the RTL, which guarantees that the functionality has not been
1.2. CLOCK-DOMAIN CROSSING 3
modified by synthesis. Also, the timing between components is checked using static timing
analysis (STA). Indeed, a digital design runs on a specific frequency, managed by an
oscillating clock signal. On the rising edge of this clock, D-flip-flops (also called registers)
will memorize the value of their input signals. As they can only capture a 0 or a 1, this
input needs to be stable, and not in the middle of a voltage ramp. The propagation delay
of a data is evaluated across all combinational gates to guarantee that their signals will
be stable on the each rising edge of the clock.
Each component of the technology library is mapped to an analog model, described
using layers of materials (mostly silicon-based semiconductors). All these components
are then placed on a given area, and routed (interconnected) with metal layers. As the
fabrication of each layer in a clean room is roughly estimated to cost a few millions of
dollars, the placement and routing intends to minimize the number of layers. Since this
optimization step may introduce bugs, the overall layout is checked against the netlist
for functionality issues, with a so-called layout-versus-schematics (LVS) check. Also, a
design rule check (DRC) ensures that these layers of materials are manufacturable with
the chosen technology.
The layout is then sent to a founder which produces wafers of dice in clean rooms.
Each hardware is tested using input patterns, and rejected if the output values do not fit
the expected values. Finally, the hardware is packaged into chips for the consumer to use.
DATA_IN DATA_OUT
Multiple problems arise from CDCs [25, 75] (metastability, bus incoherency, glitch,
data loss, . . . ), and designers need to implement specific structures at RTL-level to avoid
any issue [25, 31] (multi-flop, handshake, Gray-encoding, . . . ). Consequently, verification
engineers must check that all the potential problems have been addressed and corrected.
In the following sections, we will provide an overview of the major issues around CDCs,
and solutions to overcome them.
1.2.1 Metastability
The definition of clock domains directly implies that when data changes in the source
domain of a CDC, the destination register can capture it at any moment: compliance
with the setup and hold time requirements is not guaranteed. Hence, because of a small
delay between the rising edges of the two clocks, the data is captured just when it changes,
and a metastable value may be propagated (as shown in Figure 1.3).
The metastability phenomenon has been identified a few decades ago [17]: if a me-
tastable value is propagated through combinational logic, it can lead to a so-called dead
system. And it would be very difficult to find the source of this issue after fabrication, as
post-production testers do not understand non-binary values.
A first solution is to introduce a latency in the destination domain, in order to wait
for the stabilization of the value. This timing can be estimated by considering the clock
1.2. CLOCK-DOMAIN CROSSING 5
frequencies and the production technology, as is commonly performed in the Mean Time
Between Failure computation [29]. A single dedicated register could then, if properly sized,
output a stable data. However, such synchronizing registers would result in a significant
overhead on the circuit size. Another technique [42, 51] involves embedding a monitor in
the design which detects and corrects metastable values. However, the overhead would
also be significant.
The most common solution is to add latency by implementing cascaded registers [32]
(see Figure 1.4). While this multi-flop structure guarantees within a certain probability
that the propagated value is stable, there is no way of telling if it is a ‘0’ or a ‘1’. Indeed,
the data being captured during a change, the multi-flop may output the old or the new
value during one cycle; then, at the next destination cycle, the new value is propagated.
The drawback of this structure is thus a delay in the data propagation.
DATA_CDC
DATA_IN D Q D Q D Q DATA_OUT
CLK1 CLK2
1.2.2 Glitch
Whenever two signals converge on a combinational gate, a transient inconsistent value
(glitch) may appear. While on a synchronous path, static timing analysis ensures that
this glitch is resolved within a clock period, in the context of a CDC, the glitch may be
captured and its value propagated.
DATA_CDC[0]
DATA_IN[0] D Q
Δ1
DATA_AND
CLK1 D Q D Q DATA_OUT
Δ2
DATA_IN[1] D Q CLK2
DATA_CDC[1]
CLK1
For instance, in Figure 1.5, two signals from the source domain (DATA_CDC[0] and
DATA_CDC[1]) converge to an AND-gate. Because of physical constraints, the value
from DATA_CDC takes a delay ∆ to propagate from the register to the AND-gate. Since
this delay cannot be guaranteed to be exactly the same for all converging paths, a glitch
may appear on DATA_AND (see Figure 1.6). It is then possible that the destination
register captures this glitch, hence propagating an incorrect value.
6 INTRODUCTION
To guarantee that a glitch cannot appear on this gate, the two input signals should
never toggle in the same cycle. In other words, the control logic which generates these
signals must ensure that only one signal can toggle at a time.
CLK1
Δ1
DATA_CDC[0]
Δ2
DATA_CDC[1]
DATA_AND
CLK2
DATA_OUT
DATA_IN[1] D Q D Q D Q
DATA_CDC[1] DATA_OUT[1]
CLK1 CLK2
For instance, in Figure 1.8, two bits of DATA_CDC are toggling at the same cy-
cle. After being synchronized by separate multi-flops, the DATA_OUT bus value is not
consistent anymore. If both bits are converging on an exclusive OR gate, a glitch can
be observed. However, we can add some encoding so that only one bit can change at a
time [25] (Gray-encoding in the case of a counter, or mutual exclusion in other cases).
Even if the multi-flops stabilize this toggling signal with different latencies, the bus output
value will be correct regarding either the previous or the next cycle. Thus, no false value
is propagated. This can create some data loss, but avoids incoherency.
1.2. CLOCK-DOMAIN CROSSING 7
DATA_CDC[0]
DATA_CDC[1]
CLK2
DATA_OUT[0]
DATA_OUT[1]
DATA_XOR
An alternative solution is using a control signal which stops the data from propaga-
ting to the destination registers (Figure 1.9). This CTRL signal is set to ‘1’ only after
DATA_IN stabilizes. Hence, no metastability is propagated in the destination domain,
and there is no need for further resynchronization[25]. Also, since the resulting bus is
always coherent, a glitch cannot be generated. Of course, this ‘stable’ information comes
from the source clock domain, so this control signal must be resynchronized in the desti-
nation domain (here with a multi-flop). (In Figures 1.9, 1.10 and 1.11, clock domains are
shown in blue or red, data is in green and control logic is in yellow or purple.)
Note that different synchronization schemes are derived from this structure. The
control signal is here controlling a recirculation mux of the destination flop, but it could
also be connected to a clock-gate enable, an AND-gate or an OR-gate on the data path.
0
DATA_CDC
DATA_IN 1 DATA_OUT
DATA_IN D Q D Q DATA_OUT
E E
EN
REQ
D Q D QD Q
Q D Q D Q D
ACK
The delay introduced by handshake protocols may not be acceptable for a high-rate
interface. Putting a FIFO in the CDC allows the source to write and the destination to
read at their own frequencies, and increases the data propagation efficiency. In a FIFO, all
the previous schemes are implemented (see Figure 1.11). The main controls of the CDC
are the write and read pointers, which need to be Gray-encoded before being synchronized
by a multi-flop. In order to activate the source or destination access, global write and
read control signals can be implemented in a handshake protocol.
Using a FIFO implies some data latency (caused by the handshake and the resynchro-
nization of pointers), but allows a higher transfer rate. All the previously mentioned issues
are avoided (metastability, coherency, data loss), but its complexity makes the FIFO the
most difficult synchronizer to design and verify.
1.3. VERIFICATION OF CDCS 9
D Q
DATA_CDC
DATA_IN E D Q DATA_OUT
D Q E
one-hot
WR_PTR
RD_PTR
WR RD
WRITE Q D Q D D QD Q READ
FSM FSM
Assumptions Design
Formal properties are generated from all synchronizer patterns, and can then be ve-
rified at RTL level using simulation or formal methods (see Figure 1.13). However, the
CDC issues are by nature very rare since they depend on physical timing constraints.
While simulation scales for large designs, it is only able to show the absence of functio-
nal errors in a subset of the full design behavior. A simulation environment, even with
good coverage, is then very likely to miss a glitch or metastability. Consequently, in the
functional verification of CDC, using model checking prevails.
Specification
design environment
• Chapter 2 provides the formalism for a structural and a functional hardware model,
along with its model checking;
• Chapter 3 focuses on the identification of clock configuration modes, with the intent
of reaching a realistic clock setup;
• Chapter 4 tackles the state-space explosion problem, while providing guidance for
counter-example analysis;
• Chapter 5 aims at creating missing protocol assumptions for under-specified envi-
ronments;
• Chapter 6 concludes on the CDC formal verification flow, and provides perspectives
for future works.
CHAPTER 2
Among all following notations, vector indexes are super-scripted; time and context are
sub-scripted. Notation B = {true, false} denotes the Boolean set, and N denotes the set
of integers including zero.
The Kleene closure of the Cartesian product of a set A is referred to as A∗ . The
cardinality of a finite set A is expressed as |A| (e.g., |B| = 2). Note that vectors are
considered as ordered sets. For any vector ω = hω 1 , ω 2 , . . . , ω n i, we respectively define a
prefix, suffix and sub-sequence with the following notations:
∀0 ≤ i ≤ n, ω ..i = hω 1 , . . . , ω i−1 , ω i i
∀0 ≤ i ≤ n, ω i.. = hω i , ω i+1 , . . . , ω n i
∀0 ≤ i1 ≤ i2 ≤ n, ω i1 ..i2 = hω i1 , ω i1 +1 , . . . , ω i2 −1 , ω i2 i
As they are foundations of this thesis, it is expected from the reader to be familiar
with the following notions:
• Hardware description languages such as VHDL [37] or Verilog [36];
• Formal specification languages such as LTL [69], PSL [38] or SVA [39];
• The main principles of model checking [6, 22, 24].
Using an additional input signal α to model the variable duration of the phases 2 and 4,
the netlist description in Figure 2.2 shows a potential circuit generating the acknowledge
control signal.
13
14 FORMAL VERIFICATION OF A HARDWARE MODEL
clk
data value
req
ack
1 2 3 4
acki
0 muxq
reqi D Q ack
req D Q 1
clk
Figure 2.2: Handshake netlist
The overall structural design model is defined using a directed graph with labeled vertices
D = hA, E, Typei, where:
• A is the set of vertices (also called signals);
2.1. HARDWARE MODELING 15
¬req req.¬α
req
S0 S1
ack ¬req.¬α ack
.α
req req.α
¬req.α
e q.α
¬r
S3 req.¬α S2
¬req.¬α ack ¬req ack
req
αn mux0
α muxq acki ack
req mux1
reqi
At = {α ∈ A | Type(α) = t}
We can then define: Ain , Aout , Azero , Aone , Anot , Aand , Aor and Aseq . We consider that
all the signals in a set At are ordered. For instance, the notation Aiin represents the ith
primary input, where: 1 ≤ i ≤ |Ain |.
As an example, the handshake design model of Figure 2.4 is defined by:
• A = {α, αn, req, reqi, mux0, mux1, muxq, acki, ack}
• E = {hα, αni, hα, mux1i, hαn, mux0i, hmux1, muxqi, hmux0, muxqi,
hmuxq, ackii, hacki, acki, hacki, mux0i, hreq, reqii, hreqi, mux1i}
• Ain = {req, α}
• Aout = {ack}
• Anot = {αn}
• Aor = {muxq}
• Aand = {mux1, mux0}
• Aseq = {reqi, acki}
The successors of a signal in model D are the signals directly succeeding it:
The predecessors of a signal in D are the signals directly preceding it (also called inputs
of the operator):
Note that the Tseq , Tout and Tnot operators only have one predecessor; the operators
Tin , Tzero and Tone have no predecessor; the operator Tout has no successor:
• ∀α ∈ (Aseq ∪ Aout ∪ Anot ), |Pred(α)| = 1
• ∀α ∈ (Ain ∪ Azero ∪ Aone ), |Pred(α)| = 0
• ∀α ∈ Aout , |Succ(α)| = 0
A structural path from α to α0 is a vector of successive signals πα..α0 = hπ 1 , π 2 , . . . , π n i ∈
A∗ , where:
• π1 = α
• π n = α0
2.1. HARDWARE MODELING 17
• ∀1 ≤ i < n, hπ i , π i+1 i ∈ E
We call cone of influence (or fan-in) of a signal α0 the set of signals from which there
exists a structural path to α0 :
COIα0 = {α ∈ A | ∃πα..α0 }
A structural path π is called combinational iff it does not include any sequential
operators, i.e.:
∀1 ≤ i ≤ |π|, π i ∈
/ Aseq
Throughout this thesis, we assume that the graph has no combinational loop: there is
no repeating signal in a combinational path. More precisely, it means that D has a finite
number of combinational paths.
The design graph D is composed of only eight types of simple components. More complex
operators like the multiplexer, the latch or the flip-flop are then modeled by combining
these simple components. For instance, a multiplexer (mux) can be modeled by two
AND-gates, an OR-gate and an inverter, as shown in Figure 2.5.
I0
S mux 1
Q 0
I1
Also, if all the sequential elements in the design are flip-flops and are sampling on one
common clock, the Tseq component (called the simple flop model hereafter) can be used to
model all of them. Otherwise, in case the design contains latches or multiple clocks, more
complex models are necessary. A latch is modeled with a multiplexer and a sequential
operator (see Figure 2.6).
Our flip-flop model (also called register) is composed of latches enabled on different
polarities of the clock (see Figure 2.7). The model is also enhanced with an asynchronous
reset to force the output to a constant zero or one.
To facilitate the manipulation of complex operators, we introduce notations for the
signals and edges related to the multiplexer, flop and latch models. The set Amux contains
all signals which are the outputs of a multiplexer. Similarly, we define Aflop and Alatch .
With the following notations, the input signals of these operators are identified :
18 FORMAL VERIFICATION OF A HARDWARE MODEL
seq 0 latch D Q
D Q E
1
EN
zero/one
zero/one
1
1
0
0
seq flop D Q
0 seq 1
D 0
1 Q
CLK RST
In the case of the handshake from Figure 2.4, the complex operators modeling allows
to see the netlist as in Figure 2.2. Because there is only one deterministic clock, the simple
flop model Tseq is used, and the clock is abstracted. The complex operator sets are:
To build the functional model M upon D, each signal is associated to a binary value.
The semantics M is described as a deterministic finite state machine where each state
2.1. HARDWARE MODELING 19
Σ = Σl × Σg × Σs
λ(s) = g
The relation between the structure and the functionality of an RTL design is implied by
the valuation function Valω :
Valω : A × N −→ B
20 FORMAL VERIFICATION OF A HARDWARE MODEL
Within the context of an execution ω ∈ Ω, Valω gives the value of a signal at a time
point.
The type of an operator defines a relation between its value and the values of its inputs
(if any). The properties of the valuation function Valω follow:
∀α ∈ Azero ,
Valω (α, t) = false
∀α ∈ Aone , Valω (α, t) = true
Valω (α, t) = Valω (α0 , t) with {α0 } = Pred(α)
∀α ∈ Aout ,
∀t ∈ N, ∀α ∈ Anot , Valω (α, t) = ¬Valω (α0 , t) with {α0 } = Pred(α)
0
∀α ∈ Aand ,
V
Valω (α, t) = α0 ∈Pred(α) Valω (α )
Valω (α, t) = α0 ∈Pred(α) Valω (α0 )
∀α ∈ Aor ,
W
Valω (α, t + 1) = Valω (α0 , t) with {α0 } = Pred(α)
∀α ∈ Aseq ,
The only non-determinism of Valω lies in the values of type Tin and of type Tseq at time
point 0.
For instance, a correct handshake transaction – as in Figure 2.1 – reduced to signals
req and ack is:
D E D E
ω= Valω (req, 0), Valω (ack, 0) , . . . , Valω (req, 6), Valω (ack, 6)
= h0, 0i, h1, 0i, h1, 0i, h1, 1i, h0, 1i, h0, 1i, h0, 0i
2.2.1 Satisfiability
The formal verification of a model is based on formal properties which define a correct
design behavior. The property formalism we use is the linear temporal logic (LTL). We
build an LTL formula over the model M by considering all signals in A as atomic propo-
sitions. In particular, we are interested in the subset P of safety LTL formulas [47, 57].
As proved in [76], a safe LTL property is constructed in a positive normal form,
using the usual Boolean logic and the temporal operators X (next), U (weak until) and G
(always). The syntax of an LTL property in P is recursively defined as follows:
• true ∈ P
• false ∈ P
• ∀α ∈ A, α ∈ P
2.2. FORMAL VERIFICATION 21
• ∀α ∈ A, ¬α ∈ P
• ∀φ ∈ P, Xφ ∈ P
• ∀φ ∈ P, Gφ ∈ P
• ∀(φ, φ0 ) ∈ P, φ ∧ φ0 ∈ P
• ∀(φ, φ0 ) ∈ P, φ ∨ φ0 ∈ P
• ∀(φ, φ0 ) ∈ P, φ U φ0 ∈ P
The semantics of the LTL language are represented by the satisfaction – or failure –
of a property. We use the notation ω φ to indicate that a property φ ∈ P is satisfied
in the sequence of configurations ω ∈ Σ∗ . The satisfaction relation and its negation 2
are inductively defined as:
• ω true
• ω 2 false
• ∀α ∈ A, ∀i ≥ 1, ω i.. α ⇐⇒ Valω (α, i − 1)
• ∀α ∈ A, ∀i ≥ 1, ω i.. ¬α ⇐⇒ ¬Valω (α, i − 1)
• ∀φ ∈ P, ω Xφ ⇐⇒ ω 2.. φ
• ∀φ ∈ P, ω Gφ ⇐⇒ ∀i ≥ 1, ω i.. φ
• ∀(φ, φ0 ) ∈ P, ω φ ∧ φ0 ⇐⇒ (ω φ) ∧ (ω φ0 )
• ∀(φ, φ0 ) ∈ P, ω φ ∨ φ0 ⇐⇒ (ω φ) ∨ (ω φ0 )
(∃j ∈ N, (ω j.. φ0 ) ∧ (ω ..j−1 Gφ))
• ∀(φ, φ0 ) ∈ P, ω φ U φ0 ⇐⇒
∨(ω Gφ)
When all possible executions of a given design model satisfy a property φ, we say that
the model satisfies the property:
M φ ⇐⇒ ∀ω ∈ Ω, ω φ
A property that is used to verify the behavior of the model is called an assertion.
Conversely, an assumption χ (also called constraint) is a property that specifies the design
environment, and which restricts the set of admissible input value sequences. In effect, an
assumption restricts the behavior of the model, which is strongly related to the assume-
guarantee paradigm from [23, 70]. When M is constrained by χ, then it is denoted M|χ
and is defined by:
M|χ φ ⇐⇒ ∀ω ∈ Ω, ω χ → ω φ
When model M constrained by χ does not satisfy property φ, then we say that the
property is failed. Note that an execution exhibiting a failure of φ (also called a counter-
example, or CEx) will satisfy χ.
M|χ 2 φ ⇐⇒ ∃ω ∈ Ω, ω χ ∧ ω 2 φ
22 FORMAL VERIFICATION OF A HARDWARE MODEL
Property φ semantics can be described as a Moore machine Mφ [16, 80, 81] the input
alphabet of which is in B|A| , and output alphabet is B. The output letter of this machine
indicates whether the property is satisfied or failed. Hence, we call this machine the
property monitor.
As demonstrated in [2, 11, 13, 30, 62] such machine is synthesizable into hardware.
The corresponding structural representation Dφ takes all signals of D as inputs, and
outputs a single new signal denoted αφ . Within the context of an execution, the positive
(respectively negative) value of αφ represents the failure (respectively satisfaction) of
property φ. Thus, execution ω satisfies property φ up to a time point if the value of αφ
stays false:
ω φ ⇐⇒ ∀t ∈ N, ¬Valω (αφ , t)
When the model is constrained, the satisfaction of the assertion is then defined as:
Taking the PSL property written above, the corresponding circuit monitor and state
graph are represented in Figure 2.8. As we consider a Moore machine (and not a Me-
aly one), the final value of the monitor is latched. Here, Σs is the set of valuations
of hseq1, seq2i, and:
ack.¬req
¬ack
ack
S0 S1
αΦ ¬ack.¬req αΦ
ack ack.req
eq
¬ack c k.r
¬a ack.¬req
¬ack
.¬req
S3 S2
αΦ αΦ
¬ack.req
ack.req
(b) Functional representation of φ
Constant Modeling. When a primary input is specified as a constant, then the model
is modified: the type of the input is changed from Tin to Tzero or Tone .
Clock Modeling. Current model checkers only consider one overall clock of the state
machine. When the design has multiple clock inputs with different periods, state-machines
are created to model each clock as if they derived from one master clock (Figure 2.9). The
outputs of the corresponding netlists are connected to the clock signals of the design. The
overall model can then be clocked by a unique master clock, whose period is the greatest
common divisor of the modeled clock periods. For instance, if the designer specifies two
2.2. FORMAL VERIFICATION 25
signals as clocks of period 2ns and 5ns, then the master clock is considered to have a
period of 1ns.
clk5
clk5
clk5
clk2 clk2
clk5 clk5
Reset Modeling. In practice, the initial state s0 has to be reached after powering up
a circuit by an initialization phase, which may be potentially complex. If no specific
initialization execution is predefined, it is then inferred by symbolically evaluating the
system: all the resets are kept enabled (replaced by constants) until all register values
are stable. This allows resets to propagate through registers, until a global stable state
is reached. We assume the sequential values of this state as the initial state (which
restricts the original Σs,0 ). In our context, properties should not be verified during the
initialization phase. During all following formal checks, reset signals are replaced by
constants corresponding to their disabled value.
3.1 Context
27
28 REACHING A COMPLETE DESIGN SETUP
a control protocol usually does not change between modes, verifying it in one may be
enough. Finally, having a good understanding of the clock network and its modes helps
provide a realistic setup to the tool very efficiently.
CLK1 D Q out
0 clksel
clkgate
D Q 1
clkdiv E Q
D
CLK2
E
RST
CFG
SEL
EN
can derive from a primary clock (e.g., CLKDIV derives from CLK2) or from another derived
clock (e.g., CLKGATE derives from CLKSEL).
The configuration signals are mostly all non-clock inputs to the clock network. Ho-
wever, as the clock control becomes more and more complex, modern designs include
configuration FSMs which gather information from both the environment and the on-
going tasks of the system in order to compute configuration values. Thus, configuration
signals are identified as either primary inputs or internal registers.
We define an operation mode by the set of configuration signal values. Within such
operation mode, configuration signals are then considered constant. Figure 3.2 exhibits an
operation mode where CLK2 is selected after being divided by 4. Note that in a realistic
execution, the system may dynamically switch between modes, depending on the required
performance.
CLK1
CLK2
CFG 1
CLKDIV
SEL
CLKSEL
EN
CLKGATED
provide a functional analysis of the clock network and of its modes. As a result of this
analysis, protocols will be verified in a realistic and complete design setup.
To verify the setup of a clock network, we need to identify its global architecture (Fi-
gure 3.3). First, the clock signals are determined. Then, the (external and internal)
configurations signals are identified.
However, a clock network includes sequential transformations, and control loops, which
makes its boundaries not trivial to find. For instance, structurally, there is no way to
differentiate a configuration register from a register which is part of the shaping logic (the
cloud in Figure 3.1). Thus, the derived clocks cannot be restricted to the clock pins of the
registers, and the configuration signals cannot be restricted to the primary inputs. The
identification of the clock paths and the control paths need to rely on heuristics.
D Q D Q
Clock D Q
primary clocks
Clock Network
Network D Q
Clock Config.
Network FSM
primary controls
Figure 3.3: Global clock network
*SYNOPSYS CONFIDENTIAL*
3.3. MODE GENERATION 31
Configuration signals are identified as the inputs to the clock network which are not on
the clock path. A first approximation would be to consider them as:
[
{α ∈ Ain ∩ COI(αc ) | α ∈
/ Aclk }
αc ∈Aclk
However, as seen in 3.1.2, internal signals may also configure the clock network. We
could then consider that the first sequential operators on the side of the clock path are
configuration registers. This is incorrect, as there can be sequential logic (or even an
FSM) between a configuration signal and the clock path.
*SYNOPSYS CONFIDENTIAL*
3.3.1 Justification
*SYNOPSYS CONFIDENTIAL* To compute this relevance, we use the justification
algorithm of Mishchenko [61]. Within the context of a CEx ω exhibiting a failure of
property φ, the justification function Jω,φ is recursively defined on the design netlist:
Jω,φ : A × N −→ B
If signal α is relevant to the failure at time point t, then Jω,φ (α, t) is true. Otherwise, it
means that the signal value is a don’t care value. The computation of function Jω,φ starts
by considering that the property monitor signal αφ is relevant whenever its value is true.
Then, it traverses the netlist backwards, and for each gate, decides which of its inputs are
relevant. For instance, if the output of an AND-gate is 0, and one of its inputs is 0, then
this input is relevant. When the traversal reaches a relevant register, it continues on its
32 REACHING A COMPLETE DESIGN SETUP
inputs but now justifying the previous time point. The justification process terminates
when the traversal reaches the primary inputs at the first time point.
d : A × N −→ B ∪ {−}
Val ω (
Valω (α, t) if Jω,φ (α, t)
α, t 7→
− otherwise
*SYNOPSYS CONFIDENTIAL*
χω ⊆ Ω − {ω}
Checking the property under the blocking condition will result in a new CEx. In
order to generate many CEx, a new blocking condition is iteratively created using the
latest obtained CEx, and added to the global set of assumptions χall . Let χall,i be the
set of assumptions at step i, and assume that ωi is the CEx of M|χall,i 2 φ. Then,
2
For ease of notation, we shall not distinguish a property and the set of executions on which the
property is satisfied.
3.4. CLOCK NETWORK FUNCTIONAL VIEW 33
we can define the successive blocking conditions, starting with the original user-defined
assumptions, and terminating at step k when M|χall,k φ:
(
χall,0 = χuser
∀i ∈ N, i < k, χall,i+1 ⊆ χall,i − {ωi }
Litω : A × N −→ (P
α if Valω (α, t)
α, t 7→
! α otherwise
*SYNOPSYS CONFIDENTIAL*
3.3.3 Algorithm
*SYNOPSYS CONFIDENTIAL*
Signals CLK1 and CLK2 are defined as primary clocks, with respective periods of
10ns and 15ns. No other assumption is given on the design (in particular, no constants on
the other inputs). The asynchronous reset signal rst, which is used in the counter, stays
unconstrained.
*SYNOPSYS CONFIDENTIAL*
Interestingly, while the reset was left unconstrained, our clock analysis found out that
it was required to be disabled for modes 3,4,5, and that its value does not matter for
modes 1,2. Table 3.1 represents these modes with their respective source clock, shaping
factor and configuration values. Indeed, within this clock network, the reset only affects
the counter, which is not selected in mode 1 and always enables the division in mode 2.
This observable non-determinism in the reset values will lead the verification engineer to
provide a correct reset setup. Then, this clock analysis not only leads to a complete setup
of the clock tree, but also of the reset.
Derived: clkgate #1 #2 #3 #4 #5
Source CLK1 CLK2 CLK2 CLK2 CLK2
Frequency - 1/2 1/4 1/6 1/8
Configurations CFG - 00 01 10 11
SEL 0 1 1 1 1
EN 1 1 1 1 1
RST - - 0 0 0
Using these results, an engineer can review the potential modes of his design, and
selects one (or more) for further verification steps. For instance, if one crucial mission
mode to check is mode 2, the CFG, SEL and EN will be respectively assumed to be
constants “00”, 1 and 1. In parallel, an overview of the clock network is given in Figure 3.4.
Note that the table of operation modes and the functional schematics can be directly used
to create or review the specification of the clock network and of the design modes.
D Q
CLK1
0 CLKSEL
DIV/x CLKDIV CLKGATE
CLK2
2,4,6,8
1
CFG
SEL
EN
Derived: clkout #1 #2 #3 #4 . . . #6 #7
Source clkin clkin clkin clkin . . . clkin clkin
Frequency 1/2 1/3 1/4 1/5 . . . 1/32 1/64
Configuration div 1 2 3 4 ... 31 0
signals rst_n 1 1 1 1 ... 1 1
For another experiment at the hierarchical level of the CCU module, the primary clock
is considered to be the output of the PLL/VCO module (signal l2clk). The propagation
of this clock leads to gathering 9 configuration signals and 457 derived clocks, among
which the output of the ccu_divider. In the verification of this signal (ccu_io2x_out), 136
modes are generated, all exhibiting a division by 2 of the primary clock. After running
the Quine-McCluskey optimization, it appears that any combination of the configuration
signals leads to a correct clock behavior. Hence, the specification has been formally proved
to correctly define the behavior of clock ccu_io2x_out: a deterministic division by 2 of
clock l2clk (see Figure 3.5).
CCU
ccu_divider
div
rst_n
for the whole system among its 38 primary clocks and 17 primary resets. Because this
control module was reused from previous projects, and because an exhaustive specification
was not available, the clock configurations and clock tree components were not perfectly
understood by the verification engineers.
DATA
REQ ACK
FIFO
PLL
DATA
CLK/RST REQ
ACK
clocks CTRL
resets
CLKCPU
CPU
DATA
ACK
ACK
DATA
ACK
Low Power Handshake
REQ REQ
CTRL REQ
DATA
Primary clocks and resets are defined on the primary inputs of the design. The clock
propagation leads to a global clock path of 450 sequential elements (most of them covered
by 2 mission clocks). 134 configuration signals are identified, among which primary inputs
and internal signals. Interestingly, 109 of these signals are identified to be the outputs
of only 10 different modules. After review with designers, these modules were confirmed
to be configuration FSMs. The remaining signals are either primary inputs or internal
registers inside the clock control module. These observations give a strong confidence in
3.5. EXPERIMENTAL RESULTS 37
CLK1 CLKINT D Q
CLK3
CLKCPU
CLKSW
CLK2 DIV/2
Derived: CLKCPU #1 #2 #3 #4
Source CLK1 CLKSW CLKSW CLKSW
Frequency - - - -
Configuration GE - 1 1 1
O2 1 0 0 0
signals O39 - 0 0 0
O40 - 0 - -
O30 - - 0 0
S0 - - 1 -
EM - - - 1
Two kinds of configuration signals are observed. Selecting signals (like O2) lead to
propagating one clock or another, depending on their values. Enabling signals (like GE,
O39, O40, O30, S0, EM) must be set to a constant value in order to lead to a correct
clock behavior. Among these, some signals are reviewed by the designer as reset (like O39
and O40), and must be constrained as such.
Using the full table of modes and the clock network schematics, the user would be
able to select one or multiple modes, hence generating the constraints for a correct and
complete design setup.
3.6 Conclusion
The main contribution of this chapter is a fully automatic flow to formally analyze clock
networks. We provide algorithms to detect the major components of a clock network and
infer its functional modes. This work is derived from a broader work in the context of the
verification of so-called generated clocks in the Synopsys Design Constraints (SDC) format,
which is currently undergoing a provisional patent application by Synopsys Inc. [74].
*SYNOPSYS CONFIDENTIAL*
Using these modes and a structural analysis of the netlist, a functional overview of
the whole clock network is reported. Thanks to the schematic overview and the table
of modes, designers may notice false or redundant logic, and optimize the clock network.
Verification engineers can then use these results to provide the complete and realistic setup
to the design, and proceed with further formal verification checks. Also, these reports can
be made available as a functional specification of the clock network.
CHAPTER 4
4.1 Context
39
40 AVOIDING STATE-SPACE EXPLOSION
on just a small but relevant part of the design: the synchronizer boundary. However, this
approach comes with the following issues:
• First, synchronization protocols are often split between modules, so the synchronizer
boundaries are not contained in a small hierarchical module;
• Then, the verification engineer needs to have a very good understanding of the
underlying design, which is not realistic for large RTL models;
• Also, strong time-to-market constraints do not allow a manual labor-intensive se-
lection of appropriate abstractions for each property;
• Finally, even with such a high manual effort, a conclusive result cannot be guaran-
teed.
This abstraction of the design space uses a technique called localization reduction which
has been pioneered by Kurshan in the 1980s (and eventually published in 1994 [48]). It
was also considered by Clarke et al. [20, 21] in the context of over-approximating abstracti-
ons defined through state-space partitioning. Using counter-example guided abstraction
refinements (CEGAR), successive localization abstractions lead to an identification of the
boundaries that are relevant to the property logic. The works by Andraus et al. [4, 5]
propose a similar approach for data-paths in hardware designs. Their abstractions are
obtained by replacing data-path components by uninterpreted functions which, in turn,
also requires a more powerful model checker based on SMT. To verify CDC protocols, Li
and Kwok [52] described a flow using abstraction refinement along with synthesis to prove
some properties, but the underlying techniques were not explained in detail. After run-
ning their flow, properties that are still inconclusive are promoted to the top-level, which
require the user to use another methodology to proceed with the formal verification.
In the context of CDC verification, other approaches have been recently researched.
Burns et al. [15] proposed a novel verification flow using xMAS models. While the verifica-
tion of the synchronizer protocol is correct, the user needs to define its boundaries, which
is not scalable. Similarly, Kebaili [43] proposed to push some workload from the functio-
nal to the structural checks: the main control signals of the synchronizer are structurally
detected, and this information is used to create more precise assertions. The properties
to be verified would only rely on these control signals (with a handshake-based protocol),
hence avoiding the state-space explosion caused by the design complexity in the data
path. However, it relies on the correct and exhaustive identification of a synchronizer
“meta-model”. To the best of our knowledge, the practical feasibility of this identification
has not been proved nor implemented in industrial CDC tools.
• Static signals:
Usually setup during an initialization phase, they remain constant during all exe-
cutions (e.g., scan or test enables).
• Quasi-static signals:
Their values change from time to time (e.g., for dynamically switching between
power modes), but they remain mostly constant. In our context, these signals can
be considered constant as we verify the design in a static mode.
• Protocol signals:
Their behaviors are dependent on the values of other signals.
For instance, in the industrial CPU subsystem presented in Section 3.5.3, quasi-static
configuration values are internally set by the software program which is expected to run on
the CPU. We observe that verification engineers often do not constrain such configuration
signals because it is not required outside of functional checks. Since the model checker
has no information about the validity of a software program, it exhaustively explores all
possible states and transitions. This often leads to a timeout, or in the best case spurious
results. By pruning these spurious behaviors from the state machine, realistic results
might actually be reached.
However, providing configuration information involves a complex and tedious work
from the verification engineer. Even after providing it, a model checker may still not
reach a conclusive result, due to the design complexity. Tackling the state-space explo-
sion then requires to consider both the complexity issue and the missing assumptions on
configuration signals. Our goal is then to automate both the boundary identification and
the configuration debug, in order to reach realistic conclusive results.
the size of the reachable state space. The number of variables (as mentioned in 2.2.3)
is greatly decreased, hence reducing the complexity of the state space exploration for
the model checker. The new cone-of-influence of the property is called the focus, which
represents an abstraction of the complete state space.
The rationale for this notion of abstraction is that, in practice, all the relevant control
logic for a given CDC is implemented locally. Thus, properties requiring the correctness
of synchronizing protocols should have a small focus that suffices to either prove the
property or to reveal bugs.
Figure 4.1 illustrates the abstraction process for a correct, hard to prove, property.
By removing parts of the circuit from the cone-of-influence of the property (keeping only
A1 from COI), and leaving the boundary signals free, the set of reachable executions is
enlarged (i.e., it represents an over-approximation). As a result, executions in which the
property can fail, initially unreachable, may become reachable (A1 ). The goal is to find
the ideal abstraction which prunes away a sufficient part of the design (Asuff ) without
spuriously making any such execution reachable.
A3 A2
ERROR
A1
COI
structural heuristics identify relevant signals for the user to review. Either new assump-
tions are made, or the abstraction is refined, or the counter-example is concretizable, i.e.
also valid for the full design, and the flow terminates with Result “fail”. In contrast to
fully automatic CEGAR approaches, the user here influences the refinement process. We
therefore call our approach User-aided CEGAR (UCEGAR). Figure 4.2 gives an overview
on the semi-automatic algorithm in the context of the overall methodology.
Abstraction
refinement
χ α
D#
User
Review Model
concretizable Checking
Jw
FAIL CEx ω Abstract
PROOF
Analysis CEx
The set Cut contains cut-points; Ref represents the signals to be refined; χuser is the list
of assumptions that the user originally defined. We assume that constant propagation is
already processed, and signal classification returns the set Adata,φ .
Starting from the property output signal, Function refine runs the traversal defined in
Algorithm ?? to identify the first cut-points (line 4). Function abstract creates a duplicate
design from D where the type of these cut-points is replaced by Tin (line 5). The resulting
design model D# is our focus, on which we run the formal check (line 6). In the first
round, only χuser is considered in the list of assumptions χall that is given to the the model
checker. As the abstraction is an over-approximation, a proof which is valid for D# will
also be valid for D, and is immediately reported (line 8).
*SYNOPSYS CONFIDENTIAL*
The failure is concretizable on the whole design when no signal is to be refined, and the
user does not add any new assumption (line 24); a failure is then immediately reported.
Otherwise, new assumptions χadd are added to the set of user assumptions χuser , and will
be considered in the next rounds (for this property and the next ones).
*SYNOPSYS CONFIDENTIAL*
our heuristics are combinational and structurally close to the CDC control logic, which
makes them easy for the user to review. Then, they do not over-constrain but ensure
that the design does not exhibit spurious behavior. On the other hand, cut-points are
conservative in the sense that they lead to an over-approximation that preserves all safety
properties.
This hardware design includes a FIFO similar to the one presented in Figure 1.11 on
page 9. To mimic a state-space explosion on the DATA_IN and WRITE paths of the
4.5. EXPERIMENTAL RESULTS 47
source domain, an FSM was implemented with a self-looping 128-bit counter, along with
some non-deterministic control logic. Also, the source and destination clocks are enabled
by sequential clock-gates, controlled by two independent non-deterministic inputs.
This design is parameterized by the width of the data being propagated, and by
the depth of the FIFO. By varying these two size parameters, we increase the design
complexity and analyze the corresponding performance of the methodology.
4.5.2.2 Results
Using the industrial CDC tool, three formal properties were extracted:
• A data stability property is created on signal DATA_CDC.
• Two coherency properties are extracted on the address buses after synchronization,
one on RD_PTR and one on WR_PTR. Indeed, the write and read pointers are sy-
nchronized with multi-flops, and should then follow Gray-encoding (see Section 1.2.3
on page 6).
A first observation is that the coherency properties are proved in less than a second
in all four schemes and variations of the design. This is not surprising considering that
the Gray-encoding implemented in this design does not depend on any non-deterministic
control logic. Henceforth, we will focus on the data stability property (see the results in
Table 4.1).
The standard scheme is not able to prove the property in all 35 variations of the design
(Column “Standard”). Using the simple CEGAR approach, the property is proved in all
variations within 4 to 15 minutes (Column “CEGAR”). Interestingly, the proof runtime is
stable when the FIFO depth is fixed and the data width increases. By looking at the last
abstraction exercised, we notice that DATA_IN is always abstracted out. Its value does
not depend on the source logic. Hence, heuristics from the proof engine inferred that the
proof does not depend on the data value, which makes the analysis as simple for 8 bits as
it is for 128 bits. Actually, even if the source logic of the data was greatly more complex,
the CEGAR result would be the same.
Along the UCEGAR run, the enables of the clock-gates are justified and proposed for
review to the user. Because having a non-deterministically enabled clock is not a realistic
design behavior, we decide to accept the assumption that these signals should be constant
ones. As a result, the stability property is solved in all 35 variations of the design within
10 seconds each (Column “UCEGAR”). Same as with simple CEGAR and contrary to
the standard scheme, the complexity of the data source logic is irrelevant for the proof.
Interestingly, even when applying the inferred enabling assumptions on the standard
scheme, not all properties can be solved (Column “Standard w/ assumptions”). Also in
this case, by comparing with column “UCEGAR”, we notice that the runtime is always
higher than when using both the inferred assumptions and CEGAR. This observation
along with Figure 4.3 points out the importance of using both abstraction and involving
the user in order to reach a conclusive result. Note that the workload on the user was
small: only two clock controls had to be assumed to be constant ones.
48 AVOIDING STATE-SPACE EXPLOSION
Standard
FIFO depth Data width Standard CEGAR UCEGAR
w/ assumptions
8 T/O 389 7 22
16 T/O 390 7 35
3 32 T/O 392 7 66
64 T/O 390 7 870
128 T/O 391 7 T/O
8 T/O 592 6 15
16 T/O 591 6 28
4 32 T/O 594 6 57
64 T/O 593 6 145
128 T/O 594 6 243
8 T/O 641 7 14
16 T/O 651 7 53
5 32 T/O 641 7 69
64 T/O 640 7 180
128 T/O 693 7 374
8 T/O 558 7 13
16 T/O 558 7 55
6 32 T/O 563 7 62
64 T/O 563 7 203
128 T/O 562 7 414
8 T/O 574 7 10
16 T/O 574 7 49
7 32 T/O 575 7 68
64 T/O 574 7 150
128 T/O 575 6 841
8 T/O 589 7 11
16 T/O 590 7 36
8 32 T/O 579 7 60
64 T/O 580 7 150
128 T/O 580 7 463
8 T/O 868 9 14
16 T/O 863 9 43
9 32 T/O 868 9 74
64 T/O 864 9 210
128 T/O 865 9 475
TOTAL PROVED 0 35 35 34
Table 4.1: Proof CPU runtime (in sec) on the asynchronous FIFO
4.5. EXPERIMENTAL RESULTS 49
1,000
TimeOut
900
CPU running time [sec]
800
700
Standard
600
CEGAR
500 Standard w/ assumptions
400 UCEGAR
300
200
100
0
23 24 25 26 27
Data width
The second case study is the industrial hardware design which was already introduced in
Section 3.5.3 (page 35). The setup of the clock and reset control module was made by
the verification engineers. It is configured for a structural CDC analysis: all clock multi-
plexers are constrained so that one single clock propagates to each register. Also, static
primary inputs were constrained to the value given in the design specification (subsystem
configuration). However, many configuration signals (such as clock-enable signals, or in-
ternal resets which are controlled by the CPU software) are not constrained. As explained
in Section 2.2.4 (page 24), all external resets are exercised to define a global initial state.
Since the design has a Globally Asynchronous Locally Synchronous (GALS) intent,
CDC signals are always synchronized in the destination module. Figure 3.6 on page 36
gives an overview of some synchronizations around the CPU. Data communication with
the CPU environment (the rest of the SoC) is synchronized by a customized FIFO with a
4-phase protocol based on the one described in Section 1.2.4 (page 8), with additional low
power and performance optimizations. Only one communication is shown in Figure 3.6,
among the ten in each direction. The figure also shows the communications with the clock
and reset controller, and the handshake with the low power management block. Note that
the CPU is one central module which, due to its complexity, is likely to cause a timeout
in the model checking algorithm when considered in its entirety.
Note that many synchronizers (mostly FIFO protocols) are split between modules
of this subsystem. Consequently, the boundaries of each synchronizer are not trivial to
define, and we cannot proceed in a module-by-module CDC analysis. Working at this
hierarchy level is then particularly relevant for us.
50 AVOIDING STATE-SPACE EXPLOSION
An industrial CDC tool is used to structurally analyze the industrial design. The struc-
tural analysis identified several thousand synchronizers, most of them multi-flops which
do not need a functional check. It also extracted 78 stability and 47 coherency proper-
ties. Each extracted property is verified in the same four schemes that were presented
previously.
# Fail 0 0 26 0 0
# Inconclusive 49 47 0 0 35
CPU time [min] 771 734 436 31 557
# Proof 11 42 42 47 11
Gray-enc.
# Fail 0 0 5 0 0
# Inconclusive 36 5 0 0 36
CPU time [min] 540 86 27 15 540
Table 4.2 shows the results for model checking the stability and Gray-encoding pro-
perties. Without any automatic refinement, the standard scheme can only prove 40 out
of 125 properties (Column “Standard”). After increasing the timeout limit to several
hours, the same results are obtained. Using automatic refinement (the CEGAR scheme),
33 more properties can be proved (Column “CEGAR”). Also, it should be noted that all
proofs from the standard scheme get confirmed by the CEGAR scheme. CEGAR proves
to be particularly efficient for proving Gray-encoding properties, as the encoding logic is
generally local to the synchronizer.
The most striking observation, however, is that during the first UCEGAR run, 40
setup assumptions are automatically inferred and are all easily accepted. These include
global interface enables (scan or test enables, internal configuration signals, ...), and also
internal software resets and clock-gate enables which were missing in the original setup.
It leads to 94 proved properties and provides counterexamples for the remaining 31. Note
that in those abstract counterexamples, many irrelevant signals are automatically hidden
using the justifiable subset, which makes debugging easier.
By reviewing them, we observe spurious behaviors in the handshakes: on each inter-
face, a global enable signal is toggling. Indeed, in some cases the WRITE or READ of the
FIFO represents an information coming from the CPU, and would depend on a software
execution. When these signals are abstracted out, they take random values which do not
follow the handshake protocol, hence creating a failure. After consulting the designers,
we decide to constrain these signals to a realistic behavior. Here, the worst case would be
4.6. CONCLUSION 51
to set them to ’1’ which would mean the CPU always transfers data. Twenty two Boolean
assumptions are then added to enable the handshakes. We stress the fact that no deep
design knowledge is needed during this process, and the assumptions represent a realistic
design behavior.
With these new assumptions, the second UCEGAR run is able to conclude all 125 pro-
perties correctly. Compared to the fully automatic approaches, the final UCEGAR proof
runtime is accelerated by more than 20x. In fact, the most difficult property concludes in
only 7 minutes.
Finally, the last column shows that having the proper assumptions is not sufficient
to get proofs; the efficiency of UCEGAR indeed relies on the combination of automatic
CEGAR and manual assumption classification.
Regarding the size of the abstractions: on the full design, some properties have a
cone-of-influence of more than 250,000 registers. Interestingly, our variant of CEGAR is
able to find sufficient abstractions containing only up to 200 registers. This ratio confirms
our assumption that only the local control logic has a real influence on the correctness of
a CDC property.
Overall, a relevant metric to score the different flows would be the total time spent by
the verification engineer starting with the design setup and ending with achieving con-
clusive results for all properties. It would allow us to conclude on the complexity and
usability of different methodologies, as for instance the manual extraction and constrai-
ning explained in Section 4.1.2. However, this time depends very much on the design
complexity, reuse, and user insight. Such an experiment would assume the availability of
two concurrent verification teams on the same design, an investment that could not be
made by our industrial partners.
*SYNOPSYS CONFIDENTIAL*
4.6 Conclusion
UCEGAR is a complete formal verification flow for conclusively proving or disproving
CDC assertions on industrial-scale SoC hardware designs. This work has been presented
in the industrial conference SNUG [44], in the international conference VLSI-SoC [67] and
was extended as a chapter for the book edition of the conference [68]. The key idea is
to use counter-example-guided abstraction refinement (CEGAR) as the algorithmic back-
end, while the user influences the course of the algorithm based on information extracted
from intermediate abstract counterexamples.
The efficiency of our approach had been demonstrated on an industrial SoC design
which was persistently difficult to verify: prior approaches required to manually extract
the cone-of-influence of the synchronizers, which resulted in a tedious (and costly) work
52 AVOIDING STATE-SPACE EXPLOSION
for verification engineers. In contrast, our new methodology allowed the full verification
without requiring any deep design knowledge. This very encouraging practical experience
suggests that we identified an interesting sweet-spot between automatic and deductive
verification of hardware designs. On the one hand, it is a rather easy manual task to accept
or reject assumptions that refer to single signals. On the other hand, this information can
be crucial to guide an otherwise automatic abstraction refinement process.
Another positive side-effect of our methodology is that it gradually results in a functi-
onal design setup. Note that most of our predefined assumptions do not depend on a
certain property but provide a general design setup. Therefore, they remain globally va-
lid. This does not only speed-up the overall CDC verification time, when assumptions
are reused while verifying multiple properties, it also helps further functional verification
steps in the VLSI flow.
CHAPTER 5
5.1 Context
53
54 FIXING UNDER-CONSTRAINED DESIGNS
All these properties use values at the current cycle in order to deduce a behavior at the
next cycle. When this deduced behavior is an input, the property is an assumption that
models the environment; in the example, properties (2) and (4) are assumptions. When
this deduced behavior is an output, the property is an assertion that has to be verified; in
the example, properties (1) and (3) are assertions. Using these four properties, a model
checker will be able to verify the functionality of the handshake.
Figure 2.3 on page 15 displays the state diagram of Block receiver. In the absence of the
two assume properties of the protocol, all transitions are possible. If the two assumptions
are added, the brown dotted transitions are removed. For instance, it becomes impossible
to directly reach S3 from S1.
If assumptions 2 and 4 are given to a formal engine, then assertions 1 and 3 are proven.
If only assumption 4 is given, then assertion 1 fails after reaching state S2 from S3.
Among all counter-examples that the formal engine can generate, three are shown
in Figure 5.2. Note that this is theoretical, as current formal engines only report one
counter-example (usually the one with the smallest sequential depth).
After debugging these counter-examples, the designer will infer that the request signal
is not following the protocol specification. The verification engineer will then have to
figure out how to write the missing assumption.
0 1 2 3 4
clk
req
(a)
ack
req
ack (b)
req
(c)
ack
and apply the two-players game procedure by which the automaton built for the environ-
ments elaborates a strategy (sequence of inputs) to prevent the system from satisfying
the specifications, a procedure that is known to time-out on large benchmarks (its time
complexity is cubic). As a matter of fact, no execution time is provided on the practical
results of the above references. We differ in the fact that we work on an already designed
system and a failing assertion, while the above consider no pre-existing design and a set
of properties.
Property mining techniques have been successful in generating formal properties from
a model, for verification purposes. It may be used to help identify non-trivial corner-cases
and improve the coverage for regression tests. In the GoldMine tool [35, 82] a large set
of patterns are simulated on a design, which generates thousands of traces for Boolean
signals. Property mining tools extract from the traces thousands of assertion candidates
which are model checked on the design to filter out spurious and failing assertions. Static
analysis helps improve the percentage of mined assertions that pass the model checking
step by computing the cones of influence.
Mining of properties built over arithmetic variables is reported in [8, 26]. The method
repeatedly invokes the Daikon tool [79] that returns relations over arithmetic variables at
a time step. Then the authors build temporal properties according to predefined patterns
based on the implication, next, until and alternating operators. The complexity is cubic
in the number of variables, and proportional to the length of the trace and the number
of considered implication antecedents, which limits the use of the tool to a limited set of
traced variables.
The work of Keng et al. [45, 46], like ours, addresses the hypothesis that an assertion
fails in a correct design due to missing assumptions, and aims at automatically suggesting
some assumptions to the designer as an aid to elaborate the missing one. Their flow
is based on two successive iterations: the first one generates multiple counter-examples
and multiple corrective assumptions on the design inputs. The second one prunes the
generated assumptions to reduce the number of those effectively returned to the user.
The technology underlying Keng’s method is entirely built over a SAT engine for the
56 FIXING UNDER-CONSTRAINED DESIGNS
generation and filtering of candidate assumptions, at the cost of processing time. The
generated assumptions are unit cycle, which cannot express temporal causality between
distinct signals, a must for assumptions on protocols.
Our main contribution is an efficient algorithm which automatically infers missing
assumptions in the context of a property failure. First, we generate multiple distinct
counter-examples which are guaranteed not to be just delayed one from the other by some
cycles (in contrast to the results discussed in [46]). Then, we extract common root causes
of the failure from the set of counter-examples, using mining techniques combined with a
structural analysis of the netlist. Finally, we generate realistic temporal assumptions for
the user to review.
With this approach, the three blocking conditions corresponding to the three CEx on
Figure 5.2 are:
(a) assume ! {! req; req; ! req; req}
5.2. GENERATION OF COUNTER-EXAMPLES 57
For instance, in CEx (a) and (b) of Figure 5.2, the value of req at cycle 3 is irrelevant
to the property, so both CEx exhibit the exact same failure. The corresponding blocking
condition will not block this value, hence avoiding to generate CEx (b):
one. Since the counter-examples are of finite length, an equivalent execution is eventually
generated without any stuttering. Thus, two executions ω and ω 0 are said to be equivalent
if they can be reduced to the same non-stuttering equivalent execution.
During the generation of counter-examples, we want to avoid generating many equi-
valent CEx because they do not reveal a new cause of failure. However, gathering a few
equivalent CEx gives significant information, as it implies that the stutter is irrelevant for
the failure. Let maxTD be the maximum temporal depth of the assumptions we plan to
generate. For instance, in the case of the handshake, all assumptions are 2-cycles deep.
Whenever maxTD CEx have been generated with the same stutter, we create a blocking
condition to preclude further executions with the same stutter. Assuming that the stutter
is detected at time point k, the blocking condition becomes1 :
n o
χω = ! L0 ; . . . ; Lk [+] ; . . . ; L|ω|−1)
|Ain |
d (Ai , t)
^
with Lt = Litω in
i=1
For instance in Figure 5.2, cycles 1 and 2 of CEx (c) are repeating cycle 1 of CEx (a).
The corresponding blocking condition ends up to become:
To improve the performance of the blocking condition in the formal check, a custom
property monitor is created. Each cycle to block is represented with a monitor state.
When the final state is reached, it means that the sequence of inputs has been identified.
Then, the monitor outputs false and the assumption is violated.
Figure 5.3 shows the example of a monitor blocking the CEx (a) of Figure 5.2. Here,
the states of the monitor are encoded in one-hot. At time t, when the inputs have the
expected value, ANDt is true (here, there is only one input: req). If the input sequence
has been identified up to time t, then SEQt is true. Whenever both SEQt−1 and ANDt
are true, then we continue to the next state. To consider the stuttering at time t, if both
SEQt and ANDt are true, then we stay in this same stuttering state.
SEQINIT
zero SEQt0 STUTt1 SEQt1
init:1 ANDt0
init:0
ANDt1
init:0
ANDt2 αχω
1
In PSL, α[+] denotes one or more repetitions of α.
5.3. EXTRACTION OF ASSUMPTIONS 59
5.2.3 Algorithm
The flow which generates multiple CEx is presented in Algorithm 1. During each round,
the model checking engine checks the assertion with a set of assumptions χall . If a CEx ω
is found, the blocking condition χω is created on the justified CEx. Then, the next
round starts with an updated set of assumptions. The algorithm terminates either when
assertion φ is proved on M, or when the number of CEx reaches a predefined limit
maxLimit. Then, we obtain a set of CEx W.
Function check(M, φ, χall ) calls a formal engine on the design model M to verify an
assertion φ with given assumptions χall . It returns a CEx if the assertion fails.
Function justify(ω) extends Jω,φ on all signals and time points of an execution ω. It
returns a justified execution, as defined in Section 3.3.1 (page 31).
Function disable(ω, W) returns the blocking condition corresponding to a CEx ω, after
considering stuttering executions in W.
Note that when a proof is obtained (lines 5-6 of Algorithm 1), it does not mean that
we reached a corrected model, it only means that no more CEx can be generated.
Note that, in this section, we work on signals which are in the same clock domain.
Indeed, all the protocol assumptions that we are aware of in modern design only refer to
one single clock. Thus, the missing assumptions we are interested in contain signals from
one single domain.
TC ⊂ {TC : A∗ → P}
Here, we limit our scope to the set of templates TC described in Column “Assumption
Template” of Table 5.1. This scope is motivated by the handshake specification, and
corresponds to the common templates of property mining.
For brevity, Table 5.1 only shows the positive occurrences of the parameters. The
table we use in practice holds all combinations. For instance, the second template of the
table can be instantiated as:
• λα1 α2 . always α1 → α2
• λα1 α2 . always α1 → ¬α2
• λα1 α2 . always ¬α1 → α2
• λα1 α2 . always ¬α1 → ¬α2
In contrast to the usual property mining which exhibits circuit properties that the
user is looking for, the kind of mining we are performing on CEx exhibits behaviors that
should be forbidden. As a consequence, we wish to find in the CEx some behaviors to be
5.3. EXTRACTION OF ASSUMPTIONS 61
negated by the missing assumption. In other words, the type of property we mine are the
negations of the templates of Table 5.1.
We therefore give a 1-to-1 mapping between each assumption template TC of Table 5.1
and a mining template TM under the format given in Table 5.2, such that:
Mining Template TM
λα1 . eventually ¬α1
λα1 α2 . eventually α1 && ¬α2
λα1 α2 . eventually α1 && next ¬α2
λα1 α2 α3 . eventually (α1 && α2 ) && ¬α3
λα1 α2 α3 . eventually (α1 && α2 ) && next ¬α3
λα1 α2 α3 . eventually (α1 && next α2 ) && next ¬α3
For instance, all counter-examples of Figure 5.2 from Section 5.1.2 exhibit:
Since we are checking for behaviors relevant to the assertion, our property mining
runs on the list of justified counter-examples W. After mining for a behavior following
template TM ∈ TM on CEx ω ∈ W, we obtain a list PTωM of mined properties:
From this set of mined properties, we infer the set of potentially missing assumptions
PTωCusing the 1-to-1 correspondence from TM to TC .
In our context, we are aiming at modeling an external module connected to the inputs
and outputs of our design. We are thus searching for missing assumptions on inputs, and
not on outputs. If a generated assumption restricts the behavior of an output, then it
is discarded. For instance, “α1 → next α2 ” is only valid if α2 is an input. Using the
reasoning explained in Section 5.1.2, we specify, for each template, which parameter needs
to be an input (see Column 2 of Table 5.1). In effect, for the restricted list of Table 5.1,
the last parameter of a template function should always be an input.
5.3.2.2 Uniqueness
Since we assume that all the generated CEx show the same erroneous behavior, the root
cause behavior can be found in every CEx. Thus, we only keep the assumptions that are
common to all the CEx:
PTωC
\ [
X =
ω∈W TC ∈TC
If we assume multiple missing assumptions, then this criterion can be easily weakened by
replacing the intersection by a threshold of occurrences, e.g., properties occurring in at
least 50% of the generated CEx.
5.3.2.4 Implication
5.3.2.5 Triviality
It may happen that the user assumptions imply a generated assumption: χ ⊆ χuser .
Then, the generated assumption is considered trivially true and is discarded from X . The
same deduction applies to all assumptions χ0 ⊆ χ which are identified with the set Impl.
Conversely, if an assumption χ is not considered trivial, then all assumptions χ0 which
imply χ are also not trivial. Note that we cannot use a SAT-solver as the assumptions
are temporal. We use model checking bounded with the highest sequential depth of the
assumptions maxTD.
5.3.2.6 Consistency
Each generated assumption is checked for consistency with the user-defined assumptions,
and the inconsistent ones are discarded from the set X . An inconsistent assumption is
defined by: (χuser ∧ χ) ≡ false.
5.3.2.7 Vacuity
A generated assumption may over-constrain the model, which leads to a vacuous proof of
the assertion φ that the user wants to check. For that purpose, we verify that all signals
directly involved in the assertion can toggle, using the assertion:
assert eventually (α = next ¬α)
Also, if the assertion is formulated with an implication, we check if the left-hand side α1
is not always false, and if the right-hand side α2 is not always true:
φ = assert always α1 → α2
assert eventually α1
assert eventually ! α2
Assumptions complying with the vacuity criterion are discarded from the set X . The same
action holds for all generated assumptions which imply a vacuous assumption. Conversely,
if an assumption is not vacuous, then all implied assumptions are also not vacuous.
64 FIXING UNDER-CONSTRAINED DESIGNS
After applying the above filters on X , the assertion φ is checked on the model with each
remaining assumption χ.
M|χ∧ χuser φ
This step occurs at the end of all filters because it is expected to be the most expensive. If
the assertion is proved, then assumption χ stands out as a very good candidate to be the
missing one, and should be reported to the user with a high priority. The same conclusion
holds for all assumptions which imply χ.
To order the assumptions by priority, they are labeled with a quality metric µ. If the
assertion is non-vacuously proved, then this assumption gets the highest value. If the
assertion is still failing, the assumption’s µ is the relative augmentation of the sequential
depth of the new CEx ω1 (obtained when χ∧ χuser is considered) with respect to the original
CEx ω0 (when only χuser is considered).
µ : X −→ [0, 1]
(
1 if M|χ∧ χuser φ
χ 7→ |ω1 |−|ω0 |
|ω1 |
otherwise
It is possible that none of the generated assumptions leads to a proof. Several ex-
planations exist: either our list of templates is not exhaustive, or a second assumption
is missing, or the design exhibits a bug. In the last case, the assertion should always
fail. However, if the depth of the new failure is greater than the original one, then it was
more difficult for the model checker to find it. Thus, we infer that this assumption is
somehow relevant – at least more than if the two CEx were identical – and can be helpful
in debugging the design.
After ranking all assumptions according to their metric µ, they are reported to the
user for review. The implication map is also used in the reporting, as an aid to the
user. For usability purposes, all assumptions with a quality metric of 0 (or lower than a
user-defined value) are discarded from the set X . In case the user discards all proposed
assumptions, because they don’t fit the specification, he may deduce that there is a real
design bug. The list of justified counter-examples and the mined properties – revealing a
potential root cause of the failure – will then contribute greatly to debugging.
1
Blocking Model
Condition Checking
FAIL
PROOF
CEx
2
User
Review Property Mining Templates
TM
}
Input Condition
Templated
Uniqueness synta
Assumptions ct
Common Cause chec ical
k
TC
Implication
}
Triviality
Consistency mo d
Filtering el
Vacuity chec
king
Assumption Property Check
Proposal
5.4.1 Handshake
As a proof of concept, we run our flow on the handshake described in Section 5.1.2. We
use the model checker ABC [14] with the engines PDR [27] and BMC3 [10] in parallel.
The CPU is an Intel Xeon E5-2640 v4 at 2.40GHz with 250GB of memory.
Algorithm 1 generates 10 counter-examples in 2 seconds. The first two are CEx (a)
and (c) from Figure 5.2. 77 assumptions are then extracted from these 10 CEx, and
filtered with the syntactical criteria: input condition, uniqueness, common cause. The
66 FIXING UNDER-CONSTRAINED DESIGNS
After automatically removing inconsistent assumptions and the ones leading to a spu-
rious proof, only four assumptions are left for the user to review. Note that the full flow
(CEx generation, property mining and filtering) completes within 8 seconds.
Actually, all four assumptions lead to a correct handshake protocol, but some are
more restrictive than others. Indeed, the implication set shows that assumption 5 implies
assumptions 9 and 13. We notice that assumptions 5 and 13 force the protocol to never be
idle: as soon as the acknowledge is falling, a new request rises. Assumption 11 describes a
behavior similar to Assumption 9, but the control logic on the sender side is combinational
rather than sequential. As they are all realistic, we cannot automatically infer which of
the four assumptions is the missing one. Only a verification engineer, using a design
specification, can identify Assumption 9 as the correct one.
equivalent, as are the receivers. The interface between each sender and the buffer uses
a handshake protocol (these signals are prefixed with “src_”). The interface between
each sender and the buffer also uses a handshake protocol (these signals are prefixed with
“dst_”).
Receiver 1 Receiver 0
src_data[0] dst_data[0]
Sender 0
src_data[1] dst_data[1]
FIFO
Sender 1
src_req[1] dst_req[1]
src_ack[1] fifo_full fifo_empty dst_ack[1]
Consequently, the formal specification of the GenBuf control interface consists of four
sets of handshake properties. For instance, the handshake between the GenBuf and
Receiver 0 is specified as:
(R0.1) assume always (! dst_req[0]) → next ! dst_ack[0]
(R0.2) assert always (dst_req[0] && ! dst_ack[0]) → next dst_req[0]
(R0.3) assume always (dst_req[0] && dst_ack[0]) → next dst_ack[0]
(R0.4) assert always dst_ack[0] → next ! dst_req[0]
The FIFO only reads one data, and writes another data. Then, the GenBuf only
receives data from one sender at a time, and sends data to one receiver at a time. Proper-
ties 5 and 6 assert the mutual exclusion between the acknowledgments to the senders, and
between the requests to the receivers. To complete the formal specification, properties 7
and 8 assert that the internal FIFO cannot overflow nor underflow:
(P5) assert always ! (src_ack[0] && src_ack[1])
(P6) assert always ! (dst_req[0] && dst_req[1])
(P7) assert always fifo_full → next src_ack = 00
(P8) assert always fifo_empty → next dst_req = 00
When all 8 assumptions are given (2 assumptions for each sender and each receiver),
the 12 asserted properties are proved (2 assertions per sender and receiver, plus assertions
(P5) to (P8)). When removing assumption (R0.1), then assertion (R0.4) fails. The
following experiment will aim at inferring this removed assumption.
68 FIXING UNDER-CONSTRAINED DESIGNS
In this experiment, we limit to 10 the number of CEx to be generated (setting the maxLi-
mit parameter in Algorithm 1). The CEx generation step completes in 4 seconds. After
mining assumptions from the CEx, Table 5.4 shows how many assumptions are discarded
by each step of the flow, and how many assumptions remain.
We observe that, among the 1413 mined assumptions, 84% are discarded within a
second by the first 3 syntactical criteria. Among the remaining 232 assumptions, 56 are
neither inconsistent nor vacuous, and only 6 finally lead to a proof of the assertion. Note
that 26 of these checks were not run, as their result was inferred from the implication
criteria. Since we get some non-vacuous proofs, we discard the assumptions with a quality
metric less than 1 (when the assertion fails). So, from the 1413 mined assumptions, only
6 are proposed to the user for review (see Table 5.5).
Assumption Implied by
1. ! dst_req[0] → next ! dst_ack[0] -
2. ! dst_req[0] → ! dst_ack[0] -
3. (! dst_req[0] && next ! dst_req[0]) → next ! dst_ack[0] 1
4. (! dst_req[0] && next ! dst_req[1]) → next ! dst_ack[0] 1
5. (! dst_req[0] && ! dst_req[1]) → ! dst_ack[0] 2
6. (! dst_req[1] && next ! dst_req[0]) → next ! dst_ack[0] 2
Assumption (1) is the one we removed from the initial problem. Assumption (2)
is similar, but for a combinational control logic on the receiver side. Assumption (3)
corresponds to a more specific variation of Assumption (1). The last three assumptions
involve the request signals of both receivers. These assumptions may look spurious, but
are actually more specific variations of the first three. Indeed, Assumption (1) implies
5.4. EXPERIMENTAL RESULTS 69
Assumption (4), and Assumption (2) implies Assumptions (5) and (6). Consequently, a
verification engineer can easily review these assumptions and accept the one corresponding
to the design specification.
On the performance side, the last two columns of Table 5.4 show that the formal
checks (consistency and vacuity) take most of the runtime. In a previous version of the
tool, we had assertion check performed before vacuity: model checking needed 418 seconds
for running these filters on 190 assumptions, hence 3.3x more than with the current filter
order. This motivates the importance of using lightweight filtering criteria in order to
prune the set of assumptions from spurious ones.
Under the hypothesis that only one assumption is missing, all the counter-examples that
are generated exhibit the faulty behavior. Thus, using only one CEx (|W| = 1), our
assumption extraction will propose the correct missing assumption to the user, but it will
be reported among many others. The advantage of using several CEx is to consider the
common behavior of all of them, which also contains the faulty behavior. The assumptions
proposed to the user, by application of the formula of Section 5.3.2.3, is therefore the
intersection of the generated assumption sets for each CEx (and decreases when increasing
|W|). How many CEx should be generated in order to optimize performance, and to avoid
proposing too many assumptions to the user?
Figure 5.6: GenBuf runtime evolution with the CEx generation maxLimit
Figure 5.6 shows the evolution of the runtime and the number of returned assumptions,
depending on the number of CEx being generated (maxLimit in Algorithm 1). The upper
blue-filled part represents the runtime of the CEx generation. The lower green-filled
part represents the runtime of the assumption extraction. The orange line represents the
70 FIXING UNDER-CONSTRAINED DESIGNS
number of generated assumptions after the “Common cause” filter. Note that, whatever
the number of CEx, the same final 6 assumptions were eventually returned.
The total runtime is the sum of the CEx generation runtime (box 1 in Figure 5.4) and
the extraction runtime (box 2 in Figure 5.4). It is optimal for 10 CEx in Figure 5.6. As
expected, the orange line shows that increasing the number of CEx decreases the number
of filtered common assumptions. However, this lower number does not compensates for
the CEx generation time: between 10 and 100 CEx, the total runtime is increased by
21%, while only 1 assumption is removed. The same experiments have been repeated by
removing different assumptions one by one from the GenBuf specification, and they show
identical results. In all cases, a very small number of CEx (between 5 and 10) suffices to
produce the optimal result and running time.
Note that the filtering of one assumption is independent from the next one. While
the extraction runtime is high, it is highly parallelizable. Ideally, all assumptions can be
filtered in parallel, which would reduce the total runtime to just a few seconds.
To exercise the scalability of our approach, we run the same experiment as before with
various numbers of senders and receivers. The effect will be to increase the complexity
of the design and of the formal specification. For the results to be comparable, the same
assumption (R0.1) on receiver 0 is removed in each variation of the design.
As we can see in Table 5.6, increasing the complexity of the design generally increases
both the number of generated assumptions and the runtime. Observe that increasing
the number of senders from 2 to 5 only increases the runtime from 190 to 768 seconds.
However, increasing the number of receivers multiplies the runtime by 92. As the remo-
ved assumption is on the receiver side, and the request logic for all receivers is strongly
connected (via the round-robin), this complexity increase is expected.
In all obtained results, the correct assumption was generated in less than 40 minutes.
This exhibits the scalability of our approach with an increasing complexity of the design
and of its formal specification.
5.4. EXPERIMENTAL RESULTS 71
In all the experiments above, we assumed that only one assumption was missing. In
this new experiment, we remove not one but two assumptions from the specification, and
see how the filtering criteria can be tuned to still generate them. Among the possible
benchmarks, we choose the GenBuf because it has the most complex specification due
to the interaction between the protocols of the various modules. We consider the simple
GenBuf with 2 senders and 2 receivers, and we remove the following assumptions:
• assume always (src_req[1] && ! src_ack[1]) → next src_req[1]
• assume always (! dst_req[0]) → next ! dst_ack[0]
10 counter-examples are generated from the failures of multiple different properties.
Some CEx exhibit a failure which would be solved by one of the missing assumptions.
Other failures occurred because both assumptions were removed.
As a proof of concept, the filtering criteria are tuned to better solve this problem. As
mentioned in Section 5.3.2.3, the Common cause criterion is weakened by filtering out
assumptions which occur in less than 50% of the CEx (instead of 100% in all previous
experiments). Since multiple properties were originally failing, we also consider that an
assumption passes the Assertion check if it leads to prove all these properties.
As shown in Table 5.7, from the original 1898 assumptions, only 56 reach the final
filter. A first observation is that, as expected, no generated assumption achieves to
prove all properties. They are then ordered depending on the number of properties that
were proved, and depending on the quality metric µ. Actually, most of the generated
assumptions had no effect on the failures, and led to a metric of 0. In the end, only 16
assumptions led to a proof of some properties or to a positive quality metric. Among
these, the two removed assumptions are identified.
Note that the overall runtime is lower than in Table 5.4. This is explained because
most of the results were failures, hence they were easier to find for the model checker.
72 FIXING UNDER-CONSTRAINED DESIGNS
This experiment exhibits the adaptability of the filtering criteria to solve variations of
the problem with multiple missing assumptions.
The practical question that remains to be answered is: how to help the user find how
many and which assumptions need to be included in the specification? We do not have
a definitive answer to propose, but we can provide some hints. It is easy to know that
more than one assumption is missing since adding only one does not prove all assertions.
We identified two main situations:
• The missing assumptions involve signals that are independent. For instance, a same
assumption is forgotten on all the instances of a module. This case is relatively
simple, because a divide-and-conquer strategy can be applied to come back to the
single missing assumption case.
• The missing assumptions are related and are needed together for the proof of one
of more properties. This is a more complex case in which grouping strategies such
as implication can be used.
AMBA AXI. We focus on one of the bus channels, the “read data” channel. This channel
uses a variation of the handshake protocol, defined as follows:
1. assume always (ready && ! valid) → next ready
2. assert always (! ready && valid) → next valid
3. assume always (ready && valid) → next ! ready
4. assert always (ready && valid) → next ! valid
Two buses are studied:
• BUS1: the system control interface, by which the system reads data from the envi-
ronment.
• BUS2: the system interface by which the interruptions are transmitted to the CPU.
Due to the size of the design, we need some abstraction to prune the state-space,
for the model checker to give a conclusive result. We use our UCEGAR algorithm, as
described in Chapter 4. Using the two assumptions (Properties 1 and 3 above), the model
checker proves both assertions (Properties 2 and 4).
On each bus, we perform two experiments:
• We remove Property (3). Property (4) fails.
• We remove a reset assumption. Property (2) fails.
We then try to infer the missing assumption on the four experiments, with a limit of
10 CEx to be generated. The obtained results are given in Table 5.8.
Observe that the number of generated assumptions, and thus the execution run time,
are significantly higher for BUS1 than for BUS2. This is due to the cone of influence of
the signals named in the properties after abstraction. The numbers are given by columns
# Latches and # Signals in Table 5.8. Initially, for experiments 1 and 3, these cones of
influence were: 15,658 nets and 2,054 latches for Bus1; 343,769 nets and 42,134 latches
for Bus2. The position of a same block in the design greatly influences the results, which
was expected.
It is interesting to note that, for a given assertion, the final number of returned as-
sumptions bears some dependency on the initial number of generated assumptions. For
74 FIXING UNDER-CONSTRAINED DESIGNS
experiments 1 and 3, we go from 813 down to 15 for Bus1, from 68 down to 5 for Bus2.
After analysis of the returned assumptions, we arrive at the same conclusions as for the
GenBuf: some assumptions are assuming a sequential control logic, others a combinatio-
nal logic; some imply the missing assumption, and this missing assumption is among the
final set. For experiments 2 and 4, the filtering iterations are specially effective, since a
single missing assumption is returned to the user, and it is the exact assumption on the
reset. Finally, these results confirm that the CPU run time is proportional to the initial
number of generated assumptions.
If we compare the initial numbers of generated assumptions in Table 5.6 (GenBuf)
and Table 5.8, the much larger numbers of Table 5.6 are an indication that the protocols
in the small GenBuf are much more complex. This is why we did not perform the removal
of more than one assumption on this industrial design.
This experiment exhibits the scalability of our approach: even for a large design,
generating multiple CEx and filtering the set of assumptions down to a modest returned
set takes less than half an hour.
5.5 Conclusion
Our main contribution is a novel and efficient flow to automatically produce missing
assumptions on a design environment. This work has been presented in the internatio-
nal conference MEMOCODE [65] and was extended for a journal publication in ACM
TECS [66].
As a first step, our flow generates multiple distinct counter-examples for the failing
assertion (counter-examples that are not equivalent after stuttering removal). We sho-
wed that a small number of counter-examples is enough to contain relevant and sufficient
information. The efficiency of this generation relies on the application of the structu-
ral justification method, which brings significant improvements compared to the cone of
influence analysis alone.
As a second step, the counter-examples are mined for properties which could reveal the
cause of the failure. Corresponding assumptions avoiding these behaviors are generated
in the form of multi-cycle temporal properties. To be useful to the verification engineer,
the returned assumptions must be few, relevant and realistic. We implemented a series of
filtering criteria, and showed their practical efficiency.
In case the failing assertion cannot be corrected by a single assumption, the generated
assumptions are ranked according to their influence on the formal check. Using both the
justified counter-examples and the proposed assumptions, a verification engineer is able
to debug the failure and provide correct assumptions on the design environment.
CHAPTER 6
Assertions Assumptions
Lightweight
Assumption
UCEGAR User review
user insight Extraction
FAIL
PROOF
CEx
design bug unrealistic
First, using the design specification, engineers are used to describe the behavior of some
primary inputs with formal assumptions. Then, our clock network analysis identifies all
potential behaviors of each internal clock (Chapter 3). The definition of these operation
modes, using formal assumptions, results in a realistic and complete setup of the clocks
in the design, which is a mandatory step for any formal verification flow.
The formal assertions to be verified are generated after performing a structural ana-
lysis of the design which detects CDC synchronization patterns. During the formal check
of these assertions, our user-guided counter-example abstraction refinement (UCEGAR)
tackles the state-space explosion problem using both a structural analysis and lightweight
feedback from the user (Chapter 4). The model checking of the CDC synchronizer pro-
perties then always provides a conclusive result: either guaranteeing the absence of bug,
75
76 CONCLUSION & OUTLOOK
6.2 Outlook
As a future work, all our techniques could be brought together into one unified flow. First,
the result of the clock network analysis could be considered as an input to the assertion
check. Indeed, while selecting one operation mode makes the clocks to behave determi-
nistically, the whole logic stays in the model. As seen with the constant propagation in
Chapter 4, even fully deterministic logic influences the complexity of formal verification.
By replacing the internal clocks with deterministic clock monitors, the clock network
would be pruned from the design, hence decreasing the formal verification runtime while
keeping its soundness.
Also, the UCEGAR flow could be coupled with the assumption extraction. More
precisely, after the design abstraction of the UCEGAR flow, multiple counter-examples
could be created and mined for information. On the one hand, this would provide more
insight on the relevant signals of the intermediary counter-examples. On the other hand,
it would lead to more precise assumptions than a cut-point or a stuck-at – which are the
ones currently proposed by UCEGAR.
In addition to this unification of all our methods, there are several directions in which
to improve them individually. *SYNOPSYS CONFIDENTIAL*
The long-term goal is to extend our methodology to many critical functional verifica-
tion steps in the VLSI flow. In particular, our UCEGAR flow appears to be very effective
when the logic that is relevant to the assertion is small, but its boundaries cannot be easily
identified. For instance, the same UCEGAR could be reused to functionally verify false
or multi-cycle paths. On the way, this would lead us to improve the UCEGAR heuristics
based on the type of assertion.
Multiple parts of our assumption extraction can also be improved. First, in generating
the counter examples, we could consider not only stuttering states, but also a stuttering
6.2. OUTLOOK 77
sequence of states (even though the added computing cost will need to be evaluated).
Another extension concerns the set of assumption templates. Industrial libraries of pro-
perties are much richer (for instance the ones described in OVL[3]). A systematic analysis
of such libraries, and prioritization of the assumptions to be applied after mining is yet
to be performed. Also, in view of obtaining fast results on large designs, a systematic pa-
rallelization of our algorithms should provide large performance improvements. Finally,
the property mining from counter-examples could be investigated in the scope of RTL
debugging.
78 CONCLUSION & OUTLOOK
APPENDIX A
Table of Notations
General
B Boolean set
N Naturals set
Netlist model
D Design graph
A Set of all signals
E Set of all edges
Type Type function
T Set of signal types
α Signal
Tzero Constant zero signal type
Tone Constant one signal type
Tin Primary input signal type
Tout Primary output signal type
Tnot Inverter signal type
Tand AND-gate signal type
Tor OR-gate signal type
Tseq Sequential signal type
A∗ Set of signals of type T∗
Aflop Set of flop output signals
Aflop,∗ Set of flop *-input signals
Alatch Set of latch output signals
Alatch,∗ Set of latch *-input signals
Amux Set of mux output signals
Amux,∗ Set of mux *-input signals
getClockOf Function from sequential output to CLK input
Pred Predecessor function in D
Succ Successor function in D
πα..α0 Structural path from α to α0
COIα Cone-of-influence of α
Functional model
M Moore machine
Σs State alphabet
79
80 TABLE OF NOTATIONS
*SYNOPSYS CONFIDENTIAL*
82
APPENDIX C
assign c l k s e l = SEL ? c l k d i v : c l k 1 ;
reg enLatch ;
always @( c l k s e l )
begin
i f (~ c l k s e l )
enLatch <= EN;
end
assign c l k g a t e = c l k s e l & enLatch ;
83
APPENDIX D
Derived: CLKCPU #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17
Source CLK1 CLK1 CLK1 CLK1 CLK1 CLK1 CLK1 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2
Frequency - - - - - - - - - - - - - - - - -
Configuration GE - - - 1 1 1 1 1 1 1 1 1 1 1 1 1 1
signals O2 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SY - - - 0 0 0 0 0 0 0 1 - - - - - -
CG - - - 0 0 0 0 - - - - 1 1 1 1 - -
EG - - - 0 0 0 0 1 1 1 1 - - - - 1 1
O6 - - - 0 0 0 0 - 1 - - 1 1 - - 1 -
O7 - - - 0 0 0 0 - - 1 1 - - 1 - - 1
O39 0 - - 0 0 0 0 0 0 0 0 - 0 0 0 0 -
O40 0 - - 0 0 0 0 - - - - - - - - - -
FT - - - - - 1 - 0 - - - - - - 0 - -
S2 - - - 0 0 - - 1 - - - - - - 1 - -
O32 - 1 - 0 - - 1 0 0 0 0 0 0 0 0 0 0
O38 0 - 1 - 1 - - 0 0 0 0 0 0 0 0 0 0
O3 0 0 0 - - - - 0 0 0 0 1 0 - - - 1
O117 0 0 0 - - - - - - - - - - - - - -
O1 0 0 0 - - - - - - - - - - - - - -
O30 - - - - - - - 0 0 0 0 0 0 0 0 0 0
S16 - - - - - - - 1 1 1 1 1 1 1 1 1 1
S0 - - - - - - - 1 - - 1 1 - - 1 - -
EM - - - - - - - - 1 1 - - 1 1 - 1 1
Derived: CLKCPU #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32
Source CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2 CLK2
Frequency 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2
Configuration GE 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
signals O2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SY 0 0 - - - - - - - - - - - 0 0
CG 1 1 1 1 1 1 1 1 - - - - - - -
EG - - - - - - - - 1 1 1 1 1 1 1
O6 1 - 1 1 1 - - - 1 1 1 - - - -
O7 - 1 - - - - - - - - - - 1 - -
O39 0 0 0 - - - 0 0 0 - - - - 0 0
O40 - - - - - - - - - - - - - - -
FT - - - - - 0 0 0 - - - 0 - 0 0
S2 - - - - - 1 1 1 - - - 1 - 1 1
O32 - - - - - 1 - - - - - - - 1 1
O38 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
O3 0 0 - 1 1 1 - - - 1 1 1 1 0 0
O117 - - - - - - - - - - - - - - -
O1 - - - - - - - - - - - - - - -
O30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
S16 - - - - - - - - - - - - - - -
S0 1 1 - 1 - - 1 - - - 1 - - - 1
EM - - 1 - 1 1 - 1 1 1 - 1 1 1 -
84
APPENDIX E
Given the templates of Table 5.1, the following table shows all potential implications. For
brevity, we consider α1 , α2 , α3 to be literals instead of signals. With α1 being the literal
for signal α, it can be read as either α or !α.
χ χ0
always α1 always ¬α1 → α2
always α2 always α1 → α2
always α1 always ¬α1 → next α2
always α2 always α1 → next α2
always α1 always (¬α1 && α2 ) → α3
always α2 always (α1 && ¬α2 ) → α3
always α3 always (α1 && α2 ) → α3
always α1 always (¬α1 && α2 ) → next α3
always α2 always (α1 && ¬α2 ) → next α3
always α3 always (α1 && α2 ) → next α3
always α1 always (¬α1 && next α2 ) → next α3
always α2 always (α1 && next ¬α2 ) → next α3
always α3 always (α1 && next α2 ) → next α3
always α1 → α2 always (α1 && ¬α2 ) → α3
always α1 → α3 always (α1 && α2 ) → α3
always α2 → α3 always (α1 && α2 ) → α3
always α1 → α2 always (α1 && ¬α2 ) → next α3
always α2 → α3 always (α1 && next α2 ) → next α3
always α1 → next α3 always (α1 && α2 ) → next α3
always α2 → next α3 always (α1 && α2 ) → next α3
always α1 → next α3 always (α1 && next α2 ) → next α3
always α1 → next α2 always (α1 && next ¬α2 ) → next α3
85
86 IMPLICATIONS BETWEEN ASSUMPTION TEMPLATES
Bibliography
[3] Accellera. Open Verification Library (OVL), Accessed April 2014. http://
accellera.org/activities/working-groups/ovl.
[6] C. Baier and J.-P. Katoen. Principles of Model Checking. The MIT Press, 2008.
[7] L. Benini and G. d. Micheli. Dynamic Power Management: Design Techniques and
CAD Tools. Kluwer Academic Publishers, Norwell, MA, USA, 1998.
[9] A. Biere, C. Artho, and V. Schuppan. Liveness checking as safety checking. Electronic
Notes in Theoretical Computer Science, 66(2):160 – 177, 2002. FMICS.
[10] A. Biere, A. Cimatti, E. M. Clarke, and Y. Zhu. Symbolic model checking without
BDDs. In Tools and Algorithms for Construction and Analysis of Systems (TACAS),
pages 193–207, 1999.
[13] M. Boule and Z. Zilic. Incorporating efficient assertion checkers into hardware emu-
lation. In International Conference on Computer Design, pages 221–228, Oct 2005.
87
88 BIBLIOGRAPHY
[15] F. Burns, D. Sokolov, and A. Yakovlev. GALS synthesis and verification for xMAS
models. In DATE, 2015.
[16] D. Bustan, D. Fisman, and J. Havlicek. Automata construction for psl. Technical
report, Freescale Semiconductor, Inc, 2005.
[17] T. Chaney and C. Molnar. Anomalous behavior of synchronizer and arbiter circuits.
IEEE Transactions on Computers, C-22(4):421–422, April 1973.
[21] E. Clarke, O. Grumberg, and D. E. Long. Model checking and abstraction. In ACM,
1991.
[24] E. M. Clarke, Jr., O. Grumberg, and D. A. Peled. Model Checking. MIT Press,
Cambridge, MA, USA, 1999.
[25] C. E. Cummings. Clock domain crossing design & verification techniques using sys-
temverilog. In SNUG Boston, MA, 2008.
[30] S. V. Gheorghita and R. Grigore. Constructing checkers from PSL properties. Control
Systems and Computer Science, 2:757–762, 2005.
[31] R. Ginosar. Fourteen ways to fool your synchronizer. In Asynchronous Circuits and
Systems, pages 89–96, May 2003.
[35] S. Hertz, D. Sheridan, and S. Vasudevan. Mining hardware assertions with guidance
from static analysis. IEEE Trans. on CAD, 32(6):952–965, 2013.
[36] IEEE. IEEE standard for verilog hardware description language. IEEE Std 1364-
2005, pages 1–560, 2006.
[37] IEEE. IEEE standard vhdl language reference manual. IEEE Std 1076-2008, pages
1–620, Jan 2009.
[38] IEEE. IEEE standard for property specification language (PSL). IEEE Std 1850-
2010), pages 1–182, April 2010.
[39] IEEE. IEEE standard for systemverilog–unified hardware design, specification, and
verification language. IEEE Std 1800-2012, pages 1–1315, Feb 2013.
[40] B. Jobstmann, S. Galler, M. Weiglhofer, and R. Bloem. Anzu: A Tool for Property
Synthesis, pages 258–262. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
[42] N. Karimi and K. Chakrabarty. Detection, diagnosis, and recovery from clock-domain
crossing failures in multiclock SOCs. Computer-Aided Design of Integrated Circuits
and Systems, 32(9):1395–1408, Sept 2013.
[43] M. Kebaili, J.-C. Brignone, and K. Morin-Allory. Clock domain crossing formal
verification: a meta-model. In IEEE International High Level Design Validation and
Test Workshop (HLDVT), pages 136–141, Oct 2016.
90 BIBLIOGRAPHY
[44] M. Kebaili, G. Plassan, J.-C. Brignone, and J.-P. Binois. Conclusive formal verifica-
tion of clock domain crossings using spyglass-cdc. In SNUG France, June 2016. Best
paper award from the technical committee.
[46] B. Keng, E. Qin, A. Veneris, and B. Le. Automated debugging of missing assumpti-
ons. In Asia-Pacific DAC, pages 732–737. IEEE Computer Society, 2014.
[47] O. Kupferman and M. Y. Vardi. Model checking of safety properties. Formal Methods
in System Design, 19(3):291–314, Nov 2001.
[49] C. Kwok, V. Gupta, and T. Ly. Using assertion-based verification to verify clock
domain crossing signals. In Design and Verification Conference, pages 654–659, 2003.
[50] Leda CDC Documentation. Clock domain crossing. Online, Sept 2017. https:
//filebox.ece.vt.edu/~athanas/4514/ledadoc/html/pol_cdc.html.
[51] C. Leong, P. Machado, and et. al. Built-in clock domain crossing (CDC) test and
diagnosis in GALS systems. In Proc. DDECS 2010, pages 72–77, April 2010.
[52] B. Li and C.-K. Kwok. Automatic formal verification of clock domain crossing signals.
In ASP-DAC, pages 654–659, Jan 2009.
[53] W. Li, L. Dworkin, and S. A. Seshia. Mining assumptions for synthesis. In MEMO-
CODE, pages 43–50, 2011.
[55] M. Litterick. Full flow clock domain crossing - from source to si. In DVCON, 2016.
[56] A. Lozano and J. L. Balcázar. The complexity of graph problems for succinctly
represented graphs. In Graph-Theoretic Concepts in Computer Science, pages 277–
286, 1989.
[57] Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent Systems.
Springer-Verlag New York, Inc., New York, NY, USA, 1992.
[58] P. C. McGeer and R. K. Brayton. Efficient algorithms for computing the longest via-
ble path in a combinational network. In ACM/IEEE Design Automation Conference,
pages 561–567, June 1989.
[61] A. Mishchenko, N. Een, and R. Brayton. A toolbox for counter-example analysis and
optimization. In IWLS, 2013.
[69] A. Pnueli. The temporal logic of programs. In 18th Annual Symposium on Founda-
tions of Computer Science, pages 46–57, Oct 1977.
[71] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proceedings of the
16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL ’89, pages 179–190, New York, NY, USA, 1989. ACM.
[74] S. Sarwary, H.-J. Peter, G. Plassan, B. Chakrabarti, and M. Movahed. Formal clock
network analysis visualization, verification and generation, 2017. Provisional Patent
Application.
[76] A. P. Sistla. Safety, liveness and fairness in temporal logic. Formal Aspects of Com-
puting, 6(5):495–511, Sep 1994.
[79] University of Washington. The Daikon Invariant Detector, Accessed May, 2017.
http://plse.cs.washington.edu/daikon/.
[80] M. Y. Vardi. Alternating automata: Unifying truth and validity checking for temporal
logics, pages 191–206. Springer Berlin Heidelberg, Berlin, Heidelberg, 1997.
Résumé en Français
Introduction
Les systèmes numériques sont désormais présents massivement dans notre vie quotidienne.
Pour tout produit, le moindre problème fonctionnel peut causer des conséquences désas-
treuses sur le plan financier ou humain. Un système a ainsi besoin d’être validé sur de
multiples critères afin de détecter ses problèmes aussi tôt que possible. En particulier, et
contrairement aux logiciels, les circuits électroniques doivent être validés sur des modèles
virtuels, avant leur fabrication.
Le flot de conception d’un circuit microélectronique se déroule sur plusieurs niveaux
d’abstraction. A chaque étape, une technique de vérification est utilisée afin de détecter et
corriger les potentielles erreurs au plus tôt. A partir d’un langage de description de maté-
riel tel que le Verilog ou le VHDL, la simulation de traces d’exécutions est une technique
très utilisée, bien que ne garantissant pas l’absence de bogue dans le circuit. Lorsque le
circuit est utilisé dans un contexte où la sécurité est critique, et donc qu’aucun bogue
ne doit subsister, des méthodes formelles sont utilisées. Celles-ci modélisent le circuit
en équations mathématiques qui décrivent entièrement son comportement, et permettent
ainsi de garantir l’absence de bogue. Toutefois, la complexité de cette vérification for-
melle oblige les ingénieurs de vérification à ne l’appliquer que sur des tailles de circuits
modestes.
Toutefois, certains circuits très complexes sont aussi très utilisés, notamment dans
les objets connectés. Cette complexité vient en particulier de certaines optimisations de
performance et de consommation, qui consistent à adaptent la fréquence de l’horloge aux
tâches en cours. Plusieurs domaines d’horloges sont alors utilisés : des horloges rapides
pour les tâches demandant de la performance, et des horloges plus lentes pour les tâches
non critiques. Cela génère des milliers d’interconnexions entre les différentes horloges,
aussi appelées traversées de domaines d’horloges (ou CDC).
Typiquement, un CDC se manifeste par un circuit numérique entre deux éléments
séquentiels recevant des horloges déphasées. De nombreux problèmes sont soulevés par
les CDCs [25, 75] (métastabilité, incohérence des bus, corruption des données, valeurs
transitoires, . . . ), et les concepteurs des circuits doivent implémenter des structures par-
ticulières afin d’éviter tout bogue [25, 31] (flop cascadées, protocole poignée de main,
codage de Gray, . . . ). Les ingénieurs de vérification doivent alors vérifier que tous les
problèmes possibles aient été adressés.
Les principaux fournisseurs d’outils fournissent des outils d’analyse des CDC : Syn-
opsys SpyGlass CDC [77], Mentor Questa CDC [60], Real Intent Meridian CDC [73],. . . .
Une analyse structurelle du RTL leur permet de détecter les synchroniseurs, puis des
93
94 RÉSUMÉ EN FRANÇAIS
propriétés formelles correspondant aux protocoles sont générées [41, 43, 49, 54], e.g. en
PSL [38] ou SVA [39]. Ces propriétés peuvent alors être vérifiées par simulation ou par les
méthodes formelles. Toutefois, étant donné la nature rare des phénomènes propres aux
CDC, la vérification formelle est souvent préférée.
Le principal problème de la vérification de modèles utilisant les méthodes formelles
est qu’elle nécessite des temps trop élevés pour prouver les propriétés des systèmes multi-
horloges complexes. L’apport d’un expert est alors nécessaire afin de préciser les modèles
et d’ajuster la configuration du moteur formel, pour enfin atteindre un résultat conclusif
: une preuve, ou un échec des propriétés. Au lieu d’améliorer les moteurs formels, cette
thèse prend l’approche de simplifier le modèle, tout en répondant aux problématiques
suivantes :
• Configurer le modèle du système dans un mode réaliste ;
• Générer des hypothèses sur les protocoles modélisant l’environnement ;
• Contrer l’explosion de l’espace des états ;
• Analyser les contre-exemples.
αn mux0
α muxq acki ack
req mux1
reqi
guration des modes des horloges. Un simple exemple d’arbre d’horloge est donné sur la
Figure F.2. On retrouve dans un arbre d’horloge classique quatre types d’opérateurs :
CLK1 D Q out
0 clksel
clkgate
D Q 1
clkdiv E Q
D
CLK2
E
RST
CFG
SEL
EN
D Q
CLK1
0 CLKSEL
DIV/x CLKDIV CLKGATE
CLK2
2,4,6,8
1
CFG
SEL
EN
Abstraction
refinement
χ α
D#
User
Review Model
concretizable Checking
Jw
FAIL CEx ω Abstract
PROOF
Analysis CEx
à un comportement réaliste.
Le flot global de la génération de contre-exemple et de l’extraction de contrainte est
visible sur la Figure F.5. Sur un circuit industriel, cette technique a réussi à générer
de manière totalement automatique une contrainte manquante sur un protocole de type
AMBA, et ce dans un temps raisonnable.
1
Blocking Model
Condition Checking
FAIL
PROOF
CEx
2
User
Review Property Mining Templates
TM
}
Input Condition
Templated
Uniqueness synta
Assumptions ct
Common Cause chec ical
k
TC
Implication
}
Triviality
Consistency mo d
Filtering el
Vacuity chec
king
Assumption Property Check
Proposal
Conclusion
La vérification formelle des propriétés des systèmes multi-horloges est un problème bien
trop complexe pour être résolu par une vérification de modèle traditionnelle. Cette thèse
alors propose plusieurs solutions pratiques, directement utilisables par des ingénieurs de
vérification. Chaque problème soulevé dans le flot de vérification a été adressé par une
technique distincte : la configuration des horloges, la vérification de propriétés, le dé-
bogage de spécifications incomplètes.
Chacune s’intégrant dans le flot de vérification habituel, il en résulte un flot amélioré
(voir Figure F.6). La clé de ce flot est la combinaison de la rapidité de l’analyse structurelle
100 RÉSUMÉ EN FRANÇAIS
avec l’exhaustivité des méthodes formelles. Cette dualité est ainsi visible sur toutes les
techniques développées, en utilisant les formalismes du graphe structurel et de la machine
d’états. Ainsi, le flot obtenu permet d’atteindre des résultats formels conclusifs sur des
circuits industriels multi-horloges, et ce tout en configurant formellement le circuit dans
un mode réaliste.
Assertions Assumptions
Lightweight
Assumption
UCEGAR User review
user insight Extraction
FAIL
PROOF
CEx
design bug unrealistic
Abstract — Modern hardware designs typically comprise tens of clocks to optimize con-
sumption and performance to the ongoing tasks. With the increasing number of clock-
domain crossings as well as the huge complexity of modern SoCs, formally proving the
functional integrity of data propagation became a major challenge. Several issues arise:
setting up the design in a realistic mode, writing protocol assumptions modeling the envi-
ronment, facing state-space explosion, analyzing counter-examples, . . .
The first contribution of this thesis aims at reaching a complete and realistic design
setup. We use parametric liveness verification and a structural analysis of the design in
order to identify behaviors of the clock and reset trees. The second contribution aims
at avoiding state-space explosion, by combining localization abstractions of the design,
and counter-example analysis. The key idea is to use counterexample-guided abstraction
refinement as the algorithmic back-end, where the user influence the course of the algorithm
based on relevant information extracted from intermediate abstract counterexamples. The
third contribution aims at creating protocol assumptions for under-specified environments.
First, multiple counter-examples are generated for an assertion, with different causes of
failure. Then, information is mined from them and transformed into realistic protocol
assumptions.
Overall, this thesis shows that a conclusive formal verification can be obtained by
combining inexpensive structural analysis along with exhaustive model checking.
Keywords — formal methods, model checking, verification, static analysis, SoC, CDC.
Résumé — Les circuits microélectroniques récents intègrent des dizaines d’horloges afin
d’optimiser leur consommation et leur performance. Le nombre de traversées de domai-
nes d’horloges (CDC) et la complexité des systèmes augmentant, garantir formellement
l’intégrité d’une donnée devient un défi majeur. Plusieurs problèmes sont alors soulevés :
configurer le système dans un mode réaliste, décrire l’environnement par des hypothèses
sur les protocoles, gérer l’explosion de l’espace des états, analyser les contre-exemples, . . .
La première contribution de cette thèse a pour but d’atteindre une configuration com-
plète et réaliste du système. Nous utilisons de la vérification formelle paramétrique ainsi
qu’une analyse de la structure du circuit afin de détecter automatiquement les compo-
sants des arbres d’horloge. La seconde contribution cherche à éviter l’explosion de l’espace
des états en combinant des abstractions localisées du circuit avec une analyse de contre-
examples. L’idée clé est d’utiliser la technologie de raffinement d’abstraction guidée par
contre-exemple (CEGAR) où l’utilisateur influence la poursuite de l’algorithme en se basant
sur des informations extraites des contre-exemples intermédiaires. La troisième contribu-
tion vise à créer des hypothèses pour des environnements sous-contraints. Tout d’abord,
plusieurs contre-exemples sont générés pour une assertion, avec différentes raisons d’échec.
Ensuite, des informations en sont extraites et transformées en hypothèses réalistes.
Au final, cette thèse montre qu’une vérification formelle concluante peut être obtenue en
combinant la rapidité de l’analyse structurelle avec l’exhaustivité des méthodes formelles.
Mots-clés — méthodes formelles, vérification de modèle, analyse statique, SoC, CDC.
ISBN: 978-2-11-129240-6