Smart Contract Repair: Xiao Liang Yu, Omar Al-Bataineh, David Lo, Abhik Roychoudhury
Smart Contract Repair: Xiao Liang Yu, Omar Al-Bataineh, David Lo, Abhik Roychoudhury
Smart Contract Repair: Xiao Liang Yu, Omar Al-Bataineh, David Lo, Abhik Roychoudhury
1 INTRODUCTION
Smart contracts are automated or self-enforcing programs which currently underpin many online
commercial transactions. A smart contract is a series of instructions or operations written in special
programming languages which get executed when certain conditions are met. Typically, smart
contracts are running on the top of blockchain systems, which are distributed systems whose
storage is represented as a sequence of blocks. The key attractive property of smart contracts is
mainly related to their ability to eliminate the need of trusted third parties in multiparty interactions,
enabling parties to engage in secure peer-to-peer transactions without having to place trust in
external parties (i.e., outside parties which help to fulfill the contractual obligations).
While smart contracts are commonly used for commercial transactions, many malicious attacks in
the past were made possible due to poorly written or vulnerable smart contracts. The code executed
by smart contracts can be complex. There is therefore a need for testing (e.g. [16, 24]), analysis
(e.g. [18]) and verification (e.g. [36]) of smart contracts. In this paper, we take the technology
∗ Corresponding Author
Authors’ addresses: Xiao Liang Yu, National University of Singapore, Singapore, xiaoly@comp.nus.edu.sg; Omar Al-Bataineh,
National University of Singapore, Singapore, omerdep@yahoo.com; David Lo, Singapore Management University, Singapore,
davidlo@smu.edu.sg; Abhik Roychoudhury, National University of Singapore, Singapore, abhik@comp.nus.edu.sg.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2020 Association for Computing Machinery.
1049-331X/2020/5-ART $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
2 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
for enhancing reliability of contracts one step further: once vulnerabilities in smart contracts are
detected, we seek to automatically repair the vulnerabilities.
Automated program repair [11] is an emerging technology for automatically fixing errors and
vulnerabilities in programs via search, symbolic analysis, program synthesis and learning. The
successful application of automated repair techniques to traditional programs [20, 21, 23, 26–29, 40]
raises the question of whether these techniques can be also applied to fix bugs in smart contracts.
Several different approaches have been developed to automatically repair bugs in traditional
programs which can be classified mainly into two categories: heuristic repair approaches [20, 21, 26]
and constraint-based repair approaches [23, 27–29, 40]. The inputs to these approaches are a buggy
program and a correctness criterion (often given as a test suite). The automated repair approaches
return a (often minimal) transformation of the buggy program, so that the transformed program
passes all the tests in the given test-suite.
In practice, the implications of unfixed bugs in smart contracts can be more serious than the
typical non-security sensitive programs for several reasons. First, smart contracts are open for
inspection and running on a decentralized network, the whole program state of smart contracts is
transparent to everyone. Second, the generated patch for a vulnerable smart contract should not
only fix the detected vulnerabilities but also needs to be mindful of the gas consumption of the
resultant patched program. The blockchain system on which the contract will be running typically
has a gas usage limit. Third, the quality of the generated patch for a vulnerable smart contract
is a major design issue to be considered as smart contracts are typically used for commercial
transactions. In fact, malicious agents may take advantage of unfixed bugs in smart contracts to
steal some valuable assets of the parties involved.
In this work, we develop an automated smart contract repair algorithm using genetic program-
ming search. Given a vulnerable smart contract and test suite, we conduct a parallel, biased random
search for a set of edits to the contract that fixes a given vulnerability without breaking any test
that previously passed. The bias in the search comes from the objective function driving the search.
The parallelization strategy consists of splitting the search space into mutually-exclusive (disjoint)
sub-spaces, where patches in each sub-space are concurrently and independently generated and
validated. We introduce also the notion of gas dominance level for smart contracts which enables us
to compare the quality of patches based on their runtime gas. The gas dominance level can be used
to compare the quality of generated patches. This also emphasizes our position is that automated
repair of smart contracts needs to be gas-aware.
To evaluate the effectiveness of our genetic repair algorithm, we constructed a dataset of vulnera-
ble smart contracts taken from the Ethereum mainnet network, which is the main network wherein
actual transactions of smart contracts take place on a distributed ledger. Hence, our constructed
dataset consists of real-world smart contracts. During our evaluation, we considered 20 vulnerable
contracts which have been selected randomly from the constructed dataset while taking into
consideration the class of detected vulnerabilities and the complexity of the vulnerable contracts.
The vulnerable contracts have been selected in a way such that most of the common classes of
vulnerabilities that are typically made by smart contract developers are covered when evaluating
the genetic algorithm. However, to understand and draw some valid conclusions about the factors
affecting the correctness and quality of patches generated by the algorithm, we have evaluated the
algorithm under many different settings and configurations. Examples of such settings include: (i)
enabling/disabling the gas calculation of generated patches, (ii) varying the size of time budget
allocated to the algorithm. Our genetic algorithm was able to fully repair 10 vulnerable smart con-
tracts from the selected set of 20 vulnerable contracts, achieving a 50% success rate. It is interesting
to mention that most of the selected vulnerable contracts have multiple bugs and we therefore
assert a vulnerable contract as repaired if all detected bugs are repaired.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 3
• We present the first automated smart contract repair approach that is gas-optimized and
vulnerability-agnostic. The approach is inspired by genetic programming and can be used to
generate a patch for a given vulnerable smart contract.
• We describe a parallel genetic repair algorithm that can be used to split the large search space
of candidate patches into smaller mutually-exclusive search spaces which can be processed
independently. The presented parallel algorithm helps to process large number of candidate
patches in a short computational time and therefore, in contrast to previous repair approaches,
repairs can be generated faster. It also improves the scalability of genetic repair algorithms
so that large real-world contracts can be repaired.
• We show how to integrate gas-awareness into the repair of smart contracts. This is crucial for
smart contracts as excessive unnecessary gas consumption of contracts can lead to financial
loss or out-of-gas exceptions when running the contract on a public blockchain network.
It is therefore necessary to reduce the cost of running the contract and also the possibility
of introducing new out-of-gas exceptions when repairing a vulnerable smart contract. We
introduce a simple yet effective gas ranking approach with the novel notion of Gas Dominance
Level that can be used to rank generated patches of a given vulnerable smart contract during
the patch generation. In general, the gas consumption of a given smart contract can be a
non-constant bound which can be described as a parametric gas formula that takes into
consideration both static and dynamic parameters that affect the cost of the contract including
the instruction gas, memory gas, stack gas, and storage gas. We provide an acceleration
technique to quickly compare candidate patches in terms of gas consumption, by introducing
the concept of gas dominance levels.
• Based on the above described techniques, we develop a fully automated repairing tool for
smart contracts (which we call SCRepair) which is integrated with a gas ranking approach
to generate an gas-optimized secure contract. Our tool can both detect and repair security
vulnerabilities in smart contracts. It is does so by integrating the tool SCRepair with the
powerful smart contract security analyzer Oyente [24] and Slither [8]. We demonstrate that
our approach is effective in fixing bugs for real-world smart contracts. Our approach can deal
with bugs whose fixes involve multi-line changes. Our smart contract repair tool and dataset
is publicly available in GitHub from https://SCRepair-APR.github.io
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
4 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
Smart contracts allow for decentralized automation by facilitating, verifying, and enforcing the
conditions of an underlying agreement.
Ethereum is the most popular blockchain platform supporting smart contracts. It supports a
feature called Turing-completeness that allows the creation of practically useful smart contracts.
Smart contracts are typically written using the programming language “Solidity”. Note that ev-
erything executed on Ethereum costs some gas for giving the miners incentive to perform the
computations [38]. For example, executing an ADD instruction costs 3 units of gas. Storing a byte
costs 4 units or 68 units of gas, depending on the value of the byte (zero or non-zero). Hence, any
slight mutation to the source code of a smart contract can change the gas usage of the contract
tremendously (and hence the amount of money that the parties of a transaction need to pay when
running the smart contract on a real blockchain network).
To develop a better understanding of blockchain and smart contracts, let us consider an example.
Suppose that Bob would like to sell a property (house) to Alice and Alice is willing to pay 100 Ether
(a cryptocurrency) as a price for that property and that Bob is happy with Alice’s offer. After some
discussion, they agreed to proceed with their business transaction and wish to perform it in an
automated way by taking advantage of blockchain and smart contracts. From the given description
of the problem, one can see that there are three main conditions that any possible solution to the
problem needs to satisfy: (1) Bob has legal ownership of the property that he is selling (2) Alice can
get the ownership of Bob’s property only if she transferred 100 Ether to Bob, and (3) Bob can get
100 Ether from Alice only if he transferred the ownership of his property to Alice. The transaction
can be said to be successful if upon completion, the ownership of Bob’s property is transferred to
Alice while Bob receives 100 Ether.
Suppose that Alice and Bob perform their transaction using the smart contract given in Fig. 1
written in the most popular smart contract programming language Solidity. The code consists of
a number of functions needed in order to perform the commercial transaction in an atomic way.
The function transferA is used by Alice to transfer 100 Ether to the smart contract and hence
when this function is executed, the money comes under the control of the smart contract. The
function transferB is used by Bob to inform the smart contract that the ownership of his property
has been transferred to the smart contract. So that after executing the functions transferA and
transferB, the smart contract is supposed to hold both the money of Alice and the property of Bob.
The function finalize is used to finalize the transaction by transferring the money from Alice
to Bob and the property from Bob to Alice. The smart contract provides also two more functions,
namely abortA and abortB which are available to both Alice and Bob respectively. The goal of
these functions is to protect the parties from the situation where the purchase is canceled halfway
while the smart contract has already held the assets from the parties, so that the parties can get
back their assets from this contract.
Recently, there has been a growing interest in verification and validation of smart contracts [1, 14,
16, 18, 37], as vulnerabilities in smart contracts can have serious adverse consequences. Therefore,
a number of vulnerability detection tools have been developed for smart contracts including
Oyente [24], Slither[8], and ContractFuzzer [16]. In general, smart contract vulnerabilities can be
categorized into three categories [3]: (i) vulnerabilities at the blockchain level, (ii) vulnerabilities at
the Ethereum virtual machine level, and (iii) vulnerabilities at the source code level. In this work,
we are interested on the vulnerabilities that can be repaired at the level of source code.
Based on our conducted literature review on recent research work on smart contracts [3, 4, 6,
7, 12, 16, 24, 35, 36], we summarize in Table 1 some selected popular vulnerabilities that can be
detected using the tools Oyente [24] and Slither [8]. Table 1 shows a summary of these widely
studied vulnerabilities. We give a detailed description of these classes of vulnerabilities in section 6.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 5
1 contract CommercialTransaction {
2 bool transferredA = false ; bool transferredB = false ;
3
4 function transferA () public payable {
5 if ( msg . sender == Alice && msg . value == 100 ether ) {
6 transferredA = true ;
7 }
8 }
9 function transferB () public {
10 // Bob should transfer the house ownership to this
11 // contract before calling this function
12 if ( msg . sender == Bob && hasHouseOwnership ( address ( this ) ) ) {
13 transferredB = true ;
14 }
15 }
16 function finalize () public {
17 if ( transferredA && transferredB ) {
18 transferredA = false ; transferredB = false ;
19 transferHouseOwnership ( Alice ) ; Bob . transfer (100 ether ) ;
20 }
21 }
22 function abortA () public {
23 if ( msg . sender == Alice && transferredA ) {
24 transferredA = false ; Alice . transfer (100 ether ) ;
25 }
26 }
27 function abortB () public {
28 if ( msg . sender == Bob && transferredB ) {
29 transferredB = false ; transferHouseOwnership ( Bob ) ;
30 }
31 }
32 }
Fig. 1. A smart contract written in Solidity language that allows two parties to be involved in a commercial
transaction to sell some property. msg.sender represents the party calling the function. Certain definitions
are omitted for brevity.
Table 1. Selected smart contract vulnerabilities that can be fixed by modifying Solidity source code
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
6 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
contracts. We also discuss the key differences between the smart contract repair problem and the
traditional program repair problem.
Problem 1. (Automated smart contract repair problem). Consider a vulnerable smart contract
C with a set of detected vulnerabilities U , a test suite T and a maximum gas usage bound L, the
automated smart contract repair problem is the problem of developing an algorithm that takes as input
′
(C, U ,T , L) and produces as an output a new contract C that is similar to C but has all vulnerabilities
in U fixed, passing all tests in T , and the maximum gas usage of feasible execution paths should be
less than or equal to L.
The smart contract repair problem is similar to the traditional program repair problem. However,
the smart contract repair problem introduces some extra computational complexity as the patch
generation needs to be gas-aware. It is also highly desirable for the patches to signify readable and
small changes, so that the patched contract is easily comprehensible. Overall, we would want the
the (syntactic) structure of the vulnerable contract to be maximally preserved.
Since detailed formal specifications of intended program behavior are typically unavailable,
program repair uses weak correctness criteria, such as an assertion of existence of vulnerabilities
by vulnerability detector and a test suite. Therefore, the validity of patches is relative to the chosen
vulnerability detector and the available test cases.
As mentioned earlier, the generated patches for smart contracts need to meet more criteria than
those generated for traditional programs. This is mainly due to the fact that smart contracts are
typically running on the top of the blockchain systems, which impose certain constraints on the
total computational resources used by the contract. The execution of the smart contract needs to
comply with the gas usage constraints imposed by the blockchain system. Note that if the running
smart contract exceeds the allowed upper bound limit of the gas usage, the execution of the contract
will be interrupted and a “out-of-gas” exception will be thrown.
Definition 1. (Validity criteria of generated patches). Given a vulnerable smart contract C
with a set of detected vulnerabilities U and a test suite T that consists of two sets: the failing tests TF
and the passing tests TP . Suppose that the contract C is running on the top of a blockchain system B and
that the maximum allowed gas usage available to the contract is bounded by L. We say that the new
′
patched smart contract C is a valid plausibly fixed contract if it satisfies the following requirements.
′
(1) The contract C is not vulnerable to the vulnerabilities in U .
′
(2) The contract C passes all tests in TF .
′
(3) The contract C does not break any test in TP .
′
(4) There is no feasible execution path in C whose total gas consumption exceeds the bound L.
Typically, the bound L imposed on the gas usage of the contract is determined by the involving
parties of the transaction, the structure and semantics of the smart contract, and the block gas limit
of the blockchain. Such bound (if known) can be incorporated in the patch generation process for
vulnerable contracts in order to avoid introducing new out-of-gas exceptions. Note that requirement
′
4 of Definition 1 can be checked by enumerating all feasible paths in the patched contract C and
then verifying that there is no feasible path that exceeds the bound L.
In addition to the above correctness requirements, we are also interested in certain desirable
properties indicating patch quality, as described in the following.
(1) The simplicity of the patch. The simplicity of the edited contract can be measured in terms
of the number of edits that have been made to the original contract.
(2) The cost of the patch. The cost of the contract can be measured in different ways. We
choose here the average gas usage as a metric to measure the cost of the contract.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 7
To evaluate the quality requirements of a generated patch we introduce two functions, namely
′ ′ ′
diff (C, C ) and cost(C ). The function diff (C, C ) returns a numerical value that specifies how
′
much the edited contract C differs from the original vulnerable contract C. Replacing expressions,
inserting of new statements, and moving/deleting of statements will be counted when computing
′
diff (C, C ). Overall, diff (C, C ′) captures the edit distance between two smart contracts C and C ′.
The function cost(C) computes the average cost of gas usage of a given smart contract. Recall
that every single operation that takes part in the blockchain network consumes some amount of
gas. Gas is what is used to calculate the fee that need to be paid to the miner in order to execute
operations. Of course, the cost of transactions can vary from one to the other depending on the
details of the transaction and the structure and complexity of the smart contract. However, for a
given smart contract C and a specific transaction t, one can perform certain calculations to compute
the average cost or the maximum expected cost of the transaction in gas units, provided that the
cost of each operation of the contract on the running blockchain system is known in advance. We
defer the discussion of the computational details of gas usage of a given smart contract to Section 5.
On Plausible and Correct Patches. In this paper, we use the terminology of plausible patch and
correct patch. Here we rely on the terminology in program repair literature (e.g. see [11]), where a
correct patch is deemed to be correct via manual analysis, but a plausible patch is one produced
by a repair technique since it passes all given tests. Since a formal complete specification of the
intended program behavior is not available, the description of intended behavior given to a program
repair technique is incomplete: it is given in the form of tests, assertions or vulnerabilities found.
A plausible patch generated by the repair algorithm thus meets the intended behavior as per this
incomplete description that was provided to the repair method. Thus, if the repair method was
given a test-suite T, a plausible patch can still potentially fail a test t outside T. For this reason,
a plausible patch cannot be guaranteed to be correct, and we need a manual validation step to
ascertain how many of the plausible patches generated are correct. We have conducted such an
evaluation in our work, in Section 7.
Advantages of our search-based approach. The main motivation behind developing a genetic
repair approach relies on the hypothesis that most software bugs introduced by programmers are
due to small syntactic errors. Furthermore, the genetic search technique also has the following
advantages with respect other common repair techniques.
• Semantic repair techniques employ symbolic execution and program synthesis for repairing
programs. Employing such techniques for smart contract repair will deprive our approach
of the natural ability to insert/delete statements which seems to be important for repairing
common smart contract vulnerabilities like the reentrancy vulnerability.
• Template based repair techniques can be used as a purely static approach to smart contract
repair. In this approach for every detected vulnerability type, a specific program transforma-
tion template can be employed for repair. Such an approach deprives us the possibility of
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
8 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
exploring a variety of patch candidates and enforce patch quality indicators in terms of gas
consumption and patch simplicity.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 9
1 int x = 0;
1 bool a = true ;
2 while ( x <= 100) {
2 while ( a ) {
3 x = x + 2;
3 // Some computation
4 // Some computation
4 }
5 }
mutation operator which supposes not to increase the cost of the contract can also lead to a mutant
whose gas usage is higher than that of the original contact. The cost of the generated mutant does
not depend only on the cost of the applied mutation operations but also on the way the operators
change the behavior of the contract. We therefore cannot favor one operator over another when
searching for low cost repairs without performing some analysis on the overall structure of the
vulnerable contract.
Let us consider some trivial examples to demonstrate how the insert operator can turn a high-cost
contract into a low-cost contract while the move operator may turn a low-cost contract into a
high-cost contract. The program in Fig. 2 represents a buggy program. Suppose that we generate a
mutant for this program by inserting a new statement after the initialization statement (line 1) of
the form: a = false; . In this case, the loop in the generated mutant will be skipped and the average
gas usage of the new mutated version will be much smaller than that of the original version. The
program in Fig. 3 represents another potentially buggy program. Let us generate a random mutant
of the program by applying the move operator so that the statement at line 4 (the loop counter
update statement) is moved outside the loop. Obviously, this will turn the loop into an infinite
loop and hence the contract will run out of gas after certain number of iterations. Note that since
mutation makes random changes to the buggy smart contract, it may impact the performance and
cost of the contract in many different arbitrary ways. This is critical especially when the buggy
smart contract contains loops.
Observation 1. There is insufficient information to predict the gas of a mutated contract by
inspecting the mutation operations applied. For example, the successive applications of the mutation
operators not introducing new statements (move, replace) does not necessarily lead to a low-cost
mutant w.r.t. the original smart contract. Similarly, the successive applications of the mutation operator
inserting new statements insert does not necessarily lead to a high-cost mutant w.r.t. the original smart
contract. The cost of the generated mutants depends mainly on how the applied mutation operators
change the behavior of the smart contract.
As mentioned earlier, one of the biggest challenges that need to be addressed when using a genetic
search approach for repairing smart contracts is how to speed up the generation and validation
processes of mutated versions. We describe here a parallel search-based algorithm for efficiently
generating patches. We assume here we have three versions of the mutate function: mutateM(C)
which mutates the contract C using only the move operator, mutateR(C) which mutates the contract
C using only the replace operator, and mutateI (C) which mutates the contract C using only the
insert operator. Since genetic repair approaches use mainly an exhaustive search algorithm to
generate a patch, it is highly desirable to split the search space into sub-spaces. To do so, we use
the mutate functions described above to split the search space into 7 smaller spaces as described in
the following.
• [SpaceS 1 ]: this search space consists of the set of candidate patches that result from mutating
the contract C using only the function mutateM(C).
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
10 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
• [SpaceS 2 ]: this search space consists of the set of candidate patches that result from mutating
the contract C using only the function mutateR(C).
• [SpaceS 3 ]: this search space consists of the set of candidate patches that result from mutating
the contract C using only the function mutateI (C).
• [SpaceS 4 ]: this search space consists of the set of candidate patches that result from mutating
the contract C using the two functions mutateM(C) and mutateR(C).
• [SpaceS 5 ]: this search space consists of the set of candidate patches that result from mutating
the contract C using the two functions mutateM(C) and mutateI (C).
• [SpaceS 6 ]: this search space consists of the set of candidate patches that result from mutating
the contract C using the two functions mutateR(C) and mutateI (C).
• [SpaceS 7 ]: this search space consists of the set of candidate patches that result from mutating
the contract C using the functions mutateM(C), mutateR(C), mutateI (C).
Note that for the effectiveness of the parallel algorithm we need to ensure that the search spaces
are mutually-exclusive spaces so that no redundant mutants are generated and validated across
various spaces. Recall that each mutant will be checked using the vulnerability detectors and
against a set of test cases in addition to the gas usage requirement. Such validation process can be
computationally complex specially when the search space of candidate patches is extremely large.
Mutants in S 7 are generated using the nesting operation mutateX (mutateY (mutateZ(C))), where
X , Y , and Z are distinct operators taken from the mutation domain {Move, Replace, Insert }. Assume
C 1 = mutateZ(C), C 2 = mutateY (mutateZ(C)), C 3 = mutateX (mutateY (mutateZ(C))). Then the
validity function VS7 (C 3 ) for this search space S 7 can be formalized as follows.
Accept iff diff (C 1 , C) > 0 ∧ diff (C 2 , C 1 ) > 0
∧ diff (C 2 , C) > 0 ∧ diff (C 3 , C 2 ) > 0
VS7 (C 3 ) =
∧ diff (C 3 , C) > 0 ∧ diff (C 3 , C 1 ) > 0
Reject
otherwise
Note that for a mutant to be added to the space S 7 it has to satisfy a somewhat complex condition.
This is necessary in order to avoid overlaps with the other search spaces. Similar validity functions
are defined for the other sub-spaces to ensure the mutually-exclusive property of the sub-spaces
(please see Theorem 1).
Definition 2. (Properties of splitting strategy). Let S be the search space of possible mutants
of a vulnerable smart contract C generated using the operators move, replace, and insert. The splitting
strategy of S into spaces S 1 , ..., S 7 satisfies the following properties
• disjointness: for any two distinct sets Si and S j such that (i, j = 1, ..., 7∧i , j) we have Si ∩S j = ∅.
• completeness: (S 1 ∪ S 2 ∪ ... ∪ S 7 ) = S.
Theorem 1. Spaces (S 1 , ..., S 7 ) are mutually exclusive spaces.
Proof. (sketched). To prove the theorem we need to consider many different cases as we have 7
spaces. However, since the proof argument of all cases will be very similar and for brevity reason,
we consider here only space S 7 . For this case, we need to show that S 7 ∩ S j = ∅ | j = 1...6. Hence,
there are six possible sub-cases to consider. Recall that the mutants in S 7 are generated using the
nesting operation mutateX (mutateY (mutateZ(C))), where X , Y , and Z are distinct operators taken
from the mutation domain {Move, Replace, Insert }. The theorem can be proven by contradiction.
• Let Si ∩ S 7 , ∅ | i ∈ {1, 2, 3}. This implies that there exists a mutant m that belongs to
both Si and S 7 . Note that since m belongs to Si then it is generated using a single mutate
function of the form mutateX , where X ∈ {Move, Replace, Insert }. It is easy to see then that
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 11
the mutant m cannot exist in the space S 7 as the addition of such mutant to S 7 contradicts
with the definition of the validity function of the space S 7 .
• Let S j ∩ S 7 , ∅ | j ∈ {4, 5, 6}. This implies that there exists a common mutant m that
belongs to both S j and S 7 . Note that since m belongs to S j then m is generated from the
nesting operation mutateX (mutateY (C)), where X and Y are distinct operators taken from
the domain {Move, Replace, Insert }. Hence, the mutant m is generated using only two op-
erators while ignoring the effect of one of the three operators. Therefore, the mutant m
cannot exist in the space S 7 as this contradicts with the definition of the validity function
of the space S 7 and the fact that mutants in S 7 are generated using the nesting operation
mutateX (mutateY (mutateZ (C))).
□
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
12 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
all possible combinations of random mutations of the function mutateM(C) on the contract C until
the corresponding patch space S 1 is exhausted.
We now discuss the implementation of the main process p8 (Algorithm 2). This process takes
as inputs: the original vulnerable smart contract C, the set of targeted vulnerabilities U , and the
set of test cases T , then returns the patches that meets the quality requirements (plausible patches
that pass given tests T and do not exhibit given vulnerabilities U ). At the beginning, we conduct
a population bootstrapping that a set of mutants is generated to have the initial set of mutants.
The size of the set is controlled by the parameter IP (Initial Population Size). At the time new
mutants should be generated, p8 sends requests to the processes p1 , . . . , p7 (the Requests operation
in Algorithm 2). Whenever one of the processes has generated a new compilable mutant, all other
mutant generation processes will stop attempting to generate new mutants and the request is
fulfilled. The Eval is used to calculate the fitness value of the patches. The objective functions are
defined in Table 2. Note that all the objective functions are independent from one to the other,
the Eval function therefore also issues new concurrent processes to speed up the patch fitness
evaluation process. The control flow then enters the main loop. In each iteration, the algorithm
first checks if there is already plausible patch existing in the maintained set of patches; this is
accomplished by invoking the function Filter_Plausible_Patches. If it exists, this algorithm returns
immediately the plausible patch. Otherwise, the maintain set of patches will be trimmed to the
size P size by the NSGA2 population selection algorithm[5] and yet another set of patches will be
generated in the similar fashion. The base version used to generate the new set of patches is chosen
to be the best patch among all the patches in the maintained set Patches. The evaluation of relative
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 13
quality between patches is based on their fitness values. In each iteration of the main loop, the
number of new patches will be generated is determined by the parameter GR (Generation Rate).
We employ a timer in p8 (not shown in pseudo-code for simplicity) which will be used to enforce
termination of the process in case the time spent in the search process exceeds the bound MaxBound.
The bound MaxBound should be chosen while taking into consideration the number of test cases,
the size of the buggy program, and the estimated number of mutants in the search space assigned
to the process. Note that processes work independently and terminate whenever a plausible patch
is found or that the timer is fired.
Objectives or Fitness Functions. As mentioned earlier, the size of the search space can be extremely
large even for programs whose source code size is small. Recall that the search space grows
exponentially with the considered lines of code and hence the efficiency and performance of the
genetic repair algorithm needs to be improved when examining candidate patches in the generated
search space. While the parallel repair algorithm splits the large search space into smaller sub-
spaces which improves considerably the patch generation process, the search sub-spaces can be
still huge to be exhaustively explored in a reasonable time budget. The goal of the employed
fitness functions is to guide the search towards plausible repair. We therefore integrate four fitness
functions (objectives) with the patch generation process. The objectives are classified into primary
objectives and secondary objectives. Primary objectives are related to the functional or correctness
properties of the patch, while secondary objectives are related to the non-functional properties of
the patch. The two main functional correctness objectives are the number of targeted vulnerabilities
and the number of failing test cases. The number of targeted vulnerabilities can be retrieved from
any smart contract vulnerability detector (e.g. Oyente [24]) while test cases can be provided by the
vulnerable contract developers. The secondary properties or non-functional properties include the
number of mutation operators applied on the generated patch and the gas usage or the cost of the
patch. The designated fitness functions measure how many of desired functional and non-functional
requirements a generated mutant meets. The mutation distance of the generated mutant from the
original vulnerable contract is measured by counting the number of times the mutation operators
applied to the generated mutant. This can be used to measure the simplicity of the generated
mutant. The average gas usage is compared by the methodology described in section 5. The two
secondary objectives are considered only when the generated patch is valid (fixed all targeted
vulnerabilities and passes all test cases). Note that we give higher preference to a patch that fixes all
detected vulnerabilities and passes all test cases with lower average gas usage and smaller number
of syntactical changes w.r.t. the original vulnerable contract. We summarize these objectives (fitness
functions) in Table 2.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
14 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
usage of different generated patches of a given vulnerable contract. In general, the gas cost of a
smart contract depends on a number of parameters including memory cost, stack cost, and storage
cost in addition to the instructions’ costs. Hence, the gas consumption of a given path π in a smart
contract SC can be a non-constant. It should be therefore described as a parametric formula that
takes into consideration the parameters that affect the gas consumption of the path. We call the
described parametric formula as gas formula.
To compare the average gas usage of two smart contracts, we propose the notion of gas dominance.
The goal of the introduced gas dominance notion is to rank edited contracts (generated repairs of
vulnerable contracts) based on their corresponding gas formula as an estimation on the relative
average gas usage. This estimation is required as we cannot predict in advance the true average
gas usage over their lifespan. Such a ranking approach can be used to select a low-cost repair
for a vulnerable smart contract from the set of proposed repairs generated by the parallel repair
algorithm.
C(σinst , µ inst , I ) = GUOPCODEins t (σinst , µ inst , I ) + GUNew Memory (σinst , µ inst , I ) (1)
where σinst is the blockchain world state before the instruction inst is executed and µ inst is the
machine state before inst is executed, the operation code OPCODEinst = I .code[µpc ] is a property of
the execution environment I indexed by a program counter µpc , and GUOPCODEins t is the gas formula
associated to the operation code of inst and GUNew Memory is the gas usage formula associated to
the expansion of machine memory when executing the instruction inst. For more technical details
about the definition of the gas cost function, we refer the reader to [38].
The total gas usage of an invocation (in the form of a single transaction) with the execution
information specified in I can be defined as a gas function corresponding to the visited contract
path triggered by the inputs:
Õ
GUpath (σp , µp , I ) = C(σinst , µ inst , I ) (2)
inst ∈Insts
where Insts = (inst 0 , inst 1 , inst 2 , . . .) the sequence of instructions in the execution path deter-
mined by σp , µp and I , and σp = σinst0 , and µp = µ inst0 . For a smart contract with k execution paths,
we construct k gas usage functions, e.g. GUpat h1 , . . . , GUpathk . We can then express the total gas
usage of a smart contract SC over its lifespan as follows:
Õ
GUlifespan, SC = GUt r ans (t)(σt , µ t , It ) (3)
t ∈t r ans
where trans is the set of transactions to smart contract (denoted by SC) over its lifespan (the
history of transactions of SC), and σt , µ t and It are the world state, machine state and execution
environment respectively when the first instruction of the invocation corresponding to transaction
t was executed. We introduce a new higher order function GUt r ans here that maps a transaction to
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 15
its corresponding gas usage function. Suppose the execution path is π for the transaction t, then
GUt r ans (t) = GUpat h π .
Given two repaired versions SCa and SCb for a vulnerable smart contract SC addressing the same
vulnerabilities, we then favor the version with lower lifespan gas usage. However, since the future
blockchain world state and the user inputs to SC can be of any possible combination which are
generally unknown in advance, concrete lifespan gas usage of patched versions cannot be used to
compare effectively the average gas usage of patches. We therefore propose to use what we call gas
dominance as a method to compare the relative gas-efficiency between two patches by comparing
the expected gas usage functions of them. So that for a given a smart contract SCa with k execution
paths, we can express the expected gas usage of SCa as follows:
k
Õ
GUE (SCa )(σ , µ, I ) = Pi ∗ GUpathi (σ , µ, I ) (4)
i=0
where Pi is the probability of pathi being visited by an arbitrary execution of SCa , GUpathi is
the gas usage function corresponds to program path pathi . For the cases where the contract paths
invoke external functions, we need to include the gas usage introduced by the external function
invocations in the equation of GUE (SCa ) of the contract.
Definition 3. (Gas Dominance Relation). Given two smart contracts SCa and SCb , we say SCa
gas dominates SCb (denoted by SCa >д SCb ) if and only if GUE (SCa ) ≤ GUE (SCb ) for all inputs and
GUE (SCa ) < GUE (SCb ) for at least one input to the smart contracts.
Formally,
SCa >д SCb ⇐⇒ ∀σ , µ, I (GUEa (σ , µ, I ) ≤ GUEb (σ , µ, I ))∧
(5)
∃σ , µ, I (GUEa (σ , µ, I ) < GUEb (σ , µ, I ))
where GUEa = GUE (SCa ) and GUEb = GUE (SCb )
The gas dominance relation has the following properties:
Property 1 (Irreflexive). For all smart contracts SC, they do not gas dominate themselves. That
is, SC must not gas dominate SC.
Property 2 (Asymmetric). For two arbitrary smart contracts SCa and SCb , if SCa gas dominates
SCb , then SCb must not gas dominate SCa .
Property 3 (Transitive). For three arbitrary smart contracts SCa , SCb and SCc , SCa gas dominates
SCb and SCb gas dominates SCc , then SCa must gas dominate SCc .
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
16 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
5.3 Integrating Gas Dominance Relationship into Genetic Patch Search Process
The above defined gas dominance relationship is for comparing the relative average gas consumption
between two versions of the vulnerable contract. To enable the comparison among multiple patched
versions of the original vulnerable contract, we here define the notion of gas dominance level, as
defined in the following.
Definition 4. (Gas Dominance Level). Given a set of smart contracts, non-dominated sorting
[5] is performed based on the gas dominance relationship. The gas dominance level of an arbitrary
smart contract in the set is defined as its ranking in the non-dominated sorting result.
The multi-objective genetic algorithm can now use the gas dominance level as one of the
objectives, which serves to implicitly capture the effect of patches on the gas consumption (without
having to compute the gas consumption directly).
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 17
Remark 1. Syntactically identical paths among contracts share the same gas formula and therefore
can be safely skipped during comparison.
′
Definition 5. (Classifying paths in contracts). Let C be a vulnerable smart contract and C
′
be a repaired versions of C obtained by the parallel repair algorithm. A feasible path π in C can be
classified into one of the following categories
• π is a repaired path of some paths in C, or
• π is a new path w.r.t. the set of feasible paths in C, or
′
• π is a joint or common path between C and C .
Note that a patch introduces to a given vulnerable smart contract may trigger a new set of paths
that were infeasible in the original vulnerable smart contract. Thus, a repaired version of a contract
may have new set of behaviors w.r.t. the original contract. This may happen for example when the
patch updates an expression in a conditional statement in the original vulnerable contract. The
advantages of distinction between the above three classes of paths are two-fold. First, it helps to
reduce the number of paths that need to be considered when comparing the contracts and hence
the number of gas formulas that need to be synthesized. Second, it helps to reduce the complexity
of the final gas formulas of the contacts being compared. Note that since we use a genetic algorithm
based on three mutation operators (move, insert, and replace), we can easily then classify paths
in the contracts being compared into three categories: repaired paths, joint paths, or new paths.
Typically, we can identify the locations of buggy statements in the contract and we can augment the
repairing algorithm to label the locations of statements that have been influenced by the deployed
patch. This facilitates the classification of paths in the generated repaired contract w.r.t. the original
contract.
We now turn to describe an acceleration technique that can be applied before conducting the
′
actual comparison between two similar contracts C and C . Let us denote the set of feasible paths
in the two contracts by ΠC and ΠC ′ . The goal of the acceleration technique is to generate reduced
′
versions of the contracts C and C as follows:
(1) Compute the sets of paths that are unique in each contract as follows
′
Diff (C, C ) = (ΠC \ ΠC ′ )
′
Diff (C , C) = (ΠC ′ \ ΠC )
′ ′
(2) Synthesize a gas formula for each path in the sets Diff (C, C ) and Diff (C , C) using Equation
(2) and then compute the final gas formula by summing the resulting gas formula using
Equation (4).
(3) Compare the resulting gas formulas using the comparative approach described at Section 5.
Comparing the gas usage of two contracts using their reduced versions (i.e., versions obtained
by skipping joint paths or repaired paths whose gas formulas are equivalent) preserves soundness,
as described in the following theorem.
′
Theorem 2. (Soundness of reduction). Let C be a vulnerable smart contract and C be a repaired
′ ′ ′
version of C. Let also G(C) and G(C ) be gas formulas for C and C respectively and G(C R ) and G(C R )
′
be gas formulas for reduced versions of C and C obtained as described at Section 5.4. G(C R ) dominates
′ ′
G(C R ) if and only if G(C) dominates G(C ).
Remark 2. (Effectiveness of reduction). The accelerated comparative approach of smart contracts
has lower computational complexity than the non-accelerated comparative approach. The amount of
reduction on the computational complexity that can be obtained depends on the number of joint and
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
18 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
repaired paths in the contracts being compared that can be skipped safely during the comparison (i.e.,
without adversely affecting the outcome of comparison).
The number of generated gas sub-formulas (for paths) and the complexity of the final gas formula
(for the contract) can be significantly reduced if the acceleration approach is employed. This is
crucial as synthesizing gas formulas for paths can be an expensive step specially for paths with
cyclic behavior. Note that comparing reduced versions of contracts using simplified or reduced
gas formulas that consider only different paths in the two contracts does not affect the soundness
of the analysis. This is mainly due to the observation that only the set of different paths in the
contracts can make the gas consumption of a contract dominates the other.
6 IMPLEMENTATION
In this section, we describe the implementation of the SCRepair tool, as well as the setup of the
experimental evaluation (the results from the experiments will appear in the next section).
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 19
SCRepair
Worker
Vulnerability
Requests Detector
Vulnerable
Vulnerable
Vulnerable Vulnerable
Smart
Smart Contract Main Process Test Case Vulnerable
SmartContract
Contract Patch Generator Vulnerable
Smart Contract
Controller Executor Smart
SmartContract
Contract
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
20 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
the blockchain. For every transaction (denoted by t) to the subject smart contract, we capture the
inputs and the changes to the blockchain state during the execution of t which then considered as
the inputs and expected behaviors of the generated regression test case. A generated regression
test case for a transaction t contains the following elements:
(1) Blockchain state before executing the transaction t.
(2) The function being invoked and the corresponding argument values.
(3) Blockchain state after executing the transaction t.
(4) The return values of invoked functions.
However, as the whole blockchain state can be very huge (in the magnitude of terabytes), it is
impractical to simply store relevant versions of the blockchain state. To address this issue, we only
capture relevant states of the Ethereum accounts in the blockchain before and after the execution
of the transaction t. The generation of each regression test case is then run against the original
vulnerable smart contract to check the validity of the newly generated test case. During the test
case generation process, we have set a timeout bound of 5 minutes for the execution time of each
regression test case. Regression tests requiring longer time are terminated and discarded. Table 3
shows the number of regression test cases generated for each subject contract. The generated
regression test cases are then used in the automated repair experiments.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 21
suite with a script to convert past block-chain transactions as positive test-cases as described
in subsection 6.2. The vulnerabilities detected by a smart contract checker like Oyente and
Slither constitute the negative behavior that the generated patches should avoid.
(2) Timeout allocated to the algorithm. A feasible exploration of the search space (candidate
patches) depends heavily on the amount of resources allocated to the genetic algorithm. In
general, the size of the generated search space of a given vulnerable contract depends on
multiple factors including: (i) the size and complexity of the contract being repaired, (ii) the
number of buggy statements in the contract, and (iii) the mutation operators used by the
algorithm. However, the number of mutants that can be examined during the search is limited
to the time budget allocated to the algorithm. The bigger the time budget, the higher the
probability to produce a plausible patch.
(3) The consideration of gas consumption of patches. Considering the gas when searching
for plausible patches of a vulnerable smart contract can be of great benefit. First, it can help
to generate a low-cost repair for a given vulnerable smart contract by comparing the gas
consumption of generated patches and selecting the one with low average cost. Second, it
can be used to optimize the efficiency of the genetic search algorithm in various ways. For
example, it can be used to detect and discard infeasible patches early. Note that a patch can be
a plausible patch (passes the test-cases) but infeasible to be deployed on a real blockchain. This
happens when the generated patch consumes a significantly large amount of gas and thus
leads to expensive transactions. To reduce the computational complexity of the algorithm,
one might need to maintain during the genetic search the best known low-cost average gas
usage (let us call it дmax ) of a plausible patch. Then when a new plausible patch is found that
has lower average cost, the bound дmax will be updated accordingly. The bound дmax can be
updated on-the-fly during the search and used to discard infeasible patches early without
necessarily examining the entire test suite.
(4) The number of genetic mutation operators used by the algorithm. Note that the size
of the search space that needs to be examined when searching for a plausible patch for a
vulnerable contract can be extremely large. Recall that the search space of a given vulnerable
smart contract is generated by mutating (buggy) statements in the contract. Hence, the size
of the generated search space grows exponentially w.r.t. the number of considered lines in
the contract and the number of mutation operators. The smaller the number of the mutation
operators, the smaller the size of the search space and the faster the algorithm. However,
reducing the number of the mutation operators may reduce significantly the capability of
the algorithm to produce plausible patches.
(5) The state space search order. As the search space grows, the organization of mutants or
candidate patches into sub-spaces becomes more critical to the efficiency of the algorithm. In
general, there is no specific search strategy that one can follow when examining the candidate
patches of a given vulnerable smart contract. The search can be purely sequential and random
or it can be parallelized based on the semantics of the mutation operators. However, as
expected, the search can be optimized by taking into consideration some interesting factors
including the semantics of the bug, the semantics of the mutation operators, and the gas
consumption of generated patches.
As one can see from the aforementioned factors, the correctness and efficiency of the genetic
algorithm can be evaluated under many different settings. For example, one might wonder how
does the algorithm perform when enabling/disabling the gas calculation of generated patches
or when increasing/decreasing the size of test suite or the amount of time budget allocated to
the genetic algorithm. In this work, we choose to evaluate the correctness and efficiency of the
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
22 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
genetic algorithm by considering five key research questions. The goal of the research questions is
to evaluate the presented parallel genetic repair algorithm and to understand and draw conclusions
about the factors affecting the correctness and quality of generated patches.
Setup. To demonstrate the effectiveness of the presented genetic repair algorithm in fixing vul-
nerable smart contracts, we run the genetic algorithm on the selected set of smart contracts. We
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 23
evaluate the effectiveness of the algorithm by measuring the number of vulnerabilities that can be
detected and repaired correctly and the time it takes to generate correct patches of these vulnerable
contracts. Recall that a repair is generated by the algorithm when all test cases pass and no targeted
vulnerability is found. We call such as a fix as a plausible fix. Hence, the generated patch might still
not be a correct patch. We then check the correctness of the generated patches by inspecting the
semantics of the patches manually. We assert a plausible fix for one vulnerability as correct if it
repaired the vulnerability being detected while the original business logic is not modified and the
fix does not introduce new features or vulnerabilities to the code. We use the unguided random
search implementation as the baseline to evaluate the effectiveness of the designated guidance in
the search process. This is essential since the expensive complete patch quality assessment by the
objective functions could result in lowering the efficiency and effectiveness[2, 33].
Results. For each of the considered vulnerable contracts, we have run our algorithm five times,
each time with a timeout of one hour. We report the average value of the run time and the sum of
plausibly successfully repaired vulnerabilities among five runs as the final results. Table 4 shows
the summary of the results and the average run time of the algorithm. The algorithm was able to
plausibly repair 26 occurrences of vulnerabilities among the 48 detected vulnerabilities. The average
run time of the algorithm over the considered 17 subjects was 25 minutes. We noticed that the main
bottleneck of the implementation is due to the test case execution time which often consumes the
most computational resources and blocks the synchronization barrier of each iteration of the main
loop of the algorithm. When inspecting the generated patches, we found that our algorithm was
able to fix correctly 21 vulnerabilities out of the detected 48 vulnerabilities. With the same timeout,
our genetic algorithm was able to plausibly fix 15 more vulnerability than the unguided random
search version yielding a 136% improvement. This clearly shows the guidance in the search process
from the genetic algorithm has increased the repair efficiency significantly. Moreover, a careful
inspection of the results reported in Table 4 leads to the following interesting observations.
Observation RQ1.1. As shown in Table 4 there are four different classes of vulnerabilities that
have been considered when evaluating the algorithm, namely, ED, RE, IO, and TOD. We observed that
most of the vulnerabilities of the classes ED and RE have been fixed correctly by the algorithm, where
21 out of the 28 detected EDs have been plausibly repaired and 4 out of the 6 detected REs have been
plausibly repaired. On the other hand, the algorithm was unable to generate correct patches for any of
the vulnerabilities of the classes IO and TOD; one plausible patch for IO was generated.
Observation RQ1.2. The occurrence rates of the vulnerabilities ED, RE, IO, and TOD in the
considered vulnerable contracts are as follows: ED occurs 58%, RE occurs 13%, TOD occurs 4%, and IO
occurs 25%. We observed that the ED vulnerability is the most frequently occurring class of bugs in the
selected vulnerable contracts, where 28 out of the 48 detected bugs are ED bugs.
Observation RQ1.3. We observed that 7 out of the considered 17 vulnerable contracts have been
repaired in less than 10 minutes, where most of these contracts contain multiple bugs. This demonstrates
clearly the efficiency of the presented parallel genetic repair algorithm in fixing vulnerabilities in a
considerably short amount of time.
Answer to RQ1: Among the 48 detected vulnerabilities in the 20 vulnerable smart contracts,
the algorithm was able to fix plausibly 26 vulnerabilities, where 21 of these plausible fixes
have been verified to be correctly fixing the vulnerabilities. Notably, our implementation
fully repaired 10 of the 20 contracts.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
24 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
Table 5. Results for RQ2, showing gas variation between buggy contract and patched versions.
Setup. When fixing the detected vulnerabilities, expressions in the vulnerable smart contracts will
be modified. However, it is unclear whether plausibly fixing the vulnerabilities would change the
average gas consumption of the smart contract. We therefore perform a comparison on the average
gas consumption between the original vulnerable smart contract and the plausibly patched versions
generated from five repeated runs conducted in RQ1. Gas dominance levels between the original
contract and the patched versions are computed. We assert the patched version has different average
gas consumption from the original version when they are of different gas dominance levels. To
calculate the gas dominance level, the gas formula of the original version and the patched versions
will be generated, as described in earlier sections.
Results. Table 5 shows the difference in average gas consumption between the plausible patches
and the original version. Subjects for which plausible patches could not be generated within time
limit (1 hour) are omitted for consideration of this RQ. To sum up, 6 out of 8 (75%) of our set
of selected subjects have plausible patches with gas formula that are different from the original
vulnerable version while half (50%) of our set of selected subjects have plausible patches having gas
dominance levels different from that of the original vulnerable version. This suggests the possibility
that fixing vulnerabilities in smart contracts can change the average gas consumption of the
original contract. For the subjects with plausible patches amending the average gas consumption,
each independent patch generation process has high probability (93.65% in our experiments) of
generating plausible patches of gas dominance levels different from the original version.
Answer to RQ2: In general, when fixing vulnerabilities in a vulnerable smart contract, the
gas should be one of the factors considered in the repair process.
Setup. Further, we would like to investigate whether there is a possibility to plausibly fix the
vulnerabilities with more than one patch yielding to different average gas consumption across
patches. In other words, we intend to understand whether the same bugs can be fixed with patches
of different average gas consumption. If the answer is positive, we then justify the need to attempt
pursuing a more gas-efficient plausible patch during the search process. We conduct our analysis
on the patches generated in RQ1 across five repeated runs. We leverage gas dominance levels of
patches as a proxy to compare the difference in average gas consumption between patches. We
assert a patched version has different average gas consumption from the other when they are of
different gas dominance levels. Note that two patched versions have different gas dominance levels
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 25
Table 6. Results for RQ3, gas variation among patch candidates is shown.
when the gas formula of one of the two versions dominates the other. However, to calculate the gas
dominance level of generated patches, the gas formulas of the original version and the patched
versions need to be generated first.
Results. Table 6 shows the difference in average gas consumption between the generated plausible
patches of selected vulnerable contracts. Subjects for which plausible patches could not be generated
within time limit (1 hour) are omitted for consideration of this RQ. For 5 out of 8 subjects (62.5%), we
were able to get a set of plausible patches with more than one corresponding unique gas formulas,
indicating the diversity of gas consumption between plausible patches addressing the same set of
vulnerabilities. We noticed that plausible patches have around two gas dominance levels among
them, on average, for a given contract.
Observation RQ3.1. For 62.5% of the considered subjects, there exist plausible patches having
different average gas consumption.
Answer to RQ3: Different plausible patches can yield various average gas consumption for
fixing the same vulnerabilities. We should therefore attempt to guide the search towards
more gas-efficient plausible patches besides considering their correctness.
RQ4: How effective is the gas ranking approach at producing low-cost patches?
Setup. During the patch generation process, we have integrated our proposed gas comparison
approach to compare the relative gas usage of generated patches. The relative gas dominance
relationship is then used in the genetic patch generation process as a guidance to generate a
potentially gas optimized patch. To evaluate systematically the effectiveness of the gas usage
objective in producing low-cost patches, we run our repair algorithm on the selected vulnerable
smart contracts under two different settings: the first setting is when the the gas ranking objective
is active (done in RQ1) and the second setting is when the gas ranking objective is deactivated.
The first setting is a reuse of patches generated in RQ1 while the second setting is additional runs
with repeating factor of five and timeout of one hour. Later, we run all patches generated in both
settings on our generated test cases and collect the average runtime gas usage of each setting. For
consistent and fair comparison, we only consider patches fixing all vulnerabilities. Different from
RQ2 and RQ3, this RQ attempts to expose the change in average gas consumption for the previous
usages of the contracts to infer practical gas cost changes.
Results. Table 7 shows the summary of average gas usage of patches generated with and without
the gas objective being activated. Subjects that plausible patches could not be generated within
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
26 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
Table 7. Results for RQ4, showing the average gas usage of the patched versions (on the given tests) in two
settings. The percentage shows the improvement on average gas usage when the gas objective is enabled.
time limit (1 hour) are omitted for consideration of this RQ. Overall, 6 out of 8 subjects among
subjects for which both settings can generate plausible patches (75%), the gas objective is effective
to reduce the average cost of the patches by up to 9.31% for our subjects. Two subjects (Autonio
ICO and Classy Coin Airdrop) do not have varied average gas usage between patches generated in
two settings. One subject (MXToken Crowdsale) does not have plausible patch generated where
the gas objective is deactivated in the five repeated runs. In addition, we have also done careful
profiling of the algorithm exposing the fact that gas ranking has frequently been the determining
factor of patch rankings during the repair process of the selected subjects even though the gas
objective is employed as a secondary objective.
Answer to RQ4: When enabling the gas objective during repair, we observed that the
average gas consumption of generated patches of four vulnerable contracts has been
reduced comparing to the setting in which the gas objective was disabled. We observed
also that the average gas of two subjects has been considerably reduced when enabling
the gas objective, where the average gas of the patched version of XGold Coin contract is
reduced by 6.37% and the average gas of the patched version of Privatix Presale contract
has been reduced by 9.31%. This is a considerable reduction as gas costs real money.
RQ5: How does the time budget impact our effectiveness at fixing bugs?
Setup. Allocating or estimating a feasible time budget to a genetic repair algorithm is an interesting
open problem. It is crucial as it affects the capability of the algorithm in generating plausible patches
for a given vulnerable contract. There are some key factors that should be taken into consideration
in order to allocate a feasible time budget to our repair algorithm including: (i) the size of the test
suite, (ii) the complexity of the contact (i.e., larger contracts may take longer time to be analyzed
than smaller contracts), and (iii) the estimated size of the search space which in turn depends on
the number of the mutation operators used by the algorithm and size of the original vulnerable
contract. To address this research question, we choose to evaluate the algorithm under two different
time budgets: the first is when we set the timeout to 30 minutes and the second is when we set the
timeout to one hour. The goal is then to measure the number of vulnerable contracts that have
been repaired under the two settings.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 27
Table 8. Results for RQ5, obtained by varying the timeout from 30 minutes to 1 hour
Results. Table 8 shows the results of running the algorithm over the selected vulnerable smart
contracts using two different values of the timeout parameter (30 minutes and 1 hour). As shown
in the table, when setting the timeout parameter to 30 minutes the algorithm was able to generate
plausible patches for 17 vulnerabilities out of the 48 detected ones, achieving a success rate of
35.4%. On the other hand, when setting the timeout parameter to 1 hour the algorithm was able
to generate plausible patches for 26 vulnerabilities, achieving a success rate of 54.2%. While the
amount of improvement on the repair rate looks somewhat small, it is very crucial as it shows
that some vulnerabilities can be only repaired when increasing the timeout to 1 hour. This clearly
demonstrates the impact of the timeout parameter on the effectiveness of the algorithm. However,
since every detected vulnerability in a given vulnerable smart contract needs to be repaired and
the fact that the size of the search space can be extremely large, the time budget allocated to the
algorithm can play a key role in the successful termination of the algorithm. When we increase
the time budget of the algorithm, we increase the size of the explored search space which in turn
increases the probability of generating plausible patches.
Answer to RQ5: When we increase the timeout parameter of the algorithm from 30 minutes
to 1 hour we observe that the vulnerability repair rate of the algorithm has been increased
from 35.4% to 54.2%, where the genetic algorithm was able to repair 9 extra vulnerabilities.
This demonstrates clearly the importance of allocating a substantial time budget (at least
one hour) to the algorithm when repairing vulnerable smart contracts.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
28 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
the dataset is however limited since this is the first automated smart contract repair work, and
therefore, there is no consolidated dataset for use like Detect4J[17] for Java. We are aware that
our approach employs biased random search techniques, and for this reason each experiment was
repeated five times. We admit that the presented results are potentially skewed even though we
have conducted our experiments with a replication factor of five times for each setup.
External validity. External validity treats are related to the ability to generalize our findings.
We have only evaluated our work on four known vulnerability types. While our approach is
vulnerability-agnostic, the efficacy in terms of fixing other vulnerabilities remains unknown. On the
other hand, we have conducted our experiments on real-world subjects as an attempt to investigate
the performance of approach. This does not guarantee that similar efficacy will be exhibited for
arbitrary vulnerable smart contracts.
8 RELATED WORK
We discuss the related literature on automated program repair, smart contract analysis, and gas
usage calculation of smart contracts.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 29
9 DISCUSSION
In this paper, we have presented the first work on automatically repairing smart contracts. Our
repair method is gas-aware. The repair algorithm is search-based, and it breaks up the huge search
space of candidate patches down into smaller mutually-exclusive spaces that can be processed
1 https://github.com/federicobond/eth-mutants
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
30 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
independently. The repair technique considers gas usage of vulnerable contracts when generating
patches for detected vulnerabilities. Our experiments demonstrated that our method can handle
real-world contracts and generate repairs in a short time (less than 1 hour) while taking into
consideration the gas consumption of the generated repairs.
Since the owners of smart contracts are unknown, we could not reach out to them in advance,
prior to publication. Nevertheless, we hope that our work will spur greater interest in automatically
fixing smart contracts via a variety testing, analysis, validation and synthesis methods. We have
made our smart contract repair tool and dataset available in GitHub from the following site.
https://SCRepair-APR.github.io
ACKNOWLEDGMENTS
This work was partially supported by the National Satellite of Excellence in Trustworthy Software
Systems, funded by National Research Foundation (NRF) Singapore under National Cybersecurity
R&D (NCR) programme, and by a Singapore Ministry of Education (MOE) Academic Research Fund
(AcRF) Tier 1 grant (17-C220-SMU-008).
REFERENCES
[1] Sidney Amani, Myriam Bégel, Maksym Bortin, and Mark Staples. 2018. Towards Verifying Ethereum Smart Contract
Bytecode in Isabelle/HOL. In Proceedings of the 7th ACM SIGPLAN International Conference on Certified Programs and
Proofs (CPP 2018). 66–77.
[2] Andrea Arcuri and Lionel Briand. 2011. A Practical Guide for Using Statistical Tests to Assess Randomized Algorithms
in Software Engineering. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). ACM,
New York, NY, USA, 1–10. https://doi.org/10.1145/1985793.1985795
[3] Nicola Atzei, Massimo Bartoletti, and Tiziana Cimoli. 2017. A Survey of Attacks on Ethereum Smart Contracts SoK. In
Proceedings of the 6th International Conference on Principles of Security and Trust - Volume 10204. 164–186.
[4] Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cédric Fournet, Anitha Gollamudi, Georges Gonthier, Nadim
Kobeissi, Natalia Kulatova, Aseem Rastogi, Thomas Sibut-Pinote, Nikhil Swamy, and Santiago Zanella-Béguelin.
2016. Formal Verification of Smart Contracts: Short Paper. In Proceedings of the 2016 ACM Workshop on Programming
Languages and Analysis for Security (PLAS ’16). 91–96.
[5] Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and Tanaka Meyarivan. 2000. A fast elitist non-dominated sorting
genetic algorithm for multi-objective optimization: NSGA-II. In International conference on parallel problem solving
from nature. Springer, 849–858.
[6] Kevin Delmolino, Mitchell Arnett, Ahmed Kosba, Andrew Miller, and Elaine Shi. 2016. Step by step towards creating a
safe smart contract: Lessons and insights from a cryptocurrency lab. In Financial Cryptography and Data Security -
International Workshops, FC 2016, BITCOIN, VOTING, and WAHC, Revised Selected Papers. 79–94.
[7] Ardit Dika. 2017. Ethereum Smart Contracts: Security Vulnerabilities and Security Tools. Master’s thesis. Norwegian
University of Science and Technology, Department of Computer Science.
[8] Josselin Feist, Gustavo Grieco, and Alex Groce. 2019. Slither: a static analysis framework for smart contracts. In 2019
IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB). IEEE,
8–15.
[9] Ying Fu, Meng Ren, Fuchen Ma, Heyuan Shi, Xin Yang, Yu Jiang, Huizhong Li, and Xiang Shi. 2019. EVMFuzzer:
Detect EVM Vulnerabilities via Fuzz Testing. In Proceedings of the 2019 27th ACM Joint Meeting on European Software
Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). ACM, New York,
NY, USA, 1110–1114. https://doi.org/10.1145/3338906.3341175
[10] Peter Gammie and Ron van der Meyden. 2004. MCK: Model Checking the Logic of Knowledge. In Computer Aided
Verification, 16th International Conference, CAV.
[11] Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. Communications of The
ACM 62, 12 (2019).
[12] Ilya Grishchenko, Matteo Maffei, and Clara Schneidewind. 2018. A Semantic Framework for the Security Analysis of
Ethereum Smart Contracts. In Principles of Security and Trust - 7th International Conference, POST. 243–269.
[13] Alex Groce, Josie Holmes, Darko Marinov, August Shi, and Lingming Zhang. 2018. An extensible, regular-expression-
based tool for multi-language mutant generation. In Proceedings of the 40th International Conference on Software
Engineering (ICSE). 25–28.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
Smart Contract Repair 31
[14] Shelly Grossman, Ittai Abraham, Guy Golan-Gueta, Yan Michalevsky, Noam Rinetzky, Mooly Sagiv, and Yoni Zohar.
2017. Online Detection of Effectively Callback Free Objects with Applications to Smart Contracts. Proc. ACM Program.
Lang. 2, POPL (2017), 48:1–48:28.
[15] Joran J. Honig, Maarten H. Everts, and Marieke Huisman. 2019. Practical Mutation Testing for Smart Contracts. In
Data Privacy Management, Cryptocurrencies and Blockchain Technology - ESORICS International Workshop. 289–303.
[16] Bo Jiang, Ye Liu, and W. K. Chan. 2018. ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection. In
Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). 259–269.
[17] René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled
testing studies for Java programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis.
ACM, 437–440.
[18] Sukrit Kalra, Seep Goel, Mohan Dhawan, and Subodh Sharma. 2018. ZEUS: Analyzing Safety of Smart Contracts. In
25th Annual Network and Distributed System Security Symposium, NDSS.
[19] Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. 2017. JFIX: Semantics-based Repair of
Java Programs via Symbolic PathFinder. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software
Testing and Analysis. 376–379.
[20] Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated
Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proceedings of the 34th International Conference on Software
Engineering (ICSE).
[21] Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A Generic Method for
Automatic Software Repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54–72.
[22] Bin Liu, Xiao Liang Yu, Shiping Chen, Xiwei Xu, and Liming Zhu. 2017. Blockchain based data integrity service
framework for IoT data. In 2017 IEEE International Conference on Web Services (ICWS). IEEE, 468–475.
[23] Fan Long and Martin Rinard. 2015. Staged Program Repair with Condition Synthesis. In Proceedings of the 2015 10th
Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). 166–178.
[24] Loi Luu, Duc-Hiep Chu, Hrishi Olickel, Prateek Saxena, and Aquinas Hobor. 2016. Making Smart Contracts Smarter.
In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 254–269.
[25] Matteo Marescotti, Martin Blicha, Antti E. J. Hyvärinen, Sepideh Asadi, and Natasha Sharygina. 2018. Computing
Exact Worst-Case Gas Consumption for Smart Contracts. In Leveraging Applications of Formal Methods, Verification
and Validation. 450–465.
[26] Matias Martinez and Martin Monperrus. 2015. Mining Software Repair Models for Reasoning on the Search Space of
Automated Program Fixing. Empirical Softw. Engg. 20, 1 (2015), 176–205.
[27] Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. DirectFix: Looking for Simple Program Repairs. In
Proceedings of the 37th International Conference on Software Engineering (ICSE ’15). 448–458.
[28] Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable Multiline Program Patch Synthesis
via Symbolic Analysis. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). 691–701.
[29] Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. SemFix: Program Repair via
Semantic Analysis. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). 772–781.
[30] A. Nistor, P. Chang, C. Radoi, and S. Lu. 2015. CARAMEL: Detecting and Fixing Performance Problems That Have
Non-Intrusive Fixes. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 902–912.
https://doi.org/10.1109/ICSE.2015.100
[31] A. Jefferson Offutt and Ronald H. Untch. 2001. Mutation Testing for the New Century. Chapter Mutation 2000: Uniting
the Orthogonal, 34–44.
[32] Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Mutation Testing Advances:
An Analysis and Survey. Advances in Computers 112 (2019), 275–378.
[33] Y. Qi, X. Mao, Y. Lei, Z. Dai, and C. Wang. 2014. The strength of random search on automated program repair. In
ACM/IEEE International Conference on Software Engineering.
[34] Christopher Signer. 2018. Gas Cost Analysis for Ethereum Smart Contracts. Master’s thesis. ETH Zurich, Department of
Computer Science.
[35] Sergei Tikhomirov, Ekaterina Voskresenskaya, Ivan Ivanitskiy, Ramil Takhaviev, Evgeny Marchenko, and Yaroslav
Alexandrov. 2018. SmartCheck: Static Analysis of Ethereum Smart Contracts. In Proceedings of the 1st International
Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB ’18). 9–16.
[36] Petar Tsankov, Andrei Dan, Dana Drachsler-Cohen, Arthur Gervais, Florian Bünzli, and Martin Vechev. 2018. Securify:
Practical Security Analysis of Smart Contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and
Communications Security.
[37] Ron van der Meyden. 2019. On the specification and verification of atomic swap smart contracts. In IEEE International
Conference on Blockchain and Cryptocurrency. 176–179.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.
32 Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury
[38] Gavin Wood. 2019. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper 151
(2019), 1–32.
[39] Haoran Wu, Xingya Wang, Jiehui Xu, Weiqin Zou, Lingming Zhang, and Zhenyu Chen. 2019. Mutation Testing for
Ethereum Smart Contract. arXiv:cs.SE/1908.03707
[40] Jifeng Xuan, Matias Martinez, Favio DeMarco, Maxime Clement, Sebastian Lamelas Marcote, Thomas Durieux, Daniel
Le Berre, and Martin Monperrus. 2017. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs.
IEEE Trans. Softw. Eng. (Jan. 2017), 34–55.
[41] Jooyong Yi, Umair Z. Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roychoudhury. 2017. A Feasibility Study of
Using Automated Program Repair for Introductory Programming Assignments. In Proceedings of the 2017 11th Joint
Meeting on Foundations of Software Engineering (ESEC/FSE 2017). 740–751.
[42] Xiao Liang Yu, Xiwei Xu, and Bin Liu. 2017. EthDrive: A Peer-to-Peer Data Storage with Provenance. In Proceedings
of the Forum and Doctoral Consortium Papers Presented at the 29th International Conference on Advanced Information
Systems Engineering, CAiSE 2017, Essen, Germany, June 12-16, 2017. 25–32.
ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article . Publication date: May 2020.