Elliptic Curve Hierarchical Deterministic Private Key Sequences: Bitcoin Standards and Best Practices

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

P OLITECNICO DI M ILANO

M ASTER T HESIS

Elliptic Curve Hierarchical Deterministic


Private Key Sequences: Bitcoin Standards
and Best Practices

Author: Supervisors:
Daniele F ORNARO Prof. Daniele M ARAZZINA
Prof. Ferdinando M. A METRANO

A thesis submitted in fulfillment of the requirements


for the degree of Mathematical Engeneering
in the

Industrial and Information Engineering


Department of Mathematics

19 April 2018
i

“The Times 03/Jan/2009 Chancellor on brink of second bailout for banks”

Bitcoin Blockchain
ii

POLITECNICO DI MILANO

Abstract
Industrial and Information Engineering
Department of Mathematics

Mathematical Engeneering

Elliptic Curve Hierarchical Deterministic Private Key Sequences: Bitcoin


Standards and Best Practices
by Daniele F ORNARO

The cryptography used by most of the cryptocurrencies is mainly based on the


private-public key pair. The method used to generate private keys is therefore fun-
damental: it must be efficient, secure and suitable for the situation. Among alter-
native methods, the Hierarchical Deterministic Wallet has emerged as standard, de-
scribed in the Bitcoin Improvement Proposal #32 (BIP32). Starting from a random
number, called SEED, picked up in a sufficiently large range, it is possible to gener-
ate numerous private keys in a hierarchical and deterministic way through particu-
lar HASH functions and thanks to the elliptic curve properties. Several wallets also
use a special algorithm to store the seed and to be able to back it up in a readable
form, through the use of a mnemonic phrase, words selected from a specific dictio-
nary. Consensus on a single standard for the mnemonic phrase as not been reached
among all major players in the industry yet. This work aims to clarify the various
techniques used for the derivation of the keys, with particular attention to the HD
wallet. It will also be analyzed the two principal way of encoding the seed, the one
described into BIP39 as opposed to the proposal of Electrum, one of the main Bitcoin
Wallet, highlighting their respective advantages and disadvantages.
iii

Acknowledgements
First of all, I would like to give my sincere gratitude to professor Ferdinando Ame-
trano of the Politecnico di Milano, who transmitted to me the passion of the subject
and who has dedicated a large part of his time in order to bring me on the right path.
I would like to thank professor Daniele Marazzina of Politecnico di Milano for his
supervision to this work and for his many tips. Then I would like to give my thanks
to all the friends and colleagues of Deloitte Blockchain Lab Italy for the stimulating
and innovative environment in which I was able to write this thesis; in particular
Paolo Mazzocchi, Stefano Leone, Raffaele Nicodemo and Calogero Mandracchia,
for support and suggestions. Furthermore, I would thank Leonardo Comandini for
the mutual support and for his help on this work.

Finally, I must express my very profound gratitude to my family and to my friends


for providing me with unfailing support and continuous encouragement through-
out my years of study and through the process of researching and writing this thesis.
This accomplishment would not have been possible without them.

Thank you.
iv

Contents

Abstract ii

Acknowledgements iii

Contents iv

Introduction 1

1 Cryptography 3
1.1 HASH function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Other functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Elliptic Curve over F p . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Point addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Scalar multiplication . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.4 Discrete Logarithm Problem . . . . . . . . . . . . . . . . . . . . 8
1.2.5 Group order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.6 Bitcoin private-public key cryptography . . . . . . . . . . . . . 9

2 Wallet 11
2.1 Nondeterministic (random) Wallet . . . . . . . . . . . . . . . . . . . . . . 11
Pros and Cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Deterministic Wallets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Deterministic Wallet type-1 . . . . . . . . . . . . . . . . . . . . . 12
Pros and Cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Deterministic Wallet type-2 . . . . . . . . . . . . . . . . . . . . . 13
Pros and Cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.3 Deterministic Wallet type-3 . . . . . . . . . . . . . . . . . . . . . 15

3 Hierarchical Deterministic Wallet 17


3.1 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Extended Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 From SEED to Master Private Key . . . . . . . . . . . . . . . . . . . . . 18
3.3 Child Key derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.1 Normal derivation . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.2 Hardened derivation . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Special derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.1 Public derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.2 Weakness of Normal Derivation . . . . . . . . . . . . . . . . . . 25
3.5 Advantages and disadvantages . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.1 When to use Normal Derivation? . . . . . . . . . . . . . . . . . . 26
3.5.2 When to use Hardened Derivation? . . . . . . . . . . . . . . . . 26
v

4 Mnemonic phrase 27
4.1 BIP 39 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.1 Mnemonic Generation . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2 From Mnemonic to Seed . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Electrum Mnemonic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Mnemonic Generation . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 From Mnemonic to Seed . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 How to use a HD Wallet 34


5.1 Derivation path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 BIP 43 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.1 Multi-coin wallet BIP 44 . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.2 SegWit addresses BIP 49 . . . . . . . . . . . . . . . . . . . . . . . 36

Conclusion 37

A Bitcoin keys representation and addresses 38


A.1 Public Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.1.1 Uncompressed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.1.2 Compressed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.2 Private Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.3 Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

B Python code 41
B.1 Deterministic Wallet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
B.1.1 Type-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
B.1.2 Type-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
B.2 Hierarchical Deterministic Wallet - BIP 32 . . . . . . . . . . . . . . . . . 42
B.3 Mnemonic phrase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
B.3.1 BIP 39 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
B.3.2 Electrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1

Introduction

The cryptography used by most of the cryptocurrencies is mainly based on the


private-public key pair. It is therefore fundamental the method used to generate
private keys, which must be efficient, secure and suitable for the situation.

This thesis claims to analyze in detail the principal techniques used for the deriva-
tion of the public-private keys pair in the Bitcoin framework.

The first chapter will give an explanation of the basic concepts needed for this work.
Two fundamental elements are used: hash functions and the Elliptic Curve. The
former are irreversible algorithms; although the image of such functions is a limited
set, it doesn’t exist a computationally feasible procedure for the inverse. The only
way to compute the inverse of a hash function is by trying and this will take to much
time, due to high computational costs, making the operation infeasible. The latter,
the Elliptic Curve, is defined by an equation over a specific field and it is a plane
algebraic curve in this contest. In this type of cryptography a point on this curve is
called public key and the integer number, used to obtain the point, is called private
key. In this chapter, all the most important properties of this curve will be explained.

In the second chapter will be analyzed in detail the principal techniques used in
order to generate private and public key pairs. In particular, we will see four type of
derivations. The first and naive method consists of randomly extracting a number
and considering it as a private key, from which the corresponding public key will be
derived each time a new pair is requested. The other three methods are the so-called:
deterministic. This is due to the fact that in order to generate a bunch of keys, it is
necessary one single datum, called seed. These three methods are in an increasing
scale of difficulty and complexity and we will see their principal advantages and
disadvantages. The last type of derivation is the most used because it derives the
keys in a hierarchical way. This method will be seen in the next chapter.

The third chapter will be focused on the analysis of the Hierarchical Determinis-
tic Wallet, the most sophisticated type of derivation used up to now. It is defined by
BIP32 [1] and it is used by most of the Bitcoin wallets. This derivation is determinis-
tic, a seed is needed, and it is hierarchical. From the seed it is possible to derive an
arbitrary large number of keys and all of these keys can derive new keys in the same
way and so on. This procedure can be iterated as long as desired, leaving the user a
wide choice in the derivation of these numbers.

In the fourth chapter there will be the analysis of two possible ways to store the
seed: one was proposed by BIP39 [2] and it is the most used in the Bitcoin frame-
work; the other is the one used by Electrum [3], one of the principal Bitcoin wallet.
Both of them used a mnemonic phrase, a sentence composed of a certain number of
words from which it is possible to derive the seed. Nevertheless, they have some
Contents 2

differences and we will analyze them. The principal difference stands in the differ-
ent way used to verify the correctness of the mnemonic phrase. With BIP32 it is only
possible to check if the phrase is plausible, but with Electrum it is possible to assign
a version to the seed that will be generated by the mnemonic phrase, giving a pur-
pose to the keys derived from it.

The fifth chapter will be focused on some possible applications of the Hierarchi-
cal Deterministic Wallet proposed by BIP32. In particular we will see the standard
way to write a path, in order to easily understand how to generate particular keys
from the seed. We will also analyze one of the standards used by most of the Bitcoin
wallet: BIP43 [4]. The purpose of this BIP is to give a particular meaning to some
branches of the tree. We will therefore describe two important applications: multi-
coin wallet BIP44 [5] and SegWit addresses BIP49 [6].

In appendix A there will be a summary of methods used for the representation of


private and public keys in the Bitcoin framework and respective addresses.

Along with this writing, we attach the GitHub link to the repository of Python code
for the course of professor F. Ametrano. In this repository I have replicated in Python
all the procedures and methods presented and described in this thesis, neglecting all
those parts that are not inherent to it and writing the important ones in a synthetic
and essential way. The most relevant parts of those scripts will be reported in ap-
pendix B.

https://github.com/fametrano/BitcoinBlockchainTechnology
3

Chapter 1

Cryptography

In order to have a clear understanding of this thesis, it is necessary to know the basic
concepts of:

X HASH function.
X Elliptic Curve.
Only these two elements together can describe most of the cryptography behind
Bitcoin.

1.1 HASH function


In general, a hash function is a mathematical process that takes input data of any
size, performs an operation on it, and returns output data of a fixed size.

The input data is called message and the output data is called hash value.

A good and secure hash function must have at least these six properties:

(i) It is deterministic: if the message remains unchanged, the hash value is the
same.

(ii) It is quick: it should not take too much time to compute the hash value from
the message.

(iii) It is a one-way function: knowing the hash value, it is infeasible to find a


message. The only way to find the message must be to try randomly all the
possible combination.

(iv) It is collision free: it is infeasible to find two messages with the same hash
value, even if it is theoretically possible.

(v) It has the avalanche effect: a very small change in the input message, even
flipping a single bit, produces a completely different hash value.

(vi) It has fixed size output and could have input messages of any size.

In this thesis, we will see hash functions as black-boxes, with all the proprieties de-
scribed above.

There are various kinds, but for our purpose, the main difference lies in the number
of bits of the hash value. Among all the possible hash functions, in Bitcoin cryptog-
raphy three functions are used:
Chapter 1. Cryptography 4

• SHA256: Secure Hash Algorithm 256

– developed by the National Institute of Standards and Technology (NIST)


as a U.S. Federal Information Processing Standard (FIPS).
– output size: 256 bits.

• SHA512: Secure Hash Algorithm 512

– developed by the National Institute of Standards and Technology (NIST)


as a U.S. Federal Information Processing Standard (FIPS).
– output size: 512 bits.

• RIPEDM160: RACE Integrity Primitives Evaluation Message Digest 160

– developed by Hans Dobbertin, Antoon Bosselaers and Bart Preneel at the


COSIC research group at the Katholieke Universiteit Leuven.
– output size: 160 bits.

One important hash function used in the Bitcoin cryptography is the so-called HASH160.
It is simply the concatenation of SHA256 and RIPEDM160:

H ASH160 ( msg ) = RIPEDM160 ( SH A256 ( msg )).

From the moment that the last operation made to compute the HASH160 function
was the RIPEDM160 function, the output size is 160 bits.

1.1.1 Other functions


There are two other functions that will be heavily used in this thesis:

• HMAC: Hash-based Message Authentication Code,

• PBKDF2: Password-Based Key Derivation Function 2.

HMAC is a function that makes some computation, involving also a hash function.
This algorithm provides better immunity against length extension attacks, namely
attack in which the length of the input message is known and all the possible
combinations of the input are tried.

It receives 3 inputs:

Hash function: a hash function with the properties described above.


Key: a sequence of bytes.
Message: a sequence of bytes.
This function computes the following operations:

HMAC ( H, k, m) = H ( opad(k ) || H ( ipad(k ) || m )),

where H is the hash function, k is the key, m is the message, opad(•) and ipad(•) are
two padding function, applied to the key k and || is a symbol that denotes concate-
nation.

PBKDF2 is an algorithm that applies a hash function to an input (message) many


Chapter 1. Cryptography 5

times. Each of these times a particular string of bytes, called salt, is inserted within
the computation of the hash. This algorithm provides more computational work
with respect to a single hash function, and so it reduces the risk of a brute force at-
tack.

It receives five inputs:

Message: a sequence of bytes.


Salt: a sequence of bytes.
Number of iterations: the number of iteration to be computed.
Digest-module: a hash function with the properties described above.
Mac-module: a message authentication code module (e.g. HMAC).
Having a salt reduces the ability to use rainbow tables for attacks, namely tables
with precomputed hash value. It is recommended to use at least 64 bits for the salt.

1.2 Elliptic Curve over F p


Elliptic curve [7; 8; 9; 10] is a plane algebraic curve defined by an equation, over a
specific field. In cryptography the field is often finite.

A point Q, which coordinates are x and y ∈ N, belong to an Elliptic Curve if and


only if Q satisfies the following equation:

y2 = x3 + ax + b over F p , (1.1)

where F p is the finite field defined over the set of integers modulo p and a and b are
the coefficients of the curve.

We can rewrite the Equation (1.1) in the following way:

y2 = x3 + ax + b mod p. (1.2)
Figure 1.1 shows some examples of Elliptic Curve over F p with a = −7 and b = 10

1.2.1 Symmetry
The elliptic curve has an important property: the line y = p/2 is an axis of symmetry
for the curve.

This can be shown, by proving that the point P( x, y) belongs to the Elliptic Curve
(EC) if and only if the point Q( x, p − y) belongs to the curve too:

P( x, y) ∈ EC ⇐⇒ Q( x, p − y) ∈ EC.

Proof :

First analyze the implication in the right direction: ( =⇒ ).


Chapter 1. Cryptography 6

F IGURE 1.1: Points on the Elliptic Curve y2 = x3 − 7x + 10 mod p,


with p = 19, 97, 127, 487

From Equation (1.2) and from the hypothesis we have that:

P( x, y) ∈ EC =⇒ y2 = x3 + ax + b mod p,

but we know also that:

Q( x, p − y) ∈ EC ⇐⇒ ( p − y)2 = x3 + ax + b mod p.

From the moment that the right hand side of both the equations are equal, we only
need to prove that:
( p − y)2 = y2 mod p.
This is true, indeed:

( p − y)2 = p2 − 2py + y2 mod p


2
= p · ( p − 2y) + y mod p
2
= 0+y mod p
2
=y mod p

This is due to the fact that

p·k = 0 mod p ∀k ∈ N.

The other implication ( ⇐= ) is almost the same and it follows the same logic.

c.v.d.
Chapter 1. Cryptography 7

Once shown the symmetry property, it can be useful to denote the point P( x, y) as
the opposite of Q( x, p − y):

P = − Q =⇒ P + Q = 0,

where the + is a binary operation between two points in the EC and it will be ex-
plained below. The 0 in this contest is the point at infinity.

1.2.2 Point addition


After defined a point on the elliptic curve, let’s introduce the addition between two
points on this finite field.

We need to change our definition of addition in order to make it works in F p . In


this framework, we claim that if some points are aligned over the finite field F p ,
then they have zero-sum.

So P + Q = R if and only if P, Q and − R are aligned, in the sense shown in Fig-


ure 1.2

F IGURE 1.2: Elliptic Curve y2 = x3 − 7x + 10 mod 97

After defined when points in the EC have zero-sum, it is possible to calculate the
equations for point addition:

Suppose that A and B belong to the Elliptic Curve.

A = ( x1 , y1 ) B = ( x2 , y2 ).

Let’s define A + B := ( x3 , y3 ).
Chapter 1. Cryptography 8

When x1 = x2 but y1 6= y2 , it is the case in which A and B are symmetric point


and so the sum is a particular point, called point at infinity:

A + B = 0 = (in f , in f ).

In all the other cases we have:


y2 − y1

 , if x1 6= x2 , y1 6= y2 → point addition,
 x2 − x1



s=
3x2 + a


 1 , if x1 = x2 , y1 = y2 → point doubling,


2y1

where s is a dummy variable, used to compute x3 and y3 . It can be computed in two


different way: if we are performing a "real" point addition, when A 6= B or if we are
looking for the double of a point, when A = B.

x3 = s2 − x1 − x2 mod p,
y3 = s( x1 − x3 ) − y1 mod p.

Once we have s the value x3 and y3 are obtained following this simple formula.

1.2.3 Scalar multiplication


Once defined the addition, any multiplication between a scalar and a point on the
elliptic curve can be defined as:

n · P = |P + P +
{z· · · + P}.
n times

When n is a very large number can be difficult or even infeasible to compute nP in


this way, but we can use the double and add algorithm in order to perform multiplica-
tion in O(log n) steps. Let’s see an example:

Suppose that we need to compute 53 · P where P is a point on the EC:

53 · P = 110101base 2 · P = 25 · P + 24 · P + 22 · P + P

Computing the common sub-terms only once we obtain a total of 5 doubling and
3 addition operations, much less of 52 addition operations. This algorithm is even
more efficient if the scalar is a very large number.

1.2.4 Discrete Logarithm Problem


Once we have described the multiplication between a scalar and a point, let’s see if
it is possible to make the inverse operation. Let’s suppose that:

Q = n · P,

where Q and P are points on the EC and n is a scalar number.

Let’s suppose to know Q and P. With these information it exists only one possi-
ble n ∈ N, such that n < order, where order will be defined below, and that the
equation above holds true. Even so this number n is infeasible to find for large value
of order.
Chapter 1. Cryptography 9

This is due to the fact that there is not an efficient algorithm that is able to com-
pute n given P and Q. The only way to find n is by trying. As already mentioned,
this could become infeasible if the number of value that n can assume (order) is too
large.

1.2.5 Group order


An elliptic curve defined over a finite field is a group and so it has a finite number
of points. This number is called order of the group.

If p is a very large number, it is not trivial to count all the points in that field, but
there is an algorithm that allows to calculate the order of a group in a fast and effi-
cient way, like Schoof’s algorithm.

Let’s consider a generic point on the curve G, we have:

n · G + m · G = |G + ·{z
· · + G} + |G + ·{z
· · + G} = |G + ·{z
· · + G} = (n + m) · G.
n times m times n+m times

So multiples of G are closed under addition and this is enough to prove that the set
of the multiples of G is a cyclic subgroup of the group formed by the elliptic curve.

The point G is called generator of the cyclic subgroup.

Remark The order of the subgroup generated by G is linked to the order of the elliptic curve
by Lagrange’s theorem, which states that the order of a subgroup is a divisor of the order of
the parent group.

Remark If the order of the group is a prime number, all the points belonging to the EC
generate a subgroup with the same order of the group or with order 1.

All these preliminary information are needed in order to introduce the private-public
key cryptography used by Bitcoin.

1.2.6 Bitcoin private-public key cryptography


Bitcoin [11] uses a specific Elliptic Curve defined over the finite field of the natural
numbers, where a = 0 and b = 7.

The Equation (1.1) becomes:

y2 = x 3 + 7 mod p, (1.3)

where the mod p (modulo prime number) indicates that this curve is over a finite
field of prime order p = 2256 − 232 − 29 − 28 − 27 − 26 − 24 − 1.

The order of this Elliptic Curve is a very large prime number, close to 2256 , but smaller
then p.

Let’s consider a particular point G, called generator, expressed in hexadecimal dig-


its:
Chapter 1. Cryptography 10

x=
79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798
y=
483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8

From the moment that the order of the group is a prime number, the order of any
subgroup is equal to the order of the entire group. In particular, the order of the
subgroup generated by G is equal to order.

We have now all the elements necessary to define the private and the public key.

Definition A private key is a number chosen in the range between 1 and (order − 1).

Definition A public key W is a point in the Bitcoin EC, derived from a private key k in the
following way:

W = k · G, (1.4)
where the multiplication between k and G is defined in the previous chapter.

This is a one way function: it is simple to compute the scalar multiplication, knowing
the private key, but it is infeasible to do the opposite.

Remark It is infeasible to calculate a private key knowing the public key.

The purpose of having defined the private and public keys is to use them to crypto-
graphically sign a message. It is not the scope of this thesis explain how a message
is signed, but it is at least necessary to know the principal properties of a signed
message.

Let’s suppose to have a message that is needed to be signed, in Bitcoin this mes-
sage is usually a transaction.

• A message is signed using a private key.

• Knowing the public key associated with the private key that signs the mes-
sage, it is possible to verify that the message is signed using the corresponding
private key (without knowing it).

For this reason the keys are called private and public, the former is suppose to be
kept secret because it is able to sign a message, instead the latter is suppose to be
shared, in order to let everyone else knows that who signs the message is in posses-
sion of the corresponding private key.
11

Chapter 2

Wallet

The way used to store keys is essential in private-public keys cryptography. For this
reason, different types of wallets were designed.

Definition A wallet is a software used to store keys. It is able also to sign messages with
the private key, but in this framework, we will only consider wallets as key containers.

There are different types of wallet:

• Nondeterministic (random) Wallet.

• Deterministic Wallet.

Remark Bitcoin wallets contain keys, not coins. Coins are in the Blockchain.

2.1 Nondeterministic (random) Wallet


A nondeterministic wallet is the simplest type of wallet. Each key is randomly and
independently generated.

(i) Consider a Discrete Uniform Random Variable

X ∼ U ( S ),

where S is the finite set of natural number in the range from 1 to (order − 1).

(ii) Take some realizations k1 , k2 ...k n of X using enough entropy to make these
numbers (private keys) impossible to guess.

k 1 = X ( ω1 ) , k 2 = X ( ω2 ) , ... k n = X ( ω n ).

(iii) Go back to point (i) every time new private keys are needed.

With this procedure it is impossible to compute the public key without having already
computed the private key.

Pros and Cons


Let’s focus on the good and bad aspects of this wallet.
Chapter 2. Wallet 12

Random Wallet

Pros Cons

• Easy to implement • Difficult to find real new en-


tropy for every new private
key.

• Every time new private keys


are needed, a new back up is
needed.

• Difficult to store or back up in


a non digital way. Awkward to
write it down all yours keys
on a paper.

The use of random wallet is strongly discouraged for anything other than simple tests.

2.2 Deterministic Wallets


A deterministic wallet is a more sophisticated software, in which every key is gen-
erated from a common "seed", a natural number. This means that knowing the seed
leads to know also all the keys in the wallet.

There are different types of deterministic wallets, in this text we will analyze three
main types:

• Deterministic Wallet type-1.

• Deterministic Wallet type-2.

• Deterministic Wallet type-3.

These wallets are in increasing order of complexity.

2.2.1 Deterministic Wallet type-1


The Deterministic Wallet type-1 is one of the simplest wallets among the determin-
istic ones. Each key is generated adding a number in a sequential order to the seed
and then computing a hash function, the SHA256.

Let’s see how to generate the nth private key:

(i) Generate a seed (only once), a random number from a Discrete Uniform Random
Variable
seed = X (ω ), X ∼ U ( S ),
where S is the finite set of natural number in the range from 1 to (order − 1).
Chapter 2. Wallet 13

(ii) Consider the numbers seed and n as strings and concatenate n to seed, obtaining
a value:
value = seed|n.

(iii) Apply the SHA256 function to value and obtain the nth private key.

(iv) Go back to point (ii) every time new private keys are needed with n = n + 1.

With this procedure it is impossible to compute the public key without having already
computed the private key.

Pros and Cons


Let’s focus on the good and bad aspects of this wallet.

Deterministic Wallet type-1

Pros Cons

• In order to make a back up of • Every time new public keys


the entire wallet, it is needed are needed, you need to use
to store the seed only. All pri- the seed, compute new private
vate keys can be derived from keys and then derive the pub-
it. lic ones. This could compro-
mise the entire wallet, if the
• A single back up is needed. seed is used in a non safe en-
• The seed can be stored easily vironment.
also in a non digital way, in a • There is only a key sequence.
paper for example. No way to distinguish the
"purpose" of each private key.

The use of this type of wallet is not recommended for everyday use, but it could be
used to store Bitcoin in a safe place: cold wallet.

2.2.2 Deterministic Wallet type-2


The Deterministic Wallet type-2 is more sophisticated. Each private key is generated
in such a way that it is possible to compute the respective public key without know-
ing the private.

First let’s introduce the necessary ingredients:

 Master private key (mp): a random number, generated from a Discrete Uniform
Random Variable
mp = X (ω ), X ∼ U ( S ),
where S is the finite set of natural number in the range from 1 to (order − 1).
The master private key must be kept secret.
Chapter 2. Wallet 14

 Master public key (MP): a point on the EC, obtained from the mp:

MP = mp · G,

where G is the generator.


This point can be consider non-secret.

 Public random number (r): a random number, generated from a Discrete Uni-
form Random Variable
r = X ( ω ), X ∼ U ( S ),
This number can be consider non-secret.

Let’s see how to generate the nth private key: pn .

(i) Compute the SHA256 function to the concatenation of r with n, considered as


string:
hn|r = SH A256(n|r ),
where hn|r can be consider non-secret, from the moment that it is derived from
non secret information.

(ii) Compute the nth private key adding mp to hn|r :

pn = mp + hn|r mod (order ).

In order to obtain the corresponding public key Pn , it is possible to compute the stan-
dard multiplication:
Pn = pn · G.
It is also possible to compute Pn without knowing pn , using only non-secret infor-
mation: hn|r and MP.

(i) Compute V:
V = hn|r · G,
where V can be see as the public key of hn|r and can be consider non-secret.

(ii) Add MP to V:
Pn = MP + V,
where the sum in this contest is the one defined between two point in the EC.

It is easy to prove that Pn can be computed in these two way:

Pn = pn · G
= (mp + hn|r ) · G
= (mp · G ) + (hn|r · G )
= MP + V.

Pros and Cons


Let’s focus on the good and bad aspects of this wallet.
Chapter 2. Wallet 15

Deterministic Wallet type-2

Pros Cons

• In order to make a back up of • There is only a key sequence.


the entire wallet, it is needed No way to distinguish the
to store the master private key "purpose" of each private key.
and the random number r.
All private keys can be derived
from them.

• A single back up is needed.

• The master private key can be


stored easily also in a non dig-
ital way, in a paper for exam-
ple.

• It is possible to derive a new


public key using only non-
secret information, with the
procedure above.

The type-2 deterministic wallet is an improvement of the type-1 because it has the same
benefits (except for the need to back up two number instead of only one), but with a
great advantage: it is possible to generate new addresses, obtained from the public
keys, also in a non safe environment, having only r and MP.

A thief can only steal your privacy, if he steals only MP and r. In fact, he is able
to see which messages (transactions) have you signed, without the possibility to
sign new ones and spend your coins.

2.2.3 Deterministic Wallet type-3


Deterministic Wallet type-3 is the most elaborate among the ones considered. Start-
ing from a seed it is possible to obtain different keys in a hierarchical way, with a
structure similar to a tree.

Let’s see roughly how this wallet works:

(i) Generate a seed, a random number from a Discrete Uniform Random Variable,
unique for each wallet.

seed = X (ω ), X ∼ U ( S ),

where S is a finite set of natural number.

(ii) Generate a master private key from the seed, using a stretching function: PBKDF2.

(iii) From this master private key it is possible to generate 232 private key using an
irreversible hash function: SHA512
Chapter 2. Wallet 16

(iv) All of this private key "children" can derive 232 private key and all of these "grand-
children" can derive as many.

This procedure can produce a huge number of keys. They seem independent from
an outside point of view: it is impossible to guess that two keys are derived from the
same seed.

This particular type of Wallet is commonly known as Hierarchical Deterministic


Wallet [1], one of the most used and widespread.

In the next chapter, we will see in detail how it works.


17

Chapter 3

Hierarchical Deterministic Wallet

The Hierarchical Deterministic Wallet is defined by BIP32, bitcoin improvement pro-


posal number 32 [1] and in this chapter we will see in detail how it works.

3.1 Elements
First, let us focus on the main elements of the Wallet:

 Seed.
 Extended keys.

3.1.1 Seed
The entire Wallet is based on a seed.

It is a number taken from a Discrete Uniform Random Variable

seed = X (ω ), X ∼ U ( S ),

where S is the finite set of natural numbers in the range from 1 to an arbitrary value.
Obviously the greater the set from which the number can be extracted, the better it
is for the security of the seed itself.

This is an example of seed expressed in hexadecimal format:


seed=fffcf9f6f3f0edeae7e4e1dedbd8d5d2cfccc9c6c3c0bdbab7b4b1aeaba8a5a29f9c999
693908d8a8784817e7b7875726f6c696663605d5a5754514e4b484542

3.1.2 Extended Key


An Extended Key is a sequence of bytes, encoded in base 58. It contains all the infor-
mation necessary to derive the keys. When the derivation is made for the first time
from the seed, the extended key is called master key.

Once it is decoded we will obtain exactly 78 bytes, with a specific meaning and
order:

~ 4 bytes are used to specified the version.


~ 1 byte is used to specified the depth in the hierarchical tree: the (master) ex-
tended key derived directly from the seed has depth = 0, its first children have
depth = 1, grandchildren have depth = 2 and so on.
Chapter 3. Hierarchical Deterministic Wallet 18

~ 4 bytes are used for the fingerprint. It is a unique value that identify the parent.
Compute the HASH160 function on the "parent" public key in a compressed
form and then take the first 4 bytes:

f ingerprint = H ASH160(parent public key)[0 : 4],

where [0 : 4] is a Python notation.


For the master key the fingerprint is formed by 4 zeros bytes: f ingerprint =
0000000000.

~ 4 bytes are used to specified the index of the child.


For the master key the index is formed by 4 zeros bytes: index = 0000000000.

~ 32 bytes are used for the chain code. The chain code is used in order to intro-
duce entropy in the children generation. We will see below how it works.

~ 33 bytes are used for the key. It can be private or public.


The public key is expressed in compress form, so the first byte is always 02 or
03. The first byte of the private key is always 00 in order to distinguish it from
the public one.

An extended key is called Extended Private Key if the lasts 33 bytes are used to
specify the private key; it is called Extended Public Key if they are used to specify
the public key.

For the Bitcoin mainnet it is used for the version:


• 0x0488ADE4 for an extended private key,

• 0x0488B21E for an extended public key.


When this bytes are encoded in base 58, they returns xprv and xpub respectively.
Remark Obviously it is possible to calculate the extended public key starting from the ex-
tended private key, but it is infeasible to do the opposite. The only difference between the two
extended keys are the key bytes and the version bytes, all the others elements remain the
same.

3.2 From SEED to Master Private Key


In this section we will see in detail how it is possible to obtain a master private key
starting from a seed.

First of all we need to convert the seed into a string of bytes, where the most sig-
nificant bytes come first (big endian). In order to do so, we need to know the length
of the string of bytes.

Let’s see a practical example:

byte_string1 = 00 00 00 07,
byte_string2 = 00 00 07,
byte_string3 = 00 07,
byte_string4 = 07.
Chapter 3. Hierarchical Deterministic Wallet 19

These 4 byte strings are obtained from the same seed: seed = 7 and the only differ-
ence is the length of the string.

Remark Different length of the string produces a different master private key, even if the
seed is the same number.

In Python:
1 b y t e _ s t r i n g = seed . t o _ b y t e s ( seed_bytes , ’ big ’ )

where seed is an integer number, seed_bytes is the number of bytes that the byte_string
should have.

It is essential to specify the length of the byte string, otherwise, there will be ob-
tained different wallets.

Once we obtain a string of bytes, we will compute the HMAC algorithm. The hash
function used for HMAC is the SHA512 and the key is a particular string of bytes:
b"Bitcoin seed". In Python the implementation is the following:
1 from h a s h l i b import sha512
2 from hmac import HMAC
3
4 hashValue = HMAC( b " B i t c o i n seed " , b y t e _ s t r i n g , sha512 ) . d i g e s t ( )

where .digest() is used in order to return a string of bytes.

Now we have obtained a hashValue of 512 bits, so 64 bytes. Consider the firsts 32
bytes as the master private key and the next 32 bytes as the master chain code. A
Python implementation is the following:
1 p r i v a t e _ k e y _ b y t e s = hashValue [ 0 : 3 2 ]
2 c h a i n _ c o d e _ b y t e s = hashValue [ 3 2 : 6 4 ]

Now we have two-byte strings, one for the master private key and the other for the
master chain code.

It is important to remember that a private key must be in the range between 1 and
order, so the byte string for the private key should be converted in int and then take
the mod order. In Python we have:
1 p r i v a t e _ k e y = i n t ( p r i v a t e _ k e y _ b y t e s . hex ( ) , 1 6 ) % order

Finally, we will concatenate all the information obtained in order to form a Master
Extended Private Key (in bytes format):

• vbytes = b0 \ x04\ x88\ xAD \ xE40 ,

• depth = b0 \ x000 ,

• fingerprint = b0 \ x00\ x00\ x00\ x000 ,

• index = b0 \ x00\ x00\ x00\ x000 ,

• chain code is the one previously computed,

• private key = b0 \ x000 + private key in bytes format, previously computed.


Chapter 3. Hierarchical Deterministic Wallet 20

Then the Master Extended Private Key is formed by concatenation:


1 xkey = vbytes + depth + f i n g e r p r i n t + index + chain_code + key

In order to make it readable, a base58 encoding is performed.

This is an example of Master Extended Private Key:


xprv9s21ZrQH143K3wEaiSJZ8jYCuZF1oJoXHiwFcx2WwXqQHD4ZLdyEAFZ22M4
BmQT82HRbWssLArj53YDQTj6vSN4iH6nTiSQ61C5CckxUtDq.

Remark The SHA512 is an irreversible function, so it is infeasible to obtain the seed, know-
ing the master key. (It is also useless because with the master key you can derive all the keys
in the wallet).

Graphically these operations can be shown in Figure 3.1.

F IGURE 3.1: From seed to master private key

3.3 Child Key derivation


In this section, we will see how it is possible to derive different child key from a
single extended private key. There are two methods:

• Normal.

• Hardened.

Both methods have some advantage and disadvantage that we will discuss later. For
every situation, it is essential to use the method that best fit.

For both the method the derivation starts from an extended private key. From this
key some essential information are necessary:

? Chain code.
? Private key.
Chapter 3. Hierarchical Deterministic Wallet 21

It is also required a number, used in order to specify the index of the child. This
number should be in the range between 0 and 4294967295. This is due to the fact
that in any extended key there are 4 bytes used to specify the index of the child:

max index = ( FF FF FF FF )base 16 = 232 − 1 = 4294967295.

In fact, it is possible to generate even a greater number of children from the same
parent, but it would not be possible to write the corresponding extended key in the
format described above.

3.3.1 Normal derivation


First, we need to compute the Parent Public Key P. This is obtained from the usual
scalar multiplication between a point on the EC (the Generator G) and the Parent
Private Key p:
P = p · G.
Then we consider only the compress form of P and convert this value into a byte
string, obtaining 33 bytes.

After that we concatenate this 33 byte string to the 4 byte string representing the
index number:
msg = compressed public key | index,
where msg is now a string of 37 bytes.

Finally, we apply the HMAC algorithm with the following inputs:

Hash function: SHA512.


Key: chain code.
Message: msg.

The Python code is the following:


1 from hmac import HMAC
2 from h a s h l i b import sha512
3
4 msg= p a r e n t _ p u b l i c _ k e y + index
5 hashValue = HMAC( parent_chain_code , msg , sha512 ) . d i g e s t ( )

The result is a string of 64 bytes: hashValue.

Now we split this string of bytes in two: the last 32 are the child chain code. Then we
take the first 32 bytes, convert them into an integer number and sum it to the parent
private key (mod order), obtaining the child private key.

This is the Python code:


1 c h i l d _ c h a i n _ c o d e = hashValue [ 3 2 : ]
2 q = i n t ( hashValue [ : 3 2 ] . hex ( ) , 1 6 )
3 c h i l d _ p r i v a t e _ k e y = ( q + p a r e n t _ p r i v a t e _ k e y ) % order

Graphically these operations can be shown in Figure 3.2.


Chapter 3. Hierarchical Deterministic Wallet 22

F IGURE 3.2: Normal Derivation

3.3.2 Hardened derivation


This method is similar to the previous one, the only difference is that as input of the
hash function the private key is used instead of the public one.

First, we concatenate the 33 bytes of parent private key, considering also the 00 byte,
with the 4 byte string representing the index number.

Remark In order to better distinguish the hardened derivation from the normal one, the
numbering of the indices starts from the number 231 .

msg = 00 | private key | index,


where msg is now a string of 37 bytes.

Then we apply the HMAC algorithm with the following inputs:

Hash function: SHA512.


Key: chain code.
Message: msg.

The Python code is the following:


1 from hmac import HMAC
2 from h a s h l i b import sha512
3
4 msg= p a r e n t _ p r i v a t e _ k e y + index
5 hashValue = HMAC( parent_chain_code , msg , sha512 ) . d i g e s t ( )

The result is a string of 64 bytes: hashValue. In this code the parent_private_key


already has the first 00 byte, because it is taken directly from the parent extended
private key.

Now we split this string of bytes in two (in the same way as the normal method):
the last 32 are the child chain code. Finally, we take the first 32 bytes, convert them
into an integer number and sum it to the parent private key (mod order), obtaining
Chapter 3. Hierarchical Deterministic Wallet 23

the child private key.

This is the Python code:


1 c h i l d _ c h a i n _ c o d e = hashValue [ 3 2 : ]
2 q = i n t ( hashValue [ : 3 2 ] . hex ( ) , 1 6 )
3 c h i l d _ p r i v a t e _ k e y = ( q + p a r e n t _ p r i v a t e _ k e y ) % order

Graphically these operations can be shown in Figure 3.3.

F IGURE 3.3: Hardened Derivation

3.4 Special derivation


Using a normal derivation it is possible to derive the extended public key, starting
only from the extended public key of the parent.

3.4.1 Public derivation


In order to compute this particular derivation, the only essential elements are the
ones contained in the extended public key, in particular:

• Public key Pparent .

• Chain code.

• Index.

First, we apply the HMAC algorithm to the same inputs used for the normal deriva-
tion:

Hash function: SHA512;


Key: chain code;
Message: msg;
Chapter 3. Hierarchical Deterministic Wallet 24

where msg is obtained as before:

msg = compressed public key | index.

The output of this function is the same of the normal derivation with the extended
private key. The last 32 bytes formed the child chain code, instead, the first 32 bytes
can be read as a special number: q.

Now we multiply the generator G to the integer number q and we obtain Q, a point
on the EC:
Q = q · G.
Finally, we compute the sum between two points on the elliptic curve: Q and Pparent ,
where Pparent is the parent public key.

Q + Pparent = Pchild ,

where Pchild is the child public key.

We will now prove that the child public key obtained in this way Pchild2 is the same
as that obtained starting from the private key, Pchild1 :

Both the procedures start from q, number obtained from the first 32 bytes of the
HMAC function. Let’s call p parent the parent private key and pchild the child private
key.

Pchild1 = pchild · G
= (q + p parent ) · G
= (q · G ) + ( p parent · G )
= Q + Pparent = Pchild2

cvd

Graphically this derivation can be shown in Figure 3.4.

F IGURE 3.4: Public Derivation


Chapter 3. Hierarchical Deterministic Wallet 25

3.4.2 Weakness of Normal Derivation


As shown above, the normal derivation presents a great advantage, but also a weak-
ness. It is possible to derive the parent extended private key knowing the parent
extended public key and only one of the child extended private key.

The HMAC-SHA256 function has as inputs three elements: the parent chain code,
the parent public key and the child index. The firsts two information can be taken
from the parent extended public key, instead the child index can be taken from the
child extended private key.

msg = compressed parent public key | child index.


Now we apply the HMAC algorithm with the usual inputs:

Hash function: SHA512.


Key: parent chain code.
Message: msg.

After that, we consider the first 32 bytes of the result of this function and consider it
as an integer number, q.

Remembering that to get the child private key it is needed to compute a sum with
the parent private key, it is possible to reverse the process.

Let’s call pchild and p parent the private keys of the child and the parent respectively.

pchild = q + p parent mod (order )


⇓ (3.1)
p parent = pchild − q mod (order )

The implication (3.1) holds also with modular arithmetic.

So we have derived the private key of the parent. Graphically this derivation can
be shown in Figure 3.5

3.5 Advantages and disadvantages


We have seen that public-to-public operation is possible using the normal deriva-
tion. This is impossible with the hardened one.

In fact, the inputs of HMAC-SHA512 are different for the two derivations. If for
the normal derivation only the information in the public extended key is sufficient,
for the hardened derivation the parent private key is needed. This makes impossible
to obtain a public from a public, but also it makes impossible to derive the private
key of the parent knowing the private key of the child and the public of the parent.

It is advisable to use each of the two methods in the appropriate situations.


Chapter 3. Hierarchical Deterministic Wallet 26

F IGURE 3.5: From child to parent

3.5.1 When to use Normal Derivation?


Normal derivation should be used whenever all the child keys are collected in the
same digital place and you should never give one key to someone else. As already
mentioned a leak of a single private key can compromise the entire wallet.

However, if all the child keys are used by the same person and you need to gen-
erate a different public key, with this derivation it is possible to do so even in a "hot
place". Let’s suppose to have stored only the extended public key in a device, you
can then receive payment to your public keys, but it is impossible to spend those
coins as long as the private keys are hidden. If someone stole your device the only
problem is a leak of privacy, because it is possible, by examining the blockchain,
to discover all transactions signed with the private keys associated with the public
keys of the wallet, but it is impossible to sign new transactions without having the
private keys.

3.5.2 When to use Hardened Derivation?


Hardened derivation should be used whenever all the keys generated are used for
different purposes or are stored in different places. With this procedure, it is possible
to yield a branch of the tree to someone else in order to manage part of your money,
without the risk to lose all the others keys in the wallet.

As a best practice, it is always advisable to use the hardened method for the first
derivation from the extended master private key. A hardened key should have both
hardened or normal children, but from a normal child, it is not reasonable to derive
a hardened one because it makes no sense to increase the security of the wallet at the
last level.
27

Chapter 4

Mnemonic phrase

We have seen how it is possible to generate keys starting from a seed. But a seed is
a long number, difficult to remember and not easy to write down on a paper. You
may incur typos while transcribing it, and this can compromise the entire wallet.

Remark Mistyping a single digit in the seed produce completely different keys.

In order to work around this problem, some solutions were implemented. Among
them, the most widespread and used is the one described by BIP39, Bitcoin Improve-
ment Proposal number 39. This is not the only one, in this chapter, we will also see
another solution proposed by Electrum1 , one of the most famous Bitcoin wallet.

Both these solutions use a Mnemonic phrase, from which the seed is obtained. This
phrase is designed to avoid typing errors while maintaining the same level of secu-
rity and entropy.

What is a Mnemonic phrase?

A Mnemonic phrase is a set of words taken from a specific dictionary. Although


the choice of the dictionary is not binding, the most commonly used among the
practitioners is the English one, defined by BIP39. It contains 2048 common words
of the English language, each of them has from 3 to 9 letters. The set of words that
makes up the dictionary must be chosen in such a way that the words within it are
easy to remember and difficult to misinterpret with one another. It is better to avoid
inserting into the dictionary two words with similar meaning or spelling.

4.1 BIP 39
First, we will see how to generate a mnemonic phrase in the framework of BIP39 [2]
and then how it is possible to obtain a seed from it.

4.1.1 Mnemonic Generation


In order to generate a Mnemonic phrase, we will start from a given entropy, that can
be seen as a large integer number. The way to obtain it can be left free to the user:
he can obtain it by inserting arbitrarily chosen numbers (poor choice of random-
ness), roll a dice many times or with any other method he considers suitable. Many
software provide a function that generates entropy with quite randomness, but if
someone is skeptical about the reliability of software randomness, he must provide
himself with such integer number.

1 Electrum is a lightweight Bitcoin client, released on November 5, 2011


Chapter 4. Mnemonic phrase 28

Let us call ENT the number of binary digits of the given entropy. Then ENT should
belong to a given set:
ENT ∈ {128, 160, 192, 224, 256}.
The reason for a given length for the entropy will be clear in a moment.

Now we write the entropy in bytes format, obtaining a string of ENT/8 length.
Then we compute the SHA256 algorithm and consider only the first ENT/32 bits
as a checksum. Finally add these bits to the bottom of the entropy, obtaining an in-
teger number, called entropy_checked, expressed in binary format of length equal to:
ENT + ENT/32.

In Python:

1 from h a s h l i b import sha256


2
3 e n t r o p y _ b y t e s = entropy . t o _ b y t e s ( i n t (ENT/8) , b y t e o r d e r = ’ big ’ )
4 checksum = sha256 ( e n t r o p y _ b y t e s ) . d i g e s t ( )
5 entropy_checked = entropy_bin + checksum_bin [ : i n t (ENT/32) ]

where entropy_bin and checksum_bin are strings of bits that can be concatenated.

Now it is clear the reason for a constraint on the length of the entropy in input:

(i) ENT must be a dividend of 3,

(ii) ENT < 128 could be not secure enough,

(iii) ENT > 256 is useless.

The point (i) is due to the structure proposed by BIP39. It is only a convention to take
the first ENT/32 bits as a checksum. However, it is essential that the final length of
the entropy plus the checksum must be a dividend of 11, from the moment that the
dictionary is a set of 211 words.

The point (ii) is just a suggestion because taking less entropy could bring to a leak
of security. It will be easier for an attacker to guess your mnemonic phrase by trying
out all the possible combinations if fewer words are involved. It is important to re-
member that adding even a single bit of entropy, doubles the difficulty of guessing it.

The point (iii) is another suggestion. A private key is a number smaller than 2256 ,
therefore, it would be useless to generate a seed starting from an entropy with more
than 256 bits.

Thanks to constraint (i) we obtain that the length of entropy_checked is a dividend


of 11:
ENT 33 3
len(entropy_checked) = ENT + = · ENT = 11 · · ENT.
32 32 32
Let us consider entropy_checked as a string of bits and then we divide it in substring,
3
each of 11 bits length, obtaining ( 32 · ENT ) strings of bits.

Each of these strings represents an integer number that can take values in the range
between 0 and 2047, ie 211 − 1. Associate each of these numbers with a word in the
Chapter 4. Mnemonic phrase 29

chosen dictionary, suppose to consider the English one sorted alphabetically. Write
down all these words, separated by a space and obtain the Mnemonic Phrase.

All these steps can be summarized with the following scheme, (ENT = 128):
entropy16 = f 012003974d093eda670121023cd03bb
m
entropy2 = 1111000000010010000000...0111011
| {z }
SH A256

0010
|{z} 010001000001001...
| {z }
check sum ignored

entropy checked = 1111000000010010000000...0111011 | 0010


m
11110000000
| {z }| {z } ... 01110110010
10010000000 | {z },
1920 1152 946
m m m
useless mosquito iron

obtaining a sequence of 12 words:


useless mosquito atom trust ankle walnut oil across awake bunker domain iron

4.1.2 From Mnemonic to Seed


Once obtaining the Mnemonic phrase we need to derive the seed. In order to do so,
a hash function is used. So it will be infeasible to derive the Mnemonic phrase from
the seed.

The function used is the PBKDF2 and it is used in order to avoid brute force at-
tack, from the moment that the output has exactly the same length of a standard
hash function, but it will take more times to calculate it from the moment that it will
compute the same hash function many times.

It receives as input:
Message: Mnemonic phrase.
Salt: ’mnemonic’ + passphrase.
Number of iterations: 2048.
Digest-module: SHA512.
Mac-module: HMAC.
Summing up it can be said that it calculates the same hash function (HMAC-SHA512)
2048 times.

In order to introduce more complexity in the seed computation a Salt is introduced.


If not specified the standard salt is simply the world ’mnemonic’, otherwise, it could
be extended with an optional passphrase.

Although it is true that a human being is a scarce source of randomness, the passphrase
is usually chosen by the user. This is due to the fact that it should not introduce more
entropy, but it prevents an attack with rainbow tables and gives the possibility to the
user to have different wallets with the same mnemonic phrase.
Chapter 4. Mnemonic phrase 30

Remark The randomness should be guaranteed by the input entropy used to generate the
mnemonic phrase, not by the passphrase.

The Python code is the following:


1 from h a s h l i b import sha512
2 from pbkdf2 import PBKDF2
3 import hmac
4
5 seed = PBKDF2 ( mnemonic , ’ mnemonic ’ + passphrase , i t e r a t i o n s = 2 0 4 8 ,
macmodule = hmac , digestmodule = sha512 ) . read ( 6 4 )

where mnemonic is the mnemonic phrase previously computed and passphrase is cho-
sen by the user (if not specified it is empty).

Remark With this procedure we always produce a seed of specific length: 512 bits. It will
always be enough because every private key can take value from a smaller set of value (1 to
order).

4.2 Electrum Mnemonic


Even if BIP39 is proposed, it is not the only solution adopted by the practitioners.
One example is the one proposed by Electrum [3].

The main difference is in the way that the mnemonic phrase is generated and the
purpose of it. Electrum chooses to assign a version to the seed in such a way that is
possible to recognize the purpose of the keys and the way to generate them.

4.2.1 Mnemonic Generation


Whenever a new mnemonic phrase is required, Electrum starts from some entropy,
generated through a random function. Obviously, it is possible to generate a valid
mnemonic phrase with an entropy chosen by the user, if he is skeptical or doesn’t
want to rely on the reliability of the randomness of the random function.

To be consistent with the BIP39 section, consider the entropy as a large integer num-
ber and call ENT the number of its binary digits. Then ENT must be a multiple of 11,
if the chosen dictionary is the same of BIP39. However, the choice of the dictionary
is not binding.

The first important difference with BIP39 mnemonic is that the checksum is not per-
formed on the entropy but on the Mnemonic phrase directly. In order to obtain a
valid mnemonic phrase, the following instruction must be followed:

1. Divide the entropy in string of 11 bits each.

2. Associate each string with a word from the chosen dictionary of 2048 words.

3. Write down all these words, separated by a space and obtain a candidate Mnemonic
phrase.

4. Compute a particular HASH function on the Mnemonic phrase previously ob-


tained and verify that the first digits correspond to the digits of the chosen
version of the seed:
Chapter 4. Mnemonic phrase 31

• ’0x01’ for a standard type seed,


• ’0x100’ for a segwit type seed,
• ’0x101’ for a two-factor authenticated type seed.

5. If the initial digits are different, increase entropy by one and then go back to
point 1, otherwise, you have obtained a valid mnemonic phrase.
Point 5. has the simple effect, most of the time, of changing a single word of the
mnemonic phrase and this will lead to a complete different HASH. Do this over and
over again, until the first digits match the version digits required.

This procedure can be shown with the following Python instruction:


1 import b i n a s c i i
2 import hmac
3 from h a s h l i b import sha512
4
5 def verify_mnemonic ( mnemonic , v e r s i o n = " standard " ) :
6 x = hmac . new ( b " Seed v e r s i o n " , mnemonic . encode ( ’ u t f 8 ’ ) , sha512 ) . d i g e s t ( )
7 s = b i n a s c i i . h e x l i f y ( x ) . decode ( ’ a s c i i ’ )
8 i f s [ 0 : 2 ] == ’ 01 ’ :
9 r e t u r n v e r s i o n == " standard "
10 e l i f s [ 0 : 3 ] == ’ 100 ’ :
11 r e t u r n v e r s i o n == " segwit "
12 e l i f s [ 0 : 3 ] == ’ 101 ’ :
13 r e t u r n v e r s i o n == " 2FA"
14 else :
15 return False
16
17 def generate_mnemonic ( entropy , number_words , v e r s i o n , d i c t i o n a r y ) :
18 is_verify = False
19 while not i s _ v e r i f y :
20 mnemonic = from_entropy_to_mnemonic ( entropy , number_words , d i c t i o n a r y )
21 i s _ v e r i f y = verify_mnemonic_electrum ( mnemonic , v e r s i o n )
22 i f not i s _ v e r i f y :
23 entropy = entropy + 1

Let’s see an example: suppose to be looking for a standard type seed, starting with
ENT = 132:

entropy16 = e f 938205cd78ab6d876398dc f d65dae32


m
entropy2 = 11101111100
| {z 10011100000
}| {z } 10000001011
| {z } ... 11000110010
| {z }
1916 1248 1035 1586
m m m m
usage orchard li f t shock

Mnemonic = usage orchard lift online melt replace budget indoor table twenty issue shock

HASH(Mnemonic) = 3d5d23737859601eeabe32d1e1...

Does HASH(Mnemonic) start with ’01’? ⇒ NO ⇒ add 1 to the entropy.


entropy16 = entropy16 + 1 = e f 938205cd78ab6d876398dc f d65dae33
m
entropy2 = 11101111100
| {z }|10011100000
{z } 10000001011
| {z } ... 11000110011
| {z }
1916 1248 1035 1587
m m m m
usage orchard li f t shoe
Chapter 4. Mnemonic phrase 32

Mnemonic = usage orchard lift online melt replace budget indoor table twenty issue shoe

HASH(Mnemonic) = 2a3b2c f 6a0506844a77ba f 8175...

Does HASH(Mnemonic) start with ’01’? ⇒ NO ⇒ add 1 to the entropy.

After other 443 attempts we obtain:


entropy16 = e f 938205cd78ab6d876398dc f d65da f ee
m
entropy2 = 11101111100
| {z 10011100000
}| {z } 10000001011
| {z } ... 11111101110
| {z }
1916 1248 1035 2030
m m m m
usage orchard li f t worry

Mnemonic = usage orchard lift online melt replace budget indoor table twenty issue worry

HASH(Mnemonic) = 01d133 f dcc54bd0da3d717173e0 f 82127...

Does HASH(Mnemonic) start with ’01’? ⇒ YES.

So we have finally found a valid Mnemonic phrase for the standard type seed.

4.2.2 From Mnemonic to Seed


Once obtaining the Mnemonic phrase the seed is derived in the same way described
in BIP39, with only one exception:

The Salt used for the PBKDF2 function does not contain the word ’mnemonic’, in-
stead, it contains the word ’electrum’. It is always concatenated with a passphrase,
chosen by the user.

The Python code it is, therefore, the following:


1 from h a s h l i b import sha512
2 from pbkdf2 import PBKDF2
3 import hmac
4
5 seed = PBKDF2 ( mnemonic , ’ electrum ’ + passphrase , i t e r a t i o n s = 2 0 4 8 ,
macmodule = hmac , digestmodule = sha512 ) . read ( 6 4 )

Once again if the passphrase is not specified by the user, it will be left empty.

4.3 Comparison
Once described how to generate the Mnemonic phrase for each of the two principal
proposals, let’s analyze the advantages and disadvantages.

Both BIP39 and Electrum are secure from a so-called brute force attack, from the mo-
ment that the function PBKDF2 is used to generate the seed from the mnemonic
phrase and so it is infeasible to guess a Mnemonic randomly generated by another
user. In fact, each time a valid phrase has been found, it is required to compute the
seed, then the first child keys and then look at the public ledger, the blockchain, in
order to see if some of this private keys are used to sign a transaction. All these pas-
sages are computational consuming and it would take too much time to try even a
Chapter 4. Mnemonic phrase 33

small part of all the possible combination. Even with all the most powerful comput-
ers in the world working together, it will take a time of many order of magnitude
greater than the age of the universe itself.

These two methods are both secure, but there is a difference. Suppose to be look-
ing for a 12 words mnemonic already used: for BIP39 it is "only" needed to try 128
bits of entropy and then the others 4 bits, the checksum, are obtained through a
HASH function, instead with Electrum it is needed to try all the 132 bits of entropy
and then check, through a HASH function, if they are valid. With this example, the
difference is in 4 bits, but it increases with the number of words:

Words BIP39 Electrum Difference


12 128 132 4
15 160 165 5
18 192 198 6
21 224 231 7
24 256 264 8

The first column represents the number of words in a mnemonic phrase, the second
and the third represent respectively the bits of entropy needed to be checked in or-
der to find a specific BIP39 or Electrum mnemonic phrase. The last column is simply
the difference between the previous two. Although this additional difficulty is not
necessary, the difference between the two methods is not negligible. In fact, to find
a specific Electrum phrase with 12 words you have to do 16 (24 ) times the attempts
needed to find the same number of words in the BIP39 framework. If the words
become 24 the number of attempts would be 256 (28 ) times.

Another difference is in the way that a phrase is considered valid. BIP39 used
a checksum based on the input entropy. This means that the knowledge of the
mnemonic phrase alone is not enough to know if it is valid or not. A fixed dictionary
is always required. On the other side for Electrum it is useless the knowledge of the
dictionary because the validation is based directly on the HASH of the mnemonic
phrase. Furthermore, Electrum allows the use of any kind of dictionary and not only
the standard ones.

Finally, the most important difference is the Electrum introduction of version type
for the seed. While BIP39 only check if a mnemonic is valid, Electrum checks the
validity of phrase looking if it corresponds to a specific version. Directly from the
Mnemonic, with Electrum, it is possible to understand how to derive all the keys
required and their purpose. On the other side, with BIP39, it is impossible to say a
priori the purpose of keys derived from that seed.

To avoid this problem a new BIP was proposed: BIP43. In order to identify a partic-
ular purpose for a bunch of keys, a particular derivation scheme was used:

m / purpose0 / ∗

Changing the derivation path will change also the purpose of the keys. For more
information see the next chapter.
34

Chapter 5

How to use a HD Wallet

As already mentioned there are various ways to use a Hierarchical Deterministic


Wallet. It is possible to use this sequence of keys also for a non-monetary purpose,
outside the cryptocurrency world. In fact, these keys are simple numbers and they
can be used for every possible purpose. Here we will focus only on the cryptocur-
rency application, in particular, Bitcoin.

5.1 Derivation path


In order to easily recognize the path of a particular derivation, a common notion was
introduced for the Hierarchical Deterministic Wallet.

Let’s denote the extended master private key with m and the extended master public
key with M.

The first normal private child derived from m will be denoted by the number 0:

m/0

The fourth normal private child of the first normal private child of the master private
key will use the following notation:

m/0/3

In order to indicate a hardened child, the number of the index is followed by an


apostrophe. The first hardened child will have index = 231 , the second will have
index = 231 + 1 and so on, but for the nomenclature of the path, 231 will be omitted
and the first hardened child of the master key will be written in this way:

m/00

It is possible to mix hardened and normal derivation. For example, the 15th hardened
child of the 37th normal child of the 6th normal child of the 1019th hardened child of
m will be represented in the following way:

m/10180 /5/36/140

Remark Although it is not recommended to use hardened derivation after a normal deriva-
tion, it is always possible to do so.
This nomenclature can be used also to represent extended public keys. In fact, it is
possible to derive, with the normal derivation, an extended public key from the par-
ent extended public key. The notation will be the usual one: for example, the third
Chapter 5. How to use a HD Wallet 35

public key derived from the sixth public key derived from M will be represented in
this way:
M/2/5

Remark It is useless to specify that children derived from M are normal and not hardened.
It is impossible to use the hardened derivation to derive the public key from a parent public
key.

5.2 BIP 43
This BIP introduces a "Purpose Field" in the Hierarchical Deterministic wallet [4].
The first child of the extended master private key specifies the purpose of the entire
branch.
m / purpose0 / ∗
For example, if purpose = 44 it means that this is a multi-coin wallet, if purpose = 49
it means that the keys generated follow the BIP49 specification (P2WPKH nested in
P2SH).

5.2.1 Multi-coin wallet BIP 44


The derivation scheme for a multi-coin wallet [5] should be the following:

m/purpose0 /coin_type0 /account0 /change/address_index

Let’s see the meaning of each field:

• Purpose: it must be equal to 44’ (or 0x8000002C) and it indicates that the subtree
of this node is used according to this specification. Hardened derivation is
used at this level.

• Coin_type: this level creates a separate subtree for every cryptocoin, avoiding
reusing addresses across cryptocoins and improving privacy issues. Coin type
is a constant, set for each cryptocoin. Hardened derivation is used at this level.

Some example:

Path Cryptocoin
m / 44’ / 0’ Bitcoin (mainnet)
m / 44’ / 1’ Bitcoin (testnet)
m / 44’ / 2’ Litecoin
m / 44’ / 3’ Dogecoin
m / 44’ / 60’ Ethereum
m / 44’ / 128’ Monero
m / 44’ / 144’ Ripple
m / 44’ / 1815’ Cardano
.. ..
. .

A list with the complete set of cryptocoins is available and in continuous up-
date:
https://github.com/satoshilabs/slips/blob/master/slip-0044.md
Chapter 5. How to use a HD Wallet 36

• Account: this level splits the key space into independent user identities, so the
wallet never mixes the coins across different accounts. Users can use these
accounts to organize the funds in the same fashion as bank accounts; for do-
nation purposes, for saving purposes, for common expenses etc. Hardened
derivation is used at this level.

• Change: it can take only two value: 0 and 1. 0 is used for addresses that are
meant to be visible outside of the wallet (e.g. for receiving payments). 1 is
used for addresses which are not meant to be visible outside of the wallet and
is used for return transaction change. Normal derivation is used at this level.

• Index: this is the last derivation, used to have many keys for each cryptocoin. It
can take values from 0 in sequentially increasing manner. Normal derivation
is used at this level.

The principal advantage of this BIP is the possibility to easily back up all the cryp-
tocoins of the user just by remembering a particular set of worlds (BIP39 mnemonic
phrase). If this method is used to store more then one coin, keep attention not to lose
the master private key, otherwise, all coins were lost.

5.2.2 SegWit addresses BIP 49


From August 2017 a new way to sign a Bitcoin transaction has been allowed, Segre-
gated Witness: SegWit. It is not the purpose of this thesis to look inside of this topic,
but it is important to say that a software not updated is not able to spend coins re-
ceived in this format. So it is necessary to have a different branch in the derivation
scheme just for SegWit addresses, in such a way that only software aware of SegWit
are able to spend thus coins.

The derivation scheme introduced by BIP49 [6] is the following:

m/490 /coin_type0 /account0 /change/address_index

The logic of BIP44 was followed and it let the user the possibility to have a multi-
coin wallet just by choosing a specific coin_type.

Remark In order to make SegWit a soft fork (backward compatible), a SegWit transaction
is nested in a pay-to-script-hash, so the corresponding address must begin with ’3’.
37

Conclusion

The main purpose of this work has been the analysis of methods used to generate a
sequence of private and public keys and to store the seed from which the sequence
is derived.

First, we have briefly described some simple deterministic derivation, then we have
analyzed the Hierarchical Deterministic Wallet. It is possible to derive an extended
key in two way: normal and hardened. The use of the normal derivation allows
public-to-public derivation, that is the derivation of a sequence of public keys from
an extended public key, without access to any private key; anyway the entire wallet
is compromised if both a parent extended public key and a child extended private
key are stolen. The use of the hardened derivation prevented this problem, but it
does not allow public-to-public derivation.

Then we focused on the two methods mostly used to generate the seed: the ver-
sion proposed by BIP39 and the one proposed by Electrum. Both of them start from
a given entropy to generate a mnemonic phrase, which is then used to obtain a seed.
The two methods are very similar, but with some subtle differences. In this thesis
these differences have been analyzed, showing pros and cons of each method.

It was not a goal of this work to point out a better proposal, but to provide a complete
and detailed overview of the various way to generate asymmetric cryptographic
keys.

"We often fear what we do not understand. Our best defense is knowledge."

Lieutenant Tuvok, Star Trek: Voyager


38

Appendix A

Bitcoin keys representation and


addresses

In order to make it easy to store and recognize keys, in the Bitcoin framework, some
encodes were designed.

In this appendix, we will briefly describe the possible ways to write down a pub-
lic key, a private key and finally how it is possible to obtain a Bitcoin addresses
(Pay-to-Public-Key-Hash).

All the examples below will start from the following private key (expressed in hex-
adecimal digits):
2AFEED53F26EF06521E7E825F83CB36A4632791A070A782E353230EAE71EBDD3.

A.1 Public Key


A public key is a point in the EC and can be represented in two way:

• Uncompressed.

• Compressed.

Both these encodes contains the same information and it is possible to obtain one
from the other and vice versa.

A.1.1 Uncompressed
An uncompressed public key is a string of hexadecimal digits, obtained by concate-
nation of the x coordinate with the y coordinate (64 hexadecimal digits both). It is
added 04 at the beginning of the string, obtaining a total of 130 hexadecimal digits.

Example of an uncompressed public key:


049B7D40BA6BD08C0D1C46048279947AFEA89E6BA5E0C08AEEEBC3472F38C792B
72EC672B238AD98EECD29A2CD5F2465FEE3BB8205093CEBED8B94C8472FBA15E4.

A.1.2 Compressed
A compressed public key is a string of hexadecimal digits, it is obtained taking the
x coordinate and adding 02 at the begging if the y coordinate is even, 03 otherwise.
Its length will be of 66 hexadecimal digits.
Appendix A. Bitcoin keys representation and addresses 39

Remark The symmetry property of the EC allows us to write down only the x coordinate.
The y coordinate can be derived by the equation of the EC that give us 2 possible y. The choice
between these will be made base on the first two digits of the compressed public key.

Example of a public key compressed:


029B7D40BA6BD08C0D1C46048279947AFEA89E6BA5E0C08AEEEBC3472F38C792B7.

A.2 Private Key


A private key is an integer number and in the Bitcoin framework, it is usually repre-
sented in a particular format: WIF (Wallet Import Format).

In order to obtain a WIF Private Key the following procedure must be used:
• Write down the private key in hexadecimal format. (64 digits).

• Add two digits as a version number (80 for Bitcoin) in front of the private key.
This is done in order to recognize the purpose of the key.

• Add 01 at the end of the private key if you want a WIF compressed, none if you
want a WIF uncompressed. The difference between these two types is that from
a compressed private key a compressed public key is expected to be derived and
from a uncompressed private key a uncompressed public key is expected.

• Add a checksum at the end, obtained applying the SHA256 function twice to
the string previously obtained, take the first 4 bytes (8 hexadecimal digits) and
put them at the end of the string.

• Finally compute the encoding in base 58, obtaining a string of 51 digits if un-
compressed or 52 digits if compressed.
A private key WIF compressed will start with the K or L and an uncompressed one
will start with 5.

Example of private key WIF compressed:


KxfHiqW7h3N2puewVnHWBN4ucmBzK2iupiSUEGKjx1UNT8vvLwvP.

Example of the same private key shown above in WIF uncompressed:


5J9DrMWFf5AJBQFwVGhSeEqTshNXCnhm96K1H9TT3VevM4iSudq.
Remark The two types of WIF private key give the same information. The only difference
is in the public key that is expected to be derived from it.

A.3 Address
Among the Bitcoin transactions, one of the most used is a Pay-to-Public-Key-Hash,
meaning that in the transaction you will not write directly the public key, but the
hash of that public key.

The hash function used in this framework is the HASH160, applied to the compressed
public key. The result is a PubkeyHash and from the moment that the HASH160 is an
irreversible function, it is infeasible to obtain the public key starting from the Pub-
keyHash.
Appendix A. Bitcoin keys representation and addresses 40

In order to obtain a valid Bitcoin address, it is needed to encode the PubkeyHash


in base 58:

• Write down the PubkeyHash (160 bits).

• Add one byte as version (00 for Bitcoin) in front of the PubkeyHash.

• Add a checksum at the end, obtained applying the SHA256 function twice to
the string previously obtained, take the first 4 bytes and put them at the end of
the string.

• Finally compute the encoding in base 58, obtaining a 34 digit string.

Example of an address:
1DFvgrsFE6qVfgX83E35SbLdpjiSFffY2q.

Remark This is not the only type of address in the Bitcoin framework, but it is the simplest
one, derived directly from the public key.
41

Appendix B

Python code

This appendix shows the most relevant parts of the Python code made for this thesis.

Remark These scripts only work if they are inserted in the repository of the professor Ferdi-
nando M. Ametrano [13].

B.1 Deterministic Wallet


First, let’s show the Python code related to the first two type of Deterministic Wallets.

B.1.1 Type-1
This is the script used to generate private and public keys, using the first type of
deterministic derivation.

1 from secp256k1 import order , G, modInv , pointAdd , p o i n t M u l t i p l y


2 from h a s h l i b import sha256
3 import random
4
5 # s e c r e t random number
6 r = random . r a n d i n t ( 0 , order − 1)
7 p r i n t ( ’ \nr = ’ , hex ( r ) , ’ \n ’ )
8
9 # number o f key p a i r s t o g e n e r a t e
10 nKeys = 3
11 p = [ 0 ] ∗ nKeys
12 P = [ ( 0 , 0 ) ] ∗ nKeys
13
14 f o r i i n range ( 0 , nKeys ) :
15 # H( i |r )
16 H_i_r = i n t ( sha256 ( ( hex ( i ) +hex ( r ) ) . encode ( ) ) . h e x d i g e s t ( ) , 1 6 ) %order
17 p [ i ] = H_i_r
18 P [ i ] = p o i n t M u l t i p l y ( p [ i ] , G)
19 p r i n t ( ’ prKey# ’ , i , ’ : \ n ’ , hex ( p [ i ] ) , sep= ’ ’ )
20 p r i n t ( ’ PubKey# ’ , i , ’ : \ n ’ , hex ( P [ i ] [ 0 ] ) , ’ \n ’ , hex ( P [ i ] [ 1 ] ) , ’ \n ’ , sep= ’ ’ )

B.1.2 Type-2
This is the script used to generate private and public keys, using the second type of
deterministic derivation.

1 from secp256k1 import order , G, modInv , pointAdd , p o i n t M u l t i p l y


2 from h a s h l i b import sha256
3 import random
4
Appendix B. Python code 42

5 # s e c r e t master p r i v a t e key
6 mp = random . r a n d i n t ( 0 , order − 1)
7 p r i n t ( ’ \ n s e c r e t master p r i v a t e key : \ n ’ , hex (mp) , ’ \n ’ )
8
9 # p u b l i c random number
10 r = random . r a n d i n t ( 0 , order − 1)
11 p r i n t ( ’ p u b l i c ephemeral key : \ n ’ , hex ( r ) )
12
13 # Master PublicKey :
14 MP = p o i n t M u l t i p l y (mp, G)
15 p r i n t ( ’ Master P u b l i c Key : \ n ’ , hex (MP[ 0 ] ) , ’ \n ’ , hex (MP[ 1 ] ) , ’ \n ’ )
16
17 # number o f key p a i r s t o g e n e r a t e
18 nKeys = 3
19 p = [ 0 ] ∗ nKeys
20 P = [ ( 0 , 0 ) ] ∗ nKeys
21
22 # PubKeys can be c a l c u l a t e d without using privKeys
23 f o r i i n range ( 0 , nKeys ) :
24 # H( i |r )
25 H_i_r = i n t ( sha256 ( ( hex ( i ) +hex ( r ) ) . encode ( ) ) . h e x d i g e s t ( ) , 1 6 ) %order
26 P [ i ] = pointAdd (MP, p o i n t M u l t i p l y ( H_i_r , G) )
27
28 # check t h a t PubKeys match with privKeys
29 f o r i i n range ( 0 , nKeys ) :
30 # H( i |r )
31 H_i_r = i n t ( sha256 ( ( hex ( i ) +hex ( r ) ) . encode ( ) ) . h e x d i g e s t ( ) , 1 6 ) %order
32 p [ i ] = (mp + H_i_r ) %order
33 a s s e r t P [ i ] == p o i n t M u l t i p l y ( p [ i ] , G)
34 p r i n t ( ’ prKey# ’ , i , ’ : \ n ’ , hex ( p [ i ] ) , sep= ’ ’ )
35 p r i n t ( ’ PubKey# ’ , i , ’ : \ n ’ , hex ( P [ i ] [ 0 ] ) , ’ \n ’ , hex ( P [ i ] [ 1 ] ) , ’ \n ’ , sep= ’ ’ )

B.2 Hierarchical Deterministic Wallet - BIP 32


This section shows the Python code related to the Hierarchical Deterministic Wallet,
defined by BIP 32.

1 from secp256k1 import order , G, p o i n t M u l t i p l y , pointAdd , a , b , prime


2 from hmac import HMAC
3 from h a s h l i b import new as hnew
4 from h a s h l i b import sha512 , sha256
5 from base58 import b58encode_check , b58decode_check
6 from F i n i t e F i e l d s import modular_sqrt
7
8 BITCOIN_PRIVATE = b ’ \x04\x88\xAD\xE4 ’
9 BITCOIN_PUBLIC = b ’ \x04\x88\xB2\x1E ’
10 TESTNET_PRIVATE = b ’ \x04\x35\x83\x94 ’
11 TESTNET_PUBLIC = b ’ \x04\x35\x87\xCF ’
12 BITCOIN_SEGWIT_PRIVATE = b ’ \x04\xb2\x43\x0c ’
13 BITCOIN_SEGWIT_PUBLIC = b ’ \x04\xb2\x47\x46 ’
14 PRIVATE = [ BITCOIN_PRIVATE , TESTNET_PRIVATE , BITCOIN_SEGWIT_PRIVATE ]
15 PUBLIC = [ BITCOIN_PUBLIC , TESTNET_PUBLIC , BITCOIN_SEGWIT_PUBLIC ]
16
17 def h160 ( inp ) :
18 h1 = sha256 ( inp ) . d i g e s t ( )
19 r e t u r n hnew ( ’ ripemd160 ’ , h1 ) . d i g e s t ( )
20
21 def b i p 3 2 _ i s v a l i d _ x k e y ( vbytes , depth , f i n g e r p r i n t , index , chain_code , key ) :
22 a s s e r t l e n ( key ) == 3 3 , " wrong l e n g t h f o r key "
23 i f ( vbytes i n PUBLIC ) :
Appendix B. Python code 43

24 a s s e r t key [ 0 ] i n ( 2 , 3 )
25 e l i f ( vbytes i n PRIVATE ) :
26 a s s e r t key [ 0 ] == 0
27 else :
28 r a i s e Exc ep tio n ( " i n v a l i x key [ 0 ] p r e f i x ’% s ’ " % type ( key [ 0 ] ) . __name__ )
29 a s s e r t i n t . from_bytes ( key [ 1 : 3 3 ] , ’ big ’ ) < order , " i n v a l i d key "
30 a s s e r t l e n ( depth ) == 1 , " wrong l e n g t h f o r depth "
31 a s s e r t l e n ( f i n g e r p r i n t ) == 4 , " wrong l e n g t h f o r f i n g e r p r i n t "
32 a s s e r t l e n ( index ) == 4 , " wrong l e n g t h f o r index "
33 a s s e r t l e n ( chain_code ) == 3 2 , " wrong l e n g t h f o r chain_code "
34
35 def bip32_parse_xkey ( xkey ) :
36 decoded = b58decode_check ( xkey )
37 a s s e r t l e n ( decoded ) == 7 8 , " wrong l e n g t h f o r decoded xkey "
38 i n f o = { " vbytes " : decoded [ : 4 ] ,
39 " depth " : decoded [ 4 : 5 ] ,
40 " f i n g e r p r i n t " : decoded [ 5 : 9 ] ,
41 " index " : decoded [ 9 : 1 3 ] ,
42 " chain_code " : decoded [ 1 3 : 4 5 ] ,
43 " key " : decoded [ 4 5 : ]
44 }
45 b i p 3 2 _ i s v a l i d _ x k e y ( i n f o [ " vbytes " ] , i n f o [ " depth " ] , i n f o [ " f i n g e r p r i n t " ] , \
46 i n f o [ " index " ] , i n f o [ " chain_code " ] , i n f o [ " key " ] )
47 return info
48
49 def bip32_compose_xkey ( vbytes , depth , f i n g e r p r i n t , index , chain_code , key ) :
50 b i p 3 2 _ i s v a l i d _ x k e y ( vbytes , depth , f i n g e r p r i n t , index , chain_code , key )
51 xkey = vbytes + \
52 depth + \
53 fingerprint + \
54 index + \
55 chain_code + \
56 key
57 r e t u r n b58encode_check ( xkey )
58
59 def bip32_xprvtoxpub ( xprv ) :
60 decoded = b58decode_check ( xprv )
61 a s s e r t decoded [ 4 5 ] == 0 , " not a p r i v a t e key "
62 p = i n t . from_bytes ( decoded [ 4 6 : ] , ’ big ’ )
63 P = p o i n t M u l t i p l y ( p , G)
64 P_bytes = ( b ’ \x02 ’ i f ( P [ 1 ] % 2 == 0 ) e l s e b ’ \x03 ’ ) + P [ 0 ] . t o _ b y t e s ( 3 2 ,
’ big ’ )
65 network = PRIVATE . index ( decoded [ : 4 ] )
66 xpub = PUBLIC [ network ] + decoded [ 4 : 4 5 ] + P_bytes
67 r e t u r n b58encode_check ( xpub )
68
69 def bip32_master_key ( seed , seed_bytes , vbytes = PRIVATE [ 0 ] ) :
70 hashValue = HMAC( b " B i t c o i n seed " , seed . t o _ b y t e s ( seed_bytes , ’ big ’ ) ,
sha512 ) . d i g e s t ( )
71 p_bytes = hashValue [ : 3 2 ]
72 p = i n t ( p_bytes . hex ( ) , 1 6 ) % order
73 p_bytes = b ’ \x00 ’ + p . t o _ b y t e s ( 3 2 , ’ big ’ )
74 chain_code = hashValue [ 3 2 : ]
75 xprv = bip32_compose_xkey ( vbytes , b ’ \x00 ’ , b ’ \x00\x00\x00\x00 ’ , b ’ \x00\
x00\x00\x00 ’ , chain_code , p_bytes )
76 r e t u r n xprv
77
78 # Child Key D e r i v a t i o n
79 def bip32_ckd ( extKey , c h i l d _ i n d e x ) :
80 p a r e n t = bip32_parse_xkey ( extKey )
81 depth = ( i n t . from_bytes ( p a r e n t [ " depth " ] , ’ big ’ ) + 1 ) . t o _ b y t e s ( 1 , ’ big ’ )
82 i f p a r e n t [ " vbytes " ] i n PRIVATE :
83 network = PRIVATE . index ( p a r e n t [ " vbytes " ] )
Appendix B. Python code 44

84 parent_prvkey = i n t . from_bytes ( p a r e n t [ " key " ] [ 1 : ] , ’ big ’ )


85 P = p o i n t M u l t i p l y ( parent_prvkey , G)
86 parent_pubkey = ( b ’ \x02 ’ i f ( P [ 1 ] % 2 == 0 ) e l s e b ’ \x03 ’ ) + P [ 0 ] .
t o _ b y t e s ( 3 2 , ’ big ’ )
87 else :
88 network = PUBLIC . index ( p a r e n t [ " vbytes " ] )
89 parent_pubkey = p a r e n t [ " key " ]
90 f i n g e r p r i n t = h160 ( parent_pubkey ) [ : 4 ]
91 index = c h i l d _ i n d e x . t o _ b y t e s ( 4 , ’ big ’ )
92 i f ( index [ 0 ] >= 0 x80 ) : # p r i v a t e ( hardened ) d e r i v a t i o n
93 a s s e r t p a r e n t [ " vbytes " ] i n PRIVATE , " Cannot do p r i v a t e ( hardened )
d e r i v a t i o n from Pubkey "
94 parent_key = p a r e n t [ " key " ]
95 else :
96 parent_key = parent_pubkey
97 hashValue = HMAC( p a r e n t [ " chain_code " ] , parent_key + index , sha512 ) .
digest ( )
98 chain_code = hashValue [ 3 2 : ]
99 p = i n t ( hashValue [ : 3 2 ] . hex ( ) , 1 6 )
100 i f p a r e n t [ " vbytes " ] i n PRIVATE :
101 p = ( p + parent_prvkey ) % order
102 p_bytes = b ’ \x00 ’ + p . t o _ b y t e s ( 3 2 , ’ big ’ )
103 r e t u r n bip32_compose_xkey ( PRIVATE [ network ] , depth , f i n g e r p r i n t , index ,
chain_code , p_bytes )
104 else :
105 P = p o i n t M u l t i p l y ( p , G)
106 X = i n t . from_bytes ( parent_pubkey [ 1 : ] , ’ big ’ )
107 Y_2 = X∗∗3 + a∗X + b
108 Y = modular_sqrt ( Y_2 , prime )
109 i f ( Y % 2 == 0 ) :
110 i f ( parent_pubkey [ 0 ] == 3 ) :
111 Y = prime − Y
112 else :
113 i f ( parent_pubkey [ 0 ] == 2 ) :
114 Y = prime − Y
115 p a r e n t P o i n t = ( X , Y)
116 P = pointAdd ( P , p a r e n t P o i n t )
117 P_bytes = ( b ’ \x02 ’ i f ( P [ 1 ] % 2 == 0 ) e l s e b ’ \x03 ’ ) + P [ 0 ] . t o _ b y t e s
( 3 2 , ’ big ’ )
118 r e t u r n bip32_compose_xkey ( PUBLIC [ network ] , depth , f i n g e r p r i n t , index ,
chain_code , P_bytes )

B.3 Mnemonic phrase


In this last section there are shown the Python code related to the generation of the
Mnemonic phrase from a given entropy and the way to obtain a seed from that set
of words.

Both the methods seen in this thesis have been implemented.

B.3.1 BIP 39
These are the functions used to generate a valid BIP 39 Mnemonic phrase and the
related seed.

1 from h a s h l i b import sha256 , sha512


2 from pbkdf2 import PBKDF2
3 import hmac
Appendix B. Python code 45

4
5 def from_entropy_to_mnemonic_int ( entropy , ENT) :
6 e n t r o p y _ b y t e s = entropy . t o _ b y t e s ( i n t (ENT/8) , b y t e o r d e r = ’ big ’ )
7 checksum = sha256 ( e n t r o p y _ b y t e s ) . d i g e s t ( )
8 checksum_int = i n t . from_bytes ( checksum , b y t e o r d e r = ’ big ’ )
9 checksum_bin = bin ( checksum_int )
10 while l e n ( checksum_bin ) <258:
11 checksum_bin = ’ 0b0 ’ + checksum_bin [ 2 : ]
12 entropy_bin = bin ( entropy )
13 while l e n ( entropy_bin ) <ENT+ 2 :
14 entropy_bin = ’ 0b0 ’ + entropy_bin [ 2 : ]
15 entropy_checked = entropy_bin [ 2 : ] + checksum_bin [ 2 : 2 + i n t (ENT/32) ]
16 number_mnemonic = (ENT/32 + ENT) /11
17 a s s e r t number_mnemonic %1 == 0
18 number_mnemonic = i n t ( number_mnemonic )
19 mnemonic_int = [ 0 ] ∗ number_mnemonic
20 f o r i i n range ( 0 , number_mnemonic ) :
21 mnemonic_int [ i ] = i n t ( entropy_checked [ i ∗ 1 1 : ( i +1) ∗ 1 1 ] , 2 )
22 r e t u r n mnemonic_int
23
24 def from_mnemonic_int_to_mnemonic ( mnemonic_int , d i c t i o n a r y _ t x t ) :
25 d i c t i o n a r y = open ( d i c t i o n a r y _ t x t , ’ r ’ ) . r e a d l i n e s ( )
26 mnemonic = ’ ’
27 f o r j i n mnemonic_int :
28 mnemonic = mnemonic + ’ ’ + d i c t i o n a r y [ j ] [ : − 1 ]
29 mnemonic = mnemonic [ 1 : ]
30 r e t u r n mnemonic
31
32 def generate_mnemonic_bip39 ( entropy , number_words = 2 4 , d i c t i o n a r y = ’
English_dictionary . txt ’ ) :
33 ENT = i n t ( number_words ∗32/3)
34 mnemonic_int = from_entropy_to_mnemonic_int ( entropy , ENT)
35 mnemonic = from_mnemonic_int_to_mnemonic ( mnemonic_int , d i c t i o n a r y )
36 r e t u r n mnemonic
37
38 def from_mnemonic_to_seed ( mnemonic , passphrase = ’ ’ ) :
39 PBKDF2_ROUNDS = 2048
40 r e t u r n PBKDF2 ( mnemonic , ’ mnemonic ’ + passphrase , i t e r a t i o n s =
PBKDF2_ROUNDS, macmodule = hmac , digestmodule = sha512 ) . read ( 6 4 ) . hex ( ) )

B.3.2 Electrum
These are the functions used to generate a valid Electrum Mnemonic phrase for a
chosen version and the related seed.

1 from h a s h l i b import sha512


2 from pbkdf2 import PBKDF2
3 import hmac
4 import b i n a s c i i
5
6 def from_entropy_to_mnemonic_int_electrum ( entropy , number_words ) :
7 a s s e r t entropy < 2 ∗ ∗ ( 1 1 ∗ number_words )
8 entropy_bin = bin ( entropy )
9 while l e n ( entropy_bin ) < number_words ∗ 1 1 + 2 :
10 entropy_bin = ’ 0b0 ’ + entropy_bin [ 2 : ]
11 entropy_checked = entropy_bin [ 2 : ]
12 mnemonic_int = [ 0 ] ∗ number_words
13 f o r i i n range ( 0 , number_words ) :
14 mnemonic_int [ i ] = i n t ( entropy_checked [ i ∗ 1 1 : ( i +1) ∗ 1 1 ] , 2 )
15 r e t u r n mnemonic_int
16
Appendix B. Python code 46

17 def from_mnemonic_int_to_mnemonic_electrum ( mnemonic_int , d i c t i o n a r y _ t x t ) :


18 d i c t i o n a r y = open ( d i c t i o n a r y _ t x t , ’ r ’ ) . r e a d l i n e s ( )
19 mnemonic = ’ ’
20 f o r j i n mnemonic_int :
21 mnemonic = mnemonic + ’ ’ + d i c t i o n a r y [ j ] [ : − 1 ]
22 mnemonic = mnemonic [ 1 : ]
23 r e t u r n mnemonic
24
25 def bh2u ( x ) :
26 r e t u r n b i n a s c i i . h e x l i f y ( x ) . decode ( ’ a s c i i ’ )
27
28 def verify_mnemonic_electrum ( mnemonic , v e r s i o n = " standard " ) :
29 s = bh2u ( hmac . new ( b " Seed v e r s i o n " , mnemonic . encode ( ’ u t f 8 ’ ) , sha512 ) .
digest ( ) )
30 i f s [ 0 : 2 ] == ’ 01 ’ :
31 r e t u r n v e r s i o n == " standard "
32 e l i f s [ 0 : 3 ] == ’ 100 ’ :
33 r e t u r n v e r s i o n == " segwit "
34 e l i f s [ 0 : 3 ] == ’ 101 ’ :
35 r e t u r n v e r s i o n == " 2FA"
36 else :
37 return False
38
39 def generate_mnemonic_electrum ( entropy , number_words = 2 4 , v e r s i o n = "
standard " , d i c t i o n a r y = ’ E n g l i s h _ d i c t i o n a r y . t x t ’ ) :
40 is_verify = False
41 while not i s _ v e r i f y :
42 mnemonic_int = from_entropy_to_mnemonic_int_electrum ( entropy ,
number_words )
43 mnemonic = from_mnemonic_int_to_mnemonic_electrum ( mnemonic_int ,
dictionary )
44 i s _ v e r i f y = verify_mnemonic_electrum ( mnemonic , v e r s i o n )
45 i f not i s _ v e r i f y :
46 entropy = entropy + 1
47 r e t u r n mnemonic
48
49 def from_mnemonic_to_seed_eletrcum ( mnemonic , passphrase= ’ ’ ) :
50 PBKDF2_ROUNDS = 2048
51 r e t u r n PBKDF2 ( mnemonic , ’ electrum ’ + passphrase , i t e r a t i o n s =
PBKDF2_ROUNDS, macmodule = hmac , digestmodule = sha512 ) . read ( 6 4 ) . hex ( )
47

Bibliography

[1] BIP32, Bitcoin Improvement Proposal number 32


https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki
[2] BIP39, Bitcoin Improvement Proposal number 39
https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki
[3] Electrum Bitcoin Wallet
https://electrum.org
[4] BIP43, Bitcoin Improvement Proposal number 43
https://github.com/bitcoin/bips/blob/master/bip-0043.mediawiki
[5] BIP44, Bitcoin Improvement Proposal number 44
https://github.com/bitcoin/bips/blob/master/bip-0044.mediawiki
[6] BIP49, Bitcoin Improvement Proposal number 49
https://github.com/bitcoin/bips/blob/master/bip-0049.mediawiki
[7] Andreas M. Antonopoulos, Mastering Bitcoin 2nd Edition - Programming the Open
Blockchain. 2017.

[8] Andrea Corbellini, http://andrea.corbellini.name

[9] Bundesamt fur Sicherheit in der Informationstechnik, Elliptic Curve Cryptogra-


phy, 2007.

[10] Christof Paar, Jan Pelzl, Understanding Cryptography, 2010.

[11] Satoshi Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System, 2008.

[12] Bitcoin Core, https://bitcoincore.org

[13] Ferdinando M. Ametrano, Material for the Bitcoin & Blockchain Technology course,
https://github.com/fametrano/BitcoinBlockchainTechnology

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy