The Security of Hidden Field Equations (HFE) : 62 Log D Log D
The Security of Hidden Field Equations (HFE) : 62 Log D Log D
The Security of Hidden Field Equations (HFE) : 62 Log D Log D
Nicolas T. Courtois
Syst`emes Information Signal (SIS), Universite de Toulon et du Var
BP 132, F-83957 La Garde Cedex, France
courtois@minrank.org
http://hfe.minrank.org
Abstract. We consider the basic version of the asymmetric cryptosys-
tem HFE from Eurocrypt 96.
We propose a notion of non-trivial equations as a tentative to account for
a large class of attacks on one-way functions. We found equations that
give experimental evidence that basic HFE can be broken in expected
polynomial time for any constant degree d. It has been independently
proven by Shamir and Kipnis [Crypto99].
We designed and implemented a series of new advanced attacks that are
much more ecient that the Shamir-Kipnis attack. They are practical
for HFE degree d 24 and realistic up to d = 128. The 80-bit, 500$
Patarins 1st challenge on HFE can be broken in about 2
62
.
Our attack is subexponential and requires n
3
2
log d
computations. The
original Shamir-Kipnis attack was in at least n
log
2
d
. We show how to im-
prove the Shamir-Kipnis attack, by using a better method of solving the
involved algebraical problem MinRank. It becomes then in n
3 log d+O(1)
.
All attacks fail for modied versions of HFE: HFE
(Asiacrypt98),
HFEv (Eurocrypt99), Quartz (RSA2000) and even for Flash (RSA2000).
Key Words: asymmetric cryptography, nite elds, one-way functions, Hidden
Field Equation, HFE problem, basic HFE, MinRank problem, short signatures.
1 Introduction
The HFE trapdoor function Eurocrypt 96 [14], dened in 4, is one of the most
serious alternative trapdoor functions. It generalizes the previous Matsumoto-
Imai cryptosystem from Eurocrypt 88 [10] broken by Patarin in [13, 14].
HFE operates over nite elds. In this paper we restrict to the basic version
of HFE, and to elds of characteristic 2. Thus we study a trapdoor function
F : GF(2
n
) GF(2
n
). We focus on the cracking problem of computing the
inverse of the basic HFE encryption function, without trying to recover its secret
key.
In the section 2 we attempt to base a notion of a one-way function on algebraic
criteria. We propose a boosting model which is nothing else that a kind of se-
mantics of all deterministic cryptographic attacks. This approach, subsequently
narrowed down, proves particularly relevant to HFE attacks. The security is ex-
pressed in terms of properties of implicit equations that relate the inputs x
i
and
the outputs y
i
of a function. An equation substituted with a given output value
may, or may not, produce a new non-trivial equation on x
i
. New equations boost
the set of known linearly independent equations on the x
i
, and at some point
they should allow to compute the actual values of the x
i
.
There is no doubt that our problem is closely related to polynomial elimi-
nation (Grobner bases, XL algorithm [21]). Thus in section 3 we study the NP-
complete problem of solving multivariate quadratic equations called sometimes
MQ. A simple idea of linearizing and applying Gauss elimination can indeed be
seen as eliminating equations (simple case of Grobner bases algorithm), however
we reinterpret it in section 3 in terms of implicit equations.
We distinguish between this elimination paradigm and our approach called
implicit equations paradigm. Those methods ar dierent and complementary.
We dont combine equations formally, trying to eliminate among all equations
that we could construct within some size limitation. Instead of that, the problem
is to nd special subsets of such equations, that for algebraical reasons might be
related. We are not limited (at all) by the size the equations, but only the size
of the subset we selected (!).
The whole idea that it is interesting to do so, is the object of this paper. We
may go back to the cryptanalysis of the Matsumoto-Imai cryptosystem described
briey in 4.1, to understand that algebraical reasons may suggest (or prove) the
existence of some type of equations. The idea had several generalizations, such
as the ane multiple attack by Jacques Patarin [13, 9] and other described here
and in [4]. It was already known since [14] that some such equations will exist
for basic HFE. In the present paper we show precisely what kind of equations
exist and how to use them in realistic attacks.
Though it is very clear that the equations we have found in the present
paper, exist for algebraical reasons, we were not able to explain them. They
have been found on much more experimental basis, and it remains an open
problem to understand them better. We did several months of extended computer
simulations (section 5.6), to nd memory-ecient types of equations that gave
what is now the best known attack on basic HFE.
In the whole process of solving equations by nding other equations, we
had to distinguish dierent types of equations. We denote them by expressions
in x, y, X, Y , see section 5.1). We also distinguish several kinds of equations in
terms of both their behaviour and a way they have been computed. Thus we had
to invent some special vocabulary and notations, especially that some notions
are informal.
Aglossary of words that have special meaning in this paper, usually double-
quoted, along with common notations, is compiled at the end of the paper.
The section 5 shows precisely several classes of equations we have found and
their immediate applications in an attack. Thus we get a strong experimental
evidence that basic HFE can be broken in expected polynomial time if the degree
d is constant. The same result has just been independently found by Shamir and
Kipnis at Crypto99 [23].
We show that basic HFE is not secure for degree d 24, while the original
paper [14] suggested the HFE degree d = 17 as secure enough. Therefore, as we
show in 5.10, in order to break the 500$ HFE challenge with d = 96 we need 2
62
computations and 33 Tb of memory.
We introduced successive improvements to this attack. First, it is in fact
possible to recover, recompose and use only parts of the equations (reconcilia-
tion attack) - section 6.1. Secondly, the distillation attack of section 6.1-6.3
manages also to remove other, interference equations that unfortunately ap-
pear when the parts are too small. The nal output is a method that uses very
long equations without ever computing them, which dramatically reduces the
memory requirements for Challenge 1 to 390 Gb.
In the section 7.1 we estimate the asymptotic complexity of our attacks. It is
polynomial for a xed HFE degree d and subexponential in general. If we go back
to the Shamir-Kipnis attack on (basic) HFE from Crypto99 [23], though it is
very dierent, it gives similar results with much worse complexity. In the section
8 we introduce an improved version of it, that gives the asymptotic complexity
similar that our attacks.
It is not true that HFE is broken. All attacks may have substantial complexity
and completely fail for any modied version of HFE, see section 10.
2 Algebraic Paradigm for One-wayness
Lets consider any attack on any deterministic one-way function which we sup-
pose described as a set of explicit arithmetic formulae y
i
= F
i
(x
1
, . . . , x
n
). We
point out that following the rst Godel theorem, such equations can be written
for any deterministic algorithm. The answer x we are looking for is also seen as
a set of equations, though much simpler x
i
= . . ., which a hypothetical attack
would evaluate to. Therefore any deterministic attack, is a series of transfor-
mations that starts from somewhat complex equations and eventually produces
simpler ones. We call these boosting transformations as they boost the num-
ber of all equations with a know value, and produce simpler and therefore more
meaningful equations. But what are simple or complex equations ? We must
adopt a necessarily restrictive approach with a notion of complexity.
One possible notion of complexity is the non-linear degree. Every boolean
function is a multivariate polynomial over GF(2) (algebraic normal form). It
seems to be an appropriate measure of complexity, especially to study HFE,
based itself on bounded degree (quadratic) equations.
We would like to dene a secure cryptographic primitive. However we dont
say that they are no attacks, neither that all the attacks fail, which means little.
We try to formalize how they fail.
The random oracle paradigm would be to ignore that the function formulae
exist. It is used for a symmetric primitives but is meaningless for asymmetric
primitives. Indeed, they are usually described by some strikingly simple equa-
tions e.g. x x
e
. Thus, after all, this belief about every attack being kind of
completely puzzled by the irreducible randomness of answers to all possible
questions, maybe it is not necessary at all to achieve security ?
We can even admit that some attacks exist, as long as they are hard to nd
and we cannot know the result before we executed the whole attack (experimen-
tal attacks without theoretical basis). For such general attacks, we suppose them
to fail in most cases, even if they always do output some new equations. In fact
its very likely that we get only equations that are trivial combinations of those
we have known and/or of higher degree that those given. Such a primitive would
be considered secure.
Denition 2.0.1 (A one-way function - very informal). is a function that
admits only trivial equations.
It is an attempt to give an algebraic denition of a one-way function. Still
we need to precise what are trivial and non-trivial equations.
Denition 2.0.2 (Trivial equations - informal). are explicit bounded de-
gree polynomials over the equations Y
i
and variables x
i
that does not exceed a
given maximum size
max
(or of polynomial size) and such that their degree as a
function of x
i
does not collapse.
Denition 2.0.3 (Non-trivial equations -informal). are also bounded com-
binations of the Y
i
and x
i
, limited in size all the same, but their degree does
collapse.
These equations, though could be generated explicitly are obtained in an
attack in an implicit way. We solve equations on their coecients that come
from the expressions of the Y
i
or from a series of (cleartext,ciphertext) pairs
(x, y).
3 Solving Quadratic Boolean Equations, MQ over GF(2)
In this paper we always consider n
b
quadratic equations y
i
= Y
i
(x
1
, . . . , x
n
a
)
with n
a
variables x
i
GF(q). If otherwise stated n
a
= n
b
= n and q = 2.
The general problem of solving quadratic equations is called MQ and proved
NP-complete, in [18, 7], which guarantees (only) worst-case security. However in
the current state of knowledge, the MQ problem is hard even in average case,
see [21] and about as hard as the exhaustive search in practice for n < 100 [21].
The Gaussian reduction that eliminates variables, can also be applied to
MQ if n
b
> n
a
(n
a
1)/2. Thus the so called linearization puts z
i
= x
i
x
k
and
eliminates the new variables. We say rather that it implies the existence of at
least n
b
n
a
(n
a
1)/2 equations of the form:
i
y
i
=
i
x
i
+
We call it equations of type X+Y later on, and the important point is
that the fact that n
b
> n
a
(n
a
1)/2 implies their existence, but the reverse is
obviously false. They may exist even if for small n
b
and its always interesting
to check if they do.
4 The HFE Problem
We give a simple mathematical description of the so called HFE problem.
More details on various aspects of HFE can be found in [14, 5, 4, 15, 18].
The HFE problem dened below is dened as nding one reverse image for a
basic version of the HFE cryptosystem exactly as initially proposed at Eurocrypt
1996 [14]. First we recall two basic facts from [14]:
Fact 4.0.4. Let P be a polynomial over GF(q
n
)of the special form:
P(a) =
i
a
q
s
i
+q
t
i
. (1)
Then P can be written as n multivariate quadratic equations equations over
the a
i
GF(q).
Fact 4.0.5 (HFE trapdoor). If P is a polynomial of degree at most d that
P
1
(b) can be computed in time d
2
(ln d)
O(1)
n
2
GF(q) operations, see [14, 8].
Denition 4.0.6 (HFE Problem). Let S and T be two random secret bijec-
tive and ane multivariate variable changes. Let
F = T P S. (2)
We believe that its dicult to compute F
1
as far as its decomposition
F
1
= S
1
P
1
T
1
remains secret.
4.1 Examples of HFE Problem
The simplest non-linear case of basic HFE is P = a
q
+q
. It is called the
Matsumoto-Imai cryptosystem (or C
ij
x
i
y
j
+
i
x
i
+
j
y
j
+ = 0
The Attack is as follows: rst we recover these equations by Gaussian elimination
on their coecients. Then we recover x substituting y in these equations.
4.2 HFE Challenge 1
It has been proposed by Jacques Patarin in the extended version of [14].
The HFE polynomial is of degree d = 80 over GF(2
n
) with n = 80 bits.
The price of 500$ is promised for breaking the signature scheme that amounts
to computing F
1
three times. An example of F can be downloaded from [5].
5 Implicit Equations Attack
5.1 Types of equations
We have a convention to describe an equation type:
1. The equation type is a union of terms in formal variables x, y, X, Y , for
example: XY x
2
.
2. A term x
k
y
l
denotes all the terms of degree exactly k in all the x
i
, i = 1..n
a
and of degree exactly l in y
i
, i = 1..n
b
.
Important: If the variables are in GF(q), the degrees must be in [0..q 1].
3. The capital X, Y describe equation sets that include all the lower degree
terms. For example: XY x
2
1 x y xy x
2
.
4. If necessary we distinguish by XY x
2
the set of terms used in the corre-
sponding equation type, while [XY x
2
] denotes the set of equations of this
type.
5.2 Invariant Equations
Denition 5.2.1 (Invariant equations). Set of equations with their set of
terms invariant modulo any bijective ane S and T variable changes.
For example [X
2
Y ] is invariant but not [x
2
y]. The denition states that the
sets of terms involved are invariant, that implies that the number of equations
that exist for a given type is invariant (but each of the equations is invariant).
If the equations are invariant, the number of equations of a given type will
be the same for any output value. Thus we can assume that we are solving
F
1
(y) with y = 0 without loss of generality. We make this assumption for all
subsequent attacks. The problem of the invariant equations of higher degree is
that they are still at least quadratic after substituting y.
5.3 Biased Equations
Denition 5.3.1 (Biased). equations are the equations that after substitution
of y = 0 reduce to a ane equation of the x
i
( type X).
Proposition 5.3.2. If there is enough invariant equations, there exist enough
biased equations.
Enough means the equal to the number of terms remaining after substitution
of y = 0. The proposition is trivial, we eliminate in a set of implicit equations
all the terms of X
(log
q
d|, type)
with a constant (log
q
d|, type). We postulate that:
Conjecture 5.7.1. A basic HFE (or the HFE problem) of degree d admits O(n)
equations of type [X x
2
y . . . x
1
2
log
q
d1
y].
In a later attack we will cast these equations over a smaller subspace, but we
will see in the section 6 that we can only recover them starting from a threshold
n
a
= n
art
(n, type), a threshold memory (usually in Terabytes) and a threshold
computing power. It means that todays computers are not powerful enough to
nd what happens for the equations more complex that the one we have already
studied (!)
5.8 The Complexity of the Attacks
The memory used in the attack is quadratic in size and is equal to size
2
/8 bytes.
In terms of speed, the essential element of all the attacks is the Gaussian
elimination. Though better algorithms exist in theory, [3], they are not practical.
We have implemented a trivial algorithm in O(size
3
). A structured version of
it can go as fast as CPU clock while working on a huge matrix on the disk (!).
Assuming that a 64-bit XOR in done in one clock cycle, we estimate that the
structured elimination takes 2 size
3
/64 CPU clocks.
5.9 Realistic HFE Attacks when d 24
We see in 5.6 that for d <= 24 equations of type XY x
2
y xy
2
give between
O(n) and O(n
2
) equations, enough to break basic HFE. For example we consider
an attack for n = 64 bits HFE with the degree d 24:
size
XY x
2
yxy
2(64, 64) = O(n
3
) (3)
The precise computation yields size = 262 273 and thus the memory required
in the attack is size
2
/8 = O(n
6
) = 8 Gb. The running time is 2 size
3
/64 2
48
CPU clocks, few days on a PC, and it is not our best attack yet.
Thus basic HFE is not secure for d 24. The asymptotic complexity is at
most O(n
9
).
5.10 Direct Attack on Challenge 1
Now we try to use the equations of type XY x
2
y xy
2
x
3
y x
2
y
2
to break
this degree 96 basic HFE. We have
size
XY x
2
yxy
2
x
3
yx
2
y
2(80, 80) = 17 070 561 (4)
The memory required is not realistic: size
2
/8 = 33 Terabytes. The running
time is 2 size
3
/64 2
62
CPU clocks.
6 Advanced Attacks
6.1 Reconciliation Technique
Since the main problem of the attacks is the size of the equations, it is a very
good idea to compute these equations only partly. We x to zero all x
i
except n
a
of them. We call cast equations the equations we get from the initial equations.
Unfortunately if n
a
is too small, there are some more equations that we call
articial equations. We show that the cast equations of trivial equations are
trivial and the cast equations of articial equations are articial. In [4] we have
managed to predict the number of articial equations with a great accuracy.
For example, if n = n
b
= 80 we computed:
n
art
(XY x
2
y xy
2
x
3
y x
2
y
2
) = 38 (5)
It means that the cast (and non-trivial) equations are known modulo a
linear combination of some interference equations (articial equations), that
make the resulting mix unusable for n
a
< 38.
The reconciliation attack works before the threshold when articial equa-
tion arise. The necessary condition is thus n
a
n
art
.
Moreover the equations are recovered modulo a linear combination, and we
need to, make sure that it is possible to generate cast equations, such that
their intersections are big enough to recover uniquely their corresponding linear
combinations. This leads to an additional condition.
Thus we will recover the equations from dierent casts. In fact we do not
exactly recover the whole equations but only a part of them that contains rstly
enough terms to combine dierent casts, and secondly their constant coecients
and coecients in x
i
, as only those are necessary to compute x and break HFE.
.
6.2 The Distillation Technique
In the distillation attack we show that there is another, strictly lower thresh-
old, and HFE can be broken in spite of the interference equations. The idea is
very simple, the articial equations alone doesnt have any sense with relation
to initial (huge) equations and can be eliminated from dierent casts.
In [4] we show that if the following distillation condition is true:
artificial(n
a
1, n
b
) artificial(n
a
, n
b
). (6)
then a successful attack can be lead.
6.3 Distillation Attack on Challenge 1
For n
b
= 80 and type XY x
2
y xy
2
x
3
y x
2
y
2
, the solution for the distil-
lation condition above is computed in [4] to be n
a
30.
The working size of the attack is:
size
XY x
2
yxy
2
x
3
yx
2
y
2(30, 80) = 1 831 511. (7)
We need only size
2
/8 = 390 Gb of memory instead of 33 Tb in the direct
attack of section 5.10. Following [4], the running time is computed as (80 30 +
1) 2 size
3
/64 2
62
CPU clocks.
6.4 Sparse methods
In the attacks above, we have to solve systems of several million equations with
several million variables. Such equations could be sparse, if we try to recover
them in a slightly dierent way. We build a matrix with columns corresponding
to each component of the equation, for example y
1
y
4
or x
2
y
55
y
9
. Each line of
the equation will correspond to a term, for example x
3
x
5
x
7
x
16
. We only need to
consider about as many terms as size, (there is much much more) though sparse
methods [Lanczos, Wiedemann] could take advantage if we generated more.
Such a system of equations is sparse, for example the column x
2
y
55
y
9
contains
non-zero coecients only for terms containing x
2
, therefore for about 1/n of all
terms.
In [12] we hear that with size = 1.3M (million), a system over GF(2) could
be solved in few hours on one processor of CrayC90 using modied Lanczos al-
gorithm. Their system had only 39M non-zero coecients, i.e. about 1/40000
of them. Assuming that sparse methods would combine with reconciliation and
distillation, for our systems of size = 1.8M we have about 1/80 non-zero coe-
cients, much more.
Thus it is unclear if any of the aforementioned sparse methods could improve
on the attack.
7 Asymptotic Security of basic HFE
First, if d is xed, we have found in 5.6 an experimental evidence that basic
HFE can be broken in expected polynomial time. The same result has just been
independently shown by Shamir and Kipnis at Crypto99, see [23].
Our attack in a basic version based on conclusions form 5.7 (no reconciliation,
no distillation) gives about:
size n
1
2
log
q
d
. (8)
In [4] we show that the distillation attack gives roughly:
size n
n
1
2
log
q
d
n
1
4
log
q
d
. (9)
We retain a conservative approximation:
size n
1
2
log
q
d
. (10)
7.1 Results
Therefore the security of basic HFE is not better than:
security n
3
2
log
q
d
. (11)
If the distillation attack works as well as estimated in [4], it would give even:
security n
3
4
log
q
d
. (12)
First, we compare it to the secret key operations of HFE. It requires to
factorise the degree d polynomial P over a nite eld. The asymptotically fastest
known algorithm to solve a polynomial equation P over a nite eld of von zur
Gathen and Shoup [8] requires about d
2
(log
q
d)
O(1)
n
2
operations. At any rate
we need d = n
O(1)
to enable secret key computations [14]. Thus:
security n
O(log
q
n)
e
(log
2
q
n)
. (13)
In [4] it has been shown that the complexity of Shamir-Kipnis attack is rather
in n
O(log
2
q
d)
which gives e
O(log
3
q
n)
. We are going to improve it to get a similar
result.
8 Shamir-Kipnis Attack Revisited
The starting point here is the Shamir-Kipnis attack for basic HFE, [23] that we
do not describe due to lack of space. It shows there exist t
0
, . . . , t
n1
GF(q
n
)
such that the rank of
G
=
n1
i=0
t
k
G
k
(14)
collapses to at most r = 1 + log
q
d|, with G
k
being n public matrices n n
over GF(q
n
).
The underlying problem we are solving is called MinRank [6]. Shamir and
Kipnis solved it by what is called relinearization, see [21] for improvements on
it. We do not use it, and instead we solve MinRank directly. Our method is
identical as previously used by Coppersmith, Stern and Vaudenay in [1, 2].
We write equations in the t
0
, . . . , t
n1
saying that every (r + 1)x(r + 1)
submatrix has determinant 0. Each submatrix gives a degree (r + 1) equation
on the t
0
, . . . , t
n1
over GF(q
n
). There are as much as
n
r+1
2
such equations
and we hope that at least about
n
r+1
n
r + 1
n
r+O(1)
n
log
q
d+O(1)
, (15)
which gives similar results as our attacks:
security n
O(log
q
d)
. (16)
9 Is basic HFE likely to be polynomial ?
The MinRank is an NP-complete problem for e.g. r = n 1 [24, 6]. It seems
therefore unlikely that our attack for MinRank in n
O(r)
could ever be improved
to remain polynomially bounded when r grows.
The same remark applies to our equational attacks. When d grows, the HFE
problem (i.e. basic HFE) tends to the NP-complete MQ problem of solving ran-
dom quadratic equations, see [14, 15, 4].
10 Conclusion
The best known HFE attack is our distillation attack for basic HFE. Its not
proven to work for d >> 129 but relies on an extensive experimental evidence.
we have also the Shamir-Kipnis attack, and rather our improved version of it,
that though worse in practice comes with a proof [23].
They both give the complexities in n
O(log
q
d)
to break the basic HFE version.
It is polynomial when d is xed and subexponential in general. Both presented
attacks on HFE are much better that any previously known.
Even with the signicant progress we have made, the attacks still have the
complexity and memory requirements that can quickly go out-of-range. Though
it is certain that attacks will be improved in the future, HFE can be considered
secure for d > 128 and n > 80.
Perspectives
The basic version of HFE is broken for the initially proposed degree d 17 [14]
and even for d 24. Our attacks has been tested to work for d 128, and thus
the HFE Challenge 1 is broken in 2
62
.
HFE modications that resist to all known attacks.
Several HFE problem-based cryptosystems avoid all the attacks described in the
present paper. We veried that our attacks rapidly collapse for these schemes:
HFE
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: