Diophantine Equations: 2 Mordell's Equation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Diophantine equations

F.Beukers
Spring 2011

2 Mordell’s equation
2.1 Introduction
Let d ∈ Z with d ̸= 0 and consider the equation y 2 + d = x3 in x, y ∈ Z. This equation is
known as Mordell’s equation. We shall prove the following Theorem.

Theorem 2.1.1 (Mordell, 1922) For given d ̸= 0, the equation y 2 + d = x3 in x, y ∈ Z has


at most finitely many solutions.

Actually Mordell proved a more general theorem, but we will come back to that later. It
should be emphasized that Mordell’s proof is only a finiteness result, no algorithm is provided
to actually solve the equation. Nowadays we also have methods to solve the equation explicitly.
The first results in this direction are based on A.Baker’s technique of linear forms in logarithms
starting in 1966. This work earned Baker the Fields medal. Here is a theorem based on Baker’s
methods.

Theorem 2.1.2 (Sprindzuk, 1982) There exists an effectively computable number C > 0
such that any solution x, y ∈ Z of y 2 + d = x3 with d ̸= 0 satisfies
( )
|x|, |y| ≤ exp C|d|(log |d| + 1)6 .

Note that the bound for x, y is roughly exponential in |d| with a very large constant C. One
expects that a much sharper bound holds. This is based on the following conjecture.

Conjecture 2.1.3 (Hall, 1971) . To every ϵ > 0 there is a positive real number c(ϵ) such
that
|y 2 − x3 | > C(ϵ)x1/2−ϵ
for any x, y ∈ Z>0 with y 2 ̸= x3 .

1
Actually Hall conjectured the lower bound Cx1/2 for some C > 0, but this is generally believed
not te be true.
As a consequence of Hall’s conjecture we see that |x|, |y| ≤ c1 (ϵ)|d|2+ϵ . In other words, the
expected upper bounds for x, y are polynomial in |d|.
Nowadays there are explicit algorithms to solve Mordell’s equation. In [GPZ] all equations
with |d| ≤ 10000 are solved. A particularly spectacular example is y 2 − 17 = x3 . In 1930
T.Nagell showed that the complete set of solutions with y > 0 reads,

(x, y) = (−2, 3), (−1, 4), (2, 5), (4, 9), (8, 23), (43, 282), (52, 375), (5234, 378661)

References:

[M ] L.J.Mordell, Diophantine Equations, Academic Press 1969, Chapter 26

[LF ] H.London, R.Finkelstein, On Mordell’s equation y 2 − k = x3 , Bowling Green State


University Press 1973.

[GPZ ] J.Gebel, A.Pethö, H.G. Zimmer, On Mordell’s equation Compositio Math 110 (1998),
335-367.

2.2 Special cases


Using methods from elementary algebraic number theory we can deal with certain sets of
Mordell equations.

Proposition 2.2.1 Let d > 0, d square-free, d ̸≡ −1(mod 4) and h(Q( −d)) is not divisble
by 3. Suppose y 2 + d = x3 has a solution. Then d = 3a2 ± 1 for some a ≥ 0 and some choice
of ± sign. The solution set consists of (x, y) = (4a2 − 1, ±(8a3 − 3a)) if d = 3a2 − 1 and
(x, y) = (4a2 + 1, ±(8a3 + 3a)) if d = 3a2 + 1.

This proposition implies for example that y 2 + 1 = x3 has no non-trivial solutions and that
y 2 + 2 = x3 has (x, y) = (3, ±5) as solution set. The latter fact was stated already by Fermat,
but not proved.
Here is a proof of our Proposition. Suppose y 2 + d = x3 . First we note that gcd(x, 2d) = 1.
For if an odd prime p divides both d and x we see that p should divide y as well. But since p2
divides both x3 and y 2 we find that p2 divides d, contradicting the fact that d is square-free.
If x were even, then y 2 ≡ −d(mod 8). But since −d ̸≡ 0, 1(mod 4) we get a contradiction
again. We now assume that gcd(x, 2d) = 1.
We obtain the following factorisation
√ √
(y + −d)(y − −d) = x3 .
√ √
Since d is square-free and d ̸≡√1(mod 4)√the ring of integers in Q( −d) is Z[√ −d]. Let ℘ be
a prime ideal divisor of (y + −d, y − −d). Then it also divides x and 2 −d. Hence its
divides x and 2d. This contradicts gcd(x, 2d) = 1 and we conclude that the principal ideals

2
√ √
(y + −d)√and (y − −d) are relatively prime. Their product is a cube and so √ we conclude
that (y + −d) itself is the cube of an ideal, which we call I. So we get (y + −d) = I 3 .
Note that I 3 is a principal ideal. Hence its order in the ideal class group is either 1 or 3. But
we are given that the class number is not divisible by 3. So the order of√I is the ideal class
group is 1, hence I is principal. There exist a, b ∈ Z such that I = (a + b −d). Hence
√ √
y + −d = ϵ(a + b −d)3
√ ∗
√ ϵ∗ is a unit in the ring of integers. When d > 1 we have Z[ −d] = {±1} and
where
Z[ −1] = {±1, ±i}. In both cases the unit group has order relatively prime to 3, hence
every unit can be considered as the cube of another unit. After redefining a, b if necessary,
we get √ √
y + −d = (a + b −d)3

Comparison of the coefficients before −d gives 1 = 3a2 b − db3 = b(3a2 − db2 ). We see that
b = ±1 and 3a2 − db2 = ±1. If b = 1, then 3a2 − d = 1 and so, d = 3a2 − 1. The value of y is
a3 −3ab2 d = a3 −3a(3a2 −1) = −(8a3 −3a). The value of x is a2 +db2 = a2 +3a2 −1 = 4a2 −1.
When b = −1 we proceed similarly. qed
In a very similar way we can show that

Proposition 2.2.2 Let d > 0, d square-free, d ≡ −5(mod 8) and h(Q( −d)) not divisible
by 3. Suppose that y 2 + d = x3 has a solution. Then one of the following cases holds,
1. There exist a ∈ Z and ϵ ∈ {±1} such that d = 3a2 + ϵ. The solutions read (x, y) =
(4a2 + ϵ, ±(8a3 + 3ϵa)).

2. There exist a ∈ Z and ϵ ∈ {±1} such that d = 3a2 + 8ϵ. The solutions read (x, y) =
(a2 + 2ϵ, ±(a3 + 3ϵa)).
We can apply this Proposition to y 2 +11 = x3 . Note that d = 11 satisfies all of our conditions.
Moreover, 11 = 3 · 12 + 8, which gives rise to the solutions (x, y) = (3, ±4). In addition,
11 = 3 · 22 − 1 giving rise to (x, y) = (15, ±58).
So far we dealt with d > 0 in our equation. Let us consider an example with d < 0, namely
y 2 − 17 = x3 . This known to have the solution set

(x, y) = (−2, ±3), (−1, ±4), (2, ±5), (4, ±9), (8, ±23), (43, ±282), (52, ±375), (5234, ±378661)

We make a beginning with its solution. We factor as


√ √
(y + 17)(y − 17) = x3

in the field K = Q( 17). In K we have √ unique factorization into irreducibles
√ and its group
of units is generated by −1 and 4 + 17. A further remark √ is that (5 ± 17)/2 are the
irreducible divisors
√ of 2. The √ ring of integers is Z[(1 + 17)/2]. √Let π be a common
√ prime
divisor of y + 17 and y − 17. Then π divides both 2y and 2 17. If π = 17 then y is
divisible by 17, as well as x. The difference 17 = x3 − y 2 would then be divisible by 172 which

3
is not possible. We conclude that π divides 2. But then x is even. We separate according to
the cases x even or odd. √ √
Suppose x is odd. Then, by the above, y + 17 and y − 17 have no common prime divisor
and hence, by unique factorization, there exists an integer α ∈ OK and a unit η such that

y + 17 = ηα3 .
√ k
The units η √are of the form
√ ±(4 + 17) . Of course −1 is a cube and the unit √η is a cube
times 1, 4 + 17 or 4 − 17. Hence there exists an integer α ∈ OK such that y + 17 equals
one of the following √ √
α3 , (4 + 17)α3 , (4 − 17)α3 .

Let √us write α = (a + b 17)/2 with a, b ∈ Z√having the same parity. Then, in the case
y + 17 = α3 comparison of the coefficients of 17 gives

8 = 3a2 b + 17b3 .

Since b ̸= 0 implies
√ √ 17b ≥ 17√we see
3a2 + 2
we get a contradiction. Comparison of the
coefficients of 17 in y + 17 = (4 + 17)α3 yields

8 = a3 + 12a2 b + 51ab2 + 68b3 .

Replace a by a − 4b to get
8 = a3 + 3ab2 − 8b3 .
Hence a(a2 + 3b2 ) ≡ 0(mod 8), which implies that a, b should both be even. So replace a, b
by 2a, 2b to get
1 = a3 + 3ab2 − 8b3
We do not solve this equation, but note that the solutions (a, b) = (1, 0), (−3, −2) give
√ rise
to the
√ solutions (x, y) = (−1, 4), (43, 282) of the Mordell equation. The case y + 17 =
(4 − 17)α runs similarly.
3

Suppose now that x is even. The y is odd and y ± 17 divisible by 2. Hence, upon replacing
x by 2x, √ √
y + 17 y − 17
= 2x3
2 2
From this we deduce the following possibilities
√ √ √
y + 17 5 ± 17 3 5 ± 17 √
= α , (4 ± 17)α3 .
2 2 2

for choice of ± signs and some algebraic integer α = (a + b 17)/2.
The first case with + sign gives

8 = a3 + 15a2 b + 51ab2 + 85b3

4
Replace a by a − 5b to get
8 = a3 − 24ab2 + 80b3
Hence a is even. Replace a by 2a to get
1 = a3 − 6ab2 + 10b3
we have the small solutions (a, b) = (1, 0), (−3, 1). They give rise to the solutions (x, y) =
(2, 5), (52, −375) of Mordell’s equation.
In a similar way the other cases also give rise to diophantine equations of the form f (x, y) = 1
for cubic homogeneous polynomials f ∈ Z[x, y].
In the following section we prove the following theorem.
Theorem 2.2.3 For any k ∈ Z ̸= 0 the solution of the diophantine equation y 2 + k = x3
in x, y ∈ Z can be reduced to the solution of a finite set of diophantine equation of the form
f (x, y) = 1 in x, y ∈ Z where f is a binary cubic form with integer coefficients. Moreover,
the set of forms f can be computed explicitly.

2.3 Binary forms


A binary form is a polynomial in two variables. The general shape of a binary form of degree
n reads
an X n + an−1 X n−1 Y + an−2 X n−2 Y 2 + · · · + a1 XY n−1 + a0 Y n
Two binary forms f (X, Y ), g(X, Y ) are called equivalent if there exist p, q, r, s with ps−qr = 1
such that g(X, Y ) = f (pX + qY, rX + sY ).
In invariant theory one looks for polynomials in the coefficients a0 , a1 , . . . , an which are invari-
ant under equivalence transformations. The most familiar one is the discriminant of a form
f (X, Y ), defined by ∏
D = a2n−2
n (αi − αj )2
i<j

where α1 , . . . , αn are the zeros of the polynomial f (X, 1). One can show that D ∈ Z[a0 , a1 , . . . , an ].
Here are two examples,
Binary quadratic forms aX 2 + 2bXY + cY 2 with discriminant
D = 4(b2 − ac).
Binary cubic forms aX 3 + 3bX 2 Y + 3cXY 2 + dY 3 with discriminant
D = 27(−a2 d2 + 6abcd + 3b2 c2 − 4ac3 − 4db3 ).
For quadratic and cubic forms the discriminant D and polynomials in D are the only invari-
ants. For quartic forms a4 X 4 +4a3 X 3 Y +6a2 X 2 Y 2 +4a1 XY 3 +a0 Y 4 there are two independent
invariants namely
I2 = a0 a4 − 4a1 a3 + 3a22
I3 = a0 a2 a4 − a0 a23 − a21 a4 + 2a1 a2 a3 − a32

5
The ring of invariants is the polynomial ring generated by I2 and I3 . In particular, D =
27(I22 − 27I33 ).
We shall now concentrate on binary forms with a0 , a1 , . . . , an ∈ Z and call them integral binary
forms. In particular the discriminant is an integer. Two integral forms f (X, Y ), g(X, Y ) will
be called SL(2, Z)-equivalent, or simply equivalent, if there exist p, q, r, s ∈ Z with ps−qr = 1
such that g(X, Y ) = ±f (pX + qY, rX + sY ). We have the following Theorem.
Theorem 2.3.1 The number of equivalence classes of binary integral forms of given degree
and given discriminant is finite.
For quadratic forms there is a very explicit reduction procedure from which finiteness of
the number of equivalence classes of discriminant D follows. Let us start with an arbitrary
quadratic form aX 2 + 2bXY + cY 2 which we abbreviate by [a, b, c]. Note that we have chosen
the coefficient of XY to be even. Such quadratic forms are called even. Although one could
also consider odd quadratic forms we concentrate here only on th even ones.
We keep on repeating the following steps. If |b| > |a|/2 we choose k such that |b + ka| ≤
|a|/2. Replace X by X + kY to get the new form [a, b, c] := [a, b + ka, c + 2bk + ak 2 ]. If
|c| < |a| we make the substitution (X, Y ) → (−Y, X) which changes our form into [a, b, c] :=
[c, −b, a]. Repeating this procedure we end up with an equivalent form [a, b, c] which satisfies
√ |2b| ≤ |a| ≤ |c|. From this we derive that |D| ≥ |ac| − b ≥ 3b2 . Hence |b| is
2 2
the inequalities
bounded by |D|/3. This gives a finite number of values of b and through b − ac = D we
get a finite number of values of a, c.
Example. We determine all equivalence classes of even quadratic forms aX 2 + 2bXY + cY 2
with b2 − ac = 17. According√ to the above reduction procedure we can restrict ourselves
to a, b, c satisfying |b| ≤ 17/3. So b = 0, ±1, ±2. The corresponding a, c follow from
b − ac = 17 and |c| ≥ |a| ≥ |2b|. We get the following list of possibilities with a > 0,
2

X 2 − 17Y 2
2X 2 ± 2XY − 9Y 2
3X 2 ± 2XY − 6Y 2
We now turn to cubic forms f (X, Y ) = aX 3 + 3bX 2 Y + 3cXY 2 + dY 3 . We construct the
Hessian form
−1 fXX fXY
H(X, Y ) =
36 fXY fY Y
which turns out to be H(X, Y ) = (b2 − ac)X 2 + (bc − ad)XY + (c2 − bd)Y 2 . We also define
the cubic form
1 fX fY
G(X, Y ) = .
3 HX HY
The forms H, G are called the covariants of f of degrees 2 and 3. The discriminant of f equals
27D1 where D1 = −a2 d2 + 6abcd + 3b2 c2 − 4ac3 − 4db3 . The discriminant of H equals −D1 .
Proposition 2.3.2 Let notation be as above. Then
G2 + D1 f 2 = 4H 3

6
We can now make the following observation. Let f be a binary cubic form with D1 = 4k such
that f (x, y) = 1 has the solution x0 , y0 . Then Mordell’s equation y 2 + k = x3 has the solution
y = G(x0 , y0 )/2, x = H(x0 , y0 ). It turns out that the converse is also true.

Proposition 2.3.3 Consider the equation y 2 + k = x3 and suppose that we have a solution
p, q. Then the cubic form f (x, y) = x3 − 3pxy 2 + 2qy 3 has D1 = 4k and p = H(1, 0), q =
G(1, 0)/2. Note in addition that H(X, Y ) = pX 2 − 2qXY + p2 Y 2 , so H is an even form. We
also have G(X, Y ) = 2(−qX 3 + 3p2 X 2 Y − 3pqXY 2 + (−p3 + 2q 2 )Y 3 ), i.e. G(X, Y )/2 is an
integral form.

The proof of this Proposition follows by direct computation.


To solve Mordell’s equation it suffices to find a representing element of each equivalence class
of integral cubic forms with discriminant 108k. For each such form f we solve the equation
f (x, y) = 1 in x, y ∈ Z. The latter equation is known as a cubic Thue equation. In the
next section we will see that it has finitely many solutions. Assuming this we now see that
Mordell’s finiteness result 2.1.1 follows.

7
3 Thue’s equation
3.1 Introduction
Let F be an integral binary form and m a non-zero integer. The equation

F (x, y) = m

in x, y ∈ Z is called the Thue equation.

Theorem 3.1.1 (Thue, 1909) Let F be an integral binary form such that F (x, 1) has at
least three distinct zeros. Let m be a non-zero integer. Then the equation F (x, y) = m has at
most finitely many solutions.

Note that if F is reducible over Z then we can restrict ourselves to equations of the form
G(x, y) = m′ where G is an irreducible factor of F and m′ a divisor of m. Notice also that the
requirement of at least three zeros is essential. An example of a quadratic equation would be
Pell’s equation x2 − dy 2 = 1 which is known to have infinitely many solutions if d is a positive
integer and not a square.
Using Thue’s theorem and Proposition 2.3.3 we conclude that Mordell’s equation has at
most finitely many solutions. Thue’s theorem is proved using methods from diophantine
approximation. Due to the nature of this technique Thue’s theorem is only a finiteness
statement. It does not give a method to solve the equation. We shall come back to this.
An effective method to solve Thue’s equation became available through A.Baker’s method on
linear forms in logarithms around 1966. As an application of these methods the following was
shown.

Theorem 3.1.2 (Feld’man, Baker) Suppose that F (x, y) is a form in two variables such
that F (x, 1) has at least three disctinct zeros. Then there exist positive, effectively computable
numbers C1 , C2 , depending only on F such that any solution x, y ∈ Z of F (x, y) = m (with
m ̸= 0) satisfies
log(max(|x|, |y|) ≤ C1 |m|C2 .

By ‘effective method’ we mean that the upper bound for x, y provides us with an algorithm
to determine the solution set. However, due to the enormous size of this bound the algorithm
is certainly not efficient. With the speed of present day computers a naive search of x, y up
to the bound given above would take the life time of the universe and more. So extra ideas
have to be invoked to solve the equation.
In the years before the 1980’s about the only such method was Skolem’s method, next to simple
minded congrence considerations which work only in rare cases. In solving Thue’s equation
there is a big difference between the cases when F has a positive or a negative discriminant
as we shall see
As a first example consider f (x, y) = 1 where f (x, 1) is a monic cubic irreducible polynomial.
Let K be the field Q[x]/(f (x, 1)). Write f (x, 1) = (x − α)(x − α′ )(x − α′′ ) where α ∈ K and

8
α′ , α′′ are its algebraic conjugates. Then the equation f (x, y = 1 implies that x − αy = β
where β is a unit in K. We also have the conjugate equations x − α′ y = β ′ and x − α′′ y = β ′′ .
As an exercise one can verify that
1 1 1 α α′ α′′
+ + = + + = 0.
f ′ (α) f ′ (α′ ) f ′ (α′′ ) f ′ (α) f ′ (α′ ) f ′ (α′′ )
As a consequence we get
β β′ β ′′
+ + = 0.
f ′ (α) f ′ (α′ ) f ′ (α′′ )
Now suppose that K has negative discriminant, which is equivalent with r1 = r2 = 1. By
Dirichlet’s unit theorem the units in K are of the form ±η n where η is a fundamental unit
and n ∈ Z. So our equation becomes

ηn (η ′ )n (η ′′ )n
+ + = 0.
f ′ (α) f ′ (α′ ) f ′ (α′′ )

Note that we have turned our Thue equation into an exponential equation in the unknown
exponent n. In the next section we show how to deal with this equation.
Consider the explicit example
x3 − xy 2 + y 3 = 1
Its solution set reads (x, y) = (1, 0), (0, 1), (1, 1), (−1, 1), (4, −3), which was shown by T.Nagell.
The discriminant of the form x3 − xy 2 + y 3 is −23. This is the minimal negative discriminant
possible for an irreducible cubic form. The polynomial X 3 − X + 1 has a real zero and two
complex ones, this is because the discriminant is negative. Let α be a zero. Then the field
Q(α) has r1 = r2 = 1. Its ring of integers is Z[α] and the group of units Z[α]∗ = {±αn |n ∈ Z}.
We compute that 1/f ′ (α) = (4 − 9α − 6α2 )/23. So our exponential equation becomes

θαn + θ′ (α′ )n + θ′′ (α′′ )n = 0

where θ = −4 + 9α + 6α2 and θ′ , θ′′ are its conjugates.

3.2 Skolem’s method


Here is a Proposition which solves exponential equations of the type we have just seen.

Proposition 3.2.1 (Skolem’s method) . Let α1 , α2 , . . . , αk be a set of algebraic integers.


Choose an odd rational prime p and m ∈ N such that there exist an integer a ∈ Z, not divisible
by p and algebraic integers β1 , . . . , βk with the property that αim = a + pβi for i = 1, 2, . . . , k.
Let θ1 , . . . , θk be algebraic integers and suppose that θ1 + · · · + θk = 0 and θ1 β1 + · · · + θk βk ̸≡
0(mod p). Then θ1 α1mn + · · · + θk αkmn = 0 implies n = 0.

The proof really depends on the use of so-called p-adic numbers, but here we give a version
which avoids mentioning them.

9
Let us rewrite our equation by using the binomial theorem for αimn = (a + pβi )n . We get
∑n ( )
n r n−r
0 = npa (θ1 β1 + · · · + θk βk ) +
n−1
p a (θ1 β1r + · · · + θk βkr )
r=2
r

Here we have used that θ(1 +) · · · +


( θk )= 0. Now assume that n is not zero and divide on both
n−1 1 n 1 n−1
sides by npa . Using n r = r r−1 we get,

∑n ( )
pr−1 n − 1
0 = θ1 β1 + · · · + θk βk + (θ1 β1r + · · · + θk βkr ).
r=2
ra r−1 r−1
r−1
r−1 has the factor p in its numerator for every r ≥ 2.
p
Since p is odd prime the fraction ra
In particular it follows form our equation that θ1 β1 + · · · + θk βk is divisible by p. This is
impossible by our assumption. Hence we conclude that n = 0. qed
Here is an application to the explicit equation of the previous section. Let α1 , α2 , α3 be the
zeros of X 3 − X + 1. We want to solve

µ1 α1n + µ2 α2n + µ3 α3n = 0.

in n ∈ Z, where µi = −4 + 9α + 6α2 . We note that αi24 = 1 + 5βi where βi = 30 − 53αi + 40αi2 .


We write Tr(µαn ) = µ1 α1n + µ2 α2n + µ3 α3n , which is a rational integer called the trace of the
algebraic number µαn .
Using PARI we see that Tr(µαn ) ≡ 0(mod 5) implies that n ≡ 0, 1, 3, 11, 20(mod 24). Interest-
ingly enough each of these congruence classes corresponds to an exact solution. For Tr(µαn ) =
0 we have the solutions n = −13, −4, 0, 1, 3 and we expect these to be the only solutions. We
can check the latter expectation by solving Tr(θα24n ) = 0 for θ = µα−13 , µα−4 , µ, µα, µα3
respectively. For all choices of θ except θ = µ we can check that Tr(µβ) ̸≡ 0(mod 5), so our
Proposition applies to these four cases. We are left with the problem to solve Tr(µα24n ) = 0.
Let us now put γ = α24 . Then we note that γ 2 = 1 + 7δ for δ = 18400 − 32290α + 24375α2 .
We check that Tr(µγ) ̸≡ 0(mod 7), so Tr(µα24n ) = 0 implies that n is even. Therefore it
remains to solve Tr(µα48n ) = Tr(µγ 2n ) = 0. We check that Tr(µδ) ̸≡ 0(mod 7) and now
apply our Proposition once more, where we take for α and β the numbers γ and δ and p = 7.
We conclude that n = 0. In combination with the remarks at the end of the previous section
we now conclude the following theorem.

Theorem 3.2.2 (T.Nagell) The equation x3 − xy + y 3 = 1 has the solutions

(x, y) = (1, 0), (0, 1), (1, 1), (−1, 1), (4, −3).

Skolem’s method is particularly suitable to solve cubic Thue equation with negative discim-
inant. A negative discriminant of a cubic is equivalent to the form having one real and two
complex (non-real) solutions. To see what goes wrong in the case of a positive discriminant
we take the equation
x3 + x2 y − 2xy 2 − y 3 = 1

10
Baulin showed that the only solutions are

(x, y) = (1, 0), (0, −1), (−1, 1), (−1, −1), (2, −1), (−1, 2), (5, 4), (4, −9), (−9, 5).

As a start we notice that the polynomial x3 + x2 − 2x − 1 has discrimant 49. As a result


there are three real zeros. The cubic extension generated by one such zero α has three real
embeddings, i.e. r2 = 3. Using PARI we see that this cubic field has ring of integers Z[α] and
its unit group consists of the elements ±αk (1 + α)l , with k, l ∈ Z. A solution x, y of the Thue
equation above implies that x−αy in a unit in Z[α], in other words x−αy = ±αk (1+α)l with
k, l. The difficulty now becomes clear, we have an exponential equation with two unknowns,
k and l. Skolem’s method does not work here and we have to go over to finite extensions of
Q(α) to be able to apply generalisations of Skolem’s method. This is the reason why Baulin’s
paper consists of some 40 pages.

3.3 Thue’s method


Let α be a fixed, real irrational number. Consider a rational approximation pq to α with
p, q ∈ Z, q > 0 and gcd(p, q) = 1. The quality of this approximation is the number M > 0
such that

α − = 1
p
q qM
or, if it does not exist, we take M = 0. As a first result we prove,
Theorem. Let α be an irrational number. Then there exist infinitely many approximations
to α of quality ≥ 2.
This statement is part of the theory of continued fractions. But also without knowledge of
continued fractions it is not hard to show. Fix a large positive integer Q and consider the set
of numbers {qα} for q = 0, 1, 2, . . . , Q, where {x} denotes the difference between x and the
largest integer ≤ x. The set of {qα} is a set of Q + 1 numbers in the interval [0, 1). So it tends
to be crowded when Q gets large. In particular, there must be two values of q, say q1 < q2 ,
such that the difference between {q1 α} and {q2 α} is less than Q1 in absolute value. Choose
integers p1 , p2 such that {qi α} = qi α − pi . Then, |(q2 − q1 )α − (p2 − p1 )| < Q1 . Since clearly
0 < q2 − q1 ≤ Q we see that pq22 −p 1
−q1
is an approximation of quality at least 2. By choosing
increasingly large values for Q we can produce an infinite sequence of such approximations.
qed
Here is the quality of the two famous rational approximations to π,

22 355
− π ≈ 1 − ≈ 1
7 73.429 , 113 π 1133.201

One may wonder if an infinite number of such good quality approximations exist for π, or
any other irrational we are looking at. To that end we introduce the following concept.

11
Definition The irrationality measure of an irrational number α is defined as the limsup over
all qualities of all rational approximations and is denoted by µ(α).
We have taken the limsup in our definition rather than the maximum since we are for example
interested in the question whether π has infinitely many approximations of quality at least
3. The first two occurrences from the introduction may have been exceptional coincidences.
If we assume that π behaves like most other numbers, then there is very little chance that
µ(π) ≥ 3. This is shown by the following Theorem.
Theorem The set of irrational numbers with irrationality measure strictly larger than 2 has
Lebesgue measure zero.
This Theorem is not hard to prove. Let us restrict ourselves to the irrational numbers in the
interval [0, 1]. Choose ϵ > 0. A number α with µ(α) ≥ 2 + 2ϵ is, by definition, contained in
an interval of the form [ ]
p 1 p 1
− , + ,
q q 2+ϵ q q 2+ϵ
with 0 < p < q integers, infinitely many times. Let us give an upper bound for the total
length of these intervals with q > Q, where Q is some large fixed positive integer. Such a
bound can be given by
∑∞ ∑ q
2
q=Q+1 p=1
q 2+ϵ
2
The inner sum is equal to q 1+ϵ
. The sum over q can be estimated by the integral criterion,


∞ ∫ ∞
2 2dx 2
< = .
q=Q+1
q 1+ϵ Q x 1+ϵ ϵQϵ

When we let Q → ∞ we see that the latter bound goes to zero. Hence the Lebesgue measure
of the numbers in [0, 1] with irrationality measure ≥ 2+ 2ϵ is zero. The set of numbers in [0, 1]
with irrationality measure > 2 is the union of all sets of numbers with irrationality measure
at least 2 + 2/n for n = 1, 2, 3, 4, . . .. Since a countable union of measure zero sets has again
measure zero, our result follows. qed
We note that numbers with irrationality measure > 2 do exist. In fact there exist irrational
numbers with irrationality measure ∞. These are the so-called Liouville numbers. An example
of such a number is given by
∑ 1
.
n≥0
2n!

The reader may wish to verify as an exercise that the truncated series form a sequence of
approximations whose qualities go to ∞. On the other hand, numbers like Liouville num-
bers are a bit artificial. They are constructed for the purpose of having large irrationality
measures. It is expected that the irrationally measure for a naturally occurring number is 2.
Unfortunately, there are not many instances where this is known. A classical instance is e.

12
The fact that µ(e) = 2 can easily be shown by using the continued fraction expansion of e
which, contrary to that of π, is completely known. Although it is expected that µ(π) = 2, it
is very hard to get any results on µ(π). It was only in 1953 that K.Mahler was able to show
for the first time that µ(π) is finite. Nowadays we know that µ(π) < 8.02. The following
statement is not hard to show.

Exercise 3.3.1 Prove that any real algebraic number of degree 2 has irrationality measure 2.

Exercise 3.3.2 Let α be an algebraic number of degree n ≥ 2. Prove that µ(α) ≤ n.

Let α be an algebraic number of degree n with n > 2. The first non-trivial result on irra-
tionality measures is by A.Thue, who showed in 1909√ that µ(α) ≤ n/2+1. This was improved
by C.L.Siegel who showed in 1929 that µ(α) < 2 n. Finally in 1955 K.F.Roth finished the
problem by showing that µ(α) = 2. This result won him the Fields medal in mathematics.
Using Thue’s upper bound for µ(α) we can prove Theorem 3.1.1. Suppose that the equation
F (x, y) = m has infinitely many solutions x, y ∈ Z. Let α1 , α2 , . . . , αn be the zeros of F (x, 1).
Then the inequality
∏n

αi − ≤ |m|
x
(1)
y yn
i=1

has infinitely many solutions in x, y ∈ Z and y > 0. Let A = mini̸=j |αi − αj |/2. Suppose
x, y is a solution of the inequality and suppose that y > |m|1/n /A. Then there exists an i
such that |αi − xy | < A. By the definition of A this means that |αj − xy | > A for all j ̸= i.
Combining this with inequality (1) again, gives us


αi − x ≤ |m| . (2)
y An−1 y n

To every solution x, y with y > |m|1/n /A there corresponds an i such that the latter inequality
holds. Since there are infinitely many solutions, there exists an i such that (2) has infinitely
many solutions. But this implies that µ(αi ) ≥ n. This contradicts Thue’s inequality µ(αi ) ≤
n/2 + 1 when n ≥ 3. qed

3.4 Siegel’s Lemma


This subsection and the next ones will be devoted to a proof of Thue’s inequality µ(α) ≤
n/2 + 1 for algebraic numbers of degree n. An important ingredient is the so-called Siegel
Lemma.

Theorem 3.4.1 (Siegel’s Lemma) Let aij with i = 1, . . . , m and j = 1, . . . , n be integers,


not all zero, and suppose that A = maxi,j |aij |. Then the system of linear equations


n
aij xj , i = 1, . . . , m
i=1

13
has a non-trivial solution in the integers x1 , x2 , . . . , xn with the property that
max |xj | ≤ (2nA)m/(n−m) .
j

A remarkable application is for example the following. Take 10 integers a1 , a2 , . . . , a10 of ten
digits each. Suppose we want to find integers x1 , x2 , . . . , x10 , not all zero, such that
a1 x1 + a2 x2 + · · · + an xn .
Siegel’s Lemma with A = 1010 , m = 1, n = 10 tells us that we can find such xi of absolute
value at most 18. Surprisingly small given the size of the numbers ai .
Here is a proof of Siegel’s Lemma. Choose an integer Q. Let B(Q) be the box consisting
∑n with 0 ≤ ∑
of points (x1 , . . . , xn ) with x1 , . . . , xn integers xi ≤ Q. Consider the map ϕ :
B(Q) → Z given by ϕ : (x1 , . . . , xn ) 7→ ( j=1 a1j xj , . . . , nj=1 amj xj ). The image of B(Q)
m

is contained in the box [−nAQ, nAQ]m . The number of points with integral coordinates in
this box is at most (2nAQ + 1)m . The number of points in B(Q) is precisely (Q + 1)n . So
if (Q + 1)n > (2nAQ + 1)m , then ϕ is not surjective and we find two integral vectors x1 , x2
in B(Q) such that ϕ(x1 − x2 ) = 0. In other words, x1 − x2 is a solution of our system of
equations. In addition, the components of this difference are all bounded by Q in absolute
value. A straightforward calculation shows that (Q + 1)n > (2nAQ + 1)m is satisfied if we
choose Q = [(2nA)m/(n−m) ]. qed

3.5 Sketch of the proof of Thue’s Theorem


Let α be an algebraic number of degree n and suppose that M (α) > n/2 + 1. Hence there
exists θ > 0 such that
x 1
α − < (3)
y y n/2+1+θ
has infinitely many solutions x, y ∈ Z with y > 0. Without loss of generality we can assume
that α is an algebraic integer. Let D be a large positive integer and ϵ > 0. We construct
polynomials P (x), Q(x) ∈ Z[x] of degree ≤ D such that P (x) − αQ(x) vanishes of order m,
where m ≈ ( n2 − ϵ)D. This problem can be considered as a system of linear equations in the
coefficients of P and Q as follows,
0 = P (α) − αQ(α)
0 = P ′ (α) − αQ′ (α)
1 ′′ α
0 = P (α) − Q′′ (α)
2! 2!
.. ..
. .
1 α
0 = P (m−1) (α) − Q(m−1) (α)
(m − 1)! (m − 1)!
These are linear equations with coefficients in Q(α). Each such equation can be rewritten
as n equations with coefficients in Z. So we have mn equations with coefficients in Z. The

14
coefficients are bounded by C D , where C is some number depending only on α. There are
2D + 2 unknowns is Z. We can apply Siegel’s Lemma and find polynomials P (x), Q(x) with
coefficients whose absolute value is at most

(4(D + 1)C D )mn/(2D+2−mn) < (4(D + 1)C D )2D/(2D−(2−ϵn)D) = (4(D + 1)C D )2/(nϵ) .

Notice that we can find another number C1 , depending only on α such that (4(D +1)C D )2/n <
C1D . Numbers depending only on α will be denoted by C1 , C2 , . . . in the sequel. We conclude
that we have found non-trivial polynomials P (x), Q(x) with integral coefficients bounded by
D/ϵ
C1 such that P (x) − αQ(x) vanishes of order at least m in x = α.
Let x1 /y1 , x2 /y2 be two very large solutions of (3) with y2 >> y1 >> 1. The idea is now to
find both upper and lower bounds for

x 1 x2 x1
∆ = P ( − Q( ) .

y1 y2 y1

First an upper bound.



x1 x1 x2 x1

∆ ≤ P ( ) − αQ( ) + (α − )Q( )
y1 y y2 y1
m 1
x1 1
< C1 α − + C1
D/ϵ D/ϵ
y1 n/2+1+θ
y2
D/ϵ 1 D/ϵ 1
< C1 m(n/2+1+θ)
+ C1 n/2+1+θ
y1 y2

A lower bound for ∆ can be attained if we assume that ∆ ̸= 0. For then ∆ is a non-zero
rational number with denominator dividing y2 y1D . Combining this upper and lower bound we
get ( )
1 D/ϵ 1 1
< C1 + n/2+1+θ
y2 y1D m(n/2+1+θ)
y1 y2
where m ≈ (2/n − ϵ)D. Now choose ϵ such that (2/n − ϵ)(n/2 + θ) = 1 + δ for some δ > 0.
We choose D, and as a consequence m, in such a way that y1m ≈ y2 . Then our inequality
simplifies to
1 D/ϵ 1
D
< 2C1 (1+δ)D
.
y2 y1 y2 y1
Hence
D/ϵ −δD
1 < 2C1 y1 .
But this gives a contradiction if y1 is large enough. Since there are infinitely many choices for
y1 we do arrive at a contradiction.
A problem arises if ∆ does vanish. To that end we prove the following Lemma.

15
Lemma 3.5.1 Let P (x), Q(x) be two non-trivial polynomials with rational coefficients and
of degree ≤ D. Let α be an algebraic number of degree n such that P (x) − αQ(x) vanishes
of order at least m at x = α. Suppose m > D/n. Then, for any numbers β, γ, with β not a
conjugate of α, the polynomial P (x) − γQ(x) has vanishing order at most 2D − (m − 1)n at
x = β.

Suppose P (x)−γQ(x) vanishes of order µ at x = β. Taking the derivative of P (x)−γQ(x) we


see that P ′ (x)−γQ′ (x) = O((x−β)µ−1 ). Elimination of γ shows that P ′ (x)Q(x)−P (x)Q′ (x) =
O((x − β)µ−1 ). In the same way we can show that P ′ (x)Q(x) − P (x)Q′ (x) = O((x − α)m−1 ).
Since P, Q have coefficients in Q, we find that f (x)m−1 divides P ′ Q − P Q′ , where f (x) is the
minimal polynomial of α.
An important remark is that P ′ Q − P Q′ cannot be identically zero. If it were, then P (x) =
λQ(x) for some constant λ. But then P − αQ = O((x − α)m ) would imply that Q(x) is
divisible by (x − α)m and hence by f (x)m . But this is impossible by degree considerations,
since mn > D by assumption.
So P ′ Q − P Q′ is a non-trivial polynomial divisible by (x − β)µ−1 and by f (x)m−1 . By degree
considerations we get (m − 1)n + µ − 1 ≤ 2D − 1, which proves our Lemma. qed
Application of the Lemma to our situation shows that P (x) − xy22 Q(x) has vanishing order at
most 2D − n(2/n − ϵ)D = nϵD at x = xy11 . So there exists µ ≤ nϵD such that
x1 x2 x1
∆µ = P (µ) ( ) − Q(µ) ( ) ̸= 0
y1 y2 y1
We now carry out our argument on ∆µ instead of ∆.

16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy