(Steven Galbraith) Mathematics of Public Key Crypt
(Steven Galbraith) Mathematics of Public Key Crypt
(Steven Galbraith) Mathematics of Public Key Crypt
1 Introduction
1.1 Public Key Cryptography . . . . . . . . . . . .
1.2 The Textbook RSA Cryptosystem . . . . . . .
1.3 Formal Definition of Public Key Cryptography
1.3.1 Security of Encryption . . . . . . . . . .
1.3.2 Security of Signatures . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Background
27
28
28
30
31
33
35
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
37
39
40
42
43
44
45
47
50
52
53
54
55
57
60
61
63
65
65
65
66
67
68
69
71
71
72
73
CONTENTS
2.18 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
II
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Algebraic Groups
73
75
75
76
77
77
77
78
78
81
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Varieties
5.1 Affine algebraic sets . . . . . .
5.2 Projective Algebraic Sets . . .
5.3 Irreducibility . . . . . . . . . .
5.4 Function Fields . . . . . . . . .
5.5 Rational Maps and Morphisms
5.6 Dimension . . . . . . . . . . . .
5.7 Weil Restriction of Scalars . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
87
. 87
. 91
. 96
. 98
. 100
. 104
. 105
Fields
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
109
109
111
112
113
114
116
117
119
120
120
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
123
123
127
129
131
133
134
135
138
140
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
83
83
85
85
86
CONTENTS
8 Rational Maps on Curves and Divisors
8.1 Rational Maps of Curves and the Degree . . . . . . . .
8.2 Extensions of Valuations . . . . . . . . . . . . . . . . .
8.3 Maps on Divisor Classes . . . . . . . . . . . . . . . . .
8.4 Riemann-Roch Spaces . . . . . . . . . . . . . . . . . .
8.5 Derivations and Differentials . . . . . . . . . . . . . . .
8.6 Genus Zero Curves . . . . . . . . . . . . . . . . . . . .
8.7 Riemann-Roch Theorem and Hurwitz Genus Formula
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
145
145
148
150
154
155
161
162
9 Elliptic Curves
9.1 Group law . . . . . . . . . . . . . . . . . . . .
9.2 Morphisms Between Elliptic Curves . . . . .
9.3 Isomorphisms of Elliptic Curves . . . . . . . .
9.4 Automorphisms . . . . . . . . . . . . . . . . .
9.5 Twists . . . . . . . . . . . . . . . . . . . . . .
9.6 Isogenies . . . . . . . . . . . . . . . . . . . . .
9.7 The Invariant Differential . . . . . . . . . . .
9.8 Multiplication by n and Division Polynomials
9.9 Endomorphism Structure . . . . . . . . . . .
9.10 Frobenius map . . . . . . . . . . . . . . . . .
9.10.1 Complex Multiplication . . . . . . . .
9.10.2 Counting Points on Elliptic Curves . .
9.11 Supersingular Elliptic Curves . . . . . . . . .
9.12 Alternative Models for Elliptic Curves . . . .
9.12.1 Montgomery Model . . . . . . . . . .
9.12.2 Edwards Model . . . . . . . . . . . . .
9.12.3 Jacobi Quartic Model . . . . . . . . .
9.13 Statistics of Elliptic Curves over Finite Fields
9.14 Elliptic Curves over Rings . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
165
165
167
168
170
171
172
178
180
182
183
186
188
188
191
191
195
198
198
199
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10 Hyperelliptic Curves
201
10.1 Non-Singular Models for Hyperelliptic Curves . . . . . . . . . . . . . . . . 202
10.1.1 Projective Models for Hyperelliptic Curves . . . . . . . . . . . . . 204
10.1.2 Uniformizers on Hyperelliptic Curves . . . . . . . . . . . . . . . . . 207
10.1.3 The Genus of a Hyperelliptic Curve . . . . . . . . . . . . . . . . . 209
10.2 Isomorphisms, Automorphisms and Twists . . . . . . . . . . . . . . . . . . 209
10.3 Effective Affine Divisors on Hyperelliptic Curves . . . . . . . . . . . . . . 212
10.3.1 Mumford Representation of Semi-Reduced Divisors . . . . . . . . . 213
10.3.2 Addition and Semi-Reduction of Divisors in Mumford Representation214
10.3.3 Reduction of Divisors in Mumford Representation . . . . . . . . . 217
10.4 Addition in the Divisor Class Group . . . . . . . . . . . . . . . . . . . . . 219
10.4.1 Addition of Divisor Classes on Ramified Models . . . . . . . . . . 219
10.4.2 Addition of Divisor Classes on Split Models . . . . . . . . . . . . . 221
10.5 Jacobians, Abelian Varieties and Isogenies . . . . . . . . . . . . . . . . . . 226
10.6 Elements of Order n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
10.7 Hyperelliptic Curves Over Finite Fields . . . . . . . . . . . . . . . . . . . 229
10.8 Endomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
10.9 Supersingular Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
III
CONTENTS
235
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
237
238
238
241
242
242
244
244
245
251
252
253
254
257
257
257
259
259
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
261
261
262
262
263
263
264
265
266
267
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
269
270
270
273
275
276
276
277
278
279
281
282
14 Psuedorandom Walks
14.1 Birthday Paradox . . . . . . . . . . . . . . . . .
14.2 The Pollard Rho Method . . . . . . . . . . . .
14.2.1 The Pseudorandom Walk . . . . . . . .
14.2.2 Pollard Rho Using Floyd Cycle Finding
14.2.3 Other Cycle Finding Methods . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
285
285
287
288
289
292
CONTENTS
14.2.4 Distinguished Points and Pollard Rho . . . . . . . . . . .
14.2.5 Towards a Rigorous Analysis of Pollard Rho . . . . . . . .
14.3 Distributed Pollard Rho . . . . . . . . . . . . . . . . . . . . . . .
14.3.1 The Algorithm and its Heuristic Analysis . . . . . . . . .
14.4 Using Equivalence Classes . . . . . . . . . . . . . . . . . . . . . .
14.4.1 Examples of Equivalence Classes . . . . . . . . . . . . . .
14.4.2 Dealing with Cycles . . . . . . . . . . . . . . . . . . . . .
14.4.3 Practical Experience with the Distributed Rho Algorithm
14.5 The Kangaroo Method . . . . . . . . . . . . . . . . . . . . . . . .
14.5.1 The Pseudorandom Walk . . . . . . . . . . . . . . . . . .
14.5.2 The Kangaroo Algorithm . . . . . . . . . . . . . . . . . .
14.5.3 Heuristic Analysis of the Kangaroo Method . . . . . . . .
14.5.4 Comparison with the Rho Algorithm . . . . . . . . . . . .
14.5.5 Using Inversion . . . . . . . . . . . . . . . . . . . . . . . .
14.5.6 Towards a Rigorous Analysis of the Kangaroo Method . .
14.6 Distributed Kangaroo Algorithm . . . . . . . . . . . . . . . . . .
14.6.1 Van Oorschot and Wiener Version . . . . . . . . . . . . .
14.6.2 Pollard Version . . . . . . . . . . . . . . . . . . . . . . . .
14.6.3 Comparison of the Two Versions . . . . . . . . . . . . . .
14.7 The Gaudry-Schost Algorithm . . . . . . . . . . . . . . . . . . .
14.7.1 Two-Dimensional Discrete Logarithm Problem . . . . . .
14.7.2 Interval DLP using Equivalence Classes . . . . . . . . . .
14.8 Parallel Collision Search in Other Contexts . . . . . . . . . . . .
14.8.1 The Low Hamming Weight DLP . . . . . . . . . . . . . .
14.9 Pollard Rho Factoring Method . . . . . . . . . . . . . . . . . . .
14.10Pollard Kangaroo Factoring . . . . . . . . . . . . . . . . . . . . .
15 Subexponential Algorithms
15.1 Smooth Integers . . . . . . . . . . . . . . . . . . . . . . . . . .
15.2 Factoring using Random Squares . . . . . . . . . . . . . . . . .
15.2.1 Complexity of the Random Squares Algorithm . . . . .
15.2.2 The Quadratic Sieve . . . . . . . . . . . . . . . . . . . .
15.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . .
15.3 Elliptic Curve Method Revisited . . . . . . . . . . . . . . . . .
15.4 The Number Field Sieve . . . . . . . . . . . . . . . . . . . . . .
15.5 Index Calculus in Finite Fields . . . . . . . . . . . . . . . . . .
15.5.1 Rigorous Subexponential Discrete Logarithms Modulo p
15.5.2 Heuristic Algorithms for Discrete Logarithms Modulo p
15.5.3 Discrete Logarithms in Small Characteristic . . . . . . .
15.5.4 Coppersmiths Algorithm for the DLP in F2n . . . . . .
15.5.5 The Joux-Lercier Algorithm . . . . . . . . . . . . . . . .
15.5.6 Number Field Sieve for the DLP . . . . . . . . . . . . .
15.5.7 Discrete Logarithms for all Finite Fields . . . . . . . . .
15.6 Discrete Logarithms on Hyperelliptic Curves . . . . . . . . . .
15.6.1 Index Calculus on Hyperelliptic Curves . . . . . . . . .
15.6.2 The Algorithm of Adleman, De Marrais and Huang . .
15.6.3 Gaudrys Algorithm . . . . . . . . . . . . . . . . . . . .
15.7 Weil Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.8 Elliptic Curves over Extension Fields . . . . . . . . . . . . . . .
15.8.1 Semaevs Summation Polynomials . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
293
295
297
297
300
301
302
303
304
305
305
306
308
308
309
311
311
314
315
315
316
318
318
319
320
322
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
323
323
325
327
329
330
330
332
334
334
336
337
338
341
342
342
343
344
345
345
346
347
347
CONTENTS
15.8.2 Gaudrys Variant of Semaevs Method . . . . . . .
15.8.3 Diems Algorithm for the ECDLP . . . . . . . . .
15.9 Further Results . . . . . . . . . . . . . . . . . . . . . . . .
15.9.1 Diems Algorithm for Plane Curves of Low Degree
15.9.2 The Algorithm of Enge-Gaudry-Thome and Diem
15.9.3 Index Calculus for General Elliptic Curves . . . . .
IV
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Lattices
348
349
350
350
350
350
353
16 Lattices
16.1 Basic Notions on Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.2 The Hermite and Minkowski Bounds . . . . . . . . . . . . . . . . . . . . .
16.3 Computational Problems in Lattices . . . . . . . . . . . . . . . . . . . . .
355
357
360
362
365
366
368
369
373
375
378
381
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
383
383
388
390
391
394
394
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
397
398
398
400
403
403
406
406
407
409
409
412
413
414
416
417
420
422
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
19.12NTRU . . . . . . . . . . . . . . . . . . . . . . . .
19.13Knapsack Cryptosystems . . . . . . . . . . . . .
19.13.1 Public Key Encryption Using Knapsacks .
19.13.2 Cryptanalysis of Knapsack Cryptosystems
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20 Diffie-Hellman Cryptography
20.1 The Discrete Logarithm Assumption . . . . . . .
20.2 Key Exchange . . . . . . . . . . . . . . . . . . . .
20.2.1 Diffie-Hellman Key Exchange . . . . . . .
20.2.2 Burmester-Desmedt Key Exchange . . . .
20.2.3 Key Derivation Functions . . . . . . . . .
20.3 Textbook Elgamal Encryption . . . . . . . . . . .
20.4 Security of Textbook Elgamal Encryption . . . .
20.4.1 OWE Against Passive Attacks . . . . . .
20.4.2 OWE Security Under CCA Attacks . . .
20.4.3 Semantic Security Under Passive Attacks
20.5 Security of Diffie-Hellman Key Exchange . . . . .
20.6 Efficiency of Discrete Logarithm Cryptography .
21 The
21.1
21.2
21.3
21.4
21.5
21.6
21.7
21.8
423
423
425
427
433
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
435
435
436
436
437
437
438
439
440
440
441
443
445
Diffie-Hellman Problem
Variants of the Diffie-Hellman Problem . . . . . . . . . . . . . . .
Lower Bound on the Complexity of CDH for Generic Algorithms
Random Self-Reducibility and Self-Correction of CDH . . . . . .
The den Boer and Maurer Reductions . . . . . . . . . . . . . . .
21.4.1 Implicit Representations . . . . . . . . . . . . . . . . . . .
21.4.2 The den Boer Reduction . . . . . . . . . . . . . . . . . . .
21.4.3 The Maurer Reduction . . . . . . . . . . . . . . . . . . . .
Algorithms for Static Diffie-Hellman . . . . . . . . . . . . . . . .
Hard Bits of Discrete Logarithms . . . . . . . . . . . . . . . . . .
21.6.1 Hard Bits for DLP in Algebraic Group Quotients . . . . .
Bit Security of Diffie-Hellman . . . . . . . . . . . . . . . . . . . .
21.7.1 The Hidden Number Problem . . . . . . . . . . . . . . . .
21.7.2 Hard Bits for CDH Modulo a Prime . . . . . . . . . . . .
21.7.3 Hard Bits for CDH in Other Groups . . . . . . . . . . . .
Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
447
447
450
451
454
454
455
457
462
465
468
469
470
472
474
475
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
477
477
477
481
482
483
483
484
486
487
489
490
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
CONTENTS
VI
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
VII
.
.
.
.
.
.
.
493
493
495
496
498
501
501
502
505
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
507
507
509
509
511
513
514
516
517
519
520
522
522
523
523
524
524
526
527
527
529
530
531
531
532
534
536
536
537
538
541
543
545
545
547
552
11
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
554
556
556
557
558
561
562
563
563
566
567
568
569
570
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
573
573
574
576
577
578
580
581
582
583
584
585
585
586
586
588
588
A Background Mathematics
A.1 Basic Notation . . . . . . . . . . . . . .
A.2 Groups . . . . . . . . . . . . . . . . . . .
A.3 Rings . . . . . . . . . . . . . . . . . . .
A.4 Modules . . . . . . . . . . . . . . . . . .
A.5 Polynomials . . . . . . . . . . . . . . . .
A.5.1 Homogeneous Polynomials . . . .
A.5.2 Resultants . . . . . . . . . . . . .
A.6 Field Extensions . . . . . . . . . . . . .
A.7 Galois Theory . . . . . . . . . . . . . . .
A.7.1 Galois Cohomology . . . . . . . .
A.8 Finite Fields . . . . . . . . . . . . . . .
A.9 Ideals . . . . . . . . . . . . . . . . . . .
A.10 Vector Spaces and Linear Algebra . . .
A.10.1 Inner Products and Norms . . .
A.10.2 Gram-Schmidt Orthogonalisation
A.10.3 Determinants . . . . . . . . . . .
A.11 Hermite Normal Form . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
589
589
590
590
590
591
592
592
593
594
594
595
596
597
598
599
599
600
25.3
25.4
25.5
25.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
CONTENTS
A.12 Orders in Quadratic Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 600
A.13 Binary Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
A.14 Probability and Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . 601
605
Acknowledgements
The book grew out of my lecture notes from the Masters course Public key cryptography
at Royal Holloway. I thank the students who took that course for asking questions and
doing their homework in unexpected ways.
The staff at Cambridge University Press have been very helpful during the preparation
of this book.
I also thank the following people for answering my questions, pointing out errors in
drafts of the book, helping with latex, examples, proofs, exercises etc: Jose de Jes
us Angel
Angel, Olivier Bernard, Nicolas Bonifas, Nils Bruin, Ilya Chevyrev, Bart Coppens, Alex
Dent, Claus Diem, Marion Duporte, Andreas Enge, Victor Flynn, David Freeman, Pierrick Gaudry, Takuya Hayashi, Nadia Heninger, Florian Hess, Mark Holmes, Everett Howe,
David Jao, Jonathan Katz, Eike Kiltz, Kitae Kim, David Kohel, Cong Ling, Alexander
May, Esmaeil Mehrabi, Ciaran Mullan, Mats N
aslund, Francisco Monteiro, James McKee,
James Nelson, Samuel Neves, Phong Nguyen, TaeHun Oh, Chris Peikert, Michael Phillips,
John Pollard, Francesco Pretto, Oded Regev, Christophe Ritzenthaler, Karl Rubin, Raminder Ruprai, Takakazu Satoh, Leanne Scheepers, Davide Schipani, Michael Schneider,
Peter Schwabe, Reza Sepahi, Victor Shoup, Igor Shparlinski, Andrew Shallue, Francesco
Sica, Alice Silverberg, Benjamin Smith, Martijn Stam, Damien Stehle, Anton Stolbunov,
Drew Sutherland, Garry Tee, Emmanuel Thome, Frederik Vercauteren, Timothy Vogel,
Anastasia Zaytseva, Chang-An Zhao, Paul Zimmermann.
The remaining errors and omissions are the authors responsibility.
25
Chapter 1
Introduction
This is a chapter from version 1.1 of the book Mathematics of Public Key Cryptography
by Steven Galbraith, available from http://www.isg.rhul.ac.uk/sdg/crypto-book/ The
copyright for this chapter is held by Steven Galbraith.
This book is now completed and an edited version of it will be published by Cambridge
University Press in early 2012. Some of the Theorem/Lemma/Exercise numbers may be
different in the published version.
Please send an email to S.Galbraith@math.auckland.ac.nz if you find any mistakes.
All feedback on the book is very welcome and will be acknowledged.
Cryptography is an interdisciplinary field of great practical importance. The subfield of public key cryptography has notable applications, such as digital signatures. The
security of a public key cryptosystem depends on the difficulty of certain computational
problems in mathematics. A deep understanding of the security and efficient implementation of public key cryptography requires significant background in algebra, number theory
and geometry.
This book gives a rigorous presentation of most of the mathematics underlying public
key cryptography. Our main focus is mathematics. We put mathematical precision and
rigour ahead of generality, practical issues in real-world cryptography, or algorithmic
optimality. It is infeasible to cover all the mathematics of public key cryptography in one
book. Hence we primarily discuss the mathematics most relevant to cryptosystems that
are currently in use, or that are expected to be used in the near future. More precisely, we
focus on discrete logarithms (especially on elliptic curves), factoring based cryptography
(e.g., RSA and Rabin), lattices and pairings. We cover many topics that have never had
a detailed presentation in any textbook.
Due to lack of space, some topics are not covered in as much detail as others. For example, we do not give a complete presentation of algorithms for integer factorisation, primality testing, and discrete logarithms in finite fields, as there are several good references
for these subjects. Some other topics that are not covered in the book include hardware
implementation, side-channel attacks, lattice-based cryptography, cryptosystems based
on coding theory, multivariate cryptosystems and cryptography in non-Abelian groups.
In the future, quantum cryptography or post-quantum cryptography (see the book [50]
by Bernstein, Buchmann and Dahmen) may be used in practice, but these topics are also
not discussed in the book.
The reader is assumed to have at least a standard undergraduate background in groups,
rings, fields and cryptography. Some experience with algorithms and complexity is also assumed. For a basic introduction to public key cryptography and the relevant mathematics
27
28
CHAPTER 1. INTRODUCTION
the reader is recommended to consult Smart [568], Stinson [588] or Vaudenay [612].
An aim of the present book is to collect in one place all the necessary background
and results for a deep understanding of public key cryptography. Ultimately, the text
presents what I believe is the core mathematics required for current research in public
key cryptography and it is what I would want my PhD students to know.
The remainder of this chapter states some fundamental definitions in public key cryptography and illustrates them using the RSA cryptosystem.
1.1
1.2
We briefly describe the textbook RSA cryptosystem. The word textbook indicates
that, although the RSA cryptosystem as presented below appears in many papers and
books, this is definitely not how it should be used in the real world. In particular, public
key encryption is most commonly used to transmit keys (the functionality is often called
key transport or key encapsulation), rather than to encrypt data. Chapter 24 gives many
more details about RSA including, in Section 24.7, a very brief discussion of padding
schemes for use in real applications.
Alice chooses two large primes p and q of similar size and computes N = pq. Alice
also chooses e N coprime to (N ) = (p 1)(q 1) and computes d N such that
ed 1 (mod (N )).
Alices RSA public key is the pair of integers (N, e) and her private key is the integer
d. To encrypt a message to Alice, Bob does the following:
29
1. Obtain an authentic copy of Alices public key (N, e). This step may require trusted
third parties and public key infrastructures, which are outside the scope of this book;
see Chapter 12 of Smart [568] or Chapter 12 of Stinson [588]. We suppress this issue
in the book.
2. Encode the message as an integer 1 m < N .
Note that m does not necessarily lie in (Z/N Z) . However,if p, q N then the
probability that gcd(m, N ) > 1 is (p + q 1)/(N 1) 2/ N . Hence, in practice
one may assume that m (Z/N Z) .1
30
CHAPTER 1. INTRODUCTION
1. Suppose the RSA cryptosystem is being used for an online election to provide privacy
of an individuals vote to everyone outside the electoral office.2 Each voter encrypts
their vote under the public key of the electoral office and then sends their vote by
email. Voters dont want any other member of the public to know who they voted
for.
Suppose the eavesdropper Eve is monitoring internet traffic from Alices computer
and makes a copy of the ciphertext corresponding to her vote. Since encryption is
deterministic and there is only a short list of possible candidates, it is possible for
Eve to compute each possible vote by encrypting each candidates name under the
public key. Hence, Eve can deduce who Alice voted for.
2. To speed up encryption it is tempting to use small encryption exponents, such as
e = 3 (assuming that N = pq where p q 2 (mod 3)). Now suppose Bob is
only sending a very small message 0 < m < N 1/3 to Alice; this is quite likely, since
public key cryptography is most often used to securely transmit symmetric keys.
Then c = m3 in N, i.e., no modular reduction has taken place. An adversary can
therefore compute the message m from the ciphertext c by taking cube roots in N
(using numerical analysis techniques).
3. A good encryption scheme should allow an adversary to learn absolutely nothing
about a message from the ciphertext. But with the RSA cryptosystem one can
m
) of the message by computing ( Nc ) (this can be
compute the Jacobi symbol ( N
computed efficiently without knowing the factorisation of N ; see Section 2.4). The
details are Exercise 24.1.11.
The above three attacks may be serious attacks for some applications, but not for
others. However, a cryptosystem designer often has little control over the applications
in which their system is to be used. Hence it is preferable to have systems that are not
vulnerable to attacks of the above form. In Section 24.7 we will explain how to secure
RSA against these sorts of attacks, by making the encryption process randomised and by
using padding schemes that encode short messages as sufficiently large integers and that
destroy algebraic relationships between messages.
1.3
the
the
the
the
space
space
space
space
of
of
of
of
all
all
all
all
possible
possible
possible
possible
messages;
public keys;
private keys;
ciphertexts;
2 Much more interesting electronic voting schemes have been invented. This unnatural example is
chosen purely for pedagogical purposes.
Encrypt
Decrypt
31
It is required that
Decrypt(Encrypt(m, pk), sk) = m
if (pk, sk) is a matching key pair. Typically we require that the fastest known attack on
the system requires at least 2 bit operations.
Example 1.3.2. We sketch how to write textbook RSA encryption in the format of
Definition 1.3.1. The KeyGen algorithm takes input and outputs a modulus N that is
a product of two randomly chosen primes of a certain length, as well as an encryption
exponent e.
Giving a precise recipe for the bit-length of the primes as a function of the security
parameter is non-trivial for RSA. The complexity of the best factoring algorithms implies
that we need 2 LN (1/3, c) for some constant c (see Chapter 15 for this notation and
an explantion of factoring algorithms). This implies that log(N ) = O(3 ) and so the
bit-length of the public key is bounded by a polynomial in . A typical benchmark is
that if = 128 (i.e., so that there is no known attack on the system performing fewer
than 2128 bit operations) then N is a product of two 1536-bit primes.
As we will discuss in Chapter 12, one can generate primes in expected polynomial-time
and hence KeyGen is a randomised algorithm with expected polynomial-time complexity.
The message space M depends on the randomised padding scheme being used. The
ciphertext space C in this case is (Z/N Z) , which does not agree with Definition 1.3.1 as
it does not depend only on . Instead one usually takes C to be the set of log2 (N )-bit
strings.
The Encrypt and Decrypt algorithms are straightforward (though the details depend
on the padding scheme). The correctness condition is easily checked.
1.3.1
Security of Encryption
We now give precise definitions for the security of public key encryption. An adersary
is a randomised polynomial-time algorithm that interacts with the cryptosystem in some
way. It is necessary to define the attack model, which specifies the way the adversary
can interact with the cryptosystem. It is also necessary to define the attack goal of
the adversary. For further details of these issues see Sections 10.2 and 10.6 of Katz and
Lindell [331], Section 1.13 of Menezes, van Oorschot and Vanstone [415], or Section 15.1
of Smart [568].
We first list the attack goals for public key encryption. The most severe one
is the total break, where the adversary computes a private key. There are three other
commonly studied attacks, and they are usually formulated as security properties (the
security property is the failure of an adversary to achieve its attack goal).
The word oracle is used below. This is just a fancy name for a magic box that takes
some input and then outputs the correct answer in constant time. Precise definitions are
given in Section 2.1.3.
32
CHAPTER 1. INTRODUCTION
One way encryption (OWE): Given a challenge ciphertext c the adversary cannot compute the corresponding message m.
Semantic security: An adversary learns no information at all about a message
from its ciphertext, apart from possibly the length of the message.
This concept is made precise as follows: Assume all messages in M have the same
length. A semantic security adversary is a randomised polynomial-time algorithm A that first chooses a function f : M {0, 1} such that the probability, over
uniformly chosen m M , that f (m) = 1 is 1/2. The adversary A then takes as
input a challenge (c, pk), where c is the encryption of a random message m M ,
and outputs a bit b. The adversary is successful if b = f (m).
Note that the standard definition of semantic security allows messages m M to
be drawn according to any probability distribution. We have simplified to the case
of the uniform distribution on M .
Indistinguishability (IND): An adversary cannot distinguish the encryption of
any two messages m0 and m1 , chosen by the adversary, of the same length.
This concept is made precise by defining an indistinguishability adversary to
be a randomised polynomial-time algorithm A that plays the following game with a
challenger: First the challenger generates a public key and gives it to A. Then (this is
the first phase of the attack) A performs some computations (and possibly queries
to oracles) and outputs two equal length messages m0 and m1 . The challenger
computes the challenge ciphertext c (which is an encryption of mb where b
{0, 1} is randomly chosen) and gives it to A. In the second phase the adversary A
performs more calculations (and possibly oracle queries) and outputs a bit b . The
adversary is successful if b = b .
For a fixed value one can consider the probability that an adversary is successful over
all public keys pk output by KeyGen, and (except when studying a total break adversary)
all challenge ciphertexts c output by Encrypt, and over all random choices made by the
adversary. The adversary breaks the security property if the success probability of the
adversary is noticeable as a function of (see Definition 2.1.10 for the terms noticeable
and negligible). The cryptosystem achieves the security property if every polynomial-time
adversary has negligible success probability as a function of . An adversary that works
with probability 1 is called a perfect adversary.
We now list the three main attack models for public key cryptography.
Passive attack/chosen plaintext attack (CPA): The adversary is given the
public key.
Lunchtime attack (CCA1):3 The adversary has the public key and can also ask
for decryptions of ciphertexts of its choosing during the first stage of the attack
(i.e., before the challenge ciphertext is received).
Adaptive chosen-ciphertext attack (CCA): (Also denoted CCA2.) The adversary has the public key and is given access to a decryption oracle O that will
provide decryptions of any ciphertext of its choosing, with the restriction that O
outputs in the second phase of the attack if the challenge ciphertext is submitted
to O.
3 The name comes from an adversary who breaks into someones office during their lunch break,
interacts with their private key in some way, and then later in the day tries to decrypt a ciphertext.
33
One can consider an adversary against any of the above security properties in any of
the above attack models. For example, the strongest security notion is indistinguishability
under an adaptive chosen ciphertext attack. A cryptosystem that achieves this security
level is said to have IND-CCA security. It has become standard in theoretical cryptography to insist that all cryptosystems have IND-CCA security. This is not because
CCA attacks occur frequently in the real world, but because a scheme that has IND-CCA
security should also be secure against any real-world attacker.4
Exercise 1.3.3. Show that the textbook RSA cryptosystem does not have IND-CPA
security.
Exercise 1.3.4. Show that the textbook RSA cryptosystem does not have OWE-CCA
security.
Exercise 1.3.5. Prove that if a cryptosystem has IND security under some attack model
then it has semantic security under the same attack model.
1.3.2
Security of Signatures
Definition 1.3.6. A signature scheme is defined, analogously to encryption, by message, signature and key spaces depending on a security parameter . There is a KeyGen
algorithm and algorithms:
Sign
Verify
We require that Verify(m, Sign(m, sk), pk) = valid. Typically, we require that all known
algorithms to break the signature scheme require at least 2 bit operations.
The main attack goals for signatures are the following (for more discussion see
Goldwasser, Micali and Rivest [257], Section 12.2 of Katz and Lindell [331], Section 15.4
of Smart [568], or Section 7.2 of Stinson [588]):
Total break: An adversary can obtain the private key for the given public key.
Selective forgery: (Also called target message forgery.) An adversary can
generate a valid signature for the given public key on any message.
Existential forgery: An adversary can generate a pair (m, s) where m is a message
and s is a signature for the given public key on that message.
The acronym UF stands for the security property unforgeable. In other words,
a signature scheme has UF security if every polynomial-time existential forgery
algorithm succeeds with only negligible probability. Be warned that some authors
use UF to denote universal forgery, which is another name for selective forgery.
As with encryption there are various attack models.
Passive attack: The adversary is given the public key only. This is also called a
public key only attack.
4 Of course, there are attacks that lie outside the attack model we are considering, such as side-channel
attacks or attacks by dishonest system administrators.
34
CHAPTER 1. INTRODUCTION
Known message attack: The adversary is given various sample message-signature
pairs for the public key.
Adaptive chosen-message attack (CMA): The adversary is given a signing
oracle that generates signatures for the public key on messages of their choosing.
In this case, signature forgery usually means producing a valid signature s for the
public key pk on a message m such that m was not already queried to the signing
oracle for key pk. Another notion, which we do not consider further in this book, is
strong forgery; namely to output a valid signature s on m for public key pk such
that s is not equal to any of the outputs of the signing oracle on m.
As with encryption, one says the signature scheme has the stated security property
under the stated attack model if there is no polynomial-time algorithm A that solves the
problem with noticeable success probability under the appropriate game. The standard
notion of security for digital signatures is UF-CMA security.
Exercise 1.3.7. Give a precise definition for UF-CMA security.
Exercise 1.3.8. Do textbook RSA signatures have selective forgery security under a
passive attack?
Exercise 1.3.9. Show that there is a passive existential forgery attack on textbook
RSA signatures.
Exercise 1.3.10. Show that, under a chosen-message attack, one can selective forge
textbook RSA signatures.
Chapter 2
2.1
We assume the reader is already familiar with computers, computation and algorithms.
General references for this section are Chapter 1 of Cormen, Leiserson, Rivest and Stein [145],
Davis and Weyuker [166], Hopcroft and Ullman [292], Section 3.1 of Shoup [552], Sipser
[564] and Talbot and Welsh [596].
Rather than using a fully abstract model of computation, such as Turing machines,
37
38
we consider all algorithms as running on a digital computer with a typical instruction set,
an infinite number of bits of memory and constant-time memory access. This is similar
to the random access machine (or register machine) model; see Section 3.6 of [22], [138],
Section 2.2 of [145], Section 7.6 of [292] or Section 3.2 of [552]. We think of an algorithm
as a sequence of bit operations, though it is more realistic to consider word operations.
A computational problem is specified by an input (of a certain form) and an output
(satisfying certain properties relative to the input). An instance of a computational
problem is a specific input. The input size of an instance of a computational problem
is the number of bits required to represent the instance. The output size of an instance
of a computational problem is the number of bits necessary to represent the output. A
decision problem is a computational problem where the output is either yes or no.
As an example, we give one of the most important definitions in the book.
Definition 2.1.1. Let G be a group written in multiplicative notation. The discrete
logarithm problem (DLP) is: Given g, h G to find a, if it exists, such that h = g a .
In Definition 2.1.1 the input is a description of the group G together with the group
elements g and h and the output is a or the failure symbol (to indicate that h 6 hgi).
Typically G is an algebraic group over a finite field and the order of g is assumed to be
known. We stress that an instance of the DLP, according to Definition 2.1.1, includes
the specification of G, g and h; so one must understand that they are all allowed to vary
(note that, in many cryptographic applications one considers the group G and element
g as being fixed; we discuss this in Exercise 21.1.2). As explained in Section 2.1.2, a
computational problem should be defined with respect to an instance generator; in the
absence of any further information it is usual to assume that the instances are chosen
uniformly from the space of all possible inputs of a given size. In particular, for the
DLP it is usual to denote the order of g by r and to assume that h = g a where a is
chosen uniformly in Z/rZ. The output is the integer a (e.g., written in binary). The
input size depends on the specific group G and the method used to represent it. If h can
take all values in hgi then one needs at least log2 (r) bits to specify h from among the r
possibilities. Hence, the input size is at least log2 (r) bits. Similarly, if the output a is
uniformly distributed in Z/rZ then the output size is at least log2 (r) bits.
An algorithm to solve a computational problem is called deterministic if it does not
make use of any randomness. We will study the asymptotic complexity of deterministic
algorithms by counting the number of bit operations performed by the algorithm expressed
as a function of the input size. Upper bounds on the complexity are presented using big
O notation. When giving complexity estimates using big O notation we implicitly assume
that there is a countably infinite number of possible inputs to the algorithm.
Definition 2.1.2. Let f, g : N R>0 . Write f = O(g) if there are c R>0 and N N
such that
f (n) cg(n)
for all n N .
Similarly, if f (n1 , . . . , nm ) and g(n1 , . . . , nm ) are functions from Nm to R>0 then we
write f = O(g) if there are c R>0 and N1 , . . . , Nm N such that f (n1 , . . . , nm )
cg(n1 , . . . , nm ) for all (n1 , . . . , nm ) Nm with ni Ni for all 1 i m.
Example 2.1.3. 3n2 + 2n + 1 = O(n2 ), n + sin(n) = O(n), n100 + 2n = O(2n ), log10 (n) =
O(log(n)).
Exercise 2.1.4. Show that if f (n) = O(log(n)a ) and g(n) = O(log(n)b ) then (f +g)(n) =
f (n) + g(n) = O(log(n)max{a,b} ) and (f g)(n) = f (n)g(n) = O(log(n)a+b ). Show that
O(nc ) = O(2c log(n) ).
39
We also present the little o, soft O, big Omega and big Theta notation .
These will only ever be used in this book for functions of a single argument.
Definition 2.1.5. Let f, g : N R>0 . Write f (n) = o(g(n)) if
lim f (n)/g(n) = 0.
n
Exercise 2.1.8. Show that na log(log(n)) and na log(n) , for some a R>0 , are functions
that are (nc ) and O(cn ) for all c R>1 .
For more information about computational complexity, including the definitions of
complexity classes such as P and NP, see Chapters 2 to 4 of [596], Chapter 13 of [292],
Chapter 15 of [166], Chapter 7 of [564] or Chapter 34 of [145]. Definition 2.1.7 is for
uniform complexity, as a single algorithm A solves all problem instances. One can also
consider non-uniform complexity, where one has an algorithm A and, for each n N,
polynomially sized auxiliary input h(n) (the hint) such that if x is an n-bit instance of the
computational problem then A(x, h(n)) solves the instance. An alternative definition is a
sequence An of algorithms, one for each input size n N, and such that the description
of the algorithm is polynomially bounded. We stress that the hint is not required to be
efficiently computable. We refer to Section 4.6 of Talbot and Welsh [596] for details.
Complexity theory is an excellent tool for comparing algorithms, but one should always
be aware that the results can be misleading. For example, it can happen that there are
several algorithms to solve a computational problem and that the one with the best
complexity is slower than the others for the specific problem instance one is interested in
(for example, see Remark 2.2.5).
2.1.1
Randomised Algorithms
All our algorithms may be randomised, in the sense that they have access to a random
number generator. A deterministic algorithm should terminate after a finite number of
steps but a randomised algorithm can run forever if an infinite sequence of unlucky random choices is made.1 Also, a randomised algorithm may output an incorrect answer for
1 In algorithmic number theory it is traditional to allow algorithms that do not necessarily terminate,
whereas in cryptography it is traditional to consider algorithms whose running time is bounded (typically
by a polynomial in the input size). Indeed, in security reductions it is crucial that an adversary (i.e.,
randomised algorithm) always terminates. Hence, some of the definitions in this section (e.g., Las Vegas
algorithms) mainly arise in the algorithmic number theory literature.
40
2.1.2
Throughout the book we give very simple definitions (like Definition 2.1.1) for computational problems. However, it is more subtle to define what it means for a randomised
algorithm A to solve a computational problem. A perfect algorithm is one whose output is always correct (i.e., it always succeeds). We also consider algorithms that give the
correct answer only for some subset of the problem instances, or for all instances but only
with a certain probability.
The issue of whether an algorithm is successful is handled somewhat differently by
the two communities whose work is surveyed in this book. In the computational number
theory community, algorithms are expected to solve all problem instances with probability
of success close to 1. In the cryptography community it is usual to consider algorithms that
only solve some noticeable (see Definition 2.1.10) proportion of problem instances, and
even then only with some noticeable probability. The motivation for the latter community
2 An alternative definition is that a Las Vegas algorithm has finite expected running time, and outputs
either a correct result or the failure symbol .
41
42
2.1.3
Reductions
An oracle for a computational problem takes one unit of running time, independent of
the size of the instance, and returns an output. An oracle that always outputs a correct
answer is called a perfect oracle. One can consider oracles that only output a correct
answer with a certain noticeable probability (or advantage). For simplicity we usually
assume that oracles are perfect and leave the details in the general case as an exercise for
the reader. We sometimes use the word reliable for an oracle whose success probability
is overwhelming (i.e., success probability 1 where is negligible) and unreliable for
an oracle whose success probability is small (but still noticeable).
Note that the behaviour of an oracle is only defined if its input is a valid instance of the
computational problem it solves. Similarly, the oracle performs with the stated success
probability only if it is given problem instances drawn with the correct distribution from
the set of all problem instances.
43
Definition 2.1.15. A reduction from problem A to problem B is a randomised algorithm to solve problem A (running in expected polynomial-time and having noticeable
success probability) by making queries to an oracle (which succeeds with noticeable probability) to solve problem B.
If there is a reduction from problem A to problem B then we write5
A R B.
Theorem 2.1.16. Let A and B be computational problems such that A R B. If there
is a polynomial-time randomised algorithm to solve B then there is a polynomial-time
randomised algorithm to solve A.
A reduction between problems A and B therefore explains that if you can solve B
then you can solve A. This means that solving A has been reduced to solving problem
B and we can infer that problem B is at least as hard as problem A or that problem A
is no harder than problem B.
Since oracle queries take one unit of running time and reductions are polynomial-time
algorithms, a reduction makes only polynomially many oracle queries.
Definition 2.1.17. If there is a reduction from A to B and a reduction from B to A then
we say that problems A and B are equivalent and write A R B.
Some authors use the phrases polynomial-time reduction and polynomial-time
equivalent in place of reduction and equivalence. However, these terms have a technical
meaning in complexity theory that is different from reduction (see Section 34.3 of [145]).
Definition 2.1.15 is closer to the notion of Turing reduction, except that we allow randomised algorithms whereas a Turing reduction is a deterministic algorithm. We abuse
terminology and define the terms subexponential-time reduction and exponentialtime reduction by relaxing the condition in Definition 2.1.15 that the algorithm be
polynomial-time (these terms are used in Section 21.4.3).
2.1.4
Random Self-Reducibility
There are two different ways that an algorithm or oracle can be unreliable: First, it may
be randomised and only output the correct answer with some probability; such a situation
is relatively easy to deal with by repeatedly running the algorithm/oracle on the same
input. The second situation, which is more difficult to handle, is when there is a subset
of problem instances for which the algorithm or oracle extremely rarely or never outputs
the correct solution; for this situation random self-reducibility is essential. We give a
definition only for the special case of computational problems in groups.
Definition 2.1.18. Let P be a computational problem for which every instance of the
problem is an n1 -tuple of elements of some cyclic group G of order r and such that the
solution is an n2 -tuple of elements of G together with an n3 -tuple of elements of Z/rZ
(where n2 or n3 may be zero).
The computational problem P is random self-reducible if there is a polynomialtime algorithm that transforms an instance of the problem (with elements in a group G)
into a uniformly random instance of the problem (with elements in the same group G)
such that the solution to the original problem can be obtained in polynomial-time from
the solution to the new instance.
5 The subscript R denotes the word reduction and should also remind the reader that our reductions
are randomised algorithms.
44
The probability that all n trials are incorrect is at most (1 )n < (e )log(1/ )/ =
2.2
Integer Operations
We now begin our survey of efficient computer arithmetic. General references for this
topic are Section 9.1 of Crandall and Pomerance [161], Section 3.3 of Shoup [552], Section
4.3.1 of Knuth [340], Chapter 1 of Brent-Zimmermann [100] and von zur Gathen and
Gerhard [237].
Integers are represented as a sequence of binary words. Operations like add or multiply
may correspond to many bit or word operations. The length of an unsigned integer a
represented in binary is
log2 (a) + 1 if a 6= 0,
len(a) =
1
if a = 0.
For a signed integer we define len(a) = len(|a|) + 1.
45
4. Suppose |a| > |b|. One can compute q and r such that a = bq + r and 0 r < |b| in
O(log(b) log(q)) = O(log(b)(log(a) log(b) + 1)) bit operations.
Proof: Only the final statement is non-trivial. The school method of long division
computes q and r simultaneously and requires O(log(q) log(a)) bit operations. It is more
efficient to compute q first by considering only the most significant log2 (q) bits of a, and
then to compute r as abq. For more details see Section 4.3.1 of [340], Section 2.4 of [237]
or Section 3.3.4 of [552].
2.2.1
An important discovery is that it is possible to multiply integers more quickly than the
school method. General references for this subject include Section 9.5 of [161], Section
4.3.3 of [340], Section 3.5 of [552] and Section 1.3 of [100].
Karatsuba multiplication is based on the observation that one can compute (a0 +
2n a1 )(b0 + 2n b1 ), where a0 , a1 , b0 and b1 are n-bit integers, in three multiplications of
n-bit integers rather than four.
Exercise 2.2.3. Prove that the complexity of Karatsuba multiplication of n bit integers
is O(nlog2 (3) ) = O(n1.585 ) bit operations.
[Hint: Assume n is a power of 2.]
Toom-Cook multiplication is a generalisation of Karatsuba. Fix a value k and
suppose a = a0 + a1 2n + a2 22n + ak 2kn and similarly for b. One can think of a and b as
being polynomials in x of degree k evaluated at 2n and we want to compute the product
c = ab, which is a polynomial of degree 2k in x evaluated at x = 2n . The idea is to
compute the coefficients of the polynomial c using polynomial interpolation and therefore
to recover c. The arithmetic is fast if the polynomials are evaluated at small integer
values. Hence, we compute c(1) = a(1)b(1), c(1) = a(1)b(1), c(2) = a(2)b(2) etc.
The complexity of Toom-Cook multiplication for n-bit integers is O(nlogk+1 (2k+1) ) (e.g.,
when k = 3 the complexity is O(n1.465 )). For more details see Section 9.5.1 of [161].
Exercise 2.2.4. Give an algorithm for Toom-Cook multiplication with k = 3.
Sch
onhage-Strassen multiplication multiplies n-bit integers in nearly linear time,
namely O(n log(n) log(log(n))) bit operations, using the fast Fourier transform (FFT).
The F
urer algorithm is slightly better. These algorithms are not currently used in the
implementation of RSA or discrete logarithm cryptosystems so we do not describe them in
this book. We refer to Sections 9.5.2 to 9.5.7 of Crandall and Pomerance [161], Chapter
8 of von zur Gathen and Gerhard [237], Chapter 2 of Brent and Zimmermann [100],
Turk [607] and Chapter 4 of Borodin and Munro [88] for details.
46
Another alternative is residue number arithmetic which is based on the Chinese remainder theorem. It reduces large integer operations to modular computations for some
set of moduli. This idea may be useful if one can exploit parallel computation (though for
any given application there may be more effective uses for parallelism). These methods
are not used frequently for cryptography so interested readers are referred to Section II.1.2
of [64], Section 14.5.1 of [415], Remark 10.53(ii) of [16], and Section 4.3.2 of [340].
Remark 2.2.5. In practice, the school method is fastest for small numbers. The
crossover point (i.e., when Karatsuba becomes faster than the school method) depends
on the word size of the processor and many other issues, but seems to be for numbers of
around 300-1000 bits (i.e., 90-300 digits) for most computing platforms. For a popular 32
bit processor Zimmermann [638] reports reports that Karatsuba beats the school method
for integers of 20 words (640 bits) and Toom-Cook with k = 3 beats Karatsuba at 77
words (2464 bits). Bentahar [43] reports crossovers of 23 words (i.e., about 700 bits)
and 133 words (approximately 4200 bits) respectively. The crossover point for the FFT
methods is much larger. Hence, for elliptic curve cryptography at current security levels
the school method is usually used, while for RSA cryptography the Karatsuba method
is usually used.
Definition 2.2.6. Denote by M (n) the number of bit operations to perform a multiplication of n bit integers.
For the remainder of the book we assume that M (n) = c1 n2 for some constant c1
when talking about elliptic curve arithmetic, and that M (n) = c2 n1.585 for some constant
c2 when talking about RSA .
Applications of Newtons Method
Recall that if F : R R is differentiable and if x0 is an approximation to a zero of F (x)
then one can efficiently get a very close approximation to the zero by running Newtons
iteration
xn+1 = xn F (xn )/F (xn ).
Newtons method has quadratic convergence, in general, so the precision of the approximation roughly doubles at each iteration.
Integer Division
There are a number of fast algorithms to compute a/b for a, b N. This operation has
important applications to efficient modular arithmetic (see Section 2.5). Section 10.5 of
[16] gives an excellent survey.
We now present an application of Newtons method to this problem. The idea is to
to compute a good rational approximation to 1/a by finding a root of F (x) = x1 a.
Exercise 2.2.7. Show that the Newton iteration for F (x) = x1 a is xn+1 = 2xn ax2n .
First we recall that a real number can be represented by a rational approximation
b/2e where b, e Z. A key feature of this representation (based on the fact that division
by powers of 2 is easy) is that if we know that | b/2e | < 1/2k (i.e., the result is correct
to precision k) then we can renormalise the representation by replacing the approximation
b/2e by b/2ek /2k .
Suppose 2m a < 2m+1 . Then we take x0 = b0 /2e0 = 1/2m as the first approximation
to 1/a. In other words, b0 = 1 and e0 = m. The Newton iteration in this case is
47
en+1 = 2en and bn+1 = bn (2en +1 abn ) which requires two integer multiplications.
To prevent exponential growth of the numbers bn one can renormalise the representation
according to the expected precision of that step. One can show that the total complexity of
getting an approximation to 1/a of precision m is O(M (m)) bit operations. For details see
Section 3.5 of [552] (especially Exercise 3.35), Chapter 9 of [237] or, for a slightly different
formulation, Section 9.2.2 of [161]. Applications of this idea to modular arithmetic will
be given in Section 2.5.
Integer Approximations to Real Roots of Polynomials
Let F (x) Z[x]. Approximations to roots of F (x) in R can be computed using Newtons
method. As a special case, integer square roots of m-bit numbers can be computed in
time proportional to the cost of a multiplication of two m-bit numbers. Similarly, other
roots (such as cube roots) can be computed in polynomial-time.
Exercise 2.2.8. Show that the Newton iteration for computing a square root of a is
xn+1 = (xn + a/xn )/2. Hence, write down an algorithm to compute an integer approximation to the square root of a.
Exercise 2.2.8 can be used to test whether an integer a is a square. An alternative
is to compute the Legendre symbol ( ap ) for some random small primes p. For details see
Exercise 2.4.9.
Exercise 2.2.9. Show that if N = pe where p is prime and e 1 then one can factor N
in polynomial-time.
2.3
Euclids Algorithm
For a, b N, Euclids algorithm computes d = gcd(a, b). A simple way to express Euclids
algorithm is by the recursive formula
gcd(a, 0) = a
gcd(a, b) =
gcd(b, a (mod b)) if b 6= 0.
The traditional approach is to work with positive integers a and b throughout the algorithm and to choose a (mod b) to be in the set {0, 1, . . . , b 1}. In practice, the
algorithm can be used with a, b Z and it runs faster if we choose remainders in the
range {|b|/2 + 1, . . . , 1, 0, 1, . . . , |b|/2}. However, for some applications (especially,
those related to Diophantine approximation) the version with positive remainders is the
desired choice.
In practice we often want to compute integers (s, t) such that d = as + bt, in which
case we use the extended Euclidean algorithm (due to Lagrange). This is presented in
Algorithm 1, where the integers ri , si , ti always satisfy ri = si a + ti b.
Theorem 2.3.1. The complexity of Euclids algorithm is O(log(a) log(b)) bit operations.
Proof: Each iteration of Euclids algorithm involves computing the quotient and remainder of division of ri2 by ri1 where we may assume |ri2 | > |ri1 | (except maybe for
i = 1). By Lemma 2.2.2 this requires c log(ri1 )(log(ri2 )log(ri1 )+1) bit operations
for some constant c R>0 . Hence the total running time is at most
X
c
log(ri1 )(log(ri2 ) log(ri1 ) + 1).
i1
48
X
i1
Now, 2|ri | |ri1 | so 1 + log(ri ) log(ri1 ) hence all the terms in the above sum are
0. It follows that the algorithm performs O(log(a) log(b)) bit operations.
Exercise 2.3.2. Show that the complexity of Algorithm 1 is still O(log(a) log(b)) bit
operations even when the remainders in line 6 are chosen in the range 0 ri < ri1 .
A more convenient method for fast computer implementation is the binary Euclidean
algorithm (originally due to Stein). This uses bit operations such as division by 2 rather
than taking general quotients; see Section 4.5.2 of [340], Section 4.7 of [22], Chapter 3
of [237], Section 9.4.1 of [161] or Section 14.4.3 of [415].
There are subquadratic versions of Euclids algorithm. One can compute the extended
gcd of two n-bit integers in O(M (n) log(n)) bit operations. We refer to Section 9.4 of [161],
[579] or Section 11.1 of [237].
The rest of the section gives some results about Diophantine approximation that are
used later (for example, in the Wiener attack on RSA, see Section 24.5.1). We assume
that a, b > 0 and that the extended Euclidean algorithm with positive remainders is used
to generate the sequence of values (ri , si , ti ).
The integers si and ti arising from the extended Euclidean algorithm are equal, up to
sign, to the convergents of the continued fraction expansion of a/b. To be precise, if the
convergents of a/b are denoted hi /ki for i = 0, 1, . . . then, for i 1, si = (1)i1 ki1
and ti = (1)i hi1 . Therefore, the values (si , ti ) satisfy various equations, summarised
below, that will be used later in the book. We refer to Chapter 10 of [275] or Chapter 7
of [465] for details on continued fractions.
Lemma 2.3.3. Let a, b N and let ri , si , ti Z be the triples generated by running
Algorithm 1 in the case of positive remainders 0 ri < ri1 .
1. For i 1, |si | < |si+1 | and |ti | < |ti+1 |.
2. If a, b > 0 then ti > 0 when i 1 is even and ti < 0 when i is odd (and vice versa
for si ).
3. ti+1 si ti si+1 = (1)i+1 .
49
4. ri si1 ri1 si = (1)i b and ri ti1 ri1 ti = (1)i1 a. In other words, ri |si1 | +
ri1 |si | = b and ri |ti1 | + ri1 |ti | = a.
5. |a/b + ti /si | 1/|si si+1 |.
6. |ri si | < |ri si+1 | |b| and |ri ti | < |ri ti+1 | |a|.
7. If s, t Z are such that |a/b + t/s| < 1/(2s2 ) then (s, t) is (up to sign) one of the
pairs (si , ti ) computed by Euclids algorithm.
8. If r, s, t Z satisfy r = as + bt and |rs| < |b|/2 then (r, s, t) is (up to sign) one of
the triples (ri , si , ti ) computed by Euclids algorithm.
Proof: Statements 1, 2 and 3 are proved using the relation si = (1)i1 ki1 and ti =
(1)i hi1 where hi /ki are the continued fraction convergents to a/b. From Chapter 10
of [275] and Chapter 7 of [465] one knows that hm = qm+1 hm1 + hm2 and km =
qm+1 km1 + km2 where qm+1 is the quotient in iteration m + 1 of Euclids algorithm.
The first statement follows immediately and the third statement follows from the fact
that hm km1 hm1 km = (1)m1 . The second statement follows since a, b > 0 implies
hi , ki > 0.
Statement 4 can be proved by induction, using the fact that ri+1 si ri si+1 = (ri1
qi ri )si ri (si1 qi si ) = (ri si1 ri1 si ). Statement 5 is the standard result (equation
(10.7.7) of [275], Theorem 7.11 of [465]) that the convergents of a/b satisfy |a/bhm/km | <
1/|km km+1 |. Statement 6 follows directly from statements 2 and 4. For example, a =
ri (1)i1 ti + ri1 (1)i ti and both terms on the right hand side are positive.
Statement 7 is also a standard result in Diophantine approximation; see Theorem 184
of [275] or Theorem 7.14 of [465].
Finally, to prove statement 8, suppose r, s, t Z are such that r = as + bt and
|rs| < |b|/2. Then
|a/b + t/s| = |(as + bt)/bs| = |r|/|bs| = |rs|/|bs2 | < 1/(2s2 ).
The result follows from statement 7.
Example 2.3.4. The first few terms of Euclids algorithm on a = 513 and b = 311 give
i
-1
0
1
2
3
4
5
ri
513
311
202
109
93
16
13
qi
1
1
1
1
5
si
1
0
1
1
2
3
17
ti
0
1
1
2
3
5
-28
|ri si |
513
0
202
109
186
48
221
|ri ti |
0
311
202
218
279
80
364
One can verify that |ri si | |b| and |ri ti | |a|. Indeed, |ri si+1 | |b| and |ri ti+1 | |a|
as stated in part 6 of Lemma 2.3.3.
Diophantine approximation is the study of approximating real numbers by rationals.
Statement 7 in Lemma 2.3.3 is a special case of one of the famous results; namely that
the best rational approximations to real numbers are given by the convergents in their
continued fraction expansion. Lemma 2.3.5 shows how the result can be relaxed slightly,
giving less good rational approximations in terms of convergents to continued fractions.
Lemma 2.3.5. Let R, c R>0 and let s, t N be such that | t/s| < c/s2 . Then
(t, s) = (uhn+1 vhn , ukn+1 vkn ) for some n, u, v Z0 such that uv < 2c.
50
2
expansion of d. Then h2nr1 dknr1
= (1)nr for n N. Furthermore, every solution
2
2
of the equation x dy = 1 arises in this way.
Proof: See Corollary 7.23 of [465].
2.4
1
a
0
=
p
1
Y a ei
i
pi
(n1)/2
. In other words
( 1
n ) = (1)
2
( n2 ) = (1)(n
1)/8
1
n
1
1
if n 1 (mod 4),
otherwise
. In other words
2
1
=
1
n
if n 1, 7 (mod 8),
otherwise
51
Proof: See Section II.2 of [345], Sections 3.1, 3.2 and 3.3 of [465] or Chapter 6 of [275].
An important fact is that it is not necessary to factor integers to compute the Jacobi
symbol.
Exercise 2.4.4. Write down an algorithm to compute Legendre and Jacobi symbols
using quadratic reciprocity.
Exercise 2.4.5. Prove that the complexity of computing ( m
n ) is O(log(m) log(n)) bit
operations.
Exercise 2.4.6. Give a randomised algorithm to compute a quadratic non-residue modulo p. What is the expected complexity of this algorithm?
Exercise 2.4.7. Several applications require knowing a quadratic non-residue modulo a
prime p. Prove that the values a in the following table satisfy ( ap ) = 1.
p
p 3 (mod 4)
p 1 (mod 4), p 2 (mod 3)
p 1 (mod 4), p
6 1 (mod 8)
p 1 (mod 8), p
6 1 (mod 16)
a
1
3
(1 + 1)/ 2
Remark 2.4.8. The problem of computing quadratic non-residues has several algorithmic implications. One conjectures that the least quadratic non-residue modulo p
is O(log(p) log(log(p))).
Burgess proved that the least quadratic non-residue modulo p is
at most p1/(4 e)+o(1) p0.151633+o(1) while Ankeny showed, assuming the extended Riemann hypothesis, that it is O(log(p)2 ). We refer to Section 8.5 of Bach and Shallit [22]
for details and references. It follows that one can compute a quadratic non-residue in
O(log(p)4 ) bit operations assuming the extended Riemann hypothesis.
Exercise 2.4.9. Give a Las Vegas algorithm to test whether a N is a square by
computing ( ap ) for some random small primes p. What is the complexity of this algorithm?
Exercise 2.4.10. Let p be prime. In Section 2.8 we give algorithms to compute modular
exponentiation quickly. Compare the cost of computing ( ap ) using quadratic reciprocity
versus using Eulers criterion.
Remark 2.4.11. An interesting computational problem (considered, for example, by
a+k1
Damg
ard [163]) is: given a prime p an integer k and the sequence ( ap ), ( a+1
)
p ), . . . , (
p
a+k
to output ( p ). A potentially harder problem is to determine a given the sequence
of values. It is known that if k is a little larger than log2 (p) then a is usually uniquely
determined modulo p and so both problems make sense. No efficient algorithms are known
to solve either of these problems. One can also consider the natural analogue for Jacobi
symbols. We refer to [163] for further details. This is also discussed as Conjecture 2.1 of
Boneh and Lipton [83]. The pseudorandomness of the sequence is discussed by Mauduit
and S
arkozy [398] and S
arkozy and Stewart [506].
52
Finally, we remark that one can compute the Legendre or Jacobi symbol of n-bit
integers in O(M (n) log(n)) operations using an analogue of fast algorithms for computing
gcds. We refer to Exercise 5.52 (also see pages 343-344) of Bach and Shallit [22] or Brent
and Zimmermann [101] for the details.
2.5
Modular Arithmetic
53
For further details see Section 9.2.1 of [161], Section II.1.4 of [64], Section 11.1.2.b
of [16] or Section 2.2.4 of [273].
Faster Modular Reduction
Using Newtons method to compute a/n one can compute a (mod n) using only multiplication of integers. If a = O(n2 ) then the complexity is O(M (log(n))). The basic idea
is to use Newtons method to compute a rational approximation to 1/a of the form b/2e
(see Section 2.2.1) and then compute q = n/a = nb/2e and thus r = a nq is the
remainder. See Exercises 3.35, 3.36 of [552] and Section 9.1 of [237] for details. For large
a the cost of computing a (mod n) remains O(log(a) log(n)) as before. This idea gives
rise to Barret reduction; see Section 9.2.2 of [161], Section 2.3.1 of [100], Section 14.3.3
of [415], Section II.1.3 of [64], or Section 10.4.1 of [16].
Special Moduli
For cryptography based on discrete logarithms, especially elliptic curve cryptography,
it is recommended to use primes of a special form to speed up arithmetic modulo p.
Commonly used primes are of the form p = 2k c for some small c N or the NIST
primes p = 2nk w 2nk1 w 2n1 w 1 where w = 16, 32 or 64. In these cases it
is possible to compute reduction modulo p much more quickly than for general p. See
Section 2.2.6 of [273], Section 14.3.4 of [415] or Section 10.4.3 of [16] for examples and
details.
Modular Inversion
Suppose that a, n N are such that gcd(a, n) = 1. One can compute a1 (mod n) using
the extended Euclidean algorithm: computing integers s, t Z such that as+ nt = 1 gives
a1 s (mod n). Hence, if 0 < a < n then one can compute a1 (mod n) in O(log(n)2 )
bit operations, or faster using subquadratic versions of the extended Euclidean algorithm.
In practice, modular inversion is significantly slower than modular multiplication. For
example, when implementing elliptic curve cryptography it is usual to assume that the
cost of an inversion in Fp is between 8 and 50 times slower than the cost of a multiplication
in Fp (the actual figure depends on the platform and algorithms used).
Simultaneous Modular Inversion
One can compute a1
(mod n), . . . , a1
m (mod n) with a single inversion modulo n and
1
a number of multiplications modulo n using a trick due to Montgomery. Namely, one
computes b = a1 am (mod n), computes b1 (mod n), and then recovers the individual
a1
i .
Exercise 2.5.5. Give pseudocode for simultaneous modular inversion and show that it
requires one inversion and 3(m 1) modular multiplications.
2.6
The Chinese Remainder Theorem (CRT) states that if gcd(m1 , m2 ) = 1 then there is a
unique solution 0 x < m1 m2 to x ci (mod mi ) for i = 1, 2. Computing x can be
done in polynomial-time in various ways. One method is to use the formula
x = c1 + (c2 c1 )(m1
1 (mod m2 ))m1 .
54
This is a special case of Garners algorithm (see Section 14.5.2 of [415] or Section 10.6.4
of [16]).
Exercise 2.6.1. Suppose m1 < m2 and 0 ci < mi . What is the input size of the
instance of the CRT? What is the complexity of computing the solution?
Exercise 2.6.2. Let n > 2 and suppose coprime integers 2 m1 < Q
< mn and
n
integers c1 , . . . , cn such that 0 ci < mi for 1 i n are given. Let N = i=1 mi . For
1
1 i n define Ni = N/mi and ui = Ni (mod mi ). Show that
x=
n
X
ci u i N i
(2.1)
i=1
n
X
i=1
(ci ui (Ni (mod p)) (mod p)) (N (mod p))(z (mod p)) (mod p).
(2.2)
Show that one can therefore compute x using equation (2.2) and representing z as a
floating point number in a way that does not require knowing more than one of the values
ci at a time. Show that one can precompute N (mod p) and Ni (mod p) for 1 i n
in O(n(log(mn ) log(p) + M (log(p))) bit operations. Hence show that the complexity of
solving the explicit CRT is (assuming the floating point operations can be ignored) at
most O(n(log(mn ) log(p) + M (log(p)))) bit operations.
2.7
Linear Algebra
Let A be an n n matrix over a field k. One can perform Gaussian elimination to solve
the linear system Ax = b (or determine there are no solutions), to compute det(A), or to
compute A1 in O(n3 ) field operations. When working over R a number of issues arise
55
due to rounding errors, but no such problems arise when working over finite fields. We
refer to Section 3.3 of Joux [314] for details.
A matrix is called sparse if almost all entries of each row are zero. To make this precise
one usually considers the asymptotic complexity of an algorithm on m n matrices, as
m and/or n tends to infinity, and where the number of non-zero entries in each row is
bounded by O(log(n)) or O(log(m)).
One can compute the kernel (i.e., a vector x such that Ax = 0) of an n n sparse
matrix A over a field in O(n2 ) field operations using the algorithms of Wiedemann [625]
or Lanczos [360]. We refer to Section 3.4 of [314] or Section 12.4 of [237] for details.
Hermite Normal Form
When working over a ring the Hermite normal form (HNF) is an important tool for
solving or simplifying systems of equations. Some properties of the Hermite normal form
are mentioned in Section A.11.
Algorithms to compute the HNF of a matrix are given in Section 2.4.2 of Cohen [135],
Hafner and McCurley [272], Section 3.3.3 of Joux [314], Algorithm 16.26 of von zur
Gathen and Gerhard [237], Section 5.3 of Schrijver [527], Kannan and Bachem [328],
Storjohann and Labahn [589], and Micciancio and Warinschi [422]. Naive algorithms to
compute the HNF suffer from coefficient explosion, so computing the HNF efficiently in
practice, and determining the complexity of the algorithm, is non-trivial. One solution is
to work modulo the determinant (or a sub-determinant) of the matrix A (see Section 2.4.2
of [135], [272] or [589] for further details). Let A = (Ai,j ) be an n m matrix over Z and
define kAk = maxi,j {|Ai,j |}. The complexity of the HNF algorithm of Storjohann and
Labahn on A (using naive integer and matrix multiplication) is O(nm4 log(kAk )2 ) bit
operations.
One can also use lattice reduction to compute the HNF of a matrix. For details see
page 74 of [527], Havas, Majewski and Matthews [279], or van der Kallen [325].
2.8
Modular Exponentiation
Exponentiation modulo n can be performed in polynomial-time by the square-andmultiply method.7 This method is presented in Algorithm 2; it is called a left-to-right
algorithm as it processes the bits of the exponent m starting with the most significant
bits. Algorithm 2 can be applied in any group, in which case the complexity is O(log(m))
times the complexity of the group operation. In this section we give some basic techniques
to speed-up the algorithm; further tricks are described in Chapter 11.
Lemma 2.8.1. The complexity of Algorithm 2 using naive modular arithmetic is O(log(m) log(n)2 )
bit operations.
Exercise 2.8.2. Prove Lemma 2.8.1.
Lemma 2.8.3. If Montgomery multiplication (see Section 2.5) is used then the complexity
of Algorithm 2.5 is O(log(n)2 + log(m)M (log(n))).
Proof: Convert g to Montgomery representation g in O(log(n)2 ) bit operations. Algorithm 2 then proceeds using Montgomery multiplication in lines 5 and 7, which requires
O(log(m)M (log(n))) bit operations. Finally Montgomery reduction is used to convert
the output to standard form.
7 This
56
57
Exercise 2.8.6. Write pseudocode for the sliding window method. Show that the precomputation stage requires one squaring and 2w1 1 multiplications.
Exercise 2.8.7. Show that the expected number of squarings between each multiply
in the sliding window algorithm is w + 1. Hence show that (ignoring the precomputation) exponentiation using sliding windows requires log(m) squarings and, on average,
log(m)/(w + 1) multiplications.
Exercise 2.8.8. Consider running the sliding window method in a group, with varying g
and m (so the powers of g must be computed for every exponentiation) but with unlimited
storage. For a given bound on len(m) one can compute the value for w that minimises
the total cost. Verify that the choices in the following table are optimal.
len(m)
w
80
3
160
4
300
5
800
6
2000
7
Exercise 2.8.9. Algorithm 2 processes the bits of the exponent m from left to right.
Give pseudocode for a modular exponentiation algorithm that processes the bits of the
exponent m from right to left.
i
[Hint: Have two variables in the P
main loop; one that stores g 2 in the i-th iteration, and
i
j
the other that stores the value g j=0 aj 2 .]
Exercise 2.8.10. Write pseudocode for a right to left sliding window algorithm for computing g m (mod n), extending Exercise 2.8.9. Explain why this variant is not appropriate
when using precomputation (hence, it is not so effective when computing g m (mod n) for
many random m but when g is fixed).
One can also consider the opposite scenario where one is computing g m (mod n) for a
fixed value m and varying g. Again, with some precomputation, and if there is sufficient
storage available, one can get an improvement over the naive algorithm. The idea is
to determine an efficient addition chain for m. This is a sequence of squarings and
multiplications, depending on m, that minimises the number of group operations. More
precisely, an addition chain of length l for m is a sequence m1 , m2 , . . . , ml of integers
such that m1 = 1, ml = m and, for each 2 i l we have mi = mj + mk for some
1 j k < i. One computes each of the intermediate values g mi for 2 i l with
one group operation. Note that all these intermediate values are stored. The algorithm
requires l group operations and l group elements of storage.
It is conjectured by Stolarsky that every integer m has an addition chain of length
log2 (m) + log2 (wt(m)) where wt(m) is the Hamming weight of m (i.e., the number of
ones in the binary expansion of m). There is a vast literature on addition chains, we refer
to Section C6 of [271], Section 4.6.3 of [340] and Section 9.2 of [16] for discussion and
references.
Exercise 2.8.11. Prove that an addition chain has length at least log2 (m).
2.9
There are a number of situations in this book that require computing square roots modulo
a prime. Let p be an odd prime and let a N. We have already shown that Legendre
58
If ( ap ) = 1 then x =
This result can be verified directly by computing x2 , but we give a more grouptheoretic proof that helps to motivate the general algorithm.
Proof: Since p 3 (mod 4) it follows that q = (p 1)/2 is odd. The assumption ( ap ) = 1
implies that aq = a(p1)/2 1 (mod p) and so the order of a is odd. Therefore a square
root of a is given by
1
x = a2 (mod q) (mod p).
Now, 21 (mod q) is just (q + 1)/2 = (p + 1)/4.
Lemma 2.9.2. Let p be a prime and suppose that a is a square modulo p. Write p 1 =
2e q where q is odd. Let w = a(q+1)/2 (mod p). Then w2 ab (mod p) where b has order
dividing 2e1 .
Proof: We have
w2 aq+1 aaq (mod p)
so b aq (mod p). Now a has order dividing (p 1)/2 = 2e1 q so b has order dividing
2e1 .
The value w is like a first approximation to the square root of a modulo p. To
complete the computation it is therefore sufficient to compute a square root of b.
Lemma 2.9.3. Suppose 1 < n < p is such that ( np ) = 1. Then y nq (mod p) has
order 2e .
Proof: The order of y is a divisor of 2e . The fact n(p1)/2 1 (mod p) implies that y
e1
satisfies y 2
1 (mod p). Hence the order of y is equal to 2e .
Since Zp is a cyclic group, it follows that y generates the full subgroup of elements of
order dividing 2e . Hence, b = y i (mod p) for some 1 i 2e . Furthermore, since the
order of b divides 2e1 it follows that i is even.
Writing i = 2j and x = w/y j (mod p) then
x2 w2 /y 2j ab/b a (mod p).
Hence, if one can compute i then one can compute the square root of a.
If e is small then the value i can be found by a brute-force search. A more advanced
method is to use the Pohlig-Hellman method to solve the discrete logarithm of b to the
base y (see Section 13.2 for an explanation of these terms). This idea leads to the TonelliShanks algorithm for computing square roots modulo p (see Section 1.3.3 of [64] or Section
1.5 of [135]).
59
Lemma 2.9.5. The Tonelli-Shanks method is a Las Vegas algorithm with expected running time O(log(p)2 M (log(p))) bit operations.
Proof: The first step of the algorithm is the requirement to find an integer n such that
( np ) = 1. This is Exercise 2.4.6 and it is the only part of the algorithm that is randomised
and Las Vegas. The expected number of trials is 2. Since one can compute the Legendre
symbol in O(log(p)2 ) bit operations, this gives O(log(p)2 ) expected bit operations, which
is less than O(log(p)M (log(p))).
The remaining parts of the algorithm amount to exponentiation modulo p, requiring
O(log(p)M (log(p))) bit operations, and the computation of the index j. Naively, this
could require as many as p 1 operations, but using the Pohlig-Hellman method (see
Exercise 13.2.6 in Section 13.2) brings the complexity of this stage to O(log(p)2 M (log(p)))
bit operations.
As we will see in Exercise 2.12.6, the worst-case complexity O(log(p)2 M (log(p))) of the
Tonelli-Shanks algorithm is actually worse than the cost of factoring quadratic polynomials using general polynomial-factorisation algorithms. But, in most practical situations,
the Tonelli-Shanks algorithm is faster than using polynomial factorisation.
Exercise 2.9.6. If one precomputes y for a given prime p then the square root algorithm
becomes deterministic. Show that the complexity remains the same.
Exercise 2.9.7. Show, using Remark 2.4.8, that under the extended Riemann hypothesis
one can compute square roots modulo p in deterministic O(log(p)4 ) bit operations.
Exercise 2.9.8. Let r N. Generalise the Tonelli-Shanks algorithm so that it computes
r-th roots in Fp (the only non-trivial case being when p 1 (mod r)).
Exercise 2.9.9. (Atkin) Let p 5 (mod 8) be prime and a Z such that ( ap ) = 1. Let
z = (2a)(p5)/8 (mod p) and i = 2az 2 (mod p). Show that i2 = 1 (mod p) and that
w = az(i 1) satisfies w2 a (mod p).
If p 1 is highly divisible by 2 then an algorithm due to Cipolla, sketched in Exercise 2.9.10 below, is more suitable (see Section 7.2 of [22] or Section 3.5 of [415]). See
Bernstein [44] for further discussion. There is a completely different algorithm due to
Schoof that is deterministic and has polynomial-time complexity for fixed a as p tends to
infinity.
Exercise 2.9.10. (Cipolla) Let p be prime and a Z. Show that if t Z is such that
2
(p+1)/2
( t 4a
in Fp [x]/(x2 tx + a) is a square root of a modulo p. Hence
p ) = 1 then x
write down an algorithm to compute square roots modulo p and show that it has expected
running time O(log(p)M (log(p))) bit operations.
60
We remark that, in some applications, one wants to compute a Legendre symbol to test
whether an element is a square and, if so, compute the square root. If one computes the
Legendre symbol using Eulers criterion as a(p1)/2 (mod p) then one will have already
computed aq (mod p) and so this value can be recycled. This is not usually faster
than using quadratic reciprocity for large p, but it is relevant for applications such as
Lemma 21.4.9.
A related topic is, given a prime p and an integer d > 0, to find integer solutions
(x, y), if they exist, to the equation x2 + dy 2 = p. The Cornacchia algorithm achieves
this. The algorithm is given in Section 2.3.4 of Crandall and Pomerance [161], and a
proof of correctness is given in Section 4 of Schoof [526] or Morain and Nicolas [434].
In brief, the algorithm computes p/2 < x0 < p such that x20 d (mod p), then
runs the Euclidean algorithm on 2p and x0 stopping at the first remainder r < p,
p
(p r2 )/d if this is an integer. The output is (x, y) = (r, s).
then computes s =
The complexity is dominated by computing the square root modulo p, and so is an
expected O(log(p)2 M (log(p))) bit operations. A closely related algorithm finds solutions
to x2 + dy 2 = 4p.
2.10
Polynomial Arithmetic
61
F (x) (mod G(x)) in k[x] can also be used with polynomials; see Section 9.6.2 of [161]
or Section 11.1 of [237] for details. Fast variants of algorithms for the extended Euclidean algorithm for polynomials in k[x] of degree bounded by d require O(M (d) log(d))
multiplications in k and O(d) inversions in k (Corollary 11.6 of [237]).
Kronecker substitution is a general technique which transforms polynomial multiplication into integer multiplication. It allows multiplication of two degree d polynomials
in Fq [x] (where q is prime) in O(M (d(log(q)+ log(d)))) = O(M (d log(dq))) bit operations;
see Section 1.3 of [100], Section 8.4 of [237] or Section 18.6 of [552]. Kronecker substitution can be generalised to bivariate polynomials and hence to polynomials over Fq where
q is a prime power. We write M (d, q) = M (d log(dq)) for the number of bit operations to
multiply two degree d polynomials over Fq .
Exercise 2.10.4. Show that Montgomery reduction and multiplication can be implemented for arithmetic modulo a polynomial F (x) Fq [x] of degree d.
Exercise 2.10.5. One can evaluate a polynomial F (x) R[x] at a value a R efficiently
Pd
using Horners rule. More precisely, if F (x) = i=0 Fi xi then one computes F (a) as
( ((Fd a)a + Fd1 )a + + F1 )a + F0 . Write pseudocode for Horners rule and show
that the method requires d additions and d multiplications if deg(F (x)) = d.
2.11
Efficient algorithms for arithmetic modulo p have been presented, but we now consider
arithmetic in finite fields Fpm when m > 1. We assume Fpm is represented using either a
polynomial basis (i.e., as Fp [x]/(F (x))) or a normal basis. Our main focus is when either
p is large and m is small, or vice versa. Optimal asymptotic complexities for the case
when both p and m grow large require some care.
Exercise 2.11.1. Show that addition and subtraction of elements in Fpm requires O(m)
additions in Fp . Show that multiplication in Fpm , represented by a polynomial basis
and using naive methods, requires O(m2 ) multiplications modulo p and O(m) reductions
modulo p.
If p is constant and m grows then multiplication in Fpm requires O(m2 ) bit operations
or, using fast polynomial arithmetic, O(M (m)) bit operations. If m is fixed and p goes
to infinity then the complexity is O(M (log(p))) bit operations.
Inversion of elements in Fpm = Fp [x]/(F (x)) can be done using the extended Euclidean
algorithm in O(m2 ) operations in Fp . If p is fixed and m grows then one can invert
elements in Fpm in O(M (m) log(m)) bit operations.
Alternatively, for any vector space basis {1 , . . . , m } for Fqm over Fq there is an mm
matrix M over Fq such that the product ab for a, b Fqm is given by
(a1 , . . . , am )M (b1 , . . . , bm )t =
m X
m
X
Mi,j ai bj
i=1 j=1
where (a1 , . . . , am ) and (b1 , . . . , bm ) are the coefficient vectors for the representation of a
and b with respect to the basis.
m1
} then multipliIn particular, if Fqm is represented by a normal basis {, q , . . . , q
cation of elements in normal basis representation is given by
! m1
m1
m1
X m1
X
X
X
i
j
j
i
ai bj q +q
bj q =
ai q
i=0
j=0
i=0 j=0
62
m1
Y
gq .
i=1
g 1 = g 2
11
m1
= g 2(2
1)
= 1 as
= g 2(1+2+2
++2m2 )
The Itoh-Tsujii algorithm then follows from a further idea, which is that one can compute
m1
1
g2
in fewer than m multiplications using an appropriate addition chain. We give
k
the details only in the special case where m = 2k + 1. Since 2m1 1 = 22 1 =
i
k1
k1
k1
2
(22
1)22
+ (22
1) it is sufficient to compute the sequence g 2 1 iteratively
for i = 0, 1, . . . , k, each step taking some shifts and one multiplication in the field. In
other words, the complexity in this case is O(km2 log(q)2 ) = O(log(m)m2 log(q)2 ) field
operations. For details of the general case, and further discussion we refer to [305, 269].
See, for example, Fong, Hankerson, Lopez and Menezes [206] for more discussion about
inversion for the fields relevant for elliptic curve cryptography.
Finally we remark that, for some computational devices, it is convenient to use finite
fields Fpm where p 232 or p 264 . These are called optimal extension fields and we
refer to Section 2.4 of [273] for details.
2.12
63
There is a large literature on polynomial factorisation and we only give a very brief
sketch of the main concepts. The basic ideas go back to Berlekamp and others. For full
discussion, historical background, and extensive references see Chapter 7 of Bach and
Shallit [22] or Chapter 14 of von zur Gathen and Gerhard [237]. One should be aware
that for polynomials over fields of small characteristic the algorithm by Niederreiter [463]
can be useful.
Let F (x) Fq [x] have degree d. If there exists G(x) Fq [x] such that G(x)2 | F (x)
then G(x) | F (x) where F (x) is the derivative of F (x). A polynomial is squarefree if it has no repeated factor. It follows that F (x) is square-free if F (x) 6= 0 and
gcd(F (x), F (x)) = 1. If F (x) Fq [x] and S(x) = gcd(F (x), F (x)) then F (x)/S(x) is
square-free.
Exercise 2.12.1. Determine the complexity of testing whether a polynomial F (x) Fq [x]
is square-free.
Exercise 2.12.2. Show that one can reduce polynomial factorisation over finite fields to
the case of factoring square-free polynomials.
Finding Roots of Polynomials in Finite Fields
Let F (x) Fq [x] have degree d. The roots of F (x) in Fq are precisely the roots of
R1 (x) = gcd(F (x), xq x).
(2.3)
If q is much larger than d then the efficient way to compute R1 (x) is to compute
xq (mod F (x)) using a square-and-multiply algorithm and then run Euclids algorithm.
Exercise 2.12.3. Determine the complexity of computing R1 (x) in equation (2.3). Hence
explain why the decision problem Does F (x) have a root in Fq ? has a polynomial-time
solution.
The basic idea of root-finding algorithms is to note that, if q is odd, xq x =
x(x(q1)/2 + 1)(x(q1)/2 1). Hence, one can try to split9 R1 (x) by computing
gcd(R1 (x), x(q1)/2 1).
(2.4)
64
Exercise 2.12.5. Write down pseudocode for the above root finding algorithm and show
that its expected complexity (without using a fast Euclidean algorithm) is bounded by
O(log(d)(log(q)M (d) + d2 )) = O(log(q) log(d)d2 ) field operations.
Exercise 2.12.6. Let q be an odd prime power and R(x) = x2 + ax + b Fq [x]. Show
that the expected complexity of finding roots of R(x) using polynomial factorisation is
O(log(q)M (log(q))) bit operations.
Exercise 2.12.7. Show, using Kronecker substitution, fast versions of Euclids algorithm and other tricks, that one can compute one root in Fq (if any exist) of a degree d
polynomial in Fq [x] in an expected O(log(qd)M (d, q)) bit operations.
When q is even (i.e., q = 2m ) then, instead of x(q1)/2 , one considers the trace
Pm1 2i
polynomial T (x) =
i=0 x . (A similar idea can be used over any field of small
characteristic.)
Exercise 2.12.8. Show that the roots of the polynomial gcd(R1 (x), T (x)) are precisely
the Fq such that R1 () = 0 and TrF2m /F2 () = 0.
Taking random u(x) F2m [x] of degree < d and then computing gcd(R1 (x), T (u(x)))
gives a Las Vegas root finding algorithm as before. See Section 21.3.2 of [552] for details.
Higher Degree Factors
Having found the roots in Fq one can try to find factors of larger degree. The same ideas
can be used. Let
2
R2 (x) = gcd(F (x)/R1 (x), xq x), R3 (x) = gcd(F (x)/(R1 (x)R2 (x)), xq x), . . . .
Exercise 2.12.9. Show that all irreducible factors of Ri (x) over Fq [x] have degree i.
Exercise 2.12.10. Give an algorithm to test whether a polynomial F (x) Fq [x] of
degree d is irreducible. What is the complexity?
When q is odd one can factor Ri (x) using similar ideas to the above, i.e., by computing
gcd(Ri (x), u(x)(q
1)/2
1).
65
2.13
Hensel Lifting
Hensel lifting is a tool for solving polynomial equations of the form F (x) 0 (mod pe )
where p is prime and e N>1 . One application of Hensel lifting is the Takagi variant of
RSA, see Example 24.1.6. The key idea is given in the following Lemma.
Lemma 2.13.1. Let F (x) Z[x] be a polynomial and p a prime. Let xk Z satisfy
F (xk ) 0 (mod pk ) where k N. Suppose F (xk ) 6 0 (mod p). Then one can compute
xk+1 Z in polynomial-time such that F (xk+1 ) 0 (mod pk+1 ).
Proof: Write xk+1 = xk + pk z where z is a variable. Note that F (xk+1 ) 0 (mod pk ).
One has
F (xk+1 ) F (xk ) + pk F (xk )z (mod pk+1 ).
Setting z = (F (xk )/pk )F (xk )1 (mod p) gives F (xk+1 ) 0 (mod pk+1 ).
2.14
We present some algorithms for constructing finite fields Fpm when m > 1, solving equations in them, and transforming between different representations of them.
2.14.1
Lemma A.5.1 implies a randomly chosen monic polynomial in Fq [x] of degree m is irreducible with probability 1/(2m). Hence, using the algorithm of Exercise 2.12.10 one
can generate a random irreducible polynomial F (x) Fq [x] of degree m, using naive
arithmetic, in O(m4 log(q)) operations in Fq . In other words, one can construct a polynomial basis for Fqm in O(m4 log(q)) operations in Fq . This complexity is not the best
known.
Constructing a Normal Basis
We briefly survey the literature on constructing normal bases for finite fields. We assume
that a polynomial basis for Fqm over Fq has already been computed.
The simplest randomised algorithm is to choose Fqm at random and test whether
m1
the set {, q , . . . , q
} is linearly independent over Fq . Corollary 3.6 of von zur Gathen
66
and Giesbrecht [238] (also see Theorem 3.73 and Exercise 3.76 of [385, 386]) shows that
a randomly chosen is normal with probability at least 1/34 if m < q 4 and probability
at least 1/(16 logq (m)) if m q 4 .
Exercise 2.14.1. Determine the complexity of constructing a normal basis by randomly
choosing .
When q > m(m 1) there is a better randomised algorithm based on the following
result.
Theorem 2.14.2. Let F (x) Fq [x] be irreducible of degree m and let Fqm be any
root of F (x). Define G(x) = F (x)/((x )F ()) Fqm [x]. Then there are q m(m 1)
elements u Fq such that = G(u) generates a normal basis.
Proof: See Theorem 28 of Section II.N of Artin [14] or Section 3.1 of Gao [235].
Deterministic algorithms for constructing a normal basis have been given by L
uneburg [396]
and Lenstra [376] (also see Gao [235]).
2.14.2
i
g write
Precompute x as a polynomial in x. Let g =
i=0 ai x . To compute
(assuming m is odd; the case of m even is similar)
g = a0 + a2 x2 + + am1 xm1 + x a1 + a3 x2 + + am2 xm3 .
Show that
g = a0 + a2 x + xm1 x(m1)/2 + x a1 + a3 x + + am2 x(m3)/2 .
Show that this computation takes roughly half the cost of one field multiplication, and
hence O(m2 ) bit operations.
Exercise 2.14.6. Generalise Exercise 2.14.5 to computing p-th roots in Fpm . Show that
the method requires (p 1) multiplications in Fpm .
67
(2.5)
where F2m .
Exercise 2.14.7. Prove that the equation x2 + x = has a solution x F2m if and
only if TrF2m /F2 () = 0.
Lemma 2.14.8. If m is odd (we refer to Section II.2.4 of [64] for the case where m is
even) then a solution to equation (2.5) is given by the half trace
(m1)/2
x=
2i
2 .
(2.6)
i=0
Exercise 2.14.9. Prove Lemma 2.14.8. Show that the complexity of solving quadratic
equations in Fq when q = 2m and m is odd is an expected O(m3 ) bit operations (or
O(m2 ) bit operations when a normal basis is being used).
The expected complexity of solving quadratic equations in F2m when m is even is
O(m4 ) bit operations, or O(m3 ) when a normal basis is being used. Hence, we can make
the statement that the complexity of solving a quadratic equation over any field Fq is an
expected O(log(q)4 ) bit operations.
2.14.3
68
isomorphisms between any two polynomial bases. We assume that one already has an
isomorphism between the corresponding representations of the subfield Fq .
Let {1 , . . . , m } be the vector space basis over Fq for one of the representations of
Fqm . The first task is to compute an isomorphism from this representation to a polynomial
representation. To do this one computes a polynomial basis for Fqm over Fq using the
method above. One now
Pmhas a monic irreducible polynomial F (x) Fq [x] of degree m and
a representation x = i=1 ai i for a root of F (x) in Fqm . Determine the representations of
x2 , x3 , . . . , xm over the basis {1 , . . . , m }. This gives an isomorphism from Fq [x]/(F (x))
to the original representation of Fqm . By solving a system of linear equations, one can
express each of 1 , . . . , m with respect to the polynomial basis; this gives the isomorphism
from the original representation to Fq [x]/(F (x)). The above ideas appear in a special case
in the work of Zierler [637].
Exercise 2.14.12. Determine the complexity of the above algorithm to give an isomorphism between an arbitrary vector space representation of Fqm and a polynomial basis
for Fqm .
Finally, it remains to compute an isomorphism between any two polynomial representations Fq [x]/(F1 (x)) and Fq [y]/(F2 (y)) for Fqm . This is done by finding a root
a(y) Fq [y]/(F2 (y)) of the polynomial F1 (x). The function x 7 a(y) extends to a field
isomorphism from Fq [x]/(F1 (x)) to Fq [y]/(F2 (y)). The inverse to this isomorphism is
computed by linear algebra.
Exercise 2.14.13. Determine the complexity of the above algorithm to give an isomorphism between an arbitrary vector space representation of Fqm and a polynomial basis
for Fqm .
See Lenstra [376] for deterministic algorithms to solve this problem.
Random Sampling of Finite Fields
Let Fpm be represented as a vector space over Fp with basis {1 , . . . , m }. Generating
m integers a1 , . . . , am
an element g Fpm uniformly at random can be done by selecting
Pm
uniformly at random in the range 0 ai < p and taking g = i=1 ai i . Section 11.4
mentions some methods to get random integers modulo p from random bits.
To sample uniformly from Fpm one can use the above method, repeating the process
if ai = 0 for all 1 i m. This is much more efficient than choosing 0 a < pm 1
uniformly at random and computing g = a where is a primitive root.
2.15
Yet it seems to be impossible to correctly determine the order of every g Fq without knowing the
factorisation of q 1.
69
Exercise 2.15.1. Write pseudocode for the basic algorithm for determining the order of
g and determine the complexity.
The next subsection gives an algorithm (also used in other parts of the book) that
leads to an improvement of the basic algorithm.
2.15.1
We now explain how to compute sets of the form {g (q1)/l : l | (qQ 1)} efficiently. We
m
generalise the problem as follows. Let N1 , . . . , Nm N and N = i=1 Ni (typically the
integers Ni will be coprime, but it is not necessary to assume this). Let k = log2 (m)
and, for m < i 2k set Ni = 1. Let G be a group and g G (where g typically has order
N ). The goal is to efficiently compute
{g N/Ni : 1 i m}.
The naive approach (computing each term separately and not using any window methods
etc) requires at least
m
X
i=1
log(N/Ni ) = m log(N )
m
X
i=1
log(Ni ) = (m 1) log(N )
and
h1,2 = g N1 N2
in 2(log2 (N1 N2 ) + log2 (N3 N4 )) = 2 log2 (N ) operations. One can then compute the
result
N1 N2 N4
N1 N3 N4
N2 N3 N4
3
4
1
2
= hN
= hN
g N 1 N 2 N 3 = hN
= hN
1,2 , g
1,2 , g
1,1 , g
1,1 .
This final step requires at most 2(log2 (N3 ) + log2 (N4 ) + log2 (N1 ) + log2 (N2 )) = 2 log2 (N )
operations. The total complexity is at most 4 log2 (N ) operations in the group.
The algorithm has a compact recursive description. Let F be the function that on
input (g, m, N1 , . . . , Nm ) outputs the list of m values g N/Ni for 1 i m where N =
N1 Nm . Then F (g, 1, N1 ) = g. For m > 1 one computes F (g, m, N1 , . . . , Nm ) as follows: Let l = m/2 and let h1 = g N1 Nl and h2 = g Nl+1 Nm . Then F (g, m, N1 , . . . , Nm )
is equal to the concatenation of F (h1 , (m l), Nl+1 , . . . , Nm ) and F (h2 , l, N1 , . . . , Nl ).
We introduce some notation to express the algorithm in a non-recursive format.
Definition 2.15.3. Define S = {1, 2, 3, . . . , 2k }. For 1 l k and 1 j 2l define
Sl,j = {i S : (j 1)2kl + 1 i j2kl }
Lemma 2.15.4. Let 1 l k and 1 j 2l . The sets Sl,j satisfy:
1. #Sl,j = 2kl ;
70
jSSl,j
Ni
hl,j = hl1,j1l1,j1
Sl,j
Ni
6:
7:
8:
9:
Sl,j
Ni
hl,j = hl1,j1l1,j1
end for
end for
return {hk,1 , . . . , hk,m }
Lemma 2.15.7. Algorithm 4 is correct and requires 2log2 (m) log(N ) group operations.
Proof: Almost everything is left as an exercise. The important observation is that lines
4 to 7 involve raising to the power Ni for all i S. Hence the cost for each iteration of
P2k
the loop in line 3 is at most 2 i=1 log2 (Ni ) = 2 log2 (N ).
This method works efficiently in all cases (i.e., it doesnt require m to be large).
However, Exercise 2.15.8 shows that for small values of m there may be more efficient
solutions.
Exercise 2.15.8. Let N = N1 N2 N3 where Ni N 1/3 for 1 i 3. One can compute
g N/Ni for 1 i 3 using Algorithm 4 or in the naive way. Suppose one uses the
standard square-and-multiply method for exponentiation and assume that each of N1 , N2
and N3 has Hamming weight about half their bit-length.
Note that the exponentiations in the naive solution are all with respect to the fixed
j
base g. A simple optimisation is therefore to precompute all g 2 for 1 j log2 (N 2/3 ).
Determine the number of group operations for each algorithm if this optimisation is
performed. Which is better?
71
Remark 2.15.9. Sutherland gives an improved algorithm (which he calls the snowball
algorithm) as Algorithm 7.4 of [592]. Proposition 7.3 of [592] states that the complexity
is
O(log(N ) log(m)/ log(log(m)))
(2.7)
group operations.
2.15.2
We can now return to the original problem of computing the order of an element in a
finite field.
Qm
Theorem 2.15.10. Let g Fq and assume that the factorisation q 1 = i=1 liei is
known. Then one can determine the order of g in O(log(q) log log(q)) multiplications in
Fq .
ei
Proof: The idea is to use Algorithm 4 to compute all hi = g (q1)/li . Since m = O(log(q))
lf
2.15.3
Recall that Fq is a cyclic group and that a primitive root in Fq is an element of order
q 1. We assume in this section that the factorisation of q 1 is known.
One algorithm to generate primitive roots is to choose g Fq uniformly at random
and to compute the order of g using the method of Theorem 2.15.10 until an element
of order q 1 is found. The probability that a random g Fq is a primitive root is
(q 1)/(q 1). Using Theorem A.3.1 this probability is at least 1/(6 log(log(q))). Hence
this gives an algorithm that requires O(log(q)(log(log(q)))2 ) field multiplications in Fq .
We now present a better algorithm for this problem, which works by considering the
prime powers dividing q 1 individually. See Exercise 11.2 of Section 11.1 of [552] for
further details.
Theorem 2.15.11. Algorithm 5 outputs a primitive root. The complexity of the algorithm
is O(log(q) log log(q)) multiplications in Fq .
l
ei 1
ei 1
72
2.16
We have seen that one can evaluate a univariate polynomial at a field element efficiently
using Horners rule. For some applications, for example the attack on small CRT exponents for RSA in Section 24.5.2, one must evaluate a fixed polynomial repeatedly at lots
of field elements. Naively repeating Horners rule n times would give a total cost of n2
multiplications. This section shows one can solve this problem more efficiently than the
naive method.
Theorem 2.16.1. Let F (x) k[x] have degree n and let x1 , . . . , xn k. Then one can
compute {F (x1 ), . . . , F (xn )} in O(M (n) log(n)) field operations. The storage requirement
is O(n log(n)) elements of k .
Proof: (Sketch) Let t = log2 (n) and set xi = 0 for n < i 2t . For 0 i t and
1 j 2ti define
j2i
Y
(x xk ).
Gi,j (x) =
k=(j1)2i +1
One computes the Gi,j (x) for i = 0, 1, . . . , t using the formula Gi,j (x) = Gi1,2j1 (x)Gi1,2j (x).
(This is essentially the same trick as Section 2.15.1.) For each i one needs to store n elements of k to represent all the polynomials Gi,j (x). Hence, the total storage is n log(n)
elements of k.
Once all the Gi,j (x) have been computed one defines, for 0 i t, 1 j 2ti the
polynomials Fi,j (x) = F (x) (mod Gi,j (x)). One computes Ft,0 (x) = F (x) (mod Gt,0 (x))
and then computes Fi,j (x) efficiently as Fi+1,(j+1)/2 (x) (mod Gi,j (x)) for i = t 1
downto 0. Note that F0,j (x) = F (x) (mod (x xj )) = F (xj ) as required.
One can show that the complexity is O(M (n) log(n)) operations in k. For details see
Theorem 4 of [607], Section 10.1 of [237] or Corollary 4.5.4 of [88].
Exercise 2.16.2. Show that Theorem 2.16.1 also holds when the field k is replaced by a
ring.
The inverse problem (namely, determining F (x) from the n pairs (xj , F (xj ))) can also
be solved in O(M (n) log(n)) field operations; see Section 10.2 of [237].
2.17
73
Pseudorandom Generation
Many of the above algorithms, and also many cryptographic systems, require generation of
random or pseudorandom numbers. The precise definitions for random and pseudorandom
are out of the scope of this book, as is a full discussion of methods to extract almost perfect
randomness from the environment and methods to generate pseudorandom sequences from
a short random seed.
There are pseudorandom number generators related to RSA (the Blum-Blum-Shub
generator) and discrete logarithms. Readers interest to learn more about this topic should
consult Chapter 5 of [415], Chapter 3 of [340], Chapter 30 of [16], or [395].
2.18
Summary
Table 2.18 gives a brief summary of the complexities for the algorithms discussed in this
chapter. The notation used in the table is n N, a, b Z, p is a prime, q is a prime
power and k is a field. Recall that M (m) is the number of bit operations to multiply
two m-bit integers (which is also the number of operations in k to multiply two degree-m
polynomials over a field k). Similarly, M (d, q) is the number of bit operations to multiply
two degree-d polynomials in Fq [x].
Table 2.18 gives the asymptotic complexity for the algorithms that are used in cryptographic applications (i.e., for integers of, say, at most 10,000 bits). Many of the algorithms
are randomised and so the complexity in those cases is the expected complexity. The
reader is warned that the best possible asymptotic complexity may be different: sometimes it is sufficient to replace M (m) by m log(m) log(log(m)) to get the best complexity,
but in other cases (such as constructing a polynomial basis for Fqm ) there are totally
different methods that have better asymptotic complexity. In cryptographic applications
M (m) typically behaves as M (m) = O(m2 ) or M (m) = O(m1.585 ).
The words k-operations includes additions, multiplications and inversions in k. If
inversions in k are not required in the algorithm then we say k multiplications.
74
Table 2.1: Expected complexity of basic algorithms for numbers of size relevant for cryptography and related applications. The symbol indicates that better asymptotic complexities are known.
Computational problem
Multiplication of m-bit integers, M (m)
Compute a/n, a (mod n)
p
Compute |a|
Extended gcd(a, b) where a and b are m-bit integers
a
), |a| < n
Legendre/Jacobi symbol ( n
Multiplication modulo n
Inversion modulo n
Compute g m (mod n)
Compute square roots in Fq (q odd)
Multiplication of two degree d polys in k[x]
Multiplication of two degree d polys in Fq [x], M (d, q)
Inversion in k[x]/(F (x)) where deg(F (x)) = d
Multiplication in Fqm
Evaluate degree d polynomial at k
Find all roots in Fq of a degree d polynomial in Fq [x]
Find one root in Fq of a degree d polynomial in Fq [x]
Determine if degree d poly over Fq is irreducible
Factor degree d polynomial over Fq
Construct polynomial basis for Fqm
Construct normal basis for Fqm given a poly basis
Solve quadratic equations in Fq
Compute the minimal poly over Fq of Fqm
Compute an isomorphism between repns of Fqm
Compute order of Fq given factorisation of q 1
Compute primitive root in Fq given factorisation of q 1
Compute f (j ) k for f k[x] of degree n
and 1 , . . . , n k
74
Table 2.1: Expected complexity of basic algorithms for numbers of size relevant for cryptography and related applications. The symbol indicates that better asymptotic complexities are known.
Computational problem
Multiplication of m-bit integers, M (m)
Compute a/n, a (mod n)
p
Compute |a|
Extended gcd(a, b) where a and b are m-bit integers
a
), |a| < n
Legendre/Jacobi symbol ( n
Multiplication modulo n
Inversion modulo n
Compute g m (mod n)
Compute square roots in Fq (q odd)
Multiplication of two degree d polys in k[x]
Multiplication of two degree d polys in Fq [x], M (d, q)
Inversion in k[x]/(F (x)) where deg(F (x)) = d
Multiplication in Fqm
Evaluate degree d polynomial at k
Find all roots in Fq of a degree d polynomial in Fq [x]
Find one root in Fq of a degree d polynomial in Fq [x]
Determine if degree d poly over Fq is irreducible
Factor degree d polynomial over Fq
Construct polynomial basis for Fqm
Construct normal basis for Fqm given a poly basis
Solve quadratic equations in Fq
Compute the minimal poly over Fq of Fqm
Compute an isomorphism between repns of Fqm
Compute order of Fq given factorisation of q 1
Compute primitive root in Fq given factorisation of q 1
Compute f (j ) k for f k[x] of degree n
and 1 , . . . , n k
Chapter 3
3.1
76
In general one expects that for any y {0, 1}l there are infinitely many bitstrings x
such that H(x) = y. Hence, all the above problems will have many solutions.
To obtain a meaningful definition for collision-resistance it is necessary to consider
hash families rather than hash functions. The problem is that an efficient adversary for
collision-resistance against a fixed hash function H is only required to output a pair {x, x }
of messages. As long as such a collision exists then there exists an efficient algorithm that
outputs one (namely, an algorithm that has the values x and x hard-coded into it). Note
that there is an important distinction here between the running time of the algorithm
and the running time of the programmer (who is obliged to compute the collision as part
of their task). A full discussion of this issue is given by Rogaway [497]; also see Section
4.6.1 of Katz and Lindell [331].
Intuitively, if one can compute preimages then one can compute second-preimages
(though some care is needed here to be certain that the value x output by a pre-image
oracle is not just x again; Note 9.20 of Menezes, van Oorschot and Vanstone [415] gives
an artificial hash function that is second-preimage-resistant but not preimage-resistant).
Similarly, if one can compute second-preimages then one can find collisions. Hence, in
practice we prefer to study hash families that offer collision-resistance. For more details
about these relations see Section 4.6.2 of [331], Section 4.2 of [588] or Section 10.3 of [568].
Another requirement of hash families is that they be entropy smoothing: If G is
a sufficiently large finite set (i.e., #G 2l ) with a sufficiently nice distribution
l
on
P it (but not necessarily uniform) then the distribution on {0, 1} given by Pr(y) =
xG:H(x)=y Pr(x) is close to uniform. We do not make this notion precise, but refer
to Section 6.9 of Shoup [552].
In Section 23.2 we will need the following security notion for hash families (which is
just a re-statement of second-preimage resistance).
Definition 3.1.2. Let X and Y be finite sets. A hash family {Hk : X Y : k K} is
called target-collision-resistant if there is no polynomial-time adversary A with nonnegligible advantage in the following game: A receives x X and a key k K, both chosen
uniformly at random, then outputs an x X such that x 6= x and Hk (x ) = Hk (x).
For more details about target-collision-resistant hash families we refer to Section 5 of
Cramer and Shoup [160].
3.2
Birthday Attack
Computing pre-images for a general hash function with l-bit output is expected to take
approximately 2l computations of the hash algorithm, but one
can find collisions much
more efficiently. Indeed, one can find collisions in roughly 2l1 applications of the
hash function using a randomised algorithm as follows: Choose a subset D {0, 1}l
of distinguished points (e.g., those whose least significant bits are all zero, for some
0 < < l/4). Choose random starting values x0 {0, 1}l (Joux [314] suggests that these
should be distinguished points) and compute the sequence xn = H(xn1 ) for n = 1, 2, . . .
until xn D. Store (x0 , xn ) (i.e., the starting point and the ending distinguished point)
and repeat. When the same distinguished point x is found twice then, assuming the
starting points x0 and x0 are distinct, one can find a collision in the hash function by
computing the full sequences xi and xj and determining the smallest integers i and j such
that xi = xj and hence the collision is H(xi1 ) = H(xj1 ).
If we assume that values xi are close to uniformly distributed
in {0, 1}l then, by
p
the birthday paradox, one expects to have a collision after 2l /2 strings have been
77
encountered (i.e., that many evaluations of the hash function). The storage required is
expected to be
#D
2l1 l
2
pairs (x0 , xn ). For the choice of D as above, this would be about 2l/2 bitstrings of
storage. For many more details on this topic see Section 7.5 of Joux [314], Section 9.7.1
of Menezes, van Oorschot and Vanstone [415] or Section 3.2 of Vaudenay [612].
This approach also works if one wants to find collisions under some constraint on the
messages (for example, all messages have a fixed prefix or suffix).
3.3
Message authentication codes are a form of symmetric cryptography. The main purpose is
for a sender and receiver who share a secret key k to determine whether a communication
between them has been tampered with.
A message authentication code (MAC) is a set of functions {MACk (x) : k K}
such that MACk : {0, 1} {0, 1}l. Note that this is exactly the same definition as a hash
family. The difference between MACs and hash families lies in the security requirement;
in particular the security model for MACs assumes the adversary does not know the key
k. Informally, a MAC is secure against forgery if there is no efficient adversary that, given
pairs (xi , yi ) {0, 1} {0, 1}l such that yi = MACk (xi ) (for some fixed, but secret, key
k) for 1 i n, can output a pair (x, y) {0, 1} {0, 1}l such that y = MACk (x)
but (x, y) 6= (xi , yi ) for all 1 i n. For precise definitions and further details of
MACs see Section 4.3 of Katz and Lindell [331], Section 9.5 of Menezes, van Oorschot
and Vanstone [415], Section 6.7.2 of Shoup [552], Section 4.4 of Stinson [588] or Section
3.4 of Vaudenay [612].
There are well-known constructions of MACs from hash functions (such as HMAC,
see Section 4.7 of [331], Section 4.4.1 of [588] or Section 3.4.6 of [612]) and from block
ciphers (such as CBC-MAC, see Section 4.5 of [331], Section 4.4.2 of [588] or Section 3.4.4
of [612]).
3.4
There is a large literature on constructions of hash functions and it is beyond the scope of
this book to give the details. The basic process is to first define a compression function
(namely a function that takes bitstrings of a fixed length to bitstrings of some shorter
fixed length) and then to build a hash function on arbitrary length bitstrings by iterating
the compression function (e.g., using the Merkle-Damg
ard construction). We refer to
Chapter 4 of Katz and Lindell [331], Sections 9.3 and 9.4 of Menezes, van Oorschot and
Vanstone [415], Chapter 10 of Smart [568], Chapter 4 of Stinson [588] or Chapter 3 of
Vaudenay [612] for the details.
3.5
We briefly mention some compression functions and hash functions that are based on
algebraic groups and number theory. These schemes are not usually used in practice as
the computational overhead is usually much too high.
78
An early proposal for hashing based on number theory, due to Davies and Price, was to
use the function H(x) = x2 (mod N ) where N is an RSA modulus whose factorisation is
not known. Inverting such a function or finding collisions (apart from the trivial collisions
H(x) = H(x + yN ) for y Z) is as hard as factoring N . There are a number of papers
that build on this idea.
Another approach to hash functions based on factoring is to let N be an RSA modulus whose factorisation is unknown and let g (Z/N Z) be fixed. One can define the
compression function H : N (Z/N Z) by
H(x) = g x (mod N ).
Finding a collision H(x) = H(x ) is equivalent to finding a multiple of the order of g.
This is hard if factoring is hard, by Exercise 24.1.20. Finding pre-images is the discrete
logarithm problem modulo N , which is also as hard as factoring. Hence, we have a
collision-resistant compression function. More generally, fix g, h (Z/N Z) and consider
the compression function H : N N (Z/N Z) defined by H(x, y) = g x hy (mod N ).
A collision leads to either finding the order of g or h, or essentially finding the discrete
logarithm of h with respect to g, and all these problems are as hard as factoring.
One can also base hash functions on the discrete logarithm problem in any group
G. Let g, h G having order r. One can now consider the compression function H :
{0, . . . , r 1}2 G by H(x, y) = g x hy (mod p). It is necessary to fix the domain of
the function since H(x, y) = H(x + r, y) = H(x, y + r). If one can find collisions in this
function then one can compute the discrete logarithm of h to the base g. A reference for
this scheme is Chaum, van Heijst and Pfitzmann [129].
3.6
Hash functions usually output binary strings of some fixed length l. Some cryptosystems,
such as full domain hash RSA signatures, require hashing uniformly (or, at least, very
close to uniformly) to Z/N Z, where N is large.
Bellare and Rogaway gave two methods to do this (one in Section 6 of [38] and another
in Appendix A of [41]). We briefly recall the latter. The idea is to take some hash function
H with fixed length output and define a new function h(x) using a constant bitstring c
and a counter i as
h(x) = H(ck0kx) k H(ck1kx) k k H(ckikx).
For the RSA application one can construct a bitstring that is a small amount larger than
N and then reduce the resulting integer modulo N (as in Example 11.4.2).
These approaches have been critically analysed by Leurent and Nguyen [382]. They
give a number of results that demonstrate that care is needed in assessing the security
level of a hash function with full domain output.
3.7
The random oracle model is a tool for the security analysis of cryptographic systems. It is
a computational model that includes the standard model (i.e., the computational model
mentioned in Section 2.1) together with an oracle that computes a random function from
the set {0, 1} (i.e., binary strings of arbitrary finite length) to {0, 1} (i.e., bitstrings of
countably infinite length). Since the number of such functions is uncountable, care must
79
be taken when defining the word random. In any given application, one has a fixed
bit-length l in mind for the output, and one also can bound the length of the inputs.
Hence, one is considering functions H : {0, 1}n {0, 1}l and, since there are l2n such
functions we can define random to mean uniformly chosen from the set of all possible
functions. We stress that a random oracle is a function: if it is queried twice on the same
input then the output is the same.
A cryptosystem in the random oracle model is a cryptosystem where one or more
hash functions are replaced by oracle queries to the random function. A cryptosystem is
secure in the random oracle model if the cryptosystem in the random oracle model
is secure. This does not imply that the cryptosystem in the standard model is secure,
since there may be an attack that exploits some feature of the hash function. Indeed,
there are artificial cryptosystems that are proven secure in the random oracle model,
but that are insecure for any instantiation of the hash function (see Canetti, Goldreich
and Halevi [115]).
The random oracle model enables security proofs in several ways. We list three of
these ways, in increasing order of power.
1. It ensures that the output of H is truly random (rather than merely pseudorandom).
2. It allows the security proof to look inside the working of the adversary by learning
the values that are inputs to the hash function.
3. It allows the security proof to programme the hash function so that it outputs a
specific value at a crucial stage in the security game.
A classic example of a proof in the random oracle model is Theorem 20.4.11. An extensive
discussion of the random oracle model is given in Section 13.1 of Katz and Lindell [331].
Since a general function from {0, 1}n to {0, 1}l cannot be represented more compactly
than by a table of values, a random oracle requires l2n bits to describe. It follows that
a random oracle cannot be implemented in polynomial space. However, the crucial observation that is used in security proofs is that a random oracle can be simulated in
polynomial-time and space (assuming only polynomially many queries to the oracle are
made) by creating, on-the-fly, a table giving the pairs (x, y) such that H(x) = y.
Chapter 4
Preliminary Remarks on
Algebraic Groups
This is a chapter from version 1.1 of the book Mathematics of Public Key Cryptography
by Steven Galbraith, available from http://www.isg.rhul.ac.uk/sdg/crypto-book/ The
copyright for this chapter is held by Steven Galbraith.
This book is now completed and an edited version of it will be published by Cambridge
University Press in early 2012. Some of the Theorem/Lemma/Exercise numbers may be
different in the published version.
Please send an email to S.Galbraith@math.auckland.ac.nz if you find any mistakes.
All feedback on the book is very welcome and will be acknowledged.
For efficient public key cryptography based on discrete logarithms one would like to
have groups for which computing g n is as fast as possible, the representation of group
elements is as small as possible, and for which the DLP (see Definition 2.1.1 or 13.0.2) is
(at least conjecturally) as hard as possible.
If g is a group element of order r then one needs at least log2 (r) bits to represent
an arbitrary element of hgi. This optimal size can be achieved by using the exponent
representation, i.e., represent g a as a Z/rZ. However, the DLP is not hard when this
representation is used.
Ideally, for any cyclic group G of order r, one would like to be able to represent
arbitrary group elements (in a manner which does not then render the DLP trivial) using
roughly log2 (r) bits. This can be done in some cases (e.g., elliptic curves over finite fields
with a prime number of points) but it is unlikely that it can always be done. Using
algebraic groups over finite fields is a good way to achieve these conflicting objectives.
4.1
The subject of algebraic groups is large and has an extensive literature. Instead of presenting the full theory, in this book we present only the algebraic groups that are currently
believed to be useful in public key cryptography. Informally1 , an algebraic group over
a field k is a group such that:
Group elements are specified as n-tuples of elements in a field k;
1 We
refrain from giving a formal definition of algebraic groups; mainly as it requires defining products
of projective varieties.
83
84
4.2
85
The simplest examples of algebraic groups are the additive group Ga and multiplicative group Gm of a field k. For Ga (k) the set of points is k and the group operation
is given by the polynomial mult(x, y) = x + y (for computing the group operation) and
inverse(x) = x (for computing inverses). For Gm (k) the set of points is k = k{0} and
the group operation is given by the polynomial mult(x, y) = xy and the rational function
inverse(x) = 1/x (Example 5.1.5 shows how to express Gm (k) as an algebraic set).
The additive group is useless for cryptography since the discrete logarithm problem
is easy. The discrete logarithm problem is also easy for the multiplicative group over
certain fields (e.g., if g R then the discrete logarithm problem in hgi R is easy due
to algorithms that compute approximations to the natural logarithm function). However,
Gm (Fq ) is useful for cryptography and will be one of the main examples used in this book.
The other main examples of algebraic groups in public key cryptography are algebraic
tori (see Chapter 6), elliptic curves and divisor class groups of hyperelliptic curves.
4.3
Quotients of algebraic groups are used to reduce the storage and communication requirements of public key cryptosystems. Let G be a group with an automorphism such
that n = 1 (where 1 : G G is the identity map and n is the n-fold composition
). We define 0 = 1. Define the orbit or equivalence class of g G under
to be [g] = { i (g) : 0 i < n}. Define the quotient as the set of orbits under . In
other words
G/ = {[g] : g G}.
We call G the covering group of a quotient G/. In general, the group structure of
G does not induce a group structure on the quotient G/. Nevertheless, we can define
exponentiation on the quotient by [g]n = [g n ] for n Z. Since exponentiation is the
fundamental operation for many cryptographic applications it follows that quotients of
algebraic groups are sufficient for many cryptographic applications.
Lemma 4.3.1. Let n Z and [g] G/, then [g]n is well-defined.
Proof: Since is a group homomorphism we have i (g)n = i (g n ) and so for each
g1 [g] we have g1n [g n ].
The advantage of algebraic group quotients G/ is that they can require less storage
than the original algebraic group G. We now give an example of this.
Example 4.3.2. Let p be an odd prime. Consider the subgroup G Fp2 of order p + 1.
Note that gcd(p 1, p + 1) = 2 so G Fp = {1, 1}. If g G then we have g p+1 = 1,
which is equivalent to g p = g 1 . Let be the automorphism (g) = g p . Then 2 = 1 in
Fp2 and the orbits [g] in G/ all have size 2 except for [1] and [1].
The natural representation for elements of G Fp2 is a pair of elements of Fp . However, since #(G/) = 2 + (p 1)/2 one might expect to be able to represent elements of
G/ using just one element of Fp .
Let g G. Then the elements of [g] = {g, g p } are the roots of the equation x2 tx + 1
in Fp2 where t = g + g p Fp . Conversely, each t Fp such that the roots of x2 tx + 1
are Galois conjugates corresponds to a class [g] (the values t = 2 correspond to [1] and
[1]). Hence, one can represent an element of G/ by the trace t. We therefore require
half the storage compared with the standard representation of G Fp2 .
86
In Section 6.3.2 we show that, given the trace t of g, one can compute the trace tn of
g n efficiently using Lucas sequences (though there is a slight catch, namely that we have
to work with a pair (tn , tn1 ) of traces).
Another important example of an algebraic group quotient is elliptic curve arithmetic
using x-coordinates only. This is the quotient of an elliptic curve by the equivalence
relation P P .
4.4
k
M
G(Fpi )
(4.1)
i=1
(where now denotes the direct sum of groups). A problem is that this representation
for G(Z/N Z) does not satisfy the natural generalisation to rings of our informal definition
of an algebraic group. For example, group elements are not n-tuples over the ring, but
over a collection of different fields. Also the value n is no longer bounded.
The challenge is to find a representation for G(Z/N Z) that uses n-tuples over Z/N Z
and satisfies the other properties of the informal definition. Example 4.4.1 shows that
this holds for the additive and multiplicative groups.
Q
Example 4.4.1. Let N = i pi where the pi are distinct primes. Then, using the
definition in equation (4.1),
M
M
Fpi
Ga (Fpi )
Ga (Z/N Z)
= Z/N Z.
=
=
i
Similarly,
Gm (Z/N Z)
=
M
i
Gm (Fpi )
=
Fpi
= (Z/N Z) .
Hence, both groups can naturally be considered as algebraic groups over Z/N Z.
Note that Gm (Z/N Z) is not cyclic when N is square-free but not prime.
To deal with non-square-free N it is necessary to define G(Z/pn Z). The details of
this depend on the algebraic group. For Ga and Gm it is straightforward and we still
have Ga (Z/N Z) = Z/N Z and Gm (Z/N Z) = (Z/N Z) . For other groups it can be more
complicated.
Chapter 5
Varieties
This is a chapter from version 1.1 of the book Mathematics of Public Key Cryptography by Steven Galbraith, available from http://www.isg.rhul.ac.uk/sdg/crypto-book/
The copyright for this chapter is held by Steven Galbraith.
This book is now completed and an edited version of it will be published by Cambridge
University Press in early 2012. Some of the Theorem/Lemma/Exercise numbers may be
different in the published version.
Please send an email to S.Galbraith@math.auckland.ac.nz if you find any mistakes.
All feedback on the book is very welcome and will be acknowledged.
The purpose of this chapter is to state some basic definitions and results from algebraic
geometry that are required for the main part of the book. In particular, we define algebraic
sets, irreducibility, function fields, rational maps and dimension. The chapter is not
intended as a self-contained introduction to algebraic geometry. We make the following
recommendations to the reader:
1. Readers who want a very elementary introduction to elliptic curves are advised
to consult one or more of Koblitz [345], Silverman-Tate [563], Washington [622],
Smart [568] or Stinson [588].
2. Readers who wish to learn algebraic geometry properly should first read a basic
text such as Reid [494] or Fulton [215]. They can then skim this chapter and
consult Stichtenoth [585], Moreno [436], Hartshorne [277], Lorenzini [391] or Shafarevich [539] for detailed proofs and discussion.
5.1
Let k be a perfect field contained in a fixed algebraic closure k. All algebraic extensions
k /k are implicitly assumed to be subfields of k. We use the notation k[x] = k[x1 , . . . , xn ]
(in later sections we also use k[x] = k[x0 , . . . , xn ]). When n = 2 or 3 we often write k[x, y]
or k[x, y, z].
Define affine n-space over k as An (k) = kn . We call A1 (k) the affine line and A2 (k)
the affine plane over k. If k k then we have the natural inclusion An (k) An (k ).
We write An for An (k) and so An (k) An .
Definition 5.1.1. Let S k[x]. Define
V (S) = {P An (k) : f (P ) = 0 for all f S}.
87
88
CHAPTER 5. VARIETIES
4. V (f g) = V (f ) V (g).
89
Exercise 5.1.10. Show that V (S)(k) = An (k) does not necessarily imply that S = {0}.
The following result assumes a knowledge of Galois theory. See Section A.7 for background.
Lemma 5.1.11. Let X = V (S) be an algebraic set with S k[x] (i.e., X is a k-algebraic
set). Let k be an algebraic extension of k. Let Gal(k/k ). For P = (P1 , . . . , Pn )
define (P ) = ((P1 ), . . . , (Pn )).
1. If P X(k) then (P ) X(k).
3. If X Y then Ik (Y ) Ik (X).
4. Ik (X Y ) = Ik (X) Ik (Y ).
8. Ik () = k[x].
90
CHAPTER 5. VARIETIES
Warning: Here k[X] does not denote polynomials in the variable X. Hartshorne and
Fulton write A(X) and (X) respectively for the affine coordinate ring.
Exercise 5.1.19. Prove that k[X] is a commutative ring with an identity.
Note that k[X] is isomorphic to the ring of all functions f : X k given by polynomials defined over k.
Hilberts Nullstellensatz is a powerful tool for understanding Ik (X) and it has several
other applications (for example, we use it in Section 7.5). We follow the presentation of
Fulton [215]. Note that it is necessary to work over k.
Theorem 5.1.20. (Weak Nullstellensatz) Let X An be an affine algebraic set defined
over k and let m be a maximal ideal of the affine coordinate ring k[X]. Then V (m) = {P }
for some P = (P1 , . . . , Pn ) X(k) and m = (x1 P1 , . . . , xn Pn ).
Proof:
Consider the field F = k[X]/m, which contains k. Note that F is finitely
generated as a ring over k by the images of x1 , . . . , xn . By Theorem A.6.2, F is an
algebraic extension of k and so F = k.
It follows that, for 1 i n, there is some Pi k such that xi Pi m. Hence,
n = (x1 P1 , . . . , xn Pn ) m and, since k[X]/n = k it follows that m = n.
Finally, it is clear that P V (m) and if Q = (Q1 , . . . , Qn ) X(k) {P } then there
is some 1 i n such that Qi 6= Pi and so (x Pi )(Qi ) 6= 0. Hence V (m) = {P }.
Corollary 5.1.21. If I is a proper ideal in k[x1 , . . . , xn ] then V (I) 6= .
Write y = 1/xn+1 and let N = 1 + max1im+1 {degxn+1 (ai )}. One has
y N = bm+1 (x1 , . . . , xn , y)(G y) +
m
X
i=1
91
Corollary 5.1.23. Let f (x, y) k[x, y] be irreducible over k and let X = V (f (x, y))
A2 (k). Then I(X) = (f (x, y)), i.e., the ideal over k[x, y] generated by f (x, y).
Proof: By Theorem 5.1.22 we have I(X) = radk ((f (x, y))). Since k[x, y] is a unique
factorisation domain and f (x, y) is irreducible, then f (x, y) is prime. So g(x, y)
radk ((f (x, y))) implies g(x, y)n = f (x, y)h(x, y) for some h(x, y) k[x, y] which implies
f (x, y) | g(x, y) and g(x, y) (f (x, y)).
Theorem 5.1.24. Every affine algebraic set X is the intersection of a finite number of
hypersurfaces.
Proof: By Hilberts basis theorem (Corollary A.9.4) k[x] is Noetherian. Hence Ik (X) =
(f1 , . . . , fm ) and X = V (f1 ) V (fm ).
5.2
Studying affine algebraic sets is not sufficient for our applications. In particular, the set
of affine points of the Weierstrass equation of an elliptic curve (see Section 7.2) does not
form a group. Projective geometry is a way to complete the picture by adding certain
points at infinity.
For example, consider the hyperbola xy = 1 in A2 (R). Projective geometry allows an
interpretation of the behaviour of the curve at x = 0 or y = 0; see Example 5.2.7.
Definition 5.2.1. Projective space over k of dimension n is
Pn (k) = {lines through (0, . . . , 0) in An+1 (k)}.
A convenient way to represent points of Pn (k) is using homogeneous coordinates: Let
a0 , a1 , . . . , an k with not all aj = 0 and define (a0 : a1 : : an ) to be the equivalence
class of (n + 1)-tuples under the equivalence relation
(a0 , a1 , , an ) (a0 , a1 , , an )
for any k . Thus Pn (k) = {(a0 : : an ) : ai k for 0 i n and ai 6=
0 for some 0 i n}. Write Pn = Pn (k).
In other words, the equivalence class (a0 : : an ) is the set of points on the line
between (0, . . . , 0) and (a0 , . . . , an ) with the point (0, . . . , 0) removed.
There is a map : An Pn given by (x1 , . . . , xn ) = (x1 : : xn : 1). Hence An is
identified with a subset of Pn .
Example 5.2.2. The projective line P1 (k) is in one-to-one correspondence with A1 (k)
{} since P1 (k) = {(a0 : 1) : a0 k} {(1 : 0)}. The projective plane P2 (k) is in
one-to-one correspondence with A2 (k) P1 (k).
Definition 5.2.3. A point P = (P0 : P1 : : Pn ) Pn (k) is defined over k if there
92
CHAPTER 5. VARIETIES
Lemma 5.2.5. A point P Pn (k) is defined over k if and only if (P ) = P for all
Gal(k/k).
Proof: Let P = (P0 : : Pn ) Pn (k) and suppose (P ) P for all Gal(k/k).
Then there is some : Gal(k/k) k such that (Pi ) = ()Pi for all 0 i n.
One can verify2 that is a 1-cocycle in k . It follows by Theorem A.7.2 (Hilbert 90)
that () = ()/ for some k . Hence, (Pi /) = Pi / for all 0 i n and all
Gal(k/k). Hence Pi / k for all 0 i n and the proof is complete.
Recall that if f is a homogeneous polynomial of degree d then f (x0 , . . . , xn ) =
d f (x0 , . . . , xn ) for all k and all (x0 , . . . , xn ) An+1 (k).
Definition 5.2.6. Let f k[x0 , . . . , xn ] be a homogeneous polynomial. A point P =
(x0 : : xn ) Pn (k) is a zero of f if f (x0 , . . . , xn ) = 0 for some (hence, every) point
(x0 , . . . , xn ) in the equivalence class (x0 : : xn ). We therefore write f (P ) = 0. Let S
be a set of polynomials and define
V (S) = {P Pn (k) : P is a zero of f (x) for all homogeneous f (x) S}.
A projective algebraic set is a set X = V (S) Pn (k) for some S k[x]. Such a set
is also called a k-algebraic set. For X = V (S) and k an algebraic extension of k define
X(k ) = {P Pn (k ) : f (P ) = 0 for all homogeneous f (x) S}.
Example 5.2.7. The hyperbola y = 1/x can be described as the affine algebraic set
X = V (xy 1) A2 over R. One can consider the corresponding projective algebraic set
V (xy z 2 ) P2 over R whose points consist the points of X together with the points
(1 : 0 : 0) and (0 : 1 : 0). These two points correspond to the asymptotes x = 0 and y = 0
of the hyperbola and they essentially tie together the disconnected components of the
affine curve to make a single closed curve in projective space.
Exercise 5.2.8. Describe the sets V (x2 +y 2 z 2 )(R) P2 (R) and V (yzx2 )(R) P2 (R).
A set of homogeneous polynomials does not in general form an ideal as one cannot
simultaneously have closure under multiplication and addition. Hence it is necessary to
introduce the following definition.
Definition 5.2.11. A k[x0 , . . . , xn ]-ideal I k[x0 , . . .P
, xn ] is a homogeneous ideal if
for every f I with homogeneous decomposition f = i fi we have fi I.
Define (S) to
Exercise 5.2.12. Let S k[x] be a set of homogeneous polynomials.
Pn
be the k[x]-ideal generated by S in the usual way, i.e., (S) = { j=1 fj (x)sj (x) : n
N, fj (x) k[x0 , . . . , xn ], sj (x) S}. Prove that (S) is a homogeneous ideal. Prove that
if I is a homogeneous ideal then I = (S) for some set of homogeneous polynomials S.
Definition 5.2.13. For any set X Pn (k) define
Ik (X) = {f k[x0 , . . . , xn ] : f is homogeneous and f (P ) = 0 for all P X} .
We stress that Ik (X) is not the stated set of homogeneous polynomials but the ideal
generated by them. We write I(X) = Ik (X).
An algebraic set X Pn is defined over k if I(X) can be generated by homogeneous
polynomials in k[x].
2 At least, one can verify the formula ( ) = (( ))(). The topological condition also holds, but
we do not discuss this.
93
1 1 0
L= 0 1 0
0 0 1
94
CHAPTER 5. VARIETIES
Exercise 5.2.23. Show that if P, Q Pn (k) then there is always a linear change of
variables L on Pn over k such that L(P ), L(Q) Un .
We already mentioned the map : An Pn given by (x1 , . . . , xn ) = (x1 : : xn :
1), which has image equal to Un . A useful way to study a projective algebraic set X is
to consider X Ui for 0 i n and interpret X Ui as an affine algebraic set. We now
introduce the notation for this.
Definition 5.2.24. Let i : An (k) Ui be the one-to-one correspondence
i (y1 , . . . , yn ) = (y1 : : yi : 1 : yi+1 : : yn ).
We write for n . Let
1
i (x0 : : xn ) = (x0 /xi , . . . , xi1 /xi , xi+1 /xi , . . . , xn /xi ).
1
be the map 1
: Pn (k) An (k), which is defined only on Ui (i.e., 1
i
i (X) = i (X
3
Ui )).
We write X An as an abbreviation for 1
n (X Un ).
Let i : k[x0 , . . . , xn ] k[y1 , . . . , yn ] be the de-homogenisation map4
1
(f )(x0 , . . . , xn ) = xi
i
95
n
Definition
5.2.29. Let X A (k). Define the projective closure of X to be X =
V I(X) Pn .
Proof: Let 0 i 2 be such that f (x0 , x1 , x2 ) has a monomial that does not feature xi
(such an i must exist since f is irreducible). Without loss of generality, suppose i = 2.
Write g(y1 , y2 ) = (f ) = f (y1 , y2 , 1). By part 6 of Lemma 5.2.25 the homogenisation of
g is f .
Let Y = X A2 = V (g). Note that g is k-irreducible (since g = g1 g2 implies, by taking
homogenisation, f = g1 g2 ). Let h Ik (X) then h Ik (Y ) and so, by Corollary 5.1.23,
h (g). In other words, there is some h1 (y1 , y2 ) such that h = gh1 . Taking
homogenisations gives f h1 | h and so h (f ).
96
5.3
CHAPTER 5. VARIETIES
Irreducibility
5.3. IRREDUCIBILITY
97
Hence, it is clear that an algebraic set defined over k is a k-algebraic set. The converse
does not hold in general. However, if X is absolutely irreducible and k is a perfect field
then these notions are equivalent (see Corollary 10.2.2 of Fried and Jarden [213] and use
the fact that when X is absolutely irreducible then the algebraic closure of k in k(X) is
k). Note that Corollary 5.1.23 proves a special case of this result.
The next few results use the notation of Definitions 5.2.24, 5.2.27 and 5.2.29.
Corollary 5.3.8. Let X An be a variety. Then X is geometrically irreducible. Let
X Pn be a variety. Then X An is geometrically irreducible.
Proof:
The case where X is empty is trivial so suppose X 6= . By Lemma 5.2.30,
I X = I(X). Hence, if g, h k[x0 , . . . , xn ] are such that gh I X then (g )(h) =
(gh) I(X) by part 4 of Lemma 5.2.25. Theorem 5.3.4 implies I(X)
is a prime ideal
and so either g or h is in I(X). Hence either g or h is in I X .
For the converse, suppose X An 6= . If gh I(X An ) then gh = gh I(X An )
I(X). Hence g or h is in I(X) and so, by part 4 of Lemma 5.2.25, g or h is in I(X An ).
Theorem 5.3.9. Let X Pn be an algebraic set such that X An 6= . Then X An
X. If X is a variety then X An = X.
Proof: If f I(X) then f I(X An ) and so f I(X An ). Hence, f
I(X An ). In other words, I(X) I X An and so X An X.
Let X1 = X An X and X2 = X V (x0 ). Then X = X1 X2 and so, if X is
irreducible and X An 6= then X = X1 .
Theorem 5.3.10. Let X = V (f (x, y)) A2 or X = V (f (x, y, z)) P2 . Then X is
geometrically irreducible if and only if f is irreducible over k.
Proof: If f = gh is a non-trivial factorisation of f then X = V (f ) = V (g) V (h) is
reducible. Hence, if X is geometrically irreducible then f is k-irreducible.
Conversely, by Corollary 5.1.23 (respectively, Theorem 5.2.31) we have Ik (V (f )) = (f ).
Since f is irreducible it follows that (f ) is a prime ideal and so X is irreducible.
Example 5.3.11. It is necessary to work over k for Theorem 5.3.10. For example, let
f (x, y) = y 2 + x2 (x 1)2 . Then V (f (x, y)) A2 (R) consists of two points and so is
reducible, even though f (x, y) is R-irreducible.
Lemma 5.3.12. Let X be a variety and U X a non-empty set. If U is open (in the
Zariski topology) in X then U is dense in X (i.e., the topological closure of U in X in
the Zariski topology is X).
Proof: Let X1 be the closure of U in X and X2 = X U . Then X = X1 X2 and
X1 , X2 are closed sets. Since X is irreducible and X2 6= X it follows that X1 = X.
Lemma 5.3.13. Let X be a variety and U a non-empty open subset of X. Then Ik (U ) =
Ik (X).
Proof: Since U X we have Ik (X) Ik (U ). Now let f Ik (U ). Then U V (f ) X.
Write X1 = V (f ) X, which is an algebraic set, and X2 = X U , which is also an
algebraic set. Then X = X1 X2 and, since X is irreducible and X2 6= X, X = X1 . In
other words, f Ik (X).
Exercise 5.3.14. Let X be an irreducible variety. Prove that if U1 , U2 X are nonempty open sets then U1 U2 6= .
98
5.4
CHAPTER 5. VARIETIES
Function Fields
If X is a variety defined over k then Ik (X) is a prime ideal and so the affine or homogeneous
coordinate ring is an integral domain. One can therefore consider its field of fractions. If
X is affine then the field of fractions has a natural interpretation as a set of maps X k.
When X is projective then a ratio f /g of polynomials does not give a well-defined function
on X unless f and g are homogeneous of the same degree.
Definition 5.4.1. Let X be an affine variety defined over k. The function field k(X)
is the set
k(X) = {f1 /f2 : f1 , f2 k[X], f2 6 Ik (X)}
of classes under the equivalence relation f1 /f2 f3 /f4 if and only if f1 f4 f2 f3 Ik (X).
In other words, k(X) is the field of fractions of the affine coordinate ring k[X] over k.
Let X be a projective variety. The function field is
k(X) = {f1 /f2 : f1 , f2 k[X] homogeneous of the same degree, f2 6 Ik (X)}
with the equivalence relation f1 /f2 f3 /f4 if and only if f1 f4 f2 f3 Ik (X).
Elements of k(X) are called rational functions. For a k the rational function
f : X k given by f (P ) = a is called a constant function.
Exercise 5.4.2. Prove that the field of fractions of an integral domain is a field. Hence,
deduce that if X is an affine variety then k(X) is a field. Prove also that if X is a
projective variety then k(X) is a field.
We stress that, when X is projective, k(X) is not the field of fractions of k[X] and
that k[X] 6 k(X). Also note that elements of the function field are not functions X k
but maps X k (i.e., they are not necessarily defined everywhere).
Example 5.4.3. One has k(A2 )
= k(x, y) and k(P2 )
= k(x, y).
Definition 5.4.4. Let X be a variety and let f1 , f2 k[X]. Then f1 /f2 is defined or
regular at P if f2 (P ) 6= 0. An equivalence class f k(X) is regular at P if it contains
some f1 /f2 with f1 , f2 k[X] (if X is projective then necessarily deg(f1 ) = deg(f2 )) such
that f1 /f2 is regular at P .
Note that there may be many choices of representative for the equivalence class of f ,
and only some of them may be defined at P .
Example 5.4.5. Let k be a field of characteristic not equal to 2. Let X be the algebraic
set V (y 2 x(x 1)(x + 1)) A2 (k). Consider the functions
f1 =
x(x 1)
y
and
f2 =
y
.
x+1
One can check that f1 is equivalent to f2 . Note that f1 is not defined at (0, 0), (1, 0) or
(1, 0) while f2 is defined at (0, 0) and (1, 0) but not at (1, 0). The equivalence class of
f1 is therefore regular at (0, 0) and (1, 0). Section 7.3 gives techniques to deal with these
issues for curves, from which one can deduce that no function in the equivalence class of
f1 is defined at (1, 0).
Exercise 5.4.6. Let X be a variety over k. Suppose f1 /f2 and f3 /f4 are equivalent
functions on X that are both defined at P X(k). Show that (f1 /f2 )(P ) = (f3 /f4 )(P ).
99
100
CHAPTER 5. VARIETIES
over k and if k is a finite Galois extension of k then Ik (X) is an induced Galois module
(see page 110 of Serre [538]) for Gal(k /k). It follows from Section VII.1 of [538] that the
Galois cohomology group H 1 (Gal(k /k), Ik (X)) is trivial and hence, by Section X.3 of
[538], that H 1 (G, Ik (X)) = 0. One can therefore deduce, as in Exercise 1.12(a) of [560],
that k[X]G = k[X].
To show that k(X)G = k(X) let (f0 : f1 ) : X P1 and let G. Then (f0 ) =
f0 + G0, and (f1 ) = f1 + G1, where k and G0, , G1, Ik (X). One shows
first that H 1 (G, k ), which is trivial by Hilbert 90, and so = ()/ for some
k. Replacing f0 by f0 and f1 by f1 gives = 1 and one can proceed to showing
that G0, , G1, H 1 (G, Ik (X)) = 0 as above. The result follows.
For a different approach see Theorem 7.8.3 and Remark 8.4.11 below, or Corollary 2
of Section VI.5 (page 178) of Lang [361].
5.5
Definition 5.5.1. Let X be an affine or projective variety over a field k and Y an affine
variety in An over k. Let 1 , . . . , n k(X). A map : X An of the form
(P ) = (1 (P ), . . . , n (P ))
(5.1)
(5.2)
is regular at a point P X(k) if there is some function g k(X) such that all gi , for
0 i n, are regular at P and, for some 0 i n, one has (gi )(P ) 6= 0.6 A rational
map : X Y defined over k is a map of the form (5.2) such that, for all P X(k) for
which is regular at P , then (P ) Y (k).
We stress that a rational map is not necessarily defined at every point of the domain.
In other words, it is not necessarily a function.
Exercise 5.5.2. Let X and Y be projective varieties. Show that one can write a rational
map in the form (P ) = (0 (P ) : : n (P )) where the i (x) k[x] are all homogeneous
polynomials of the same degree, not all i (x) Ik (X), and for every f Ik (Y ) we have
f (0 (x), . . . , n (x)) Ik (X).
Example 5.5.3. Let X = V (x y) A2 and Y = V (x z) P2 . Then
(x, y) = (x : xy : y)
is a rational map from X to Y . Note that this formula for is not defined at (0, 0).
However, is regular at (0, 0) since taking g = x1 gives the equivalent form (x, y) =
(x1 x : x1 xy : x1 y) = (1 : y : y/x) and y/x 1 in k(X). Also note that the image of
is not equal to Y (k) as it misses the point (0 : 1 : 0).
Similarly, (x : y : z) = (x/y, z/y) is a rational map from Y to X. This map is not
regular at (1 : 0 : 1) but it is surjective to X. The composition maps (x, y) to
(1/y, 1/x).
6 This
101
102
CHAPTER 5. VARIETIES
103
Note that if X and are defined over k then : k[Y ] O(U ) restricts to :
k[Y ] k(X). If is injective then one can extend it to get a homomorphism of the
field of fractions of k[Y ] to k(X).
Definition 5.5.23. Let X and Y be varieties over k and let : X Y be a dominant
rational map defined over k. Define the pullback : k(Y ) k(X) by (f ) = f .
We will now sketch a proof that is a k-algebra homomorphism. Recall that a
k-algebra homomorphism of fields is a field homomorphism that is the identity map on k.
Theorem 5.5.24. Let X and Y be varieties over k and let : X Y be a dominant
rational map defined over k. Then the pullback : k(Y ) k(X) is an injective k-algebra
homomorphism.
Proof: Without loss of generality we may assume that X and Y are affine. The rational
map is therefore given by (x) = (1 (x), . . . , n (x)). Let U X be an open set on
which is regular. Then : U Y is a morphism and we know : k[Y ] O(U ) is
a ring homomorphism by Lemma 5.5.20. The field of fractions of k[Y ] is k(Y ) and the
field of fractions of O(U ) is k(X). The natural extension of to : k(Y ) k(X) is
well-defined.
It immediately follows that is a ring homomorphism and that is the identity on k.
Hence, is a k-algebra homomorphism. Furthermore, is injective by Lemma 5.5.22.
Finally, since is defined over k it restricts to an injective homomorphism from k(Y ) to
k(X).
Example 5.5.25. Consider the rational maps from Example 5.5.16. The map (x, y) =
(x, x) is not dominant and does not induce a well-defined function from k(x, y) to k(x, y)
since, for example, (1/(x y)) = 1/(x x) = 1/0.
The map (x, y) = (x, xy) is dominant and (f (x, y)) = f (x, xy) is a field isomorphism.
Exercise 5.5.26. Let K1 , K2 be fields containing a field k. Let : K1 K2 be a
k-algebra homomorphism. Show that is injective.
Theorem 5.5.27. Let X and Y be varieties over k and let : k(Y ) k(X) be a kalgebra homomorphism. Then induces a dominant rational map : X Y defined
over k.
Proof: If Y is projective it suffices to construct a rational map to an affine part, say
Y An . Hence, we assume that Y An is affine and described by coordinates (y1 , . . . , yn ).
The homomorphism maps each yi to some i (x) k(X) for 1 i n. Define
: X An by
(P ) = (1 (P ), . . . , n (P )).
We now show that if P X(k) and if is regular at P then (P ) Y (k). Let f I(Y ).
Then
f ((P )) = f (1 (P ), . . . , n (P )) = f ((y1 )(P ), . . . , (yn )(P )).
Now, is a k-algebra homomorphism and f is a polynomial in k[y1 , . . . , yn ]. Hence
f ((y1 ), . . . , n (yn )) = (f (y1 , . . . , yn )).
Since f (y1 , . . . , yn ) I(Y ) it follows that f (y1 , . . . , yn ) = 0 in k(Y ) and so (f ) = 0. It
follows that f ((P )) = (f )(P ) = (0)(P ) = 0 for all f I(Y ) and so P Y (k) by part
5 of Proposition 5.1.16.
Finally, by Exercise 5.5.26, is injective. Also, equals and so is injective.
Hence, Lemma 5.5.22 implies that is dominant.
104
CHAPTER 5. VARIETIES
Theorem 5.5.28. Let X and Y be varieties over k. Then X and Y are birationally
equivalent over k if and only if k(X)
= k(Y ) (isomorphic as fields).
Proof:
Let : X Y and : Y X be the birational equivalence. First we must
deduce that and are dominating. There are subsets U X and V Y such that is
regular on U , is regular on V and is the identity on U (in other words, : U V
is an isomorphism). The maps : k[V ] O(U ) and : k[X] O(V ) therefore satisfy
(f ) = f ( ) = f (at least, they are equal on U 1 (V ), which can be shown to
be open) and so are injective. It follows from Lemma 5.5.22 that and are dominant.
Hence, induces a k-algebra homomorphism : k(Y ) k(X) and induces a
k-algebra homomorphism : k(X) k(Y ). Finally, induces a k-algebra homomorphism : k(X) k(X) that is the identity (since it is the identity on a dense
open set). It follows that and are isomorphisms.
For the converse, if : k(Y ) k(X) is an isomorphism then we associate a dominant
rational map : X Y to and : Y X to 1 . Since 1 is the identity it follows
that is the identity whenever it is regular.
Some authors prefer to study function fields rather than varieties, especially in the case
of dimension 1 (there are notable classical texts that take this point of view by Chevalley
and Deuring; see Stichtenoth [585] for a more recent version). By Theorem 5.5.28 (and
other results) the study of function fields up to isomorphism is the study of varieties up to
birational equivalence. A specific set of equations to describe a variety is called a model.
Definition 5.5.29. Let X and Y be varieties over k and let : X Y be a rational map over k given by (P ) = (1 (P ), . . . , n (P )) if Y is affine and (0 (P ) : :
n (P )) if Y is projective. Let Gal(k/k). Define () : X Y by ()(P ) =
((1 )(P ), . . . , (n (P ))) if Y is affine and ()(P ) = ((0 )(P ) : : (n (P ))) if Y is
projective. Many authors act by Galois on the right and so write the action as .
Lemma 5.5.30. Let X and Y be varieties over k and let : X Y be a rational map
over k. If () = for all Gal(k/k) then is defined over k.
Proof: If Y is affine then (P ) = (1 (P ), . . . , n (P )) where i k(X). If () = then
(i ) = i for all 1 i n. Remark 5.4.14 therefore implies that i k(X) for all i and
so is defined over k.
If Y is projective then (P ) = (0 (P ) : : n (P )) where i k(X) for 0 i n.
If (P ) = ()(P ) then, for all Gal(k/k), there is some xi() k such that
5.6
Dimension
The natural notion of dimension (a point has dimension 0, a line has dimension 1, a
plane has dimension 2, etc) generalises to algebraic varieties. There are algebraic and
topological ways to define dimension. We use an algebraic approach.8
We stress that the notion of dimension only applies to irreducible algebraic sets. For
example X = V (x, y) V (x 1) = V (x(x 1), y(x 1)) A2 is the union of a point and
a line so has components of different dimension.
Recall the notion of transcendence degree of an extension k(X) over k from Definition A.6.3.
8 See
105
Corollary 5.6.9. Let X and Y be affine varieties over k such that Y is a proper subset
of X. Then dim(Y ) < dim(X).
Proof: Since Y 6= X we have Ik (X) ( Ik (Y ) and both ideals are prime since X and Y
are irreducible. It follows that the Krull dimension of k[X] is at least one more than the
Krull dimension of k[Y ].
Exercise 5.6.10. Show that a proper closed subset of a variety of dimension 1 is finite.
5.7
106
CHAPTER 5. VARIETIES
Definition 5.7.3. Let X = V (S) An be an affine algebraic set over Fqm . Let be as
in Lemma 5.7.1. For each polynomial f (x1 , . . . , xn ) S Fqm [x1 , . . . , xn ] write
(f ) = f = f (y1,1 1 + + y1,m m , y2,1 1 + + y2,m m , . . . , yn,1 1 + + yn,m m )
(5.3)
as
f1 (y1,1 , . . . , yn,m )1 + f2 (y1,1 , . . . , yn,m )2 + + fm (y1,1 , . . . , yn,m )m
(5.4)
where each fj Fq [y1,1 , . . . , yn,m ]. Define S Fq [y1,1 , . . . , yn,m ] to be the set of all such
polynomials fj over all f S. The Weil restriction of scalars of X with respect to
Fqm /Fq is the affine algebraic set Y Amn defined by
Y = V (S ).
Example 5.7.4. Let p 3 (mod 4) and define Fp2 = Fp (i) where i2 = 1. Consider the
algebraic set X = V (x1 x2 1) A2 . The Weil restriction of scalars of X with respect to
Fp2 /Fp with basis {1, i} is
Y = V (y1,1 y2,1 y1,2 y2,2 1, y1,1 y2,2 + y1,2 y2,1 ) A4 .
Recall from Example 5.1.5 that X is an algebraic group. The multiplication operation
mult((x1 , x2 ), (x1 , x2 )) = (x1 x1 , x2 x2 ) on X corresponds to the operation
))
, y2,2
, y2,1
, y1,2
mult((y1,1 , y1,2 , y2,1 , y2,2 ), (y1,1
= (y1,1 y1,1
y1,2 y1,2
, y1,1 y1,2
+ y1,2 y1,1
, y2,1 y2,1
y2,2 y2,2
, y2,1 y2,2
+ y2,2 y2,1
)
on Y .
Exercise 5.7.5. Let p 3 (mod 4). Write down the Weil restriction of scalars of
X = V (x2 2i) A1 with respect to Fp2 /Fp .
Exercise 5.7.6. Let p 3 (mod 4). Write down the Weil restriction of scalars of
V (x21 + x22 (1 + 2i)) A2 with respect to Fp2 /Fp .
Theorem 5.7.7. Let X An be an affine algebraic set over Fqm . Let Y Amn be the
Weil restriction of X. Let k N be coprime to m. Then there is a bijection between
X(Fqmk ) and Y (Fqk ).
Proof: When gcd(k, m) = 1 it is easily checked that the map of Lemma 5.7.1 gives a
a one-to-one correspondence between Anm (Fqk ) and An (Fqmk ).
Now, let P = (x1 , . . . , xn ) X and write Q = (y1,1 . . . , yn,m ) for the corresponding
point in Amn . For any f S we have f (P ) = 0. Writing f1 , . . . , fm for the polynomials
in equation (5.4) we have
f1 (Q)1 + f2 (Q)2 + + fm (Q)m = 0.
Since {1 , . . . , m } is also a vector space basis for Fqmk over Fqk we have
f1 (Q) = f2 (Q) = = fm (Q) = 0.
Hence f (Q) = 0 for all f S and so Q Y . Similarly, if Q Y then fj (Q) = 0 for all
such fj and so f (P ) = 0 for all f S.
Note that, as the following example indicates, when k is not coprime to m then
X(Fqmk ) is not usually in one-to-one correspondence with Y (Fqk ).
107
Exercise 5.7.8. Consider the algebraic set X from Exercise 5.7.5. Show that X(Fp4 ) =
{1 + i, 1 i}. Let Y be the Weil restriction of X with respect to Fp2 /Fp . Show that
Y (Fp2 ) = {(1, 1), (1, 1), (i, i), (i, i)}.
Note that the Weil restriction of Pn with respect to Fqm /Fq is not the projective
closure of Amn . For example, considering the case n = 1, P1 has one point not contained
in A1 , whereas the projective closure of Am has an (m 1)-dimensional algebraic set of
points at infinity.
Exercise 5.7.9. Recall from Exercise 5.5.14 that there is a morphism from P1 to Y =
V (x2 + y 2 1) A2 . Determine the Weil restriction of scalars of Y with respect to
Fp2 /Fp . It makes sense to call this algebraic set the Weil restriction of P1 with respect to
Fp2 /Fp .
Chapter 6
6.1
xn 1 =
d|n,1dn
109
d (x).
110
n (x) =
Y
(xn/d 1)(d)
d|n
(d)
Y
Y n/d
j
x zn/d
d|n j=1
d|n
Y
Y n/d
d|n j=1
n
Y
i=1
x zndj
P
(x zni )
(d)
d|gcd(n,i)
(d)
P
Since d|n (d) is 0 when n > 1 and is 1 when n = 1 (Theorem 4.7 of [465]) the result
follows.
Exercise 6.1.3. Show that 1 (x) = x 1, 2 (x) = x + 1, 6 (x) = x2 x + 1 and
l (x) = xl1 + xl2 + + x + 1 if l is prime.
Exercise 6.1.4. Prove that if p | n then pn (x) = n (xp ) and that if p n then
pn (x) = n (xp )/n (x). Prove that if n is odd then 2n (x) = n (x).
[Hint: Use part 5 of Lemma 6.1.2.]
It is well-known that n (x) is irreducible over Q; we do not need this result so we
omit the proof.
Lemma 6.1.5. Let n N. The greatest common divisor of the polynomials (xn 1)/(xd
1) over all 1 d < n such that d | n is n (x).
Proof: Define I = {d N : 1 d Q
< n, d | n}. By part 3 of Lemma 6.1.2 we have
n (x) = (xn 1)/f (x) where f (x) = dI d (x) = lcm(xd 1 : d I). Hence
n
xn 1
x 1
n (x) =
=
gcd
:
d
I
.
lcm(xd 1 : d I)
xd 1
1 One
can find more elementary proofs of this fact in any book on polynomials.
111
Definition 6.1.6. Let n N and q a prime power. Define the cyclotomic subgroup
Gq,n to be the subgroup of Fqn of order n (q).
The subgroups Gq,n are of interest as most elements of Gq,n do not lie in any subfield
of Fqn (see Corollary 6.2.3 below). In other words, Gq,n is the hardest part of Fqn from
the point of view of the DLP. Note that Gq,n is trivially an algebraic group, by virtue of
being a subgroup of the algebraic group Fqn = Gm (Fqn ) (see Example 5.1.5). The goal of
this subject area is to develop compact representations for the groups Gq,n and efficient
methods to compute with them.
The two most important cases are Gq,2 , which is the subgroup of Fq2 of order q+1, and
Gq,6 , which is the subgroup of Fq6 of order q 2 q + 1. We give compact representations
of these groups in Sections 6.3 and 6.4.
6.2
Algebraic Tori
Algebraic tori are a classical object in algebraic geometry and their relevance to cryptography was first explained by Rubin and Silverberg [500]. An excellent survey of this area
is [501].
Recall from Theorem 5.7.7 that the Weil restriction of scalars of A1 with respect to
Fqn /Fq is An . Let n > 1 and let f : An (Fq ) Fqn be a bijective Fq -linear function (e.g.,
corresponding to the fact that Fqn is a vector space of dimension n over Fq ). For any d | n
Q
di
define the norm NFqn /Fqd (g) = n/d1
g q . The equation NFqn /Fqd (f (x1 , . . . , xn )) = 1
i=0
defines an algebraic set in An .
Definition 6.2.1. The algebraic torus2 Tn is the algebraic set
V ({NFqn /Fqd (f (x1 , . . . , xn )) 1 : 1 d < n, d | n}) An .
Note that there is a group operation on Tn (Fq ), given by polynomials, inherited from
multiplication in Fqn . Hence (at least, ignoring for the moment the inverse map) Tn (Fq )
satisfies our informal definition of an algebraic group.
Lemma 6.2.2. Let the notation be as above.
1. Gq,n = {g Fqn : NFqn /Fqd (g) = 1 for all 1 d < n such that d | n}.
2. Tn (Fq ) is isomorphic as a group to Gq,n .
3. #Tn (Fq ) = n (q).
Proof: For the first statement note that
n/d1
gq
di
= g (q
1)/(qd 1)
i=0
112
Proof: Suppose g Fqd for some 1 d < n such that d | n. Then 1 = NFqn /Fqd (g) =
6.3
2 + A + B = 0
(6.1)
(6.2)
The group Gq,2 is defined to be the set of elements g Fq2 such that g q+1 = 1.
Equivalently this is the set of u + v such that u2 Auv + Bv 2 = 1.
Exercise 6.3.1. Show that if g = u+v Gq,2 then g 1 = g q = u+v = (uAv)+(v).
Hence, inversion in Gq,2 is cheaper than a general group operation (especially if A = 0 or
A is small).
Exercise 6.3.2. Suppose q is not a power of 2. Suppose Fq2 = Fq () where 2 +A+B = 0
and multiplying an element of Fq by A or B has negligible cost (e.g., A = 0 and B =
1). Show that one can compute a product (respectively: squaring; inversion) in Fq2
113
6.3.1
The Torus T2
Recall that Gq,2 can be represented as the Fq -points of the algebraic torus T2 = V (NFq2 /Fq (f (x, y))
1) A2 , where f : A2 (Fq ) Fq2 . By equation (6.2), an affine equation for T2 is
V (x2 Axy + By 2 1). Being a conic with a rational point, it is immediate from general
results in geometry (see Exercise 5.5.14 for a special case) that T2 is birational with A1 .
The next two results give a more algebraic way to show that T2 is rational. Rather
than directly constructing a birational map from T2 to A1 we go via Gq,2 . Lemma 6.3.4
provides a map from A1 (Fq ) to Gq,2 while Lemma 6.3.6 provides a map from Gq,2 to
A1 (Fq ).
Lemma 6.3.4. The set Gq,2 Fq2 is equal to the set
{(a + )/(a + ) : a Fq } {1}.
Proof: Clearly, every element g = (a + )/(a + ) satisfies gg = 1. It is also easy to check
that (a + )/(a + ) = (a + )/(a + ) implies a = a . Hence we have obtained q distinct
elements of Gq,2 . The missing element is evidently 1 and the result follows.
Exercise 6.3.5. Determine the value for a such that (a + )/(a + ) = 1.
Lemma 6.3.6. Let g = u + v Gq,2 , g 6= 1. Then u + v = (a + )/(a + ) for the
unique value a = (u + 1)/v.
Proof: The value a must satisfy
a + = (u + v)(a + ) = ua + u + av + v = (ua Au + Bv) + (av u).
Equating coefficients of gives av = u + 1 and the result follows as long as v 6= 0 (i.e.,
g 6= 1).
The above results motivate the following definition.
Definition 6.3.7. The T2 decompression map is the function decomp2 : A1 Gq,2
given by decomp2 (a) = (a + )/(a + ).
The T2 compression map is the function comp2 : Gq,2 {1, 1} A1 given by
comp2 (u + v) = (u + 1)/v.
Lemma 6.3.8. The maps comp2 and decomp2 are injective. The compression map is
not defined at 1. If g Gq,2 {1, 1} then decomp2 (comp2 (g)) = g.
Exercise 6.3.9. Prove Lemma 6.3.8.
114
Alert readers will notice that the maps comp2 and decomp2 are between Gq,2 and
A1 , rather than between T2 and A1 . For completeness we now give a map from Gq,2 to
T2 A2 . From this one can deduce birational maps between T2 and A1 , which prove
that T2 is indeed rational.
Lemma 6.3.10. An element of the form (a+)/(a+) Gq,2 corresponds to the element
a2 B
2a A
,
a2 aA + B a2 aA + B
of T2 .
decomp2 (a)1 =
a + (A )
a+
,
=
a+
a + (A )
6.3.2
Lucas Sequences
Lucas sequences can be used for efficient computation in quadratic fields. We give the
details for Gq,2 Fq2 . The name LUC cryptosystem is applied to any cryptosystem using
Lucas sequences to represent elements in an algebraic group quotient of G2,q . Recall the
trace TrFq2 /Fq (g) = g + g q for g Fq2 .
3 This is analgous to using projective coordinates for efficient elliptic curve arithmetic; see Exercise 9.1.5.
4 They are named after Edouard Lucas (1842-1891); who apparently died due to a freak accident
involving broken crockery. Lucas sequences were used for primality testing and factorisation before their
cryptographic application was recognised.
115
Definition 6.3.13. Let g Fq2 satisfy g q+1 = 1. For i Z define Vi = TrFq2 /Fq (g i ).
Lemma 6.3.14. Let g = v1 + w1 with v1 , w1 Fq and as in equation (6.1). Suppose
g q+1 = 1 and let Vi be as in Definition 6.3.13. Then, for i, j Z,
1. V0 = 2 and V1 = TrFq2 /Fq (g) = 2v1 Aw1 .
2. Vi = Vi .
3. Vi+1 = V1 Vi Vi1 .
4. V2i = Vi2 2.
5. V2i1 = Vi Vi1 V1 .
6. V2i+1 = Vi Vi+1 V1 .
Proof: Let g = g q = v1 +w1 . Then TrFq2 /Fq (g) = g+g = (v1 +w1 )+(v1 +w1 (A)) =
2v1 Aw1 . Similarly, g 0 = 1 and the first statement is proven. The second statement
follows from g 1 = g. Statements 3 to 6 are all special cases of statement 8, which follows
from the equation
Vi+j = g i+j + gi+j = (g i + g i )(g j + g j ) g j g j (g ij + g ij ).
(An alternative proof of Statement 3 is to use the fact that g satisfies g 2 = V1 g 1.)
Statement 7 then follows from 3 and 6.
Exercise 6.3.15. Define Ui = (g i g i )/(g g). Prove that Ui+1 = TrFq2 /Fq (g)Ui Ui1 ,
U2i = Vi Ui , Ui+j = Ui Uj+1 Ui1 Uj .
Definition 6.3.16. Denote by Gq,2 /hi the set of equivalence classes of Gq,2 under the
equivalence relation g (g) = g q = g 1 . Denote the class of g Gq,2 by [g] = {g, g q }.
The main observation is that TrFq2 /Fq (g) = TrFq2 /Fq (g q ) and so a class [g] can be
identified with the value V = TrFq2 /Fq (g). This motivates Definition 6.3.18. When q is
odd, the classes [1] and [1] correspond to V = 2 and V = 2 respectively; apart from
these cases, the other possible values for V are those for which the polynomial x2 V x+ 1
is irreducible over Fq .
Exercise 6.3.17. Prove that if TrFq2 /Fq (g) = TrFq2 /Fq (g ) for g, g Gq,2 then g
{g, g q }. Hence, show that when q is odd there are 2 + (q 1)/2 values for TrFq2 /Fq (g)
over g Gq,2 .
The set Gq,2 /hi is not a group, however for a class [g] Gq,2 /hi and n N one can
define [g]n to be [g n ].
Definition 6.3.18. Let Gq,2 = {TrFq2 /Fq (g) : g Gq,2 }. For V Gq,2 and n N define
[n]V = TrFq2 /Fq (g n ) for any g Gq,2 such that V = TrFq2 /Fq (g).
It follows that we may treat the set Gq,2 as an algebraic group quotient. One method to
efficiently compute [n]V for n N is to take a root g Fq2 of x2 V x+1 = 0, compute g n
in Fq2 using the square-and-multiply method, and then compute TrFq2 /Fq (g n ). However,
we want to be able to compute [n]V directly using an analogue of the square-and-multiply
116
method.5 Lemma 6.3.14 shows that, although V2n is determined by Vn and n, Vn+1 is
not determined by Vn alone. Hence it is necessary to develop an algorithm that works on
a pair (Vn , Vn1 ) of consecutive values. Such algorithms are known as ladder methods.
One starts the ladder computation with (V1 , V0 ) = (V, 2).
Lemma 6.3.19. Given (Vi , Vi1 ) and V one can compute (V2i , V2i1 ) (i.e., squaring)
or (V2i+1 , V2i ) (i.e., square-and-multiply) in one multiplication, one squaring and two
or three additions in Fq .
Proof: One must compute Vi2 and Vi Vi1 and then apply part 4 and either part 5 or 7
of Lemma 6.3.14.
Exercise 6.3.20. Write the ladder algorithm for computing [n]V using Lucas sequences
in detail.
The storage requirement of the ladder algorithm is the same as when working in Fq2 ,
although the output value is compressed to a single element of Fq . Note however that
computing a squaring alone in Fq2 already requires more computation (at least when q is
not a power of 2) than Lemma 6.3.19.
We have shown that for V Gq,2 one can compute [n]V using polynomial operations
starting with the pair (V, 2). Since Gq,2 is in one-to-one correspondence with Gq,2 /hi, it
is natural to consider Gq,2 as being an algebraic group quotient.
Performing discrete logarithm based cryptography in Gq,2 is sometimes called the
LUC cryptosystem.6 To solve the discrete logarithm problem in Gq,2 one usually lifts
the problem to the covering group Gq,2 Fq2 by taking one of the roots in Fq2 of the
polynomial x2 V x + 1.
Example 6.3.21. Define F372 = F37 () where 2 3 + 1 = 0. The element g = 1 + 3
has order 19 and lies in G37,2 . Write V = TrF372 /F37 (g) = 7. To compute [6]V one uses
the addition chain (V1 , V0 ) = (7, 2) (V3 , V2 ) = (26, 10) (V6 , V5 ) = (8, 31); this is
because 6 = (110)2 in binary so the intermediate values for i are (1)2 = 1 and (11)2 = 3.
Exercise 6.3.22. Using the same values as Example 6.3.21 compute [10]V .
Exercise 6.3.23. Compare the number of Fq multiplications and squarings to compute
a squaring or a squaring-and-multiplication in the quotient Gq,2 using Lucas sequences
with the cost for general arithmetic in Gq,2 Fq2 .
6.4
The group Gq,6 is the subgroup of Fq6 of order 6 (q) = q 2 q + 1. The natural representation of elements of Gq,6 requires 6 elements of Fq .
Assume (without loss of generality) that Fq6 = Fq3 () where Fq2 and 2 +A+B =
0 for some A, B Fq .
5 In practice it is often more efficient to use other processes instead of the traditional square-andmultiply method. We refer to Chapter 3 of [575] for more details.
6 The original LUC cryptosystem due to Smith and Lennon [570] was using Lucas sequences modulo
a composite integer N ; we refer to Section 6.6 for further discussion. The finite field version is only very
briefly mentioned in [570], but is further developed in [571].
117
6.4.1
The Torus T6
Recall that T6 is a two dimensional algebraic set in A6 defined by the intersection of the
kernels of the norm maps NFq6 /Fq3 and NFq6 /Fq2 . It is known that T6 is rational, so the
goal is to represent elements of Gq,6 using only two elements of Fq .
The kernel of the norm map NFq6 /Fq3 is identified with T2 (Fq3 ) A2 (Fq3 ). As in
Section 6.3.1, T2 is birational to A1 (Fq3 ) (which can then be identified with A3 (Fq )) via
the map decomp2 (a) = (a + )/(a + ) where Fq6 = Fq3 (). The next step is to compute
the kernel of the norm map with respect to Fq6 /Fq2 .
Lemma 6.4.1. The Weil restriction of the kernel of NFq6 /Fq2 on T2 (Fq3 ) is birational
with a quadratic hypersurface U in A3 (Fq ).
Proof: First, we represent an element of T2 (Fq3 ) {1} as a single value a Fq3 . Now
impose the norm equation on the image of decomp2 (a)
!
!
q 2
q4
2
4
aq +
aq +
a+
a+
a+
a+
.
=
NFq6 /Fq2 (decomp2 (a)) =
a+
a+
a+
a+
aq 2 +
aq 4 +
To solve NFq6 /Fq2 (decomp2 (a)) = 1 one clears the denominator and equates coefficients
of , giving
a1+q
+q4
+ (a1+q + a1+q + aq
+q4
) + 2 (a + aq + aq ) + 3
a1+q
+q4
+ (a1+q + a1+q + aq
+q4
) + (a + aq + aq ) + .
The crucial observations are that the cubic terms in a cancel and that 2 = A( )
3
and 3 = (A2 B)( ). Hence we obtain a single equation in a.
Now, we identify a A1 (Fq3 ) with a 3-tuple (a0 , a1 , a2 ) A3 (Fq ). Using the fact that
a 7 aq corresponds to an Fq -linear map on A3 (Fq ), it follows that the single equation
given above is actually a quadratic polynomial in (a0 , a1 , a2 ). In other words, the values
(a0 , a1 , a2 ) corresponding to solutions of the norm equation are points on a quadratic
hypersurface in A3 (Fq ), which we call U .
The general theory (see Rubin and Silverberg [501]) implies that U is irreducible, but
we do not prove this. It remains to give a rational parameterisation pU : U A2 of the
hypersurface. This is done using essentially the same method as Example 5.5.14.
Lemma 6.4.2. An irreducible quadratic hypersurface U A3 over a field k is birational
over k to A2 .
Proof: (Sketch) Let P = (xP , yP , zP ) be a point on U and change variables so that the
tangent plane T to U at P is x = xP . We have not discussed T in this book; the only
property we need is that T contains every line through P that is not contained in U and
that intersects U at P with multiplicity 2.
Let Q U (k) be such that Q 6= P and such that the line between P and Q is not
contained in U (this is generically the case for an irreducible quadratic hypersurface).
Then the line between P and Q does not lie in T and so is given by an equation of the
form7
(x, y, z) = P + t(1, a, b)
(6.3)
for some a, b k (in other words, the equations x = xP + t, y = yP + at, etc). Such a
line hits U at precisely one point Q U (k) with Q 6= P . Writing U = V (F (x, y, z)) it
7 Here,
and below, P + Q denotes the usual coordinate-wise addition of 3-tuples over a field.
118
follows that F (xP + t, yP + at, zP + bt) = 0 has the form t(h(a, b)t g(a, b)) = 0 for some
quadratic polynomial h(a, b) k[a, b] and some linear polynomial g(a, b) k[a, b]. Hence
we have a rational map A2 U given by
(a, b) 7 P +
g(a,b)
h(a,b) (1, a, b).
1
a
b
,
,
1 a2 + ab b2 1 a2 + ab b2 1 a2 + ab b2
Finally, the map decomp6 from A2 to Gq,6 is (f (g(a, b)) + )/((f (g(a, b)) + ). It is then
straightforward to compute comp6 .
Exercise 6.4.5. Let q be a prime power and 9 a primitive 9-th root of unity in Fq .
Show that Fq (9 ) = Fq6 if and only if q 2, 5 (mod 9).
In principle one can write down the partial group operations on A2 induced from
Gq,6 , but this is not an efficient way to compute. Instead, to compute comp6 (g n ) from
comp6 (g) one decompresses to obtain an element g Gq,6 (or Gq3 ,2 ), computes g n , and
then compresses again.
119
6.4.2
XTR
+q4
s = g 1+q + g 1+q + g q
2
+q4
= g q + g 1q + g 1 .
=
=
120
3
dividing q 3q + 1. Show that g q+1 = g 3q and g q +1 = 1. Let t = TrFq6 /Fq2 (g) and
6.5
Further Remarks
Granger and Vercauteren [265] have proposed an index calculus algorithm for Tn (Fpm )
where m > 1. Kohel [348] has shown that one might map the discrete logarithm problem
in an algebraic torus Tn (Fq ) to the discrete logarithm problem in the generalised Jacobian
(which is a certain type of divisor class group) of a singular hyperelliptic curve over Fq .
This latter problem might be attacked using an index calculus method such as Gaudrys
algorithm (see Section 15.6.3). It seems this approach will not be faster than performing
index calculus methods in Fpn , but further investigation would be of interest.
6.6
Applications in factoring and primality testing motivate the study of tori over Z/N Z.
As mentioned in Section 4.4, the simplest approach is to restrict to N being square-free
121
and to use the Chinese remainder theorem to define the groups. First we explain how to
construct rings isomorphic to the direct product of finite fields.
Qk
Example 6.6.1. Let N = i=1 pi be square-free. Let F (x) = x2 + Ax + B Z[x] be a
quadratic polynomial such that F (x) is irreducible modulo pi for all 1 i k. Define
R = (Z/N Z)[x]/(F (x)). By the Chinese remainder theorem, R
= Fp2i . We will usually
1
Bx .
write for the image of x in R and = A x = Q
k
Define GN,2 to be the subgroup of R of order i=1 (pi + 1) isomorphic to the direct
sum of the groups Gpi ,2 . Note that GN,2 is not usually cyclic.
We would like to represent a general element of GN,2 using a single element of Z/N Z.
In other words, we would like to have a map from Z/N Z to GN.2 . One can immediately
apply Definition 6.3.7 to obtain the map a 7 (a + )/(a + ). Since the reduction modulo
pi of this map correctly maps to Gpi ,2 , for each prime pi , it follows that it correctly maps
to GN,2 . Hence, we can identify T2 (Z/N Z) with Z/N Z. The group operation from
Lemma 6.3.12 can also be applied in Z/N Z and its correctness follows from the Chinese
remainder theorem.
Q
Note that the
pi , whereas
Q image of Z/N Z in GN.2 under this map has size N =
GN,2 has order i (pi + 1). Hence, there are many elements of GN,2 that are missed by
the decompression map. Note that these missed elements are those which correspond
to the identity element of Gpi ,2 for at least one prime pi . In other words, they are of the
form g = u + v where gcd(v, N ) > 1.
Similarly, Lucas sequences can be used modulo N when N is square-free, and their
properties follow from the properties modulo pi for all prime factors pi of N . However,
one should be careful when interpreting the Galois theory. In Section 6.3.2 the non-trivial
element of Gal(Fq2 /Fq ) is written as (g) = g q , but this formulation does not naturally
generalise to the ring R of Example 6.6.1. Instead, define (u + v) = u + v so that
: R R is a ring homomorphism and (g) (mod pi ) = (g (mod pi )). One can then
define the trace map TrR/(Z/N Z) (g) = g + (g). The theory of Section 6.3.2 can then
immediately be adapted to give Lucas sequences modulo N .
Qk
Exercise 6.6.2. Let N = i=1 pi be a square-free integer and let R be as in Example 6.6.1. Let g GN,2 . Determine how many elements h GN,2 , in general, satisfy
TrR/(Z/N Z) (h) = TrR/(Z/N Z) (g). Show that roughly N/2k of the values V Z/N Z correspond to the trace of an element in GN,2 .
Using similar methods to the above it is straightforward to adapt the torus T6 and
XTR to the ring Z/N Z when N is square-free. We leave the details to the reader.
Chapter 7
7.1
Non-Singular Varieties
The word local is used throughout analysis and topology to describe any property
that holds in a neighbourhood of a point. We now develop some tools to study local
properties of points of varieties. The algebraic concept of localisation is the main
technique used.
Definition 7.1.1. Let X be a variety over k. The local ring over k of X at a point
P X(k) is
OP,k (X) = {f k(X) : f is regular at P }.
123
124
Define
mP,k (X) = {f OP,k (X) : f (P ) = 0} OP,k (X).
When the variety X and field k are clear from the context we simply write OP and mP .
Lemma 7.1.2. Let the notation be as above. Then
1. OP,k (X) is a ring;
125
Lemma 7.1.6. Let X An be an affine variety over k and let P X(k). Then
the quotient ring OP,k (X)/mP,k (X) is isomorphic to k as a k-algebra. Furthermore the
quotient mP,k (X)/mP,k (X)2 of OP,k (X)-ideals is a k-vector space of dimension at most
n.
Exercise 7.1.7. Prove Lemma 7.1.6.
As the following example shows, the dimension of the vector space mP,k (X)/mP,k (X)2
carries information about the local geometry of X at the point P .
Example 7.1.8. Let X = A2 and P = (0, 0) X(k). We have mP = (x, y), m2P =
(x2 , xy, y 2 ) and so the k-vector space mP /m2P has dimension 2. Note that X has dimension
2.
Let X = V (y 2 x) A2 , which has dimension 1. Let P = (0, 0) X(k). Then
mP = (x, y) and {x, y} span the k-vector space mP /m2P . Since x = y 2 in k(X) it follows
that x m2P and so x = 0 in mP /m2P . Hence mP /m2P is a one-dimensional vector space
over k with basis vector y.
Consider now X = V (y 2 x3 ) A2 , which has dimension 1. Let P = (0, 0). Again,
{x, y} spans mP /m2P over k. Unlike the previous example, there is no linear dependence
among the elements {x, y} (as there is no polynomial relation between x and y having a
non-zero linear component). Hence mP /m2P has basis {x, y} and has dimension 2.
Exercise 7.1.9. Let X = V (x4 + x + yx y 2 ) A2 over k and let P = (0, 0). Find a
basis for the k-vector space mP,k (X)/mP,k (X)2 . Repeat the exercise for X = V (x4 + x3 +
yx y 2 ).
Example 7.1.8 motivates the following definition. One important feature of this definition is that it is in terms of the local ring at a point P and so applies equally to affine
and projective varieties.
Definition 7.1.10. Let X be a variety (affine or projective) over k and let P X(k)
be point. Then P is non-singular if dimk mP,k (X)/mP,k (X)2 = dim(X) and is singular
otherwise.1 The variety X is non-singular or smooth if every point P X k is
non-singular.
Indeed, it follows from the arguments in this section that if P X(k) then P is
non-singular if and only if dimk mP,k (X)/mP,k (X)2 = dim(X). The condition of Definition 7.1.10 is inconvenient for practical computation. Hence, we now give an equivalent
condition (Corollary 7.1.13) for a point to be singular.
Suppose X An is an affine variety and let P = (0, . . . , 0). The key idea for Theorem 7.1.12 is to consider the map : k[x1 , . . . , xn ] kn defined by
f
f
(P ), . . . ,
(P ) .
(f (x1 , . . . , xn )) =
x1
xn
This is essentially the same map as used in the proof of Lemma 7.1.6, but there it was
defined on mP,k (X) OP,k (X) whereas is defined on k[x1 , . . . , xn ]. Note that is
k-linear. Let m0 (An ) be the k[x1 , . . . , xn ]-ideal (x1 , . . . , xn ). Then (m0 (An )) = kn ,
ker() = m0 (An )2 and induces an isomorphism of k-vector spaces m0 (An )/m0 (An )2
=
kn .
1 The dimension of the vector space m
2
P,k (X)/mP,k (X) is always greater than or equal to dim(X),
but we dont need this.
126
Lemma 7.1.11. Let X An be an affine variety over k and let P X(k). Define2
m = {f k[X] : f (P ) = 0}. Then k[X]/m
= mP,k (X)/mP,k (X)2 as
= k and m/m2
k-vector spaces.
Proof: We assume without loss of generality that P = (0, . . . , 0). Since k[X] = k[x1 , . . . , xn ]/Ik (X)
it follows that m is the k[X]-ideal (x1 , . . . , xn ). The first statement is then immediate.
For the second statement note that one has k[X] OP,k (X), m = mP,k (X) k[X] and
mP,k (X) is the OP,k (X)-ideal generated by m. Similarly, m2 = mP,k (X)2 k[X].
We now construct a ring isomorphism : OP,k (X)/mP,k (X)2 k[X]/m2 . Every
f OP,k (X) has a representation f1 /f2 where f1 , f2 k[X] and f2 (P ) 6= 0. Write
2
f2 = a0 + f3 + f4 where a0 k, a0 6= 0, f3 m and f4 m2 . Define g = a1
0 a0 f3 6 m.
1
2
2
Then f2 g 1 m and so g is f2 in k[X]/m . It follows that
f1 /f2 f1 g
in OP,k (X)/mP,k (X)2 . Hence, if f = f1 /f2 OP,k (X) with f1 , f2 k[X] then we define
(f ) = f1 g. One can verify that is a well-defined ring homomorphism, that is
surjective, and that ker() = mP,k (X)2 . Hence is an isomorphism of rings as claimed.
Finally, if f = f1 /f2 mP,k (X) with f1 , f2 k[X] then f1 m and f2 k[X] m
and so (f ) m. It follows that mP,k (X)/mP,k (X)2
= m/m2 .
Theorem 7.1.12. Let X = V (f1 , . . . , fm ) An be a variety defined over k and let
P X(k). Let d1 be the dimension of the k-vector space mP,k /m2P,k . Let d2 be the rank
of the Jacobian matrix
fi
.
(P )
JX,P =
1im
xj
1jn
Then d1 + d2 = n.
Proof: By Exercise 7.1.5 we may assume without loss of generality that P = (0, . . . , 0).
Let the notation be as in Lemma 7.1.11. We have d1 = dimk (m/m2 ). Recall the map
: k[x1 , . . . , xn ] kn from above, which gives an isomorphism from m0 (An )/m0 (An )2 to
kn .
Now, m is the image of m0 (An ) in k[X] = k[x1 , . . . , xn ]/Ik (X). Similarly, m2 is the image of m0 (An )2 in k[X]. Hence m/m2 is isomorphic as a k-vector space to m0 (An )/(m0 (An )2 , Ik (X)).
Similarly, the span of the rows of the matrix JX,P in kn is (Ik (X)), which is isomorphic as a k-vector space to (Ik (X), m0 (An )2 )/m0 (An )2 . One therefore has dimk (m/m2 ) +
rank(JX,P ) = n.
Corollary 7.1.13. Let X = V (f1 (x), . . . , fm (x)) An be an affine variety over k of
dimension d. Let P X(k). Then P X(k) is a singular point of X if and only if
the Jacobian matrix JX,P has rank not equal to n d. The point is non-singular if the
rank of JX,P is equal to n d.
Corollary 7.1.14. Let X = V (f (x1 , . . . , xn )) An be irreducible and let P X(k).
Then P is singular if and only if
f
(P ) = 0
xj
for all 1 j n
Exercise 7.1.15. Prove Corollaries 7.1.13 and 7.1.14.
2 We stress that m is different from the ideals m
n
P,k (X) and m0 (A ) above. One has m mP,k (X) and,
for P = (0, . . . , 0), m = m0 (An )/Ik (X).
127
Exercise 7.1.16. Let k be a field such that char(k) 6= 2 and let F (x) k[x] be such that
gcd(F (x), F (x)) = 1. Show that
X : y 2 = F (x)
is non-singular as an affine algebraic set. Now consider the projective closure X P2 .
Show that if deg(F (x)) 4 then there is a unique point in X X and that it is a singular
point.
Finally we can define what we mean by a curve.
Definition 7.1.17. A curve is a projective non-singular variety of dimension 1. A plane
curve is a curve that is given by an equation V (F (x, y, z)) P2 .
Remark 7.1.18. We stress that in this book a curve is always projective and nonsingular. Note that many authors (including Hartshorne [277] and Silverman [560]) allow
affine and/or singular dimension 1 varieties X to be curves. A fact that we wont prove is
that every finitely generated, transcendence degree 1 extension K of an algebraic closed
field k is the function field k(C) of a curve (see Theorem I.6.9 of Hartshorne [277]; note
that working over k is essential as there are finitely generated, transcendence degree 1
extensions of k that are not k(C) for a curve C defined over k). It follows that every
irreducible algebraic set of dimension 1 over k is birational over k to a non-singular curve
(see Theorem 1.1 of Moreno [436] for the details). Hence, in practice one often writes
down an affine and/or singular equation X that is birational to the projective, nonsingular curve C one has in mind. In our notation, the commonly used phrase singular
curve is an oxymoron. Instead one can say singular equation for a curve or singular
model for a curve.
The following result is needed in a later proof.
Lemma 7.1.19. Let C be a curve over k. Let P, Q C(k). Then OP,k OQ,k implies
P = Q.
Proof:
By Exercise 5.2.23 we may assume that P, Q Un (k) Pn (k) and applying
1
n we have P, Q n1 (C) An (k). Let R = k[1
n (C)] and define m = mP,k R =
{f R : f (P ) = 0} as in Lemma 7.1.11. By Lemma 7.1.11, R/m
= k and so m is a
maximal R-ideal. Finally, P V (m) since every polynomial in mP,k vanishes at P , and
by the Nullstellensatz V (m) = {P }.
If OP,k OQ,k then the inclusion map gives rise to OP,k OQ,k /mQ,k with kernel
OP,k mQ,k . In other words, OP,k /(OP,k mQ,k ) injects into OQ,k /mQ,k
= k. Hence
OP,k mQ,k is a maximal ideal and so mP,k mQ,k . Therefore m n := mQ,k R. But
m is maximal in R and 1 6 n so m = n. Since V (m) = {P } and V (n) = {Q} we have
P = Q.
7.2
Weierstrass Equations
(7.1)
(7.2)
128
(7.3)
for some a4 , a6 k. This is called the short Weierstrass form. Show that this equation
is non-singular if and only if the discriminant 4a34 + 27a26 6= 0 in k.
Exercise 7.2.7. Show that if char(k) = 2 then every Weierstrass equation over k is
isomorphic over k to a Weierstrass equation
y 2 z + xyz = x3 + a2 x2 z + a6 z 3
or
y 2 z + yz 2 = x3 + a4 xz 2 + a6 z 3 .
(7.4)
129
Proof:
Write U for the affine algebraic set obtained from E by setting z = 1. Note that
U k 6= . Corollary 5.4.9 shows that k(E)
= k(U ) and so it is sufficient to consider
functions on U . Every such function can be written in the form of equation (7.5) since
any denominators can be cleared by multiplying through by appropriate polynomials (the
polynomial (a(x) + b(x)y)(a(x) + b(x)(y)) is a polynomial in x only) and y n for n > 1
can be replaced using the equation y 2 = (x3 + a2 x2 + a4 x + a6 ) y(a1 x + a3 ). Both claims
of the Lemma follow immediately.
7.3
Uniformizers on Curves
Let C be a curve over k with function field k(C). It is necessary to formalise the notion
of multiplicity of a zero or pole of a function at a point. The basic definition will be that
f OP,k (C) has multiplicity m at P if f mm
and f 6 mm+1
. However, there are
P,k
P,k
a number of technicalities to be dealt with before we can be sure this definition makes
sense. We introduce uniformizers in this section as a step towards the rigorous treatment
of multiplicity of functions.
First we recall the definition of non-singular from Definition 7.1.10: Let C be a nonsingular curve over k and P C(k), then the quotient mP,k (C)/mP,k (C)2 (which is a
k-vector space by Lemma 7.1.6) has dimension one as a k-vector space.
Lemma 7.3.1. Let C be a curve (in particular, non-singular) over a field k and let
P C(k). Then the ideal mP,k (C) is principal as an OP,k (C)-ideal.
Proof: Write m for mP,k (C). Since C is non-singular, dimk mP,k (C)/mP,k (C)2 = 1. Let
x m be such that {m2 +x} is a k-vector space basis for m/m2 . Let n be the OP,k (C)-ideal
(x). Then n m. For every y m we have y = f + ux where u k and f m2 . Hence,
m = (n, m2 ). Let A be the OP,k (C)-module m/n. We want to prove that A = 0. This
follows by Nakayamas Lemma (see Proposition 2.6 of [15]) but we give a direct proof.
First note that mA = m(m/n) = (m2 , n)/n = A (the middle equality since y(n + z) =
n + yz for all y, z m). Suppose now that A 6= 0. Since OP,k (C) is Noetherian it follows
that m is finitely generated as an OP,k (C)-module and so A is finitely generated as an
OP,k (C)-module. Let {a1 , . . . , ak } be a minimal set of generators for A. Since A = mA
we have
k
X
m j aj
a1 =
j=1
for mj m. Hence,
a1 (1 m1 ) =
k
X
mj aj .
j=2
Note that 1 m1 6 m and so, since m is a maximal ideal, (1 m1 ) is a unit in OP,k (C).
Hence, a1 (a2 , . . . , ak ), which contradicts the minimality of the generating set. Hence
A = 0 and m = n = (x).
Definition 7.3.2. Let C be a curve (in particular, non-singular) over k and P C(k).
A uniformizer (or uniformizing parameter) at P is an element tP OP,k (C) such
that mP,k (C) = (tP ) as an OP,k (C)-ideal.
One can choose tP to be any element of mP,k (C) mP,k (C)2 ; in other words, the
uniformizer is not unique. If P is defined over k then one can take tP mP,k (C)
130
mP,k (C)2 , i.e., take the uniformizer to be defined over k; this is typically what one does
in practice.
For our presentation it is necessary to know uniformizers on P1 and on a Weierstrass
equation. The next two examples determine such uniformizers.
Example 7.3.3. Let C = P1 . For a point (a : 1) U1 P1 one can work instead with
the point a on the affine curve A1 = 11 (U1 ). One has ma = (x a) and so ta = (x a)
is a uniformizer at a. In terms of the projective equation one has ta = (x az)/z being a
uniformizer. For the point = (1 : 0) U0 P1 one again works with the corresponding
point 0 1
0 (U0 ). The uniformizer is ta = z which, projectively, is ta = z/x. A common
abuse of notation is to say that 1/x is a uniformizer at on A1 = 1
1 (U1 ).
Example 7.3.4. We determine uniformizers for the points on an elliptic curve. First
consider points (xP , yP ) on the affine equation
E(x, y) = y 2 + a1 xy + a3 y x3 + a2 x2 + a4 x + a6 .
Without loss of generality we can translate the point to P0 = (0, 0), in which case write
a1 , . . . , a6 for the coefficients of the translated equation E (x, y) = 0 (i.e., E (x, y) =
E(x + xP , y + yP )). One can verify that a6 = 0, a3 = (E/y)(P ) and a4 = (E/x)(P ).
Then mP0 = (x, y) and, since the curve is not singular, at least one of a3 or a4 is non-zero.
If a3 = 0 then4
x(x2 + a2 x + a4 a1 y) = y 2 .
Since (x2 + a2 x + a4 a1 y)(P0 ) = a4 6= 0 we have (x2 + a2 x + a4 a1 y)1 OP0 and so
x = y 2 (a4 + a2 x + x2 a1 y)1 .
In other words, x (y 2 ) m2P0 and y is a uniformizer at P0 .
Similarly, if a4 = 0 then y(a3 + a1 x + y) = x2 (x + a2 ) and so y (x2 ) m2P0 and
x is a uniformizer at P0 . If a3 , a4 6= 0 then either x or y can be used as a uniformizer.
(Indeed, any linear combination ax + by except a3 y a4 x can be used as a uniformizer;
geometrically, any line through P , except the line which is tangent to the curve at P , is
a uniformizer.)
Now consider the point at infinity OE = (x : y : z) = (0 : 1 : 0) on E. Taking y = 1
transforms the point to (0, 0) on the affine curve
z + a1 xz + a3 z 2 = x3 + a2 x2 z + a4 xz 2 + a6 z 3 .
(7.6)
It follows that
z(1 + a1 x + a3 z a2 x2 a4 xz a6 z 2 ) = x3
and so z (x3 ) m3P and so x is a uniformizer (which corresponds to x/y in homogeneous
coordinates).
In practice it is not necessary to move P to (0, 0) and compute the ai . We have shown
that if P = (xP , yP ) then tP = x xP is a uniformizer unless P = OE , in which case
tP = x/y, or P = (P ),5 in which case tP = y yP .
Lemma 7.3.5. Let C be a curve over k, let P C(k) and let tP be a uniformizer at P .
Let Gal(k/k). Then (tP ) is a uniformizer at (P ).
4 We
5 i.e.,
will see later that a3 = 0 implies (0, 0) has order 2 (since (x, y) = (x, y a1 x a3 )).
has order 2.
131
7.4
The aim of this section is to define the multiplicity of a zero or pole of a function on a
curve. For background on discrete valuation rings see Chapter 1 of Serre [538], Section
I.7 of Lang [362] or Sections XII.4 and XII.6 of Lang [364].
Definition 7.4.1. Let K be a field. A discrete valuation on K is a function v : K Z
such that:
1. for all f, g K , v(f g) = v(f ) + v(g);
6. Rv is a local ring.
132
P
f /tm
P OP,k (C)} and f = tP
1. If f k then vP (f ) = 0.
3. If f1 , f2 k(C) are such that vP (f1 ) 6= vP (f2 ) then vP (f1 +f2 ) = min{vP (f1 ), vP (f2 )}.
4. Suppose C is defined over k and let P C(k). Let Gal(k/k). Then vP (f ) =
v(P ) ((f )).
133
Proof: Let tP be a uniformizer at P . Then vP (tP ) = 1, which proves the third property
of Definition 7.4.1. The property vP (f g) = vP (f ) + vP (g) follows by the same argument
as Exercise 7.4.9. Similarly, iff = tvP u1 and g = tw
P u2 with v w and g 6= f then f +g =
tvP (u1 + twv
u
)
so
v
(f
+
g)
min{v
(f
),
v
(g)}.
Hence vP satisfies Definition 7.4.1.
2
P
P
P
P
We turn to the rest of the proof. The third statement is just a refinement of the
above argument. Without loss of generality, vP (f1 ) < vP (f2 ). Then f1 = tvP u1 and
f2 = tv+m
u2 for some u1 , u2 OP , v Z and m N. Then f1 + f2 = tvP (u1 + tm
P u2 ) 6= 0
P
O
so
v
(f
+
f
)
=
v
(f
).
and u1 + tm
2
P
1
2
P
1
P
P
The first statement follows since f (P ) 6= 0. Statement 2 is just a special case of
statement 3.
For the fourth statement, recall from Lemma 7.3.5 that one can take t(P ) = (tP ). If
f = tvP u where u(P ) 6= 0 then (f ) = (tP )v (u) and (u)((P )) = (u(P )) 6= (0) = 0
(see Exercise 5.4.13). The result follows.
Having shown that every vP is a discrete valuation on k(C) it is natural to ask whether
every discrete valuation on k(C) is vP for some point P C(k). To make this true over
fields that are not algebraically closed requires a more general notion of a point of C
defined over k. Instead of doing this, we continue to work with points over k and show in
Theorem 7.5.2 that every discrete valuation on k(C) is vP for some P C(k). But first
we give some examples.
Example 7.4.15. Let E : y 2 = x(x 1)(x + 1) over k and let P = (1, 0) E(k). We
determine vP (x), vP (x 1), vP (y) and vP (x + y 1).
First, x(P ) = 1 so vP (x) = 0. For the rest, since P = (P ) we take the uniformizer to
be tP = y. Hence vP (y) = 1. Since
x 1 = y 2 /(x(x + 1))
and 1/(x(x + 1)) OP we have vP (x 1) = 2.
Finally, f (x, y) = x + y 1 = y + (x 1) so vP (f (x, y)) = min{vP (y), vP (x 1)} =
min{1, 2} = 1. One can see this directly by writing f (x, y) = y(1 + y/x(x + 1)).
Lemma 7.4.16. Let E be an elliptic curve. Then vOE (x) = 2 and vOE (y) = 3.
Proof: We consider the projective equation, so that the functions become x/z and y/z
then set y = 1 so that we are considering x/z and 1/z on
z + a1 xz + a3 z 2 = x3 + a2 x2 z + a4 xz 2 + a6 z 3 .
As in Example 7.3.4 we have z (x3 ) and so vOE (x) = 1, vOE (z) = 3. This implies
vOE (1/z) = 3 and vOE (x/z) = 2 as claimed.
7.5
Let C be a curve over k and P C(k). We have shown that vP (f ) is a discrete valuation
on k(C). The aim of this section is to show (using the weak Nullstellensatz) that every
discrete valuation v on k(C) arises as vP for some point P C(k).
Lemma 7.5.1. Let C be a curve over k and let v be a discrete valuation on k(C). Write
Rv , mv for the corresponding valuation ring and maximal ideal (over k). Suppose C Pn
with coordinates (x0 : : xn ). Then there exists some 0 i n such that k[1
i (C)] is
a subring of Rv (where i1 is as in Definition 5.2.24).
134
Proof: First we prove there exists some 0 i n such that x0 /xi , . . . , xn /xi Rv . To
do this define Si = {j : 0 j n, xi /xj Rv }. We claim that S0 Sn 6= and
prove this by induction. First, note that i Si so S0 6= . Suppose, that j S0 Sk
for k 0. If j Sk+1 then we are done. If j 6 Sk+1 then we have xk+1 /xj 6 Rv and so
xj /xk+1 Rv . Since xi /xj Rv for 0 i k by the inductive hypothesis it follows that
(xi /xj )(xj /xk+1 ) = xi /xk+1 Rv for 0 i k + 1. It follows that S0 Sk+1 6= .
To prove the result, suppose i is such that x0 /xi , . . . , xn /xi Rv Then k[1
i (C)] =
k[x0 /xi , . . . , xn /xi ] is a subring of Rv .
Theorem 7.5.2. Let C be a curve over k and let v be a discrete valuation on k(C). Then
v = vP for some P C(k).
Proof: (Sketch) Let Rv be the valuation ring of v and mv the maximal ideal. Let i be as
in Lemma 7.5.1 so that R = k[1
i (C)] Rv . Note that R is the affine coordinate ring
of an affine curve.
By Lemma A.9.2, m = R mv is a prime ideal in R. Furthermore, m 6= and m 6= R.
Since R has Krull dimension 1, m is a maximal ideal.
Theorem 5.1.20 (weak Nullstellensatz) shows that m is equal to mP k[1
i (C)] for
(C)]
is
equal
to vP .
some point P C(k). It follows that the restriction of v to k[1
i
(C)]
it
follows
that
v
=
v
.
Finally, since k(C) is the field of fractions of k[1
P
i
For full details see Corollary I.6.6 of Hartshorne [277] or Theorem VI.9.1 of Lorenzini [391].
7.6
Divisors
A divisor is just a notation for a finite multi-set of points. As always, we work with points
over an algebraically closed field k.
Definition 7.6.1. Let C be a curve over k (necessarily non-singular and projective). A
divisor on C is a formal sum
X
nP (P )
(7.7)
D=
P C(k)
where nP Z and only finitely many nP 6= 0. The divisor with all nP = 0 is written
0. The support of the divisor D in equation (7.7) is Supp(D) = {P C(k) : nP 6= 0}.
Note that many authors use the notation |D|
P for the support of D.PDenote by Divk (C)
the set of all divisors on C. Define D = P (nP )(P ). If D = P C(k) nP (P ) then
define
X
D + D =
(nP + nP )(P ).
P C(k)
Write D D if nP
called effective.
nP
Example 7.6.2. Let E : y 2 = x3 + 2x 3 over Q and let P = (2, 3), Q = (1, 0) E(Q).
Then
D = 5(P ) 7(Q)
is a divisor on E. The support of D is Supp(D) = {P, Q} and D is not effective.
P
Definition 7.6.3. The degree of a divisor D = P nP (P ) is the integer
X
deg(D) =
nP .
P C(k)
135
(We stress that this is a finite sum.) We write Div0k (C) = {D Divk (C) : deg(D) = 0}.
Lemma 7.6.4. Divk (C) is a group under addition, and Div0k (C) is a subgroup.
Exercise 7.6.5. Prove Lemma 7.6.4.
P
Definition 7.6.6. Let C be a curve over k and let D = P C(k) nP (P ) be a divisor
P
on C. For Gal(k/k) define (D) = P nP ((P )). Then D is defined over k if
(D) = D for all Gal(k/k). Write Divk (C) for the set of divisors on C that are
defined over k.
Since Gal(k/k) is an enormous and complicated object it is important to realise that
testing the field of definition of any specific divisor is a finite task. There is an extension
k /k of finite degree containing the coordinates of all points in the support of D. Let k
be the Galois closure of k . Since k is normal over k, any Gal(k/k) is such that
(k ) = k . Hence, it is sufficient to study the behaviour of D under Gal(k /k).
Example
7.6.7.
Let C : x2 + y 2 = 6 over Q and let P = (1 + 2, 1 2), Q =
It
is sufficient
to consider (D) for Gal(Q( 2)/Q). The only non-trivial element is
( 2) = 2 and one sees that (P ) = Q and (Q) = P . Hence (D) = D for all
Gal(Q( 2)/Q) and D is defined over Q. Note that C(Q) = , so this example shows
it is possible to have Divk (C) 6= {0} even if C(k) = .
7.7
Principal Divisors
This section contains an important and rather difficult result, namely that the number of
poles of a function on a curve (counted according to multiplicity) is finite and equal to the
number of zeros (counted according to multiplicity). The finiteness condition is essential
to be able to represent the poles and zeroes of a function as a divisor. The other condition
is required to show that the set of all divisors of functions is a subgroup of Div0k (C).
In this chapter, finite poles and finite zeroes is only proved for plane curves and
deg(div(f )) = 0 is proved only for elliptic curves. The general results are given in Section 8.3 in the next Chapter.
Theorem 7.7.1. Let C be a curve over k and f k(C) . Then f has finitely many poles
and zeroes.
Proof: (Special case of plane curves.) Let C = V (F (x, y, z)) P2 where F is irreducible.
If F (x, y, z) = z then the result follows from Exercise 5.2.35 (there are only finitely many
points at infinity). So we can restrict to the affine case C = V (F (x, y)).
Let f = f1 (x, y)/f2 (x, y) with f1 , f2 k[x, y]. Then f is regular whenever f2 (P ) 6= 0
so the poles of f are contained in C V (f2 ). Without loss of generality, f2 (x, y) contains
monomials featuring x. The resultant Rx (f2 (x, y), F (x, y)) is a polynomial in y with a
finite number of roots hence C V (f2 ) is finite.
To show there are finitely many zeroes write f = f1 /f2 . The zeroes of f are contained
in C (V (f1 ) V (f2 )) and the argument above applies.
136
Definition 7.7.2. Let f k(C) and define the divisor of a function (this is a divisor
by Theorem 7.7.1)
X
vP (f )(P ).
div(f ) =
P C(k)
The divisor of a function is also called a principal divisor. Note that some authors
write div(f ) as (f ). Let
Prink (C) = {div(f ) : f k(C) }.
Exercise 7.7.3. Show that the zero element of Divk (C) lies in Prink (C).
Lemma 7.7.4. Let C be a curve over k and let f, f k(C) .
1. div(f f ) = div(f ) + div(f ).
2. div(1/f ) = div(f ).
P
3. div(f + f ) P min{vP (f ), vP (f )}(P ).
4. div(f n ) = ndiv(f ) for n Z.
n
Y
i=1
(7.8)
Pn
Since i=1 ei = 0 it follows that f (x, z) is a ratio of homogeneous polynomials of the
same degree and therefore a rational function on P1 . Using the uniformizers on P1 from
Example 7.3.3 one can verify that vPi (f ) = ei when Pi = (xi : zi ) and hence that
D = div(f ).
Note that if D is defined over k then one can show that the function f (x, z) in equation (7.8) is defined over k.
Exercise 7.7.9. Prove that if f k(P1 ) then deg(div(f )) = 0.
Lemma 7.7.10. Let E : y 2 + H(x)y = F (x) be a Weierstrass equation over k and let
P = (xi , yi ) E(k) be a non-singular point. Then div(x xi ) = (P ) + ((P )) 2(OE ).
Proof: There are one or two points P E(k) with x-coordinate equal to xi , namely
P = (xi , yi ) and (P ) = (P ) = (xi , yi H(xi )) (and these are equal if and only if
2yi + H(xi ) = 0). By Example 7.3.4 one can take the uniformizer tP = t(P ) = (x xi )
unless (E/y)(P ) = 2yi + H(xi ) = 0, in which case the uniformizer is tP = (y yi ).
In the former case we have vP (x xi ) = v(P ) (x xi ) = 1. In the latter case write
137
F (x) = (x xi )g(x) + F (xi ) = (x xi )g(x) + yi2 + H(xi )yi and H(x) = (x xi )a1 + H(xi ).
Note that a1 yi g(xi ) = (E/x)(P ) 6= 0 and so g1 (x) := 1/(a1 y g(x)) OP . Then
0
= y 2 + H(x)y F (x)
A
.
Write
a(x)
=
i=1 (x xi )
Pn
e
=
d.
It
suffices
to
compute
the
divisor
of
(x xi ) and show that it has degree 0.
i=1 i
The result therefore follows from Lemma 7.7.10.
Now consider a function of the form a(x) + b(x)y on the affine curve E A2 . By
Lemma 7.7.12 one has vP (a(x) + b(x)y) = v(P ) (a(x) + b(x)(y H(x))) for all points
6 This innocent-looking fact is actually the hardest result in this chapter to prove. There are several
accessible proofs of the general result: Stichtenoth (Theorem I.4.11 of [585]; also see Moreno [436] Lemma
2.2) gives a proof based on weak approximation of valuations and this is probably the simplest proof for a
reader who has already got this far through the current book; Fulton [215] gives a proof for projective plane
curves based on B
ezouts theorem; Silverman [560], Shafarevich [539], Hartshorne [277] and Lorenzini [391]
all give proofs that boil down to ramification theory of f : C P1 , and this is the argument we will give
in the next chapter.
138
P
P
P E(k). Hence, if div(a + by) = P nP (P ) then div(a + b(y H)) = P nP ((P ))
and deg(div(a + by)) = deg(div(a + b(y H))).
Since (a + by)(a + b(y H)) = a2 + ab(y y H) + b2 (y 2 Hy) = a2 Hab F b2
is independent of y it follows by the first part of the proof that the affine parts of the
divisors of the functions (a + by) and a + b(y H) have degree
max{2 deg(a), deg(H) + deg(a) + deg(b), 3 + 2 deg(b)}.
(7.9)
One can check that the degree in equation (7.9) is 2 deg(a) when deg(a) deg(b) + 2 and
is 3 + 2 deg(b) when deg(a) deg(b) + 1.
To study the behaviour at infinity consider (a(x, z)+b(x, z)y)/z d where d = max{deg(a), deg(b)+
1}. By the same argument as before one has vOE (a(x, z)/z d ) = 2 deg(a). Similarly,
vOE (b(x, z)y/z d ) = vOE (b(x, z)/z d1 ) + vOE (y/z) = 2 deg(b) 3. It follows by part 3
of Lemma 7.4.14 that deg(div((a(x, z) + b(x, z)y)/z d)) = 0.
Finally, consider f (x, y, z) = f1 (x, y, z)/f2 (x, y, z) where f1 and f2 are homogeneous
of degree d. By the above, deg(div(f1 (x, y, z)/z d)) = deg(div(f2 (x, y, z)/z d)) = 0 and the
result follows.
Corollary 7.7.13. Let C be a curve over k and let f k(C) . The following are
equivalent:
1. div(f ) 0.
2. f k .
3. div(f ) = 0.
Proof: Certainly statement 2 implies statement 3 and 3 implies 1. So it suffices to prove
1 implies 2. Let f k(C) be such that div(f ) 0. Then f is regular everywhere, so
choose some P0 C(k) and define h = f f (P0 ) k(C). Then h(P0 ) = 0. If h = 0 then
f is the constant function f (P0 ) and, since f is defined over k, it follows that f k . To
complete the proof suppose that h 6= 0 in k(C). Since deg(div(h)) = 0 by Theorem 7.7.11
it follows that h must have at least one pole. But then f has a pole, which contradicts
div(f ) 0.
Corollary 7.7.14. Let C be a curve over k. Let f, h k(C) . Then div(f ) = div(h) if
and only if f = ch for some c k .
Exercise 7.7.15. Prove Corollary 7.7.14.
7.8
We have seen that Prink (C) = {div(f ) : f k(C) } is a subgroup of Div0k (C). Hence,
since all the groups are Abelian, one can define the quotient group; we call this the divisor
class group. It is common to use the notation Pic for the divisor class group since the
divisor class group of a curve is isomorphic to the Picard group of a curve (even though
the Picard group is usually defined differently, in terms of line bundles).
Definition 7.8.1. The (degree zero) divisor class group of a curve C over k is
Pic0k (C) = Div0k (C)/Prink (C).
We call two divisors D1 , D2 Div0k (C) linearly equivalent and write D1 D2
if D1 D2 Prink (C). The equivalence class (called a divisor class) of a divisor
D Div0k (C) under linear equivalence is denoted D.
139
Theorem 7.8.3. Let C be a curve over k and let f k(C). If (f ) = f for all
Gal(k/k) then f k(C). If div(f ) is defined over k then f = ch for some c k and
h k(C).
Proof: The first claim follows from Remark 5.4.14 (also see Remark 8.4.11 of Section 8.4).
For the second statement, let div(f ) be defined over k. Then div(f ) = (div(f )) =
div((f )) where the second equality follows from part 4 of Lemma 7.4.14. Corollary 7.7.14
(the fact that c( ) = (c( ))c() is immediate, the fact that c : Gal(k/k) k is
continuous also follows). Hence, Theorem A.7.2 (Hilbert 90) implies that c() = ()/
Proof: (Sketch) Let G = Gal(k/k). Theorem 7.8.3 already showed that Prink (C)G =
Prink (C) but we re-do the proof in a more explicitly cohomological way, as we need further
consequences of the argument.
Take Galois cohomology of the exact sequence
1 k (k(C) )G Prink (C)G H 1 (G, k ) H 1 (G, k(C) ) H 1 (G, Prink (C)) H 2 (G, k ).
Since (k(C) )G = k(C) (Theorem 7.8.3) and H 1 (G, k ) = 0 (Hilbert 90) we have Prink (C)G =
Prink (C). Further, H 2 (G, k ) = 0 when k is finite (see Section X.7 of [538]) and
H 1 (G, k(C) ) = 0 (see Silverman Exercise X.10). Hence, H 1 (G, Prink (C)) = 0.
Now, take Galois cohomology of the exact sequence
1 Prink (C) Div0k (C) Pic0k (C) 0
to get
Prink (C) Div0k (C)G Pick0 (C)G H 1 (G, Prink (C)) = 0.
140
Now, Divk0 (C)G = Div0k (C) by definition and so the result follows.
We minimise the use of the word Jacobian in this book, however we make a few
remarks here. We have associated to a curve C over a field k the divisor class group
Pic0k (C). This group can be considered as an algebraic group. To be precise, there is a
variety JC (called the Jacobian variety of C) that is an algebraic group (i.e., there is a
morphism7 + : JC JC JC ) and such that, for any extension K/k, there is a bijective
map between Pic0K (C) and JC (K) that is a group homomorphism.
One can think of Pic0 as a functor that, given a curve C over k, associates with every
field extension k /k a group Pic0k (C). The Jacobian variety of the curve is a variety JC
over k whose k -rational points JC (k ) are in one-to-one correspondence with the elements
of Pic0k (C) for all k /k. For most applications it is sufficient to work in the language of
divisor class groups rather than Jacobians (despite our remarks about algebraic groups
in Chapter 4).
7.9
Elliptic Curves
The goal of this section is to show that the traditional chord-and-tangent rule for
elliptic curves does give a group operation. Our approach is to show that this operation
coincides with addition in the divisor class group of an elliptic curve. Hence, elliptic
curves are an algebraic group.
First we state the chord-and-tangent rule without justifying any of the claims or assumptions made in the description. The results later in the section will justify these
claims (see Remark 7.9.4). For more details about the chord-and-tangent rule see Washington [622], Cassels [122], Reid [494] or Silverman and Tate [563].
Let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be points on the affine part of an elliptic curve
E. Draw the line l(x, y) = 0 between P1 and P2 (if P1 6= P2 then this is called a chord; if
P1 = P2 then let the line be the tangent to the curve at P1 ). Denote by R the third point8
of intersection (counted according to multiplicities) of the line with the curve E. Now
draw the line v(x) = 0 between OE and R (if R = OE then this is the line at infinity
and if R is an affine point this is a vertical line so a function of x only). Denote by S the
third point of intersection of this line with the curve E. Then one defines P1 + P2 to be
S. Over the real numbers this operation is illustrated in Figure 7.1.
We now transform the above geometric description into algebra, and show that the
points R and S do exist. The first step is to write down the equation of the line between
P1 = (x1 , y1 ) and P2 = (x2 , y2 ). We state the equation of the line as a definition and then
show that it corresponds to a function with the correct divisor.
Definition 7.9.1. Let E(x, y) be a Weierstrass equation for an elliptic curve over k. Let
P1 = (x1 , y1 ), P2 = (x2 , y2 ) E(k) A2 . If P1 = (P2 ) then the line between P1 and P2
is9 v(x) = x x1 .
If P1 6= (P2 ) then there are two subcases. If P1 = P2 then define = (3x21 + 2a2 x1 +
a4 )/(2y1 + a1 x1 + a3 ) and if P1 6= P2 then define = (y2 y1 )/(x2 x1 ). The line between
P1 and P2 is then
l(x, y) = y (x x1 ) y1 .
We stress that whenever we write l(x, y) then we are implicitly assuming that it is not
a vertical line v(x).
7 To
141
Figure 7.1: Chord and tangent rule for elliptic curve addition.
R
P2
P1
Warning: Do not confuse the line v(x) with the valuation vP . The notation v(P ) means
the line evaluated at the point P . The notation vP (x) means the valuation of the function
x at the point P .
Exercise 7.9.2. Let the notation be as in Definition 7.9.1. Show that if P1 = (P2 ) then
v(P1 ) = v(P2 ) = 0 and if P1 6= (P2 ) then l(P1 ) = l(P2 ) = 0.
Lemma 7.9.3. Let P1 = (x1 , y1 ) E(k) and let P2 = (P1 ). Let v(x) = (x x1 ) as in
Definition 7.9.1. Then div(v(x)) = (P1 ) + (P2 ) 2(OE ).
Let P1 = (x1 , y1 ), P2 = (x2 , y2 ) E(k) be such that P1 6= (P2 ) and let l(x, y) =
y (x x1 ) y1 be as in Definition 7.9.1. Then there exists x3 k such that E(x, (x
Q3
x1 ) + y1 ) = i=1 (x xi ) and div(l(x, y)) = (P1 ) + (P2 ) + (R) 3(OE ) where R =
(x3 , (x3 x1 ) + y1 ).
Proof: The first part is just a restatement of Lemma 7.7.10.
For the second part, set G(x) = E(x, (x x1 ) + y1 ), which is a monic polynomial
over k of degree 3. Certainly x1 and x2 are roots of G(x) over k so if x1 6= x2 then
G(x) has a third root x3 over k. In the case x1 = x2 we have P1 = P2 6= (P2 ).
Make a linear change of variables so that (x1 , y1 ) = (x2 , y2 ) = 0. The curve equation is
E(x, y) = y 2 + a1 xy + a3 y (x3 + a2 x2 + a4 x) and a3 6= 0 since (0, 0) 6= (0, 0). Now, by
definition, l(x, y) = a4 x/a3 and one has
G(x) = E(x, a4 x/a3 ) = (a4 x/a3 )2 + a1 x(a4 x/a3 ) + a4 x (x3 + a2 x2 + a4 x)
which is divisible by x2 . Hence G(x) splits completely over k.
For the final part we consider l(x, y) as a function on the affine curve. By Lemma 7.4.14
and Lemma 7.4.16 we have vOE (l(x, y)) = min{vOE (y), vOE (x), vOE (1)} = 3. Since
deg(div(l(x, y))) = 0 there are three affine zeroes counted according to multiplicity.
Define l(x, y) = y + (a1 x + a3 ) + (x x1 ) + y1 . Note that l = l so vP (l(x, y)) =
v(P ) (l(x, y)) (also see Lemma 7.7.12). One can check that
l(x, y)l(x, y) = E(x, (x x1 ) + y1 ) =
3
Y
i=1
(x xi )
(7.10)
142
where the first equality is equivalence modulo E(x, y), not equality of polynomials. Hence,
for any point P E(k),
!
3
Y
(x xi ) .
vP (l(x, y)) + vP (l(x, y)) = vP
i=1
Write Pi = (xi , yi ), let ei be the multiplicity of xi in the right hand side of equation (7.10)
and recall that vPi (x xi ) = 1 if Pi 6= (Pi ) and 2 otherwise. Also note that l(Pi ) = 0
implies l(Pi ) 6= 0 unless Pi = (Pi ), in which case vPi (l(x, y)) = vPi (l(x, y)). It follows
that vPi (l(x, y)) = ei , which proves the result.
Remark 7.9.4. It follows from the above results that it does make sense to speak of
the third point of intersection R of l(x, y) with E and to call l(x, y) a tangent line in
the case when P1 = P2 . Hence, we have justified the assumptions made in the informal
description of the chord-and-tangent rule.
Exercise 7.9.5. Let E(x, y, z) be a Weierstrass equation for an elliptic curve. The line
z = 0 is called the line at infinity on E. Show that z = 0 only passes through (0, 0) on
the affine curve given by the equation E(x, 1, z) = 0.
Exercise 7.9.6. Prove that the following algebraic formulae for the chord-and-tangent
rule are correct. Let P1 , P2 E(k), we want to compute S = P1 + P2 . If P1 = OE then
S = P2 and if P2 = OE then S = P1 . Hence we may now assume that P1 = (x1 , y1 ) and
P2 = (x2 , y2 ) are affine. If y2 = y1 H(x1 ) then S = OE . Otherwise, set to be as in
Definition 7.9.1 and compute x3 = 2 + a1 a2 x1 x2 and y3 = (xS x1 ) y1 .
The sum is S = (x3 , y3 ).
Before proving the main theorem, we state the following technical result, whose proof
is postponed to the next chapter (Corollary 8.6.5).
Theorem 7.9.7. Let P1 , P2 E(k) be a points on an elliptic curve such that P1 6= P2 .
Then (P1 ) (P2 ) is not a principal divisor.
We now consider the divisor class group Pic0k (E). The following result is usually
obtained as a corollary to the Riemann-Roch theorem, but we give an ad-hoc proof for
elliptic curves. One can consider this result as the Abel-Jacobi map in the case of genus
1 curves.
Theorem 7.9.8. There is a one-to-one correspondence between E(k) and Pic0k (E), namely
P 7 (P ) (OE ).
Proof: We first show that the map is injective. Suppose (P1 ) (OE ) (P2 ) (OE ).
Then (P1 ) (P2 ) is principal, and so Theorem 7.9.7 implies PP
1 = P2 .
It remains to show that the map is surjective. Let D = P nP (P ) be any effective
divisor on E. We prove that D is equivalent to a divisor of the form
(P ) + (deg(D) 1)(OE ).
(7.11)
We will do this by replacing any term (P1 ) + (P2 ) by a term of the form (S) + (OE ) for
some point S.
The key equations are (P ) + ((P )) = 2(OE ) + div(v(x)) where v(x) is as in Definition 7.9.1, or, if P1 6= (P2 ), (P1 ) + (P2 ) = (S) + (OE ) + div(l(x, y)/v(x)). The first
equation allows us to replace any pair (P ) + ((P )), including the case P = (P ), by
2(OE ). The second equation allows us to replace any pair (P1 ) + (P2 ), where P1 6= (P2 )
143
(but including the case P1 = P2 ) with (S) + (OE ). It is clear that any pair of affine points
is included in one of these two cases, and so repeating these operations a finite number
of times reduces any effective divisor to the form in equation (7.11).
Finally, let D be a degree zero divisor on E. Write D = D1 D2 where D1 and
D2 are effective divisors of the same degree. By the above argument we can write D1
(S1 ) + (deg(D1 ) 1)(OE ) and D2 (S2 ) + (deg(D1 ) 1)(OE ). Hence D (S1 ) (S2 ).
Finally, adding the divisor of the vertical line function through S2 and subtracting the
divisor of the line between S1 and (S2 ) gives D (S)(OE ) for some point S as required.
Since E(k) is in bijection with the group Pic0k (E) it follows that E(k) is a group, with
the group law coming from the divisor class group structure of E. It remains to show
that the group law is just the chord-and-tangent rule. In other words, this result shows
that the chord-and-tangent rule is associative. Note that many texts prove that both
E(k) and Pic0k (E) are groups and then prove that the map P 7 (P ) (OE ) is a group
homomorphism; whereas we use this map to prove that E(k) is a group.
Theorem 7.9.9. Let E be an elliptic curve over a field k. The group law induced on
E(k) by pulling back the divisor class group operations via the bijection of Theorem 7.9.8
is the chord-and-tangent rule.
Proof: Let P1 , P2 E(k). To add these points we map them to divisor classes (P1 )(OE )
and (P2 ) (OE ) in Pic0k (E). Their sum is (P1 ) + (P2 ) 2(OE ), which is reduced to the
form (S) (OE ) by applying the rules in the proof of Theorem 7.9.8. In other words, we
get (P1 ) + (P2 ) 2(OE ) = (S) (OE ) + div(f (x, y)) where f (x, y) = v(x) if P1 = (P2 )
or f (x, y) = l(x, y)/v(x) in the general case, where l(x, y) and v(x) are the lines from
Definition 7.9.1. Since these are precisely the same lines as in the description of the
chord-and-tangent rule it follows that the point S is the same point as produced by the
chord-and-tangent rules.
A succinct way to describe the elliptic curve addition law (since there is a single
point at infinity) is that three points sum to zero if they lie on a line. This is simply a
restatement of the fact that if P , Q and R line on the line l(x, y, z) = 0 then the divisor
(P ) + (Q) + (R) 3(OE ) is a principal divisor.
Exercise 7.9.10. One can choose any k-rational point P0 E(k) and define a group law
on E(k) such that P0 is the identity element. The sum of points P and Q is defined as
follows: let l be the line through P and Q (taking the tangent if P = Q, which uniquely
exists since E is non-singular). Then l hits E at a third point (counting multiplicities)
R. Draw a line v between P0 and R. This hits E at a third point (again counting with
multiplicities) S. Then P + Q is defined to be the point S. Show that this operation
satisfies the axioms of a group.
Chapter 8
8.1
Lemma 8.1.1. Let C be a curve over k and f k(C). One can associate with f a rational
map : C P1 over k by = (f : 1). (Indeed, this is a morphism by Lemma 7.3.6.)
Denote by the constant map (P ) = (1 : 0). Then there is a one-to-one correspondence
between k(C) {} and the set of morphisms : C P1 .
Exercise 8.1.2. Prove Lemma 8.1.1.
Note that since k(C) {} is not a field, it does not make sense to interpret the set
of rational maps : C P1 as a field.
Lemma 8.1.3. Let C1 and C2 be curves over k (in particular, non-singular and projective) and let : C1 C2 be a non-constant rational map over k. Then is a dominant
morphism.
Proof:
By Lemma 7.3.6, is a morphism. By Lemma 5.5.17 and Exercise 5.5.19 we
know that the Zariski closure Z of (C1 ) is an irreducible algebraic set. Suppose Z 6= C2 .
145
146
147
Proof: Since has degree 1 it follows that (k(C2 )) = k(C1 ) and so k(C2 )
= k(C1 ).
The inverse of induces a rational map 1 : C2 C1 . Since C1 and C2 are nonsingular and projective it follows from Lemma 7.3.6 that : C1 C2 and 1 : C2 C1
are actually morphisms. It follows that 1 : C1 C1 and 1 : C2 C2 are
morphisms.
It remains to show that these maps are both the identity. Without loss of generality
we consider = 1 . Suppose for contradiction that there are points P, Q C1 (k)
148
8.2
Extensions of Valuations
Lemma 8.2.2. Let F1 /F2 be a finite extension and let v | v be valuations on F1 and
F2 respectively. Then Rv is a subring of Rv , Rv = Rv F2 and mv = mv F2 . In
particular, for f F2 , v(f ) = 0 if and only if v (f ) = 0.
Exercise 8.2.3. Prove Lemma 8.2.2.
Theorem 8.2.4. Let F1 /F2 be a finite extension of fields and let v be a valuation on F2 .
Then there is at least one (and only finitely many) valuation v of F1 such that v | v.
Proof: See Theorem XII.4.1 and Corollary XII.4.9 of Lang [364] or Proposition III.1.7(b)
of Stichtenoth [585].
Let : C1 C2 be a morphism of curves and let F2 = (k(C2 )) and F1 = k(C1 ). We
now explain the relation between extensions of valuations from F2 to F1 and pre-images
of points under .
Lemma 8.2.5. Let : C1 C2 be a non-constant morphism of curves over k (this is
short-hand for C1 , C2 and all being defined over k). Let P C1 (k) and Q C2 (k).
Denote by v the valuation on (k(C2 )) k(C1 ) defined by v( (f )) = vQ (f ) for f
k(C2 ). If (P ) = Q then vP is an extension of v.
Proof: Let f k(C2 ). Since (P ) = Q we have (f ) = f regular at P if and only if
f is regular at Q. Hence vP ( (f )) 0 if and only if vQ (f ) 0. It follows that vP | v.
149
150
Proof: As mentioned above, one can see this by noting that (OQ ) and (k[U ]) (for
an open set U C2 with Q U ) are Dedekind domains and studying the splitting of
mQ in their integral closures in k(C1 ). For details see any of Proposition 1.10 and 1.11
of Serre [538], Corollary XII.6.3 of Lang [364], Proposition I.21 of Lang [362], Theorem
III.3.5 of Lorenzini [391], Proposition II.6.9 of Hartshorne [277], or Theorem III.1.11 of
Stichenoth [585].
Corollary 8.2.13. If : C1 C2 is a rational map of degree d and Q C2 (k) then
there are at most d points P C1 (k) such that (P ) = Q.
Furthermore, if is separable then there is an open subset U C2 such that for all
Q U one has #1 (Q) = d.
Proof: The first statement is immediate. The second follows by choosing U to be the
complement of points corresponding to factors of the discriminant of k(C1 )/ (k(C2 ));
see Proposition VII.5.7 of Lorenzini [391].
Example 8.2.14. Consider : A1 A1 given by (x) = x2 as in Example 8.1.4. This
extends to the morphism : P1 P1 given by ((x : z)) = (x2 /z 2 : 1), which is regular
at
= (1 : 0) via the equivalent formula (1 : z 2 /x2 ). One has 1 ((a : 1)) = {( a :
1), ( a : 1)}, 1 ((0 : 1)) = {(0 : 1)} and 1 ((1 : 0)) = {(1 : 0)}. At a point
Q = (a : 1) with a 6= 0 one has uniformizer tQ = x/z a and
8.3
We can now define some important maps on divisors that will be used in several proofs
later. In particular, this will enable an elegant proof of Theorem 7.7.11 for general curves.
Definition 8.3.1. Let : C1 C2 be a non-constant morphism over k. Define the
pullback
: Divk (C2 ) Divk (C1 )
P
as follows. For Q C2 (k) define (Q) = P 1 (Q) e (P )(P ) and extend to Divk (C2 )
by linearity, i.e.,
X
X
nQ (Q) =
nQ (Q).
QC2 (k)
QC2 (k)
Note that, since Divk (C2 ) and Divk (C1 ) are not varieties, it does not make sense to
ask whether is a rational map or morphism.
151
x2
(x)2
x4
=
,
(x 1) (x 1)
x2 + 1
x2
x2 + 1
152
(Sketch)
vP (f )((P )) =
QC2 (k)
P C1 (k)
P C1 (k):(P )=Q
vP (f ) (Q).
P
To complete the proof it suffices to show that P C1 (k):(P )=Q vP (f ) = vQ ( (f )).
This requires some theory that has not been presented in the book, so we sketch
the details here.
Write L = k(C1 ), K = (k(C2 )) L. Fix Q C2 (k) and write v for the
valuation on K corresponding to vQ on k(C2 ). Write A = (OQ,k (C2 )) K,
which is a Dedekind domain, and let B be the integral closure of A in L. Write m
for the maximal ideal of A corresponding to mQ,k (C2 ). If P C1 (k) is such that
(P ) = Q then m = mP,k (C1 ) A. Suppose first Q
that L/K is Galois. Then for any
B-ideal I one can define the norm NL/K (I) = Gal(L/K) (I). Lemma IV.6.4
of Lorenzini [391] implies that NL/K (mP,k (C1 )) = m. When L/K is not Galois
then one can define NL/K by NL/K (mP,k (C1 )) = m. Proposition IV.6.9 of [391]
shows (also see Proposition I.22 of [362] in the case when L/K is separable) that
NL/K ((f )) = (NL/K (f )), where (f ) denotes the principal B-ideal generated by f
and where NL/K (f ) denotes the usual field-theoretic norm. Since
NL/K ((f )) =
v (f )
mPP
and
(NL/K (f )) = mv(NL/K (f ))
mP m
153
Proof: The maps and are well-defined on divisor classes by parts 2 and 4 of
Theorem 8.3.8. The homomorphic property follows from the linearity of the definitions.
Exercise 8.3.11. Show that if : C1 C2 is an isomorphism of curves over k then
Pic0k (C1 )
= Pic0k (C2 ) (isomorphic as groups). Give an example to show that the converse
is not true.
A further corollary of this result is that a rational map : E1 E2 between elliptic
curves such that (OE1 ) = OE2 is automatically a group homomorphism (see Theorem 9.2.1).
Exercise 8.3.12. Let : P1 P1 be defined by ((x : z)) = (x2 /z 2 : 1). Let D = (1 :
1) + (1 : 0) (0 : 1). Compute (D), (D), (D) and (D).
We now make an observation that was mentioned when we defined on k(C1 ).
Lemma 8.3.13. Let : C1 C2 be a non-constant morphism of curves over k. Let
f k(C1 ) and Q C2 (k). Suppose that vP (f ) = 0 for all points P C1 (k) such that
(P ) = Q. Then
Y
Nk(C1 )/ (k(C2 )) (f )(Q) =
f (P )e (P ) .
P C1 (k):(P )=Q
(Later in the book we will introduce the notation f ( (Q)) for the right hand side.) Another formulation would be f of conorm of Q equals norm of f at Q.
Proof: (Sketch) This uses similar ideas to the proof of part 4 of Theorem 8.3.8. We work
over k.
As always, k(C1 ) is a finite extension of (k(C2 )). Let A = (OQ (C2 )) and let B be
the integral closure of A in k(C1 ). Then B is a Dedekind domain and the ideal (mQ )
Q e (P )
splits as a product i mPi i where Pi C1 (k) are distinct points such that (Pi ) = Q.
By assumption, f has no poles at Pi and so f B. Note that f (Pi ) = ci k if and
only if f ci (mod mPi ). Hence, the right hand side is
Y e (Pi ) Y
Y
(f (mod mPi ))e (Pi ) .
ci
=
f (Pi )e (Pi ) =
i
It remains to prove that this is equal to the norm of f evaluated at Q and we sketch
this when the extension is Galois and cyclic (the general case is simple linear algebra).
The elements Gal(k(C1 )/ (k(C2 ))) permute the mPi and the ramification indices
e (Pi ) are all equal. Since ci k (k(C2 )) we have f ci (mod mPi ) if and only if
(f ) ci (mod (mPi )). Hence
Y e (P )
Y
ci i (mod mP1 )
(f )
Nk(C1 )/ (k(C2 )) (f ) =
Gal(k(C1 )/ (k(C2 )))
and since Nk(C1 )/ (k(C2 )) (f ) (k(C2 )) this congruence holds modulo (mQ ). The
result follows.
We now give an important application of Theorem 8.3.8, already stated as Theorem 7.7.11.
Theorem 8.3.14. Let C be a curve over k and let f k(C) . Then f has only finitely
many zeroes and poles (i.e., div(f ) is a divisor) and deg(div(f )) = 0.
154
8.4
Riemann-Roch Spaces
nP (P ) be a divisor on C.
2. D D implies Lk (D) Lk (D ).
4. Let P0 C(k). Then dimk (Lk (D+P0 )/Lk (D)) 1 and if D D then dimk (Lk (D )/Lk (D))
deg(D ) deg(D).
3. Clearly k Lk (0). The converse follows from Corollary 7.7.13. The second statement follows since deg(div(f )) = 0.
P
4. Write D = P C(k) nP (P ). Note that Lk (D) is a k-vector subspace of Lk (D + P0 ).
Let t k(C) be a function such that vP0 (f ) = nP0 + 1 (e.g., take t to be a power
of a uniformizer at P0 ). If f Lk (D + P0 ) then f t OP,k (C). We therefore have
a k-linear map : Lk (D + P0 ) k given by (f ) = (f t)(P0 ). The kernel of is
Lk (D) and the first part of the statement follows. The second statement follows by
induction.
5. First, note that Lk (D) Lk (D+ ). We then compute dimk Lk (D+ ) = 1+dimk (Lk (D+ )/Lk (0)).
By the previous part this is 1 + deg(D+ ) deg(0) = 1 + deg(D+ ).
Exercise 8.4.3. Fill in the gaps in the proof of Lemma 8.4.2.
155
P
Exercise 8.4.4. Let D = P C(k) nP (P ) be a divisor on C. Explain why {f k(C) :
vP (f ) = nP for all P C(k)} {0} is not usually a k-vector space.
Definition 8.4.5. Let C be a curve over k and let D be a divisor on C. Define
k (D) = dimk Lk (D).
Write (D) = k (D).
Exercise 8.4.6. Show that k (0) = 1 and, for f k(C), k (div(f )) = 1.
Theorem 8.4.7. (Riemanns theorem) Let C be a curve over k (in particular, nonsingular and projective) . Then there exists a unique minimal integer g such that, for all
divisors D on C over k
k (D) deg(D) + 1 g.
Proof: See Proposition I.4.14 of Stichtenoth [585], Section 8.3 (page 196) of Fulton [215]
or Theorem 2.3 of Moreno [436].
Definition 8.4.8. The number g in Theorem 8.4.7 is called the genus of C.
Note that the genus is independent of the model of the curve C and so one can associate
the genus with the function field or birational equivalence class of the curve.
Exercise 8.4.9. Show that on P1 over k one has k (D) = deg(D) + 1 for all divisors D
and so the genus of P1 is zero. Note that if D is defined over k then k (D) = deg(D) + 1
too. (More results about genus zero are given in Section 8.6.)
Exercise 8.4.10. Let k be a field and let E : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 be
an elliptic curve over k. Determine the spaces Lk (nOE ) and their dimensions k (nOE )
for n = 0, 1, 2, 3, 4, 5, 6.
Remark 8.4.11. We give an alternative justification for Remark 5.4.14. Suppose f
k(C) is such that (f ) = f for all Gal(k/k). Write D = div(f ). Note that D is defined
over k. Then f Lk (D), which has dimension 1 by Exercise 8.4.6. Now, performing the
Brill-Noether proof of Riemanns theorem (e.g., see Section 8.5 of Fulton [215]) one can
show that Lk (D) contains a function h k(C). It follows that div(h) = D and that
f = ch for some c k. Hence Theorem 7.8.3 is proved.
8.5
Differentials arise in differential geometry: a manifold is described by open patches homeomorphic to Rn (or Cn for complex manifolds) with coordinate functions x1 , . . . , xn and
the differentials dxi arise naturally. It turns out that differentials can be described in a
purely formal way (i.e., without reference to limits).
When working over general fields (such as finite fields) it no longer makes sense to
consider differentiation as a process defined by limits. But the formal description of
differentials makes sense and the concept turns out to be useful.
We first explain how to generalise partial differentiation to functions on curves. We can
then define differentials. Throughout this section, if F (x, y) is a polynomial or rational
function then F/x denotes standard undergraduate partial differentiation.
Definition 8.5.1. Let C be a curve over k. A derivation on k(C) is a k-linear map
(treating k(C) as a k-vector space) : k(C) k(C) such that (f1 f2 ) = f1 (f2 )+f2 (f1 ).
156
157
F (T ) for the minimal polynomial of f over k(x) in k(C); since the extension is separable
we have F/T 6= 0; as a function on C we have F (x, f ) = 0 and so
0 = (F (x, f )) =
F
F
(x) +
(f ).
x
T
(8.1)
(8.3)
evaluated at y. One can show that is well-defined, in the sense that if H(T ) =
Q(T )F (T ) + R(T ) for Q, R k(x)[T ] then f = H(y) = R(y) and the value of (f )
is the same regardless of whether H or R is used to compute it.
Let y be such that k(C) = k(x)(y) and write F (T ) k(x)[T ] for the minimal polynomial of y. For any f k(C) we have f = H(y) for some polynomial H(T ) k(x)[T ] and
so define (f ) using equation (8.3). We show that is a derivation. The k-linearity of
158
is clear. To show that satisfies the product rule let g, h k(C) and write g = G(y) and
h = H(y) for G[T ], H[T ] k(x)[T ]. Then note that
(gh)
D1 (F )
D2 (GH)
D2 (F )
D1 (F )
(GD2 (H) + HD2 (G))
= GD1 (H) + HD1 (G)
D2 (F )
D1 (F )
D1 (F )
= G D1 (H)
D2 (H) + H D1 (G)
D2 (G)
D2 (F )
D2 (F )
= g(h) + h(g).
= D1 (GH)
The equivalence of the two definitions (i.e., equations (8.2) and (8.3)) follows from the
uniqueness of derivations extending k(x) (Lemma IV.1.3 of Stichtenoth [585]).
Example 8.5.12. Let C : y 2 = x3 + x + 1 over Q. Note that x is a separating element.
To compute y/x one uses the fact that F (x, y) = y 2 (x3 + x + 1) = 0 in k(C) and so
y/x = (3x2 + 1)/(2y).
Consider the function f (x, y) = xy and let (f ) = f /x. Then (f ) = x(y) + y =
x(3x2 + 1)/(2y) + y = (3x3 + x + 2y 2 )/(2y) = (5x3 + 3x + 2)/(2y).
Exercise 8.5.13. Let k(C) be as in Example 8.5.12. Show that (y/x) = (x3 x
2)/(2yx2 ).
Lemma 8.5.14. Let C be a curve over k and let x, y k(C) be separating elements.
Then the corresponding derivations on k(C) satisfy the chain rule, namely
f x
f
=
.
y
x y
In particular, if x, y k(C) are separating elements then x/y = 1/(y/x) 6= 0.
Let t k(C). Then t/x = 0 if and only if t is not a separating element.
Proof: See Lemma IV.1.6 of Stichtenoth [585].
Exercise 8.5.15. Let C = P over Fp with variable x and let (f ) = f /x. Show that
(xp ) = 0.
Now we have defined f /x for general f k(C) we can introduce the differentials
on a curve over a field. Our definition is purely formal and the symbol dx is not assumed
to have any intrinsic meaning. We essentially follow Section IV.1 of Stichtenoth [585]; for
a slightly different approach see Section 8.4 of Fulton [215].
Definition 8.5.16. Let C be a curve over k. The set of differentials k (C) (some
authors write 1k (C)) is the quotient of the free k(C)-module on symbols dx for x k(C)
under the relations
1. dx 6= 0 if x is a separating element,
159
x1
.
x2
Example 8.5.20. We determine k (P1 ). Since k(P1 ) = k(x) the differentials are d(f (x)) =
(f /x)dx for f (x) k(x). Hence, they are a 1-dimensional vector space over k(C).
The following theorem, that all differentials on a curve are multiples of dx where x is
a separating element, is a direct consequence of the definition.
Theorem 8.5.21. Let C be a curve over k and let x be a separating element. Let
k (C). Then = hdx for some h k(C).
Exercise 8.5.22. Prove Theorem 8.5.21.
This result shows that k (C) is a k(C)-vector space of dimension 1 (we know that
k (C) 6= {0} since dx 6= 0 if x is a separating element). Therefore, for any 1 , 2 k (C)
with 2 6= 0 there is a unique function f k(C) such that 1 = f 2 . We define 1 /2 to
be f . (See Proposition II.4.3 of Silverman [560]).
We now define the divisor of a differential by using uniformizers. Recall from Lemma 8.5.8
that a uniformizer tP is a separating element and so dtP 6= 0.
Definition 8.5.23. Let C be a curve over k. Let k (C), 6= 0 and let P C(k)
have uniformizer tP k(C). Then the order of at P is vP () := vP (/dtP ). The
divisor of a differential is
X
div() =
vP ()(P ).
P C(k)
160
Lemma 8.5.26. The functions vP () and div() in Definition 8.5.23 are well-defined
(both with respect to the choice of representative for and choice of tP ).
Exercise 8.5.27. Prove Lemma 8.5.26.
Lemma 8.5.28. Let C be a curve over k and , k (C). Then
1. deg(div()) = deg(div( )).
2. div() is well-defined up to principal divisors (i.e., div() = div( ) + div(f ) for
some f k(C) ).
Exercise 8.5.29. Prove Lemma 8.5.28.
Definition 8.5.30. Any divisor div() is called a canonical divisor. The set {div() :
k (C)} is the canonical divisor class.
Example 8.5.31. We determine the canonical class of C = P1 .
Let = dx. Since x is a uniformizer at the point 0 we have v0 () = v0 (dx/dx) = 0.
More generally, for P k we have (x P ) a uniformizer and vP () = vP (dx/d(x P )) =
vP (1) = 0. Finally, a uniformizer at is t = 1/x and dt = (x2 )dx so v () =
v (x2 ) = 2. Hence div() = 2 and the degree of div() is -2.
Example 8.5.32. We determine the divisor of a differential on an elliptic curve E in
Weierstrass form. Rather than computing div(dx) it is easier to compute div() for
=
dx
.
2y + a1 x + a3
Let P E(k). There are three cases, if P = OE then one can take uniformizer t = x/y, if
P = (xP , yP ) = (P ) then take uniformizer (y yP ) (and note that vP (2y + a1 x + a3 ) = 1
in this case) and otherwise take uniformizer (x xP ) and note that vP (2y + a1 x + a3 ) = 0.
We deal with the general case first. Since dx/d(x xP ) = x/(x xP ) = 1 it follows
that vP () = 0. For the case, P = OE write x = t2 f and y = t3 h for some functions
f, h k(E) regular at OE and with f (OE ), h(OE ) 6= 0. One can verify that
2f + tf
2t3 f + t2 f
=
= 3
dt
2t h + a1 t2 f + a3
2h + a1 tf + a3 t3
and so vOE () = 0. Finally, when P = (P ) we must consider
1
2y + a1 x + a3
dx
.
=
= 2
d(y yP )
y/x
3x + 2a2 x + a4
It follows that = (1/(3x2 + 2a2 x + a4 ))d(y yP ) and, since P is not a singular point,
3x2P + 2a2 xP + a4 6= 0 and so vP () = 0.
In other words, we have shown that div() = 0. One can verify that
div(dx) = (P1 ) + (P2 ) + (P3 ) 3(OE )
where P1 , P2 , P3 are the three affine points of order 2 in E(k).
Exercise 8.5.33. Show that
dy
dx
= 2
2y + a1 x + a3
3x + 2a2 x + a4 a1 y
on an elliptic curve.
161
8.6
Theorem 8.6.1. Let C be a curve over k (i.e., projective non-singular). The following
are equivalent.
1. C is birationally equivalent over k to P1 .
2. The divisor class group of C over k is trivial and #C(k) 2.
Proof:
(1 2): Let C be birational to P1 over k. By Lemma 7.3.6 there is a
morphism from P1 to C and by Lemma 8.2.7 it is surjective. Since #P1 (k) 2 it follows
that #C(k) 2. Also, since the divisor class group of P1 is trivial it follows from
Exercise 8.3.11 that Pic0k (C) = {0}.
(2 3): Let P, Q C(k) be distinct. Since (Q) (P ) is principal there exists a
function h with div(h) = (Q) (P ) and so k (P ) is spanned by at least {1, h} (which is
a linearly independent set).
(3 1): Let P0 C(k) be such that k (P0 ) 2. Then there is some function h k(C)
and a point P C(k) such that div(h) = (P ) (P0 ). For any R C(k), R 6= P0 , the
function h h(R) has a simple pole at P0 and a simple zero at R. One can therefore
deduce that h gives an injective rational map h : C P1 . Unfortunately, it is not trivial
to write down the inverse rational map h : P1 C, so to complete the proof we show
that k(C)
= k(P1 ).
162
Then vR (g) = vR (g ) and so div(g ) = div(g) and g = cg for some c k . In other words,
f is a rational function of h, and so f k(h). Since f was arbitrary, k(C) = k(h) and so,
by Theorem 5.5.28, C is birational to P1 .
Definition 8.6.2. A curve satisfying any of the above equivalent conditions is called a
genus 0 curve.
Exercise 8.6.3. Write down a curve C over a field k such that the divisor class group
Pic0k (C) is trivial but C is not birationally equivalent over k to P1 .
Theorem 8.6.4. An elliptic curve does not have genus 0.
Proof: We have seen in Examples 8.5.31 and 8.5.32 that the canonical divisor classes
on P1 and an elliptic curve have different degree. It follows that P1 is not isomorphic to an
elliptic curve. And since a birational map of smooth projective curves is an isomorphism
(Lemma 8.1.13 and Lemma 8.1.15) the result follows from Corollary 8.5.37.
There are a number of other proofs of this result: For example, Lemma 11.3 of Washington [622] gives an elementary one; it also follows from the general theorem that a
non-singular plane curve of degree d has genus d(d 1)/2 or from the Hurwitz genus
formula (see below).
Corollary 8.6.5. Let E be an elliptic curve and P1 , P2 E(k). If P1 6= P2 then (P1 )
(P2 ) is not a principal divisor.
8.7
In this section we state, without proof, two very important results in algebraic geometry.
Neither will play a crucial role in this book.
Lemma 8.7.1. Let C be a curve over k of genus g and let k (C). Then
1. deg(div()) = 2g 2.
2. k (div()) = g.
Proof: See Corollary I.5.16 of Stichtenoth [585] or Corollary 11.16 of Washington [622].
For non-singular plane curves see Sections 8.5 and 8.6 of Fulton [215].
Theorem 8.7.2. (Riemann-Roch) Let C be a non-singular projective curve over k of
genus g, k (C) a differential and D a divisor. Then
k (D) = deg(D) + 1 g + k (div() D).
Proof: There are several proofs. Section 8.6 of Fulton [215] gives the Brill-Noether proof
for non-singular plane curves. Theorem I.5.15 of Stichtenoth [585] and Theorem 2.5 of
Moreno [436] give proofs using repartitions.
Some standard applications of the Riemann-Roch theorem are to prove that every
genus 1 curve with a rational point is birational to an elliptic curve in Weierstrass form,
and to prove that every hyperelliptic curve of genus g is birational to an affine curve of
the form y 2 + H(x)y = F (x) with deg(H(x)) g + 1 and deg(F (x)) 2g + 2.
163
Proof: See Theorem III.4.12 and Corollary III.5.6 of Stichtenoth [585], Theorem II.5.9
of Silverman [560] or Exercise 8.36 of Fulton [215].
A variant of the above formula is known in the case where some of the e (P ) are
divisible by char(k).
Chapter 9
Elliptic Curves
This is a chapter from version 1.1 of the book Mathematics of Public Key Cryptography by Steven Galbraith, available from http://www.isg.rhul.ac.uk/sdg/crypto-book/
The copyright for this chapter is held by Steven Galbraith.
This book is now completed and an edited version of it will be published by Cambridge
University Press in early 2012. Some of the Theorem/Lemma/Exercise numbers may be
different in the published version.
Please send an email to S.Galbraith@math.auckland.ac.nz if you find any mistakes.
All feedback on the book is very welcome and will be acknowledged.
This chapter summarises the theory of elliptic curves. Since there are already many
outstanding textbooks on elliptic curves (such as Silverman [560] and Washington [622])
we do not give all the details. Our focus is on facts relevant for the cryptographic applications, especially those for which there is not already a suitable reference.
9.1
Group law
Recall that an elliptic curve over a field k is given by a non-singular affine Weierstrass
equation
E : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
(9.1)
where a1 , a2 , a3 , a4 , a6 k. There is a unique point OE on the projective closure that
does not lie on the affine curve.
We recall the formulae for the elliptic curve group law with identity element OE : For
all P E(k) we have P + OE = OE + P = P so it remains to consider the case where
P1 , P2 E(k) are such that P1 , P2 6= OE . In other words, P1 and P2 are affine points and
so write P1 = (x1 , y1 ) and P2 = (x2 , y2 ). Recall that Lemma 7.7.10 shows the inverse of
P1 = (x1 , y1 ) is (P1 ) = (x1 , y1 a1 x1 a3 ). Hence, if x1 = x2 and y2 = y1 a1 x1 a3
(i.e., P2 = P1 ) then P1 + P2 = OE . In the remaining cases let
=
if P1 = P2
if P1 6= P2 .
(9.2)
166
Exercise 9.1.1. It is possible to unify the two cases in equation (9.2). Show that if
P1 = (x1 , y1 ) and P2 = (x2 , y2 ) lie on y 2 + (a1 x + a3 )y = x3 + a2 x2 + a4 x + a6 and
y2 6= y1 a1 x1 a3 then P1 + P2 can be computed using the fomula
=
(9.3)
167
(9.4)
2
(9.5)
(9.6)
9.2
The goal of this section is to show that a morphism between elliptic curves is the composition of a group homomorphism and a translation. In other words, all geometric maps
between elliptic curves have a group-theoretic interpretation.
Theorem 9.2.1. Let E1 and E2 be elliptic curves over k and let : E1 E2 be a
morphism of varieties such that (OE1 ) = OE2 . Then is a group homomorphism.
Proof: (Sketch) The basic idea is to note that : Pic0k (E1 ) Pic0k (E2 ) (where Pic0k (Ei )
denotes the degree zero divisor class group of Ei over k) is a group homomorphism and
((P ) (OE1 )) = ((P )) (OE2 ). We refer to Theorem III.4.8 of [560] for the details.
Definition 9.2.2. Let E be an elliptic curve over k and let Q E(k). We define the
translation map to be the function Q : E E given by Q (P ) = P + Q.
Clearly, Q is a rational map that is defined everywhere on E and so is a morphism.
Since Q has inverse map Q it follows that Q is an isomorphism of the curve E to itself
(though be warned that in the next section we will define isomorphism for pointed curves
and Q will not be an isomorphism in this sense).
Corollary 9.2.3. Let E1 and E2 be elliptic curves over k and let : E1 E2 be a
rational map. Then is the composition of a group homomorphism and a translation
map.
168
Proof: First, by Lemma 7.3.6 a rational map to a projective curve is a morphism. Now
let (OE1 ) = Q E2 (k). The composition = Q is therefore a morphism. By as
in Theorem 9.2.1 it is a group homomorphism.
Hence, every rational map between elliptic curves corresponds naturally to a map of
groups. Theorem 9.6.19 gives a partial converse.
Example 9.2.4. Let E : y 2 = x3 + x and Q = (0, 0). We determine the map Q on E.
Let P = (x, y) E(k) be a point such that P is neither Q nor OE . To add P and Q
to obtain (x3 , y3 ) we compute = (y 0)/(x 0) = y/x. It follows that
x3 = 2 x 0 =
y2
y 2 x3
1
x
=
=
2
2
x
x
x
and
y
.
x2
Hence Q (x, y) = (1/x, y/x2 ) away from {OE , Q}. It is clear that Q is a rational map
of degree 1 and hence an isomorphism of curves by Lemma 8.1.15. Indeed, it is easy to
see that the inverse of Q is itself (this is because Q has order 2).
One might wish to write Q projectively (we write the rational map in the form
mentioned in Exercise 5.5.2). Replacing x by x/z and y by y/z gives Q (x/z, y/z) =
(z/x, yz/x2) from which we deduce
y3 = (x3 0) 0 =
Q (x : y : z) = (xz : yz : x2 ).
(9.7)
Note that this map is not defined at either OE = (0 : 1 : 0) or Q = (0 : 0 : 1), in the sense
that evaluating at either point gives (0 : 0 : 0).
To get a map defined at Q one can multiply the right hand side of equation (9.7)
through by y to get
(xyz : y 2 z : x2 y) = (xyz : x3 xz 2 : x2 y)
and dividing by x gives Q (x : y : z) = (yz : x2 z 2 : xy). One can check that
Q (0 : 0 : 1) = (0 : 1 : 0) = (0 : 1 : 0) as desired. Similarly, to get a map defined at OE
one can multiply (9.7) by x, re-arrange, and divide by z to get
Q (x : y : z) = (x2 : xy : y 2 xz),
which gives Q (0 : 1 : 0) = (0 : 0 : 1) as desired.
9.3
169
(9.8)
Definition 9.3.5. Suppose char(k) 6= 2, 3 and let a4 , a6 k be such that 4a34 + 27a26 6= 0.
For the short Weierstrass equation y 2 z = x3 + a4 xz 2 + a6 z 3 , define the j-invariant
j(E) = 1728
4a34
4a34
.
+ 27a26
3j
2j
x+
1728 j
1728 j
170
Exercise 9.3.9. Let E1 , E2 be elliptic curves over Fq that are isomorphic over Fq . Show
that the discrete logarithm problem in E1 (Fq ) is equivalent to the discrete logarithm
problem in E2 (Fq ). In other words, the discrete logarithm problem on Fq -isomorphic
curves has exactly the same security.
To reduce storage in some applications it might be desirable to choose a model for
elliptic curves with coefficients as small as possible. Let p > 3 be prime. It has been
proven (see Section 5 of Banks and Shparlinski [27]) that almost all Fp -isomorphism
classes of elliptic curves over Fp have a model of the form y 2 = x3 + a4 x + a6 where
1 a4 , a6 p1/2+o(1) . Since there are O(p) isomorphism classes this result is optimal.
Note that finding such a small pair (a4 , a6 ) for a given j-invariant may not be easy.
9.4
Automorphisms
9.5. TWISTS
9.5
171
Twists
Twists of elliptic curves have several important applications such as point compression in
pairing-based cryptography (see Section 26.6.2), and efficient endomorphisms on elliptic
curves (see Exercise 11.3.24).
e
Definition 9.5.1. Let E be an elliptic curve over k. A twist of E is an elliptic curve E
e
over k such that there is an isomorphism : E E over k of pointed curves (i.e., such
e1 and E
e2 of E are equivalent if there is an isomorphism
that (OE ) = OEe ). Two twists E
e1 to E
e2 defined over k. A twist E
e of E is called a trivial twist if E
e is equivalent
from E
to E. Denote by Twist(E) the set of equivalence classes of twists of E.
Example 9.5.2. Let k be a field such that char(k) 6= 2. Let E : y 2 = x3 + a4 x + a6
over k and let d k . Define the elliptic curve E (d) : y 2 = x3 + d2 a4 x + d3 a6 . The map
(x, y) = (dx, d3/2 y) is an isomorphism
from E to E (d) . Hence E (d) is a twist of E. Note
(d)
172
9.6
Isogenies
We now return to more general maps between elliptic curves. Recall from Theorem 9.2.1
that a morphism : E1 E2 of elliptic curves such that (OE1 ) = OE2 is a group homomorphism. Hence, isogenies are group homomorphisms. Chapter 25 discusses isogenies
in further detail. In particular, Chapter 25 describes algorithms to compute isogenies
efficiently.
Definition 9.6.1. Let E1 and E2 be elliptic curves over k. An isogeny over k is a
morphism : E1 E2 over k such that (OE1 ) = OE2 . The zero isogeny is the
constant map : E1 E2 given by (P ) = OE2 for all P E1 (k). If (x, y) =
(1 (x, y), 2 (x, y)) is an isogeny then define by ()(x, y) = (1 (x, y), 2 (x, y)).
where (X, Y ) denotes, as always, the inverse for the group law. The kernel of an
isogeny is ker() = {P E1 (k) : (P ) = OE2 }. The degree of a non-zero isogeny is
the degree of the morphism. The degree of the zero isogeny is 0. If there is an isogeny
(respectively, isogeny of degree d) between two elliptic curves E1 and E2 then we say that
E1 and E2 are isogenous (respectively, d-isogenous). A non-zero isogeny is separable
if it is separable as a morphism (see Definition 8.1.6). Denote by Homk (E1 , E2 ) the set
of isogenies from E1 to E2 defined over k. Denote by Endk (E1 ) the set of isogenies from
E1 to E1 defined over k; this is called the endomorphism ring of the elliptic curve.
Exercise 9.6.2. Show that if : E1 E2 is an isogeny then so is .
Theorem 9.6.3. Let E1 and E2 be elliptic curves over k. If : E1 E2 is a non-zero
isogeny over k then : E1 (k) E2 (k) is surjective.
Proof: This follows from Theorem 8.2.7.
We now relate the degree to the number of points in the kernel. First we remark the
standard group theoretical fact that, for all Q E2 (k), #1 (Q) = # ker() (this is just
the fact that all cosets have the same size).
Lemma 9.6.4. A non-zero separable isogeny : E1 E2 over k of degree d has
# ker() = d.
9.6. ISOGENIES
173
Proof: It follows from Corollary 8.2.13 that a separable degree d map has #1 (Q) = d
for a generic point Q E2 (k). Hence, by the above remark, #1 (Q) = d for all points
Q and # ker() = d. (Also see Proposition 2.21 of [622] for an elementary proof.)
A morphism of curves : C1 C2 is called unramified if e (P ) = 1 for all P C1 (k).
Let : E1 E2 be a separable isogeny over k and let P E1 (k). Since (P ) = (P + R)
for all R ker() it follows that a separable morphism of elliptic curves is unramified
(this also follows from the Hurwitz genus formula).
Exercise 9.6.5. Let E1 and E2 be elliptic curves over k and suppose : E1 E2 is an
isogeny over k. Show that ker() is defined over k (in the sense that P ker() implies
(P ) ker() for all Gal(k/k)).
Lemma 9.6.6. Let E1 and E2 be elliptic curves over k. Then Homk (E1 , E2 ) is a group
with addition defined by (1 + 2 )(P ) = 1 (P ) + 2 (P ). Furthermore, Endk (E1 ) =
Homk (E1 , E1 ) is a ring with addition defined in the same way and with multiplication
defined by composition.
Proof: The main task is to show that if 1 , 2 : E1 E2 are morphisms then so is
(1 + 2 ). The case 2 = 1 is trivial, so assume 2 6= 1 . Let U be an open set
such that: 1 and 2 are regular on U ; P U implies 1 (P ) 6= OE2 and 2 (P ) 6=
OE2 ; 1 (P ) 6= 2 (P ). That such an open set exists is immediate for all but the final
requirement; but one can also show that the points such that 1 (x, y) = 2 (x, y) form a
closed subset of E1 as long as 1 6= 2 . Then using equation (9.3) one obtains a rational
map (1 + 2 ) : E1 E2 . Finally, since composition of morphisms is a morphism it is
easy to check that Endk (E1 ) is a ring.
By Exercise 8.1.12, if 1 : E1 E2 and 2 : E2 E3 are non-constant isogenies then
deg(2 1 ) = deg(2 ) deg(1 ). This fact will often be used.
An important example of an isogeny is the multiplication by n map.
Exercise 9.6.7. Show that [n] is an isogeny.
Example 9.6.8. Let E : y 2 = x3 + x. Then the map [2] : E E is given by the rational
function
2
(x 1)2 y(x6 + 5x4 5x2 1)
.
,
[2](x, y) =
4(x3 + x)
8(x3 + x)2
The kernel of [2] is OE together with the three points (xP , 0) such that x3P + xP = 0. In
other words, the kernel is the set of four points of order dividing 2.
We now give a simple example of an isogeny that is not [n] for some n N. The
derivation of a special case of this example is given in Example 25.1.5.
Example 9.6.9. Let A, B k be such that B 6= 0 and D = A2 4B 6= 0. Consider
the elliptic curve E : y 2 = x(x2 + Ax + B) over k, which has the point (0, 0) of order 2.
e and an isogeny : E E
e such that ker() = {OE , (0, 0)}.
There is an elliptic curve E
One can verify that
2
2
y y(B x2 )
x + Ax + B B x2
(x, y) =
=
,
,y
x2
x2
x
x2
e : Y 2 = X(X 2 2AX + D).
has the desired kernel, and the image curve is E
Before proving the next result we need one exercise (which will also be used later).
174
(9.9)
Note that when a1 = a3 = 0 this polynomial is simply 4 times the right hand side of the
elliptic curve equation. Show that this polynomial has distinct roots and so if char(k) 6= 2
then #E[2] = 4.
e be elliptic curves over k. If n N then [n] is not the zero
Lemma 9.6.11. Let E and E
e is torsion-free as a Z-module (i.e., if Homk (E, E)
e is
isogeny. Further, Homk (E, E)
non-zero then [n] is non-zero for all n Z, n 6= 0) and Endk (E) has no zero divisors.
e : Y 2 +e
Lemma 9.6.12. Let E : y 2 +a1 xy+a3 y = x3 +a2 x2 +a4 x+a6 and E
a1 XY +e
a3 Y =
3
2
e
X +e
a2 X + e
a4 X + e
a6 be elliptic curves over k. Let : E E be an isogeny of elliptic
curves over k. Then may be expressed by a rational function in the form
(x, y) = (1 (x), y2 (x) + 3 (x))
where
23 (x) = e
a1 1 (x) e
a3 + (a1 x + a3 )2 (x).
Proof: Certainly, may be written as (x, y) = (1 (x) + yf (x), y2 (x) + 3 (x)) where
1 (x), f (x), 2 (x) and 3 (x) are rational functions.
Since is a group homomorphism it satisfies (P ) = (P ). Writing P = (x, y)
the left hand side is
((x, y))
=
=
(x, y a1 x a3 )
175
9.6. ISOGENIES
It follows that (2y + a1 x + a3 )f (x) is a function that is zero for all points (x, y) E(k).
Since 2y + a1 x + a3 is not the zero function (if it was zero then k(E) = k(x, y) = k(y),
which contradicts Theorem 8.6.4) it follows that f (x) = 0.
It then follows that
23 (x) = e
a1 1 (x) e
a3 + (a1 x + a3 )2 (x).
Corollary 25.1.8 will give a more precise version of this result in a special case.
e = k(e
Proof: We have k(E) = k(x, y) being a quadratic extension of k(x), and k(E)
x, ye)
being a quadratic extension of k(e
x). Now 1 (x) gives a morphism 1 : P1 P1 and this
morphism has degree d = max{degx (a(x)), deg x (b(x))} by Lemma 8.1.9. It follows that
k(x) is a degree d extension of 1 k(e
x). We therefore have the following diagram of field
extensions
k(E)
2
e
k(E)
k(x)
k(e
x)
e = d.
it follows that [k(E) : k(E)]
e be
e elliptic curves over Fp . Let : E E
Theorem 9.6.17. Let p be a prime and E, E
a non-zero isogeny. Then there is an integer m and an elliptic curve E (q) (namely, the
curve obtained by applying the q = pm -power Frobenius map to the coefficients of E; the
reader should not confuse the notation E (q) with the quadratic twist E (d) ) and a separable
e of degree deg()/q such that = q where q : E E (q) is the
isogeny : E (q) E
q-power Frobenius morphism.
176
e
whereas one can have groups such as E(Q)
= Z/2Z for which there is a
= Z and E(Q)
e
non-zero group homomorphism E(Q) E(Q) whose kernel is infinite. It is natural to
ask whether every group homomorphism with finite kernel is an isogeny. The following
result shows that this is the case (the condition of being defined over k can be ignored by
taking a field extension).
Theorem 9.6.19. Let E be an elliptic curve over k. Let G E(k) be a finite group
that is defined over k (i.e., (P ) G for all P G and Gal(k/k)). Then there is a
e over k and a (not necessarily unique)
unique (up to isomorphism over k) elliptic curve E
e over k such that ker() = G.
isogeny : E E
Proof: See Proposition III.4.12 and Exercise 3.13(e) of [560]. We will give a constructive
proof (Velus formulae) in Section 25.1.1, which also proves that the isogeny is defined
over k.
e in Theorem 9.6.19 is sometimes written E/G. As noted, the
The elliptic curve E
isogeny in Theorem 9.6.19 is not necessarily unique, but Exercise 9.6.20 shows the only
way that non-uniqueness can arise.
e be another
Exercise 9.6.20. Let the notation be as in Theorem 9.6.19. Let : E E
isogeny over k such that ker() = G. Show that = where is an automorphism
e (or, if k is finite, the composition of an isogeny and a Frobenius map). Similarly,
of E
if : E E2 is an isogeny over k with ker() = G then show that = where
e E2 is an isomorphism over k of elliptic curves.
:E
9.6. ISOGENIES
177
Proof: Let 1 : E(k) Pic0k (E) be the canonical map P 7 (P ) (OE ) and similarly
e Pic0 (E).
e We have b = 1 1 as above. We refer to Theorem III.6.1
for 2 : E
k
2
of [560] (or Section 21.1 of [622] for elliptic curves over C) for the details.
e is a non-zero isogeny
Exercise 9.6.22. Suppose as in Theorem 9.6.21 that : E E
e
over k of degree m. Show that if : E E is any isogeny such that = [m] then
b
= .
e be elliptic curves over k and let : E E
e be a
Definition 9.6.23. Let E and E
b
e
non-zero isogeny over k. The isogeny : E E of Theorem 9.6.21 is called the dual
isogeny.
b
e E3 be an isogeny. Then [
2. Let : E
= b .
b
e be an isogeny. Then
\
3. Let : E E
+ = b + .
b
4. b = .
c = [m] and
Corollary 9.6.28. Let E be an elliptic curve over k and let m Z. Then [m]
2
deg([m]) = m .
Proof: The first claim follows by induction from part 3 of Theorem 9.6.27. The second
c = [1]: write d = deg([m]) and
claim follows from part 1 of Theorem 9.6.27 and since [1]
2
c
use [d] = [m][m] = [m ]; since Endk (E) is torsion-free (Lemma 9.6.11) it follows that
d = m2 .
An important consequence of Corollary 9.6.28 is that it determines the possible group
structures of elliptic curves over finite fields. We return to this topic in Theorem 9.8.2.
We end this section with another example of an isogeny.
178
Exercise 9.6.29. Let k be a field such that char(k) 6= 2, 3. Let E be an elliptic curve with
a subgroup of order 3 defined over k. Show that, after a suitable change of variable, one
has a point P = (0, v) such that [2]P = (0, v) and v 2 k. Show that E is k-isomorphic
to a curve of the form
2
y 2 = x3 + a16 a24 x + a6
Show that there is a k-isomorphism to a curve of the form
Y 2 = X 3 + A(X + 1)2
where A 6= 0, 27
4 .
Exercise 9.6.30. (Doche, Icart and Kohel [182]) Let k be a field such that char(k) 6= 2, 3.
Let u k be such that u 6= 0, 49 . Consider the elliptic curve E : y 2 = x3 + 3u(x + 1)2 as
in Exercise 9.6.29. Then (0, 3u) has order 3 and G = {OE , (0, 3u)} is a k-rational
subgroup of E(k). Show that
3
x + 4ux2 + 12u(x + 1)
x+2
(x, y) =
,
y
1
12u
x2
x3
e : Y 2 = X 3 u(3X 4u + 9)2 with ker() = G. Determine the
is an isogeny from E to E
dual isogeny to and show that b = [3].
9.7
dx
2y + a1 x + a3
(9.10)
on the Weierstrass equation for E, which was studied in Example 8.5.32. We showed that
Q
(div(E )) = 0 and so div(f ) = 0. It follows that Q
(E ) = cE for some c k . The
following result shows that c = 1 and so E is fixed by translation maps. This explains
why E is called the invariant differential.
Theorem 9.7.1. Let E be an elliptic curve in Weierstrass form and let E be the dif
ferential in equation (9.10). Then Q
(E ) = E for all Q E(k).
Proof: See Proposition III.5.1 of [560].
An important fact is that the action of isogenies on differentials is linear.
179
Corollary 9.7.3. Let E be an elliptic curve over k. Let m Z. Then [m] is separable
if and only if m is coprime to the characteristic of k. Let k = Fq and q be the q-power
Frobenius. Let m, n Z. Then m + nq is separable if and only if m is coprime to q.
Proof: (Sketch) Theorem 9.7.2 implies [m] (E ) = mE . So [m] maps k (E) to
{0} if and only if the characteristic of k divides m. The first part then follows from
Lemma 8.5.35. The second part follows by the same argument, using the fact that q is
inseparable and so q (E ) = 0. For full details see Corollaries III.5.3 to III.5.5 of [560].
This result has the following important consequence.
e be elliptic curves over a finite field Fq . If : E E
e is
Theorem 9.7.4. Let E and E
e
an isogeny over Fq then #E(Fq ) = #E(Fq ).
bq = q . Hence, (b
q 1) = (q 1) and so (applying Exercise 8.1.12
twice) deg(b
q 1) = deg(q 1). The result follows since #E(Fq ) = deg(q 1) and
e q ) = deg(b
#E(F
q 1).
e are elliptic curves over Fq and #E(Fq ) = #E(F
e q)
The converse (namely, if E and E
e over Fq ) is Tates isogeny theorem [597]. This can be
then there is an isogeny from E to E
proved for elliptic curves using the theory of complex multiplication (see Remark 25.3.10).
We now give a refinement of Lemma 9.6.12. This result shows that a separable isogeny
is determined by 1 (x) when char(k) 6= 2.
e be a separable
Theorem 9.7.5. Let the notation be as in Lemma 9.6.12. Let : E E
isogeny over k. Then may be expressed by a rational function in the form
(x, y) = (1 (x), cy1 (x) + 3 (x))
where 1 (x) = d1 (x)/dx is the (formal) derivative of the rational function 1 (x), where
23 (x) = e
a1 1 (x) e
a3 + (a1 x + a3 )2 (x).
(dX/(2Y + e
a1 X + e
a3 )) = 1 (x) dx/(2(y2 (x) + 3 (x)) + e
a1 1 (x) + e
a3 ).
180
It follows that 2 (x) = c1 (x) for some c k , which proves the result.
In Section 25.1.1 we will make use of Theorem 9.7.5 in the case c = 1.
Exercise 9.7.6. Let the notation be as in Theorem 9.7.5 and suppose char(k) = 2. Show
that there are only two possible values for the rational function 3 (x).
9.8
Corollary 9.6.28 showed the fundamental fact that deg([m]) = m2 and so there are at
most m2 points of order dividing m on an elliptic curve. There are several other explanations for this fact. One explanation is to consider elliptic curves over C: as a Riemann
surface they are a complex torus C/L where L is a rank 2 lattice (see Chapter 5 of Silverman [560], especially Proposition 5.4) and it follows that there are m2 points of order
m (this argument generalises immediately to Abelian varieties).
Another reason for this fact is because the group law is given by rational functions
whose denominators are essentially quadratic polynomials in each variable. For comparison, see Exercise 9.8.1 which shows that a group law given by linear polynomials only
has m points of order m.
Exercise 9.8.1. Consider the multiplicative group Gm (k) = k . The group operation
(x1 , x2 ) 7 x1 x2 is linear in each of x1 and x2 . The elements of order m are the roots of
the polynomial xm 1. Show that there are m points of order m if gcd(m, char(k)) = 1,
and if p = char(k) then there is 1 point of order p.
It follows from Corollary 9.6.28 that #E[m] m2 , and elementary group theory
implies #E[m] is therefore a divisor of m2 . Theorem 9.8.2 follows. A more precise
version of this result is Theorem 9.10.13.
Theorem 9.8.2. Let E be an elliptic curve over a finite field Fq . Then E(Fq ) is isomorphic as a group to a product of cyclic groups of order n1 and n2 such that n1 | n2 .
Proof: (Sketch) Since E(Fq ) is a finite Abelian group we apply the classification of finite
Abelian groups (e.g., Theorem II.2.1 of [300]). Then use the fact that there are at most
m2 points in E(Fq ) of order m for every m N.
Since #E[m] m2 (and, by Corollary 9.7.3, is equal to m2 when m is coprime to
the characteristic) it is natural to seek polynomials whose roots give the (affine) points of
order dividing m. We already saw such polynomials in Exercise 9.6.10 for the case m = 2
(and this gave an alternative proof that, in general, there are three points (x, y) over k of
order 2 on an elliptic curve; namely the points (x, 0) where x is a root of the polynomial
in equation (9.9)). Since [m]P = OE if and only if [m](P ) = OE one might expect to
use polynomials in k[x], but when m is even it turns out to be more convenient to have
polynomials that feature the variable y (one reason being that this leads to polynomials of
lower degree). When m is odd the polynomials will be univariate and of degree (m2 1)/2
as expected. We now determine these polynomials, first for the cases m = 3 and m = 4.
Exercise 9.8.3. Let E : y 2 = x3 + a2 x2 + a4 x + a6 be an elliptic curve over k (with
char(k) 6= 2). Show that if char(k) = 3, a2 = 0 and a4 6= 0 then there is no point (x, y)
181
of order 3. Show that if char(k) = 3 and a2 6= 0 then (x, y) has order 3 if and only if
x3 = a6 a24 /(4a2 ). Hence if char(k) = 3 then #E[3] {1, 3}.
Show that if char(k) 6= 3 then (x, y) has order 3 if and only if
3x4 + 4a2 x3 + 6a4 x2 + 12a6 x + (4a2 a6 a24 ) = 0.
Exercise 9.8.4. Let E : y 2 = x3 + a4 x + a6 be an elliptic curve over k with char(k) 6= 2.
Show that if P = (x, y) E(k) satisfies P E[4] and P 6 E[2] then [2]P is of the form
(x2 , 0) for some x2 k. Hence show that x satisfies
x6 + 5a4 x4 + 20a6 x3 5a24 x2 4a4 a6 x (a34 + 8a26 ).
We now state the polynomials whose roots give affine points of order dividing m for
the case of elliptic curves in short Weierstrass form. The corresponding polynomials for
elliptic curves over fields of characteristic 2 are given in Section 4.4.5.a of [16] and Section
III.4.2 of [65]. Division polynomials for elliptic curves in general Weierstrass form are
discussed in Section III.4 of [65].
Definition 9.8.5. Let E : y 2 = x3 + a4 x+ a6 be an elliptic curve over k with char(k) 6= 2.
The division polynomials are defined by
1 (x, y) =
2 (x, y) =
3 (x, y) =
2y
3x4 + 6a4 x2 + 12a6 x a24
4 (x, y) =
2m+1 (x, y) =
2m (x, y) =
Lemma 9.8.6. Let E be an elliptic curve in short Weierstrass form over k with char(k) 6=
2. Let m N. Then m (x, y) k[x, y]. If m is odd then m (x, y) is a polynomial in x
2
only and m (x, y) = mx(m 1)/2 + k[x]. If m is even then m (x, y) = yh(x) where
2
h(x) = mx(m 4)/2 + k[x].
Proof: The cases m = 2, 3 and 4 are clear by inspection. The rest are easily proved by
induction.
Theorem 9.8.7. Let E be an elliptic curve in short Weierstrass form over k with
char(k) 6= 2, 3. Let m N and m (x, y) as above. Then P = (xP , yP ) E(k) satisfies [m]P = OE if and only if m (xP , yP ) = 0. Furthermore, there are polynomials
Am (x) k[x] and Bm (x, y) k[x, y] such that
Am (x)
Bm (x, y)
[m](x, y) =
.
,
m (x, y)2 m (x, y)3
Proof: The first claim has already been proved for m = 3 and m = 4 in Exercises 9.8.3
and 9.8.4. The general result can be proved in various ways: Section 9.5 of Washington [622] gives a proof for elliptic curves over C and then deduces the result for general
fields of characteristic not equal to 2, Charlap and Robbins [127] give a proof (Sections
7 to 9) using considerations about divisors and functions, other sources (such as Exercise
3.7 of [560]) suggest a (tedious) verification by induction.
182
9.9
Endomorphism Structure
The aim of this section is to discuss the structure of the ring Endk (E). Note that Z
Endk (E) and that, by Lemma 9.6.11, Endk (E) is a torsion-free Z-module. For an isogeny
: E E and an integer m Z we write m for the isogeny [m] .
To understand the endomorphism rings of elliptic curves one introduces the Tate
module Tl (E). This is defined, for any prime l 6= char(k), to be the inverse limit of the
groups E[li ] (this is the same process as used to construct the p-adic (= l-adic) numbers Zl
as the inverse limit of the rings Z/li Z). More precisely, for each i N fix a pair {Pi,1 , Pi,2 }
of generators for E[li ] such that Pi1,j = [l]Pi,j for i > 1 and j {1, 2}. Via this basis
we can identify E[li ] with (Z/li Z)2 . Indeed, this is an isomorphism of (Z/li Z)-modules.
It follows that Tl (E) is a Zl -module that is isomorphic to Z2l as a Zl -module. Hence, the
set EndZl (Tl (E)) of Zl -linear maps from Tl (E) to itself is isomorphic as a Zl -module to
M2 (Zl ). We refer to Section III.7 of Silverman [560] for the details.
e gives rise to a linear map from E[li ] to E[l
e i ] for each i. Writing
An isogeny : E E
e
e
e
e
(Pi,1 ) = [a]Pi,1 + [b]Pi,2 and (Pi,2 ) = [c]Pi,1 + [d]Pi,2 (where {Pei,1 , Pei,2 } is a basis for
e i ]) we can represent as a matrix ( a b ) M2 (Z/li Z). It follows that corresponds
E[l
c d
to an element l M2 (Zl ).
Write HomZl (Tl (E1 ), Tl (E2 )) for the set of Zl -module homomorphisms from Tl (E1 ) to
Tl (E2 ). Since Tl (E) is isomorphic to M2 (Zl ) it follows that HomZl (Tl (E1 ), Tl (E2 )) is a
Zl -module of rank 4. An important result is that
Homk (E1 , E2 ) Zl HomZl (Tl (E1 ), Tl (E2 ))
is injective (Theorem III.7.4 of [560]). It follows that Homk (E1 , E2 ) is a Z-module of rank
at most 4.
The map 7 b is an involution in Endk (E) and b = [d] where d > 0. This
constrains what sort of ring Endk (E) can be (Silverman [560] Theorem III.9.3). The
result is as follows (for the definitions of orders in quadratic fields see Section A.12 and
for quaternion algebras see Vigneras [618]).
Theorem 9.9.1. Let E be an elliptic curve over a field k. Then Endk (E) is either Z, an
order in an imaginary quadratic field, or an order in a definite quaternion algebra.
Proof: See Corollary III.9.4 of [560].
When k is a finite field then the case Endk (E) = Z is impossible (see Theorem V.3.1
of [560]).
Example 9.9.2. Let E : y 2 = x3 + x over Fp where p 3 (mod 4) is prime. Then
(x, y) = (x, iy) is an isogeny where i Fp2 satisfies i2 = 1. One can verify that
2 = = [1]. One can show that #E(Fp ) = p + 1 (Exercise 9.10.5) and then
Theorem 9.10.3 implies that the Frobenius map p (x, y) = (xp , y p ) satisfies p2 = [p].
Finally, we have p (x, y) = (xp , iy p ) = p (x, y). Hence, EndFp (E) is isomorphic
to a subring of the quaternion algebra (be warned that we are recycling the symbol i here)
Q[i, j] with i2= 1, j 2 = p, ij = ji. Note that EndFp (E) is isomorphic toan order,
containing Z[ p], in the ring of integers of the imaginary quadratic field Q( p).
Every endomorphism on an elliptic curve satisfies a quadratic characteristic polynomial with integer coefficients.
Theorem 9.9.3. Let E be an elliptic curve over k and Endk (E) be a non-zero
isogeny. Let d = deg(). Then there is an integer t such that 2 t + d = 0 in Endk (E).
In other words, for all P E(k),
((P )) [t](P ) + [d]P = OE .
183
Proof: (Sketch) Choose an auxiliary prime l 6= char(k). Then acts on the Tate
module Tl (E) and so corresponds to a matrix M HomZl (Tl (E), Tl (E)). Such a matrix
has a determinant d and a trace t. The trick is to show that d = deg() and t =
1 + deg() deg(1 ) (which are standard facts for 2 2 matrices when deg is replaced
by det). These statements are independent of l. Proposition V.2.3 of Silverman [560]
gives the details (this proof uses the Weil pairing). A slightly simpler proof is given in
Lemma 24.4 of [122].
Definition 9.9.4. The integer t in Theorem 9.9.3 is called the trace of the endomorphism.
Exercise 9.9.5. Show that if Endk (E) satisfies the equation T 2 tT + d = 0 then
b
so does .
9.10
Frobenius map
We have seen that the q-power Frobenius on an elliptic curve over Fq is a non-zero isogeny
of degree q (Corollary 9.6.15) and that isogenies on elliptic curves satisfy a quadratic
characteristic polynomial. Hence there is an integer t such that
q2 tq + q = 0.
(9.11)
Definition 9.10.1. The integer t in equation (9.11) is called the trace of Frobenius.
The polynomial P (T ) = T 2 tT + q is the characteristic polynomial of Frobenius.
Note that EndFq (E) always contains the order Z[q ], which is an order of discriminant
t2 4q.
Example 9.10.2. Equation (9.11) implies
([t] q ) q = [q]
and so we have
cq = [t] q .
Theorem 9.10.3. Let E be an elliptic curve over Fq and let P (T ) be the characteristic
polynomial of Frobenius. Then #E(Fq ) = P (1).
Proof: We have E(Fq ) = ker(q 1) and, since q 1 is separable, #E(Fq ) = deg(q 1).
Now, P (1) = 1 + q t where, as noted in the proof of Theorem 9.9.3, t = 1 + deg(q )
deg(1 q ).
Exercise 9.10.4. Let p 2 (mod 3). Show that the elliptic curve E : y 2 = x3 + a6 for
a6 Fp has p + 1 points over Fp .
[Hint: re-arrange the equation.]
184
interval [q + 1 2 q, q + 1 + 2 q].
Corollary 9.10.7. Let E be an elliptic curve over Fq and let P (T ) be the characteristic
polynomial of Frobenius. Let , C be such that P (T ) = (T )(T ). Then
= q/ = and || = || = q.
Proof: It follows from the proof of Theorem 9.10.6 that if P (T ) Z[T ] has a real
root then it is a repeated root (otherwise, the quadratic form is not positive definite).
Obviously, if the root is not real then = . Since the constant coefficient of P (T ) is
q it follows that q = = = ||2 and similarly for .
185
2. m is even and t = 2 q;
u2p /u2 and w2 = u3p /u3 . One can verify that w1 , w2 Fp2 (just show that wip = wi )
and that w1p+1 = 1 and w2p+1 = 1. Finally, one has ((x, y)) = (w1 xp , w2 y p ) =
2
2
2
2
(w1p+1 xp , w2p+1 y p ) = (xp , y p ) = p2 (x, y) on E. On the other hand, by definition
2 = 1 (e
p )2 = 1 [p] = [p]
where
1 This result has been discovered by several authors. Schoof determined the group structures of supersingular elliptic curves in his thesis. The general statement was given by Tsfasman [606] in 1985,
R
uck [502] in 1987 and Voloch [619] in 1988.
186
4. if t = 0 then either the group is cyclic (i.e., all al = 0) or is Z/2Z Z/((q + 1)/2)Z
(i.e., all al = 0 except a2 = 1).
Proof: See Voloch [619] or Theorem 3 of R
uck [502] (note that it is necessary to prove that
R
ucks conditions imply those written above by considering possible divisors d | (q 1)
and d | (q t + 1) in the supersingular cases).
Exercise 9.10.14. Let q be a prime power, gcd(t, q) = 1, and N = q + 1 t a possible
value for #E(Fq ). Show that there exists an elliptic curve over Fq with N points and
which is cyclic as a group.
Another useful result, which relates group structures and properties of the endomorphism ring, is Theorem 9.10.16. Exercise 9.10.15 shows that the final condition makes
sense.
Exercise 9.10.15. Let E be an elliptic curve over Fq and let t = q + 1 #E(Fq ). Show
that if n2 | (q + 1 t) and n | (q 1) then n2 | (t2 4q).
Theorem 9.10.16. Let p be a prime, q = pm , E an elliptic curve over Fq , and t =
q + 1 #E(Fq ). Let n N be such that p n. Then E[n] E(Fq ) if and only if
9.10.1
Complex Multiplication
A lot of information about the numbers of points on elliptic curves arises from the theory
of complex multiplication. We do not have space to develop this theory in detail. Some
crucial tools are the lifting and reduction theorems of Deuring (see Sections 13.4 and
13.5 of Lang [363] or Chapter 10 of Washington [622]). We summarise some of the most
important ideas in the following theorem.
Theorem 9.10.18. Let O be an order in an imaginary quadratic field K. Then there is
a number field L containing K (called the ring class field) and an elliptic curve E over L
with EndL (E)
= O.
Let p be a rational prime that splits completely in L, and let be a prime of OL above
p (so that OL /
= Fp ). If E has good reduction modulo (this holds if does not divide
the discriminant of E), write E for the elliptic curve over Fp obtained as the reduction
2 In
fact, if gcd(q, t) 6= 1 then the condition EndFq (E) = Z[q ] never holds.
187
(9.12)
188
Exercise 9.10.22. Let p be an odd prime such that p 1 (mod 3). Then there exist
integers a, b such that p = a2 + ab + b2 (see Chapter 1 of [156] and note that p = x2 + 3y 2
implies p = (xy)2 +(xy)(2y)+(2y)2). Show that the number of points on y 2 = x3 +a6
over Fp is p + 1 t where
t {(2a + b), (2b + a), (b a)}.
Example 9.10.23. The six values a6 = 1, 2, 3, 4, 5, 6 all give distinct values for #E(F7 )
for the curve E : y 2 = x3 + a6 , namely 12, 9, 13, 3, 7, 4 respectively.
9.10.2
have determined #E(Fq ). This process will not determine #E(Fq ) uniquely if n 4 q.
Mestre suggested determining the order of points on both E(Fq ) and its quadratic twist.
1/4 ) bit operations. We
This leads to a randomised algorithm to compute #E(Fq ) in O(q
refer to Section 3 of Schoof [526] for details.
A polynomial-time algorithm to compute #E(Fq ) was given by Schoof [524, 526].
Improvements have been given by numerous authors, especially Atkin and Elkies. The
crucial idea is to use equation (9.11). Indeed, the basis of Schoofs algorithm is that if P
is a point of small prime order l then one can compute t (mod l) by solving the (easy)
discrete logarithm problem
q (q (P )) + [q]P = [t (mod l)]q (P ).
One finds a point P of order l using the division polynomials l (x, y) (in fact, Schoof
never writes down an explicit P , but rather works with a generic point of order l by
performing polynomial arithmetic modulo l (x, y)). Note that, when l is odd, l (x, y)
is a polynomial in x only. Repeating this idea for different small primes l and applying
the Chinese remainder theorem gives t. We refer to [526], Chapters VI and VII of [64],
Chapter VI of [65] and Chapter 17 of [16] for details and references.
Exercise 9.10.24. Let E : y 2 = F (x) over Fq . Show that one can determine t (mod 2)
by considering the number of roots of F (x) in Fq .
There are a number of point counting algorithms using p-adic ideas. We do not have
space to discuss these algorithms. See Chapter VI of [65] and Chapter IV of [16] for
details and references.
9.11
This section is about a particular class of elliptic curves over finite fields that have quite
different properties to the general case. For many cryptographic applications these elliptic curves are avoided, though in pairing-based cryptography they have some desirable
properties.
189
Exercise 9.11.1. Let q = pm where p is prime and let E be an elliptic curve over Fq .
Show using Exercise 9.10.9 that if #E(Fq ) 1 (mod p) then #E(Fqn ) 1 (mod p) for
all n N. Hence, show that E[p] = {OE } for such an elliptic curve.
Theorem 9.11.2. Let E be an elliptic curve over Fpm where p is prime. The following
are equivalent:
1. #E(Fpm ) = pm + 1 t where p | t;
2. E[p] = {OE };
3. EndFp (E) is not commutative (hence, by Theorem 9.9.1, it is an order in a quaternion algebra);
4. The characteristic polynomial of Frobenius P (T ) = T 2 tT + pm factors over C
with roots 1 , 2 such that i / pm are roots of unity. (Recall that a root of unity
is a complex number z such that there is some n N with z n = 1.)
Proof: The equivalence of Properties 1, 2 and 3 is shown in Theorem 3.1 of Silverman [560]. Property 4 is shown in Proposition 13.6.2 of Husem
oller [301].
Definition 9.11.3. An elliptic curve E over Fpm is supersingular if it satisfies any of
the conditions of Theorem 9.11.2. An elliptic curve is ordinary if it does not satisfy any
of the conditions of Theorem 9.11.2.
We stress that a supersingular curve is not singular as a curve.
Example 9.11.4. Let p 2 (mod 3) be prime and let a6 Fp . The elliptic curve
E : y 2 = x3 + a6 is supersingular since, by Exercise 9.10.4, it has p + 1 points. Another
way to show supersingularity for this curve is to use the endomorphism (x, y) = (3 x, y)
as in Exercise 9.6.25 (where 3 Fp2 is such that 32 + 3 + 1 = 0). Since does not
commute with the p-power Frobenius map p (specifically, p = 2 p since 3 6 Fp ) the
endomorphism ring is not commutative.
To determine the quaternion algebra one can proceed as follows. First show that
satisfies the characteristic polynomial T 2 + T + 1 = 0 (since 3 (P ) = P for all P E(Fp )).
Then consider the isogeny = [1] , which has dual b = [1] 2 . The degree d of
satisfies [d] = b = (1 )(1 2 ) = 1 2 + 1 = 3. Hence has degree 3. The trace
of is t = 1 + deg() deg(1 ) = 1 + 3 deg() = 3. One can show that ()2 = [3]
and so the quaternion algebra is Q[i, j] with i2 = 3 and j 2 = p.
Example 9.11.5. Let p 3 (mod 4) be prime and a4 Fp . Exercise 9.10.5 implies that
E : y 2 = x3 + a4 x is supersingular. An alternative proof of supersingularity follows from
Example 9.9.2; since (x, y) = (x, iy) does not commute with the p-power Frobenius.
Example 9.11.6. Let Fq be a finite field of characteristic 2 and F (x) k[x] a monic
polynomial of degree 3. Then E : y 2 + y = F (x) is supersingular. This follows from the
fact that (x, y) E(Fqn ) if and only if (x, y + 1) E(Fqn ) and hence #E(Fqn ) is odd for
all n. It follows that there are no points of order 2 on E(F2 ) and so E is supersingular.
Exercise 9.11.7. Use Waterhouses theorem to show that, for every prime p and m N,
there exists a supersingular curve over Fpm .
Br
oker [107] has given an algorithm to construct supersingular elliptic curves over
finite fields using the CM method. The method has expected polynomial-time complexity,
assuming a generalisation of the Riemann hypothesis is true.
190
characteristic polynomial of Frobenius. Then every non-square factor of q1 P (T q) divides m (T 2 ) in R[x] for some m {1, 2, 3, 4, 6}, where m (x) is the m-th cyclotomic
polynomial (see Section 6.1).
Proof: Waterhouses theorem gives the possible values for the characteristic polynomial
P (T ) = T 2 tT + q of Frobenius.
The possible values for t are 0, q, 2 q, 2q
(T / q)(T / q) = 1q P (T q).
So, write Q(T ) = P (T q)/q R[T ]. The first three values for t in the above list give
Q(T ) equal to T 2 + 1, T 2 T + 1 and T 2 2T + 1 = (T 1)2 respectively. The result
clearly holds in these cases (the condition about non-square factors is needed since
(T 1) divides 1 (T 2 ) = (T 1)(T + 1) but (T 1)2 does not divide any cyclotomic
polynomial.
(m+1)/2
We now deal
where q = 2m . Then
with the remaining two cases. Let t = 2
2
Q(T ) = T 2T + 1 and we have
2T + 1)(T 2 2T + 1) = T 4 + 1 = 4 (T 2 ).
(T 2 + 3T + 1)(T 2 3T + 1) = T 4 T 2 + 1 = 6 (T 2 ).
(T 2 +
Corollary 9.11.9. Let E be a supersingular elliptic curve over Fq . Then there is an
integer m {1, 2, 3, 4, 6} such that qm Z and the exponent of the group E(Fq ) divides
(q m 1). Furthermore, the cases m = 3, 4, 6 only occur when q is a square, a power of 2,
or a power of 3 respectively.
Exercise 9.11.10. Prove Corollary 9.11.9.
In general, the endomorphism ring of a supersingular elliptic curve is generated over
Z by the Frobenius map and some complex multiplication isogeny. However, as seen
in Example 9.10.12, the Frobenius can lie in Z, in which case two independent complex
multiplications are needed (though, as in Example 9.10.12, one of them will be very
closely related to a Frobenius map on a related elliptic curve).
It is known that the endomorphism ring Endk (E) of a supersingular elliptic curve E
over k is a maximal order in a quaternion algebra (see Theorem 4.2 of Waterhouse [623])
and that the quaternion algebra is ramified at exactly p and . Indeed, [623] shows that
when t = 2 q then all endomorphisms are defined over Fq and every maximal order
arises. In other cases not all endomorphisms are defined over Fq and the maximal order
is an order that contains q and is maximal at p (i.e., the index is not divisible by p).
We now present some results on the number of supersingular curves over finite fields.
191
2h(p)
if p 3 (mod 8)
where h(d) is the usual ideal class number of the quadratic field Q( d).
Proof: The claim that j(E) Fp2 is Theorem 3.1(iii) of [560] or Theorem 5.6 of [301].
The formula for the number of supersingular j-invariants in Fp2 is Theorem 4.1(c) of [560]
or Section 13.4 of [301]. The statement about the number of supersingular j-invariants in
Fp is given in Theorem 14.18 of Cox [156] (the supersingular case is handled on page 322).
The precise formula for H(4p) is equation (1.11) of Gross [267]. (Gross also explains
the relation between isomorphism classes of supersingular curves and Brandt matrices.)
Lemma 9.11.12. Let E1 , E2 be elliptic curves over Fq . Show that if E1 and E2 are
ordinary, #E1 (Fq ) = #E2 (Fq ) and j(E1 ) = j(E2 ) then they are isomorphic over Fq .
Proof: (Sketch) Since j(E1 ) = j(E2 ) the curves are isomorphic over Fq . If #E1 (Fq ) =
q + 1 t and E2 is not isomorphic to E1 over Fq , then E2 is a non-trivial twist of E1 . If
j(E1 ) 6= 0, 1728 then #E2 (Fq ) = q + 1 + t 6= #E1 (Fq ), since t 6= 0 (this is where we use
the fact that E1 is ordinary). In the cases j(E1 ) = 0, 1728 one needs to use the formulae
of Example 9.10.20 and Exercise 9.10.22 and show that these group orders are distinct
when t 6= 0.
An alternative proof, using less elementary methods, is given in Proposition 14.19
(page 321) of Cox [156].
Exercise 9.11.13. Give an example of supersingular curves E1 , E2 over Fp such that
j(E1 ) = j(E2 ), #E1 (Fp ) = #E2 (Fp ) and E1 is not isomorphic to E2 over Fp .
9.12
We have introduced elliptic curves using Weierstrass equations, but there are many different models and some of them have computational advantages. We present the Montgomery model and the twisted Edwards model. A mathematically important model,
which we do not discuss directly, is the intersection of two quadratic surfaces; see Section 2.5 of Washington [622] for details. It is not the purpose of this book to give an
implementation guide, so we refrain from providing the optimised addition algorithms.
Readers are advised to consult Sections 13.2 and 13.3 of [16] or the Explicit Formulas
Database [51].
9.12.1
Montgomery Model
This model, for elliptic curves over fields of odd characteristic, was introduced by Montgomery [433] in the context of efficient elliptic curve factoring using (x : z) coordinates.
192
It is a very convenient model for arithmetic in (a projective representation of) the algebraic group quotient E(k) modulo the equivalence relation P P . Versions of the
Montgomery model have been given in characteristic 2 but they are not so successful; we
refer to Stam [574] for a survey.
Definition 9.12.1. Let k be a field such that char(k) 6= 2. Let A, B k, B 6= 0. The
Montgomery model is
By 2 = x3 + Ax2 + x.
(9.13)
According to Definition 7.2.8, when B 6= 1, the Montgomery model is not an elliptic
curve. However, the theory all goes through in the more general case, and so we refer to
curves in Montgomery model as elliptic curves.
Exercise 9.12.2. Show that the Montgomery model is non-singular if and only if B(A2
4) 6= 0.
Exercise 9.12.3. Show that there is a unique point at infinity on the Montgomery model
of an elliptic curve. Show that this point is not singular, and is always k-rational.
Lemma 9.12.4. Let k be a field such that char(k) 6= 2. Let E : y 2 = x3 + a2 x2 + a4 x + a6
be an elliptic curve over k in Weierstrass form. There is an isomorphism over k from E
to a Montgomery model if and only if F (x) = x3 + a2 x2 + a4 x + a6 has a root xP k
such that (3x2P + 2a2 xP + a4 ) is a square in k. This isomorphism maps OE to the point
at infinity on the Montgomery model and is a group homomorphism.
Proof: Let P = (xP , 0) E(k). First move P to (0, 0) by the change of variable
X = x xP . The map (x, y) 7 (x xP , y) is an isomorphism topy 2 = X 3 + a2 X 2 + a4 X
where a2 = 3xP + a2 and a4 = 3x2P + 2a2 xP + a4 . Let w = a4 , which lies in k by
the assumption of the Lemma. Consider the isomorphism (X, y) 7 (U, V ) = (X/w, y/w)
that maps to
(1/w)V 2 = U 3 + (a2 /w)U 2 + U.
Taking A = a2 /w, B = 1/w k gives the result.
Conversely, suppose By 2 = x3 + Ax2 + x is a Montgomery model of an elliptic curve
over k. Multiplying though by B 3 gives (B 2 y)2 = (Bx)3 + AB(Bx)2 + B 2 (Bx) and so
(U, V ) = (Bx, B 2 y) satisfies the Weierstrass equation V 2 = U 3 + ABU 2 + B 2 U . Taking
a2 = AB, a4 = B 2 and a6 = 0 one can check that the conditions in the statement of the
Lemma hold (the polynomial F (x) has the root 0, and a4 = B 2 is a square).
The maps extend to the projective curves and map (0 : 1 : 0) to (0 : 1 : 0). The fact
that they are group homomorphisms follows from a generalisation of Theorem 9.2.1.
When the conditions of Lemma 9.12.4 hold we say that the elliptic curve E can be
written in Montgomery model. Throughout this section, when we refer to an elliptic
curve E in Montgomery model, we assume that E is specified by an affine equation as in
equation (9.13).
Lemma 9.12.5. Let P1 = (x1 , y2 ), P2 = (x2 , y2 ) be points on the elliptic curve By 2 =
x3 + Ax2 + x such that x1 6= x2 and x1 x2 6= 0. Then P1 + P2 = (x3 , y3 ) where
x3 = B(x2 y1 x1 y2 )2 /(x1 x2 (x2 x1 )2 ).
Writing P1 P2 = (x4 , y4 ) one finds
x3 x4 = (x1 x2 1)2 /(x1 x2 )2 .
For the case P2 = P1 we have [2](x1 , y1 ) = (x3 , y3 ) where
x3 = (x21 1)2 /(4x1 (x21 + Ax1 + 1)).
193
=
=
=
=
=
=
=
x1
x2
(x1 x2 (x1 x2 ) + (x2 x1 ))2
from which we deduce that x3 x4 (x2 x1 )2 = (x1 x2 1)2 . In the case P1 = P2 we have
x3 4By12 = (3x21 +2Ax1 +1)2 (A+2x1 )4By12 , which implies 4x1 x3 (x21 +Ax1 +1) = (x21 1)2 .
In other words, one can compute the x-coordinate of [2]P using only the x-coordinate
of P . Similarly, given the x-coordinates of P1 , P2 and P1 P2 (i.e., x1 , x2 and x4 ) one can
compute the x-coordinate of P1 + P2 . The next exercise shows how to do this projectively.
Exercise 9.12.6. Let P = (xP , yP ) E(Fq ) be a point on an elliptic curve given in a
Montgomery model. Define X1 = xP , Z1 = 1, X2 = (X12 1)2 , Z2 = 4x1 (x21 + Ax1 + 1).
Given (Xn , Zn ), (Xm , Zm ), (Xmn , Zmn ) define
Xn+m
Zn+m
Zmn (Xn Xm Zn Zm )2
Xmn (Xn Zm Xm Zn )2
and
X2n
Z2n
= (Xn2 Zn2 )2
= 4Xn Zn (Xn2 + AXn Zn + Zn2 ).
194
195
9.12.2
Edwards Model
Euler and Gauss considered the genus 1 curve x2 + y 2 = 1 x2 y 2 and described a group
operation on its points. Edwards generalised this to a wide class of elliptic curves (we
refer to [189] for details and historical discussion). Further extensions were proposed by
Bernstein, Birkner, Joye, Lange, and Peters (see [48] and its references). Edwards curves
have several important features: they give a complete group law on E(Fq ) for some fields
Fq (in other words, there is a single rational map + : E E E that computes addition
for all4 possible inputs in E(Fq ) E(Fq )) and the addition formulae can be implemented
extremely efficiently in some cases. Hence this model for elliptic curves is very useful for
many cryptographic applications.
Definition 9.12.14. Let k be a field such that char(k) 6= 2. Let a, d k satisfy a 6=
0, d 6= 0, a 6= d. The twisted Edwards model is
ax2 + y 2 = 1 + dx2 y 2 .
Exercise 9.12.15. Show that a curve in twisted Edwards model is non-singular as an
affine curve. Show that if any of the conditions a 6= 0, d 6= 0 and a 6= d are not satisfied
then the affine curve has a singular point.
Bernstein, Lange and Farashahi [55] have also formulated an Edwards model for elliptic
curves in characteristic 2.
The Weierstrass model of an elliptic curve over k (where char(k) 6= 2) is of the form
y 2 = F (x) and it would be natural to write the twisted Edwards model in the form
y 2 = (1 ax2 )/(1 dx2 ). A natural formulation of the group law would be such that
the
inverse of a point (x, y) is (x, y). This leads to having identity element (x, y) = (1/ a, 0).
Instead, for historical reasons, it is traditional to think of the curve as
x2 = (1 y 2 )/(a dy 2 ).
The identity element is then (0, 1) and the inverse of (x, y) is (x, y).
The group operation on twisted Edwards models is
x1 y2 + x2 y1 y1 y2 ax1 x2
,
.
(x1 , y1 ) + (x2 , y2 ) =
1 + dx1 x2 y1 y2 1 dx1 x2 y1 y2
(9.14)
This is shown to be a group law in [52, 48]. A geometric description of the Edwards group
law on the singular curve is given by Ar`ene, Lange, Naehrig and Ritzenthaler [12]. An
inversion-free (i.e., projective) version and explicit formulae for efficient arithmetic are
given in [48].
4 Note that this is a stronger statement than the unified group law of Exercise 9.1.1 as the group law on
(twisted) Edwards curve also includes addition of a point with its inverse or the identity element. Also,
the group law on (twisted) Edwards curves achieves this with no loss of efficiency, unlike Exercise 9.1.1.
On the other hand, we should mention that the group law on (twisted) Edwards curves is never complete
for the group E(Fq ).
196
(1 : d/a : 0 : 0) and (1 : 0 : d : 0)
with X3 = 0. To see that the points at infinity on X are non-singular, set X0 = 1 and
obtain the Jacobian matrix
2aX1 2X2 2X3
,
X2
X1
1
p
which is seen to have rank 2 when evaluated at the points ( d/a, 0, 0) and (0, d, 0).
Let (X0 : X1 : X2 : X3 ) and (Z0 : Z1 : Z2 : Z3 ) be points on X and define the values
197
0 or ax2
y2 6= 0 then one can deduce that d is a square
Hence, if either ax2 + y2 6=
in k . On the other hand, if ax2 + y2 = ax2 y2 = 0 one deduces that x2 = 0. Both
cases are a contradiction.
It turns out that twisted Edwards curves and Montgomery curves cover exactly the
same k-isomorphism classes of elliptic curves.
Lemma 9.12.20. Let M : By 2 = x3 + Ax2 + x be a Montgomery model for an elliptic
curve over k (so B 6= 0 and A2 6= 4). Define a = (A + 2)/B and d = (A 2)/B. Then
a 6= 0, d 6= 0 and a 6= d. The map (x, y) 7 (X = x/y, Y = (x 1)/(x + 1)) is a birational
map over k from M to the twisted Edwards curve
E : aX 2 + Y 2 = 1 + dX 2 Y 2 .
Conversely, if E is as above then define A = 2(a + d)/(a d) and B = 4/(a d). Then
(X, Y ) 7 (x = (1 + Y )/(1 Y ), y = (1 + Y )/(X(1 Y ))) is a birational map over k from
E to M .
Exercise 9.12.21. Prove Lemma 9.12.20.
The birational map in Lemma 9.12.20 is a group homomorphism. Indeed, the proofs
of the group law in [52, 49] use this birational map to transfer the group law from the
Montgomery model to the twisted Edwards model.
Exercise 9.12.22. Show that the birational map from Montgomery model to twisted
Edwards model
p in Lemma 9.12.20 is undefined only for points P of order dividing 2 and
P = (1, (A 2)/B) (which has order 4). Show that the map from Edwards model
to Montgomery model is undefined only for points P = (0, 1) and points at infinity.
Exercise 9.12.23. Show that a non-trivial quadratic twist of the twisted Edwards model
ax2 + y 2 = 1 + dx2 y 2 over k is aux2 + y 2 = 1 + dux2 y 2 where u k is a non-square.
Exercise 9.12.24. Show that if an elliptic curve E can be written in twisted Edwards
model then the only non-trivial twist of E that can also be written in twisted Edwards
model is the quadratic twist.
Example 9.12.25. The curve
x2 + y 2 = 1 x2 y 2
has
(x, y) = (ix, 1/y) (which fixes the identity point (0, 1)) for i =
an automorphism
1. One has 2 = 1. Hence this curve corresponds to a twist of the Weierstrass curve
y 2 = x3 + x having j-invariant 1728.
Example 9.12.26. Elliptic curves with
j-invariant 0) can
CM by D = 3 (equivalently,
only be written in Edwards model if 3 Fq . Taking d = ( 3 + 2)/( 3 2) gives the
Edwards curve
E : x2 + y 2 = 1 + dx2 y 2 ,
198
(X, Y ) = (3 X + (1 3 )/ 3, Y ).
Then we apply 1 (X, Y ) = (X/Y, (X 1)/(X + 1)).
9.12.3
Exercises 9.12.27 and 9.12.29 give some details of the Jacobi quartic model.
Exercise 9.12.27. Let k be a field of characteristic not equal to 2 and let a, d k be
such that a2 6= d. Show that the algebraic set
C : y 2 = dx4 + 2ax2 + 1
(9.15)
is irreducible. Show that the point at infinity on C is singular and that the affine curve
is non-singular over k. Verify that the map (x, y) = (X, Y ) = (a + (y + 1)/x2 , X/x) is a
birational map over k from C to
E : 2Y 2 = X X 2 2aX + (a2 d) .
(9.16)
Show that if d is a square then E(k) contains E[2].
9.13
There are a number of questions, relevant for cryptography, about the set of all elliptic
curves over Fq .
The theory of complex multiplication states that if |t| < 2 q and gcd(t, q) = 1
then the number of isomorphism classes of elliptic curves E over Fq with #E(Fq ) =
q + 1 t is given by the Hurwitz class number H(t2 4q). Theorem 9.11.11 gave a
similar result for the supersingular case. As noted in Section 9.10.1, this means that
the number of Fq -isomorphism classes p
of elliptic curves over Fq with q + 1 t points is
O(D log(D) log(log(D))), where D = 4q t2 . We now give Lenstras bounds on the
number of Fq -isomorphism classes of elliptic curves with group orders in a subset of the
Hasse interval.
Since the number of elliptic curves in short Weierstrass form (assuming now that
2 q) that are Fq -isomorphic to a given curve E is (q 1)/#Aut(E), it is traditional
199
to count the number of Fq -isomorphism classes weighted by #Aut(E) (see Section 1.4 of
Lenstra [374] for discussion and precise definitions). In other words, each Fq -isomorphism
class of elliptic curves with j(E) = 0 or j(E) = 1728 contributes less than one to the total.
This makes essentially no difference to the asymptotic statements in Theorem 9.13.1. The
weighted sum of all Fp -isomorphism classes of elliptic curves over Fp is p.
Theorem 9.13.1. (Proposition 1.9 of Lenstra [374] with the improvement of Theorem 2
of McKee [410]) There exists a constant C1 R>0 such that, for any prime p > 3 and
weighted sum of all elliptic curves E over Fp such that l | #E(Fp ) is p/(l 1) + O(l p)
2
if p 6 1 (mod l) and pl/(l 1) + O(l p) if p 1 (mod l). (Here the constants in the O
are independent of l and p.)
This result was generalised by Howe [294] to count curves with N | #E(Fq ) where N
is not prime.
For cryptography it is important to determine the probability that a randomly chosen
elliptic curve over Fq (i.e., choosing coefficients a4 , a6 Fq uniformly at random) is prime.
A conjectural result was given by Galbraith and McKee [224].
Conjecture 9.13.3. Let P1 be the probability that a number within 2 p of p+1 is prime.
Then the probability that an elliptic curve over Fp (p prime) has a prime number of points
is asymptotic to cp P1 as p , where
Y
2Y
1
1
cp =
1
+
.
1
3
(l 1)2
(l + 1)(l 2)
l>2
l|(p1),l>2
Here the products are over all primes l satisfying the stated conditions.
Galbraith and McKee also give a precise conjecture for the probability that a random
elliptic curve E over Fp has #E(Fp ) = kr where r is prime and k N is small.
Related problems have also been considered. For example, Koblitz [342] studies the
probability that #E(Fp ) is prime for a fixed elliptic curve E over Q as p varies. A similar
situation arises in the Sato-Tate distribution; namely the distribution on [1, 1] arising
from (#E(Fp ) (p + 1))/(2 p) for a fixed elliptic curve E over Q as p varies. We refer to
Murty and Shparlinski [444] for a survey of other results in this area (including discussion
of the Lang-Trotter conjecture).
9.14
The elliptic curve factoring method (and some other theoretical applications in crypQ
tography) use elliptic curves over the ring Z/N Z. When N = ki=1 pi is square-free5
one can use the Chinese remainder theorem to interpret a triple (x, y, z) such that
5 The
200
y 2 z + a1 xyz + a3 yz 2 x3 + a2 x2 z + a4 xz 2 + a6 z 3 (mod N ) as an element of the direct sum ki=1 E(Fpi ) of groups of elliptic curves over fields. It is essential to use the
projective representation, since there can be points that are the point at infinity modulo
p1 but not the point at infinity modulo p2 (in other words, p1 | z but p2 z). Considering
triples (x, y, z) such that gcd(x, y, z) = 1 (otherwise, the point modulo some prime is
(0, 0, 0)) up to multiplication by elements in (Z/N Z) leads to a projective elliptic curve
point in E(Z/N Z). The usual formulae for the group operations can be used modulo N
and, when they are defined, give a group law. We refer to Section 2.11 of Washington [622]
for a detailed discussion, including a set of formulae for all cases of the group law. For a
more theoretical discussion we refer to Lenstra [374, 375].
Chapter 10
Hyperelliptic Curves
This is a chapter from version 1.1 of the book Mathematics of Public Key Cryptography
by Steven Galbraith, available from http://www.isg.rhul.ac.uk/sdg/crypto-book/ The
copyright for this chapter is held by Steven Galbraith.
This book is now completed and an edited version of it will be published by Cambridge
University Press in early 2012. Some of the Theorem/Lemma/Exercise numbers may be
different in the published version.
Please send an email to S.Galbraith@math.auckland.ac.nz if you find any mistakes.
All feedback on the book is very welcome and will be acknowledged.
Hyperelliptic curves are a natural generalisation of elliptic curves, and it was suggested
by Koblitz [343] that they might be useful for public key cryptography. Note that there
is not a group law on the points of a hyperelliptic curve; instead we use the divisor
class group of the curve. The main goals of this chapter are to explain the geometry of
hyperelliptic curves, to describe Cantors algorithm [118] (and variants) to compute in
the divisor class group of hyperelliptic curves, and then to state some basic properties of
the divisor class group.
Definition 10.0.1. Let k be a perfect field. Let H(x), F (x) k[x] (we stress that
H(x) and F (x) are not assumed to be monic). An affine algebraic set of the form C :
y 2 + H(x)y = F (x) is called a hyperelliptic equation. The hyperelliptic involution
: C C is defined by (x, y) = (x, y H(x)).
Exercise 10.0.2. Let C be a hyperelliptic equation over k. Show that if P C(k) then
(P ) C(k).
When the projective closure (in an appropriate space) of the algebraic set C in Definition 10.0.1 is irreducible, dimension 1, non-singular and of genus g 2 then we will call it
a hyperelliptic curve. By definition, a curve is projective and non-singular. We will give
conditions for when a hyperelliptic equation is non-singular. Exercise 10.1.20 will give a
projective non-singular model, but in practice one can work with the affine hyperelliptic
equation. To see the points at infinity we will move them to points on a related affine
equation, namely the curve C of equation (10.3).
The classical definition of a hyperelliptic curve (over an algebraically closed field k)
is that it is a non-singular projective irreducible curve C over k (usually of genus g 2)
with a degree 2 rational map : C P1 over k. This is equivalent to C having an
(affine) equation of the form y 2 + H(x)y = F (x) over k. When C is defined over a
non-algebraically closed field k then the existence of a rational map : C P1 over k
201
202
does not imply the existence of such a map over k, and so C might not have an equation
over k of this form. This subtlety does not arise when working over finite fields (to show
this, combine Theorem 10.7.4 with the Riemann-Roch theorem), hence we will define
hyperelliptic curves using a generalisation of the Weierstrass equation.
The genus has already been defined (see Definition 8.4.8) as a measure of the complexity of a curve. The treatment of the genus in this chapter is very explicit. We
will give precise conditions (Lemmas 10.1.6 and 10.1.8) that explain when the degree of
a hyperelliptic equation is minimal. From this minimal degree we define the genus. In
contrast, the approach of most other authors is to use the Riemann-Roch theorem.
We remark that one can also consider the algebraic group quotient Pic0Fq (C)/[1] of
equivalence classes {D, D} where D is a reduced divisor. For genus 2 curves this object
can be described as a variety, called the Kummer surface. It is beyond the scope of this
book to give the details of this case. We refer to Chapter 3 of Cassels and Flynn [123] for
background. Gaudry [243] and Gaudry and Lubicz [246] have given fast algorithms for
computing with this algebraic group quotient.
10.1
Consider the singular points on the affine curve C(x, y) = y 2 + H(x)y F (x) = 0. The
partial derivatives are C(x, y)/y = 2y + H(x) and C(x, y)/x = H (x)y F (x),
so a singular point in particular satisfies 2F (x) + H(x)H (x) = 0. If H(x) = 0 and if
the characteristic of k is not 2 then C is non-singular over k if and only if F (x) has no
repeated root in k.
Exercise 10.1.1. Show that the curve y 2 + H(x)y = F (x) over k has no affine singular
points if and only if one of the following conditions hold.
1. char(k) = 2 and H(x) is a non-zero constant.
2. char(k) = 2, H(x) is a non-zero polynomial and gcd(H(x), F (x)2 F (x)H (x)2 ) =
1.
3. char(k) 6= 2, H(x) = 0 and gcd(F (x), F (x)) = 1.
4. char(k) 6= 2, H(x) 6= 0 and gcd(H(x)2 + 4F (x), 2F (x) + H(x)H (x)) = 1 (this
applies even when H(x) = 0 or H (x) = 0).
We will now give a simple condition for when a hyperelliptic equation is geometrically
irreducible and of dimension 1.
Lemma 10.1.2. Let C(x, y) = y 2 + H(x)y F (x) over k be a hyperelliptic equation.
Suppose that deg(F (x)) is odd. Suppose also that there is no point P = (xP , yP )
C(k) such that (C(x, y)/x)(P ) = (C(x, y)/y)(P ) = 0. Then the affine algebraic set
V (C(x, y)) is geometrically irreducible. The dimension of V (C(x, y)) is 1.
Proof: From Theorem 5.3.10, C(x, y) = 0 is k-reducible if and only if C(x, y) factors over
k[x, y]. By considering C(x, y) as an element of k(x)[y] it follows that such a factorisation
must be of the form C(x, y) = (y a(x))(y b(x)) with a(x), b(x) k[x]. Since deg(F )
is odd it follows that deg(a(x)) 6= deg(b(x)) and that at least one of a(x) and b(x) is
non-constant. Hence a(x) b(x) is a non-constant polynomial, so let xP k be a root of
a(x) b(x) and set yP = a(xP ) = b(xP ) so that (xP , yP ) C(k). It is then easy to check
that both partial derivatives vanish at P . Hence, under the conditions of the Lemma,
V (C(x, y)) is k-irreducible and so is an affine variety.
203
Now that V (C(x, y)) is known to be a variety we can consider the dimension. The
function field of the affine variety is k(x)(y), which is a quadratic algebraic extension of
k(x) and so has transcendence degree 1. Hence the dimension of is 1.
The proof of Lemma 10.1.2 shows that a hyperelliptic equation y 2 + H(x)y F (x)
corresponds to a geometrically irreducible curve as long as it does not factorise as (y
a(x))(y b(x)) over k[x]. In practice it is not hard to determine whether or not there
exist polynomials a(x), b(x) k[x] such that H(x) = a(x) b(x) and F (x) = a(x)b(x).
So determining if a hyperelliptic equation is geometrically irreducible is easy.
Let H(x), F (x) k[x] be such that y 2 + H(x)y = F (x) is a non-singular affine curve.
Define D = max{deg(F (x)), deg(H(x)) + 1}. The projective closure of C in P2 is given
by
y 2 z D2 + z D1 H(x/z)y = z D F (x/z).
(10.1)
Exercise 10.1.3. Show that if D > 2 then there are at most two points at infinity on
the curve of equation (10.1). Show further that if D > 3 and deg(F ) > deg(H) + 1 then
there is a unique point (0 : 1 : 0) at infinity, which is a singular point.
In Definition 10.1.15 we will define the genus of a hyperelliptic curve in terms of the
degree of the hyperelliptic equation. To do this it will be necessary to have conditions
that ensure that this degree is minimal. Example 10.1.4 and Exercise 10.1.5 show how a
hyperelliptic equation that is a variety can be isomorphic to an equation of significantly
lower degree (remember that isomorphism is only defined for varieties).
Example 10.1.4. The curve y 2 + xy = x200 + x101 + x3 + 1 over F2 (which is irreducible
and non-singular) is isomorphic over F2 to the curve Y 2 + xY = x3 + 1 via the map
(x, y) 7 (x, Y + x100 ).
Exercise 10.1.5. Let k be any field. Show that the affine algebraic variety y 2 + (1
2x3 )y = x6 + x3 + x + 1 is isomorphic to a variety having an equation of total degree 2.
Show that the resulting curve has genus 0.
Lemma 10.1.6. Let k be a perfect field of characteristic 2 and h(x), f (x) k[x]. Suppose
the hyperelliptic equation C : y 2 + h(x)y = f (x) is a variety. Then it is isomorphic over
k to Y 2 + H(x)Y = F (x) where one of the following conditions hold:
1. deg(F (x)) > 2 deg(H(x)) and deg(F (x)) is odd;
2. deg(F (x)) = 2 deg(H(x)) = 2d and the equation u2 + Hd u + F2d has no solution in
k (where H(x) = Hd xd + Hd1 xd1 + + H0 and F (x) = F2d x2d + + F0 );
3. deg(F (x)) < deg(H(x)).
Proof: Let dH = deg(H(x)) and dF = deg(F (x)). The change of variables y = Y + cxi
transforms y 2 + H(x)y = F (x) to Y 2 + H(x)Y = F (x) + c2 x2i + H(x)cxi . Hence,
if deg(F (x)) > 2 deg(H(x)) and deg(F (x)) is
even then one can remove the leading
coefficient by choosing i = deg(F (x))/2 and c = F2i (remember that char(k) = 2 and k
is perfect so c k). Similarly, if deg(H(x)) j = deg(F (x)) < 2 deg(H(x)) then one can
remove the leading coefficient Fj xi from F by taking i = j deg(H(x)) and c = Fj /HdH .
Repeating these processes yields the first and third claims. The second case follows easily.
Note that in the second case in Lemma 10.1.6 one can lower the degree using a kisomorphism. Hence, geometrically (i.e., over k) one can assume that a hyperelliptic
equation is of the form of case 1 or 3.
204
Hd d 2
2 x )
+ (Hd xd + + H0 )(Y
Hd d
2 x )
10.1.1
For the rest of the chapter we will assume that our hyperelliptic equations are k-irreducible
and non-singular as affine algebraic sets. We also assume that when char(k) = 2 one of
the conditions of Lemma 10.1.6 holds and when char(k) 6= 2 one of the conditions of
Lemma 10.1.8 holds. The interpretation of deg(H(x)) and deg(F (x)) in terms of the
genus of the curve will be discussed in Section 10.1.3.
There are several ways to write down a non-singular projective model for a hyperelliptic curve. The simplest is to use weighted projective space.
Definition 10.1.10. Let k be a perfect field and H(x), F (x) k[x]. Let C : y 2 +
H(x)y = F (x) be a hyperelliptic equation. Write Hj for the coefficients of H(x) and
Fj for the coefficients of F (x). Define dH = deg(H(x)) and dF = deg(F (x)). Let
d = max{dH , dF /2} and suppose d > 0. Set Hd = = HdH +1 = 0 and F2d = =
FdF +1 = 0 if necessary.
The weighted projective hyperelliptic equation is the equation
Y 2 +(Hd X d +Hd1 X d1 Z + +H0 Z d )Y = F2d X 2d +F2d1 X 2d1 Z + +F0 Z 2d (10.2)
in weighted projective space where X and Z have weight 1 and Y has weight d.
Points (x, y) on the affine equation correspond to points (x : y : 1) on the weighted
projective equation. If the original affine algebraic set is non-singular then the corresponding points on the weighted projective model are also non-singular (since singularity
is a local property). The map on C extends to (X : Y : Z) = (X : Y H(X, Z) : Z)
where H(X, Z) is the degree d homogenisation of H(x). Points with Z = 0 correspond to
the points at infinity. Lemma 10.1.11 shows that there are at most two points at infinity
on this equation and that they are not singular on this equation.
205
Lemma 10.1.11. The points at infinity on equation (10.2) are of the form (1 : :
0) where k satisfies 2 + Hd F2d = 0. If the conditions of Lemma 10.1.6 or
Lemma 10.1.8 hold as appropriate, then the points at infinity are non-singular.
Proof: Let Z = 0. If X = 0 then Y = 0 (which is not a projective point) so we may
assume that X = 1. The points at infinity are therefore as claimed.
To study non-singularity, make the problem affine by setting X = 1. The equation is
C : Y 2 + (Hd + Hd1 Z + H0 Z d )Y = F2d + F2d1 Z + F0 Z d .
(10.3)
206
Definition 10.1.15. Let k be a perfect field. Let H(x), F (x) k[x] be such that:
deg(H(x)) 3 or deg(F (x)) 5;
the affine hyperelliptic equation y 2 +H(x)y = F (x) is k-irreducible and non-singular;
the conditions of Lemma 10.1.6 and Lemma 10.1.8 hold.
The non-singular projective curve of equation (10.2) is called a hyperelliptic curve.
The genus of the hyperelliptic curve is g = max{deg(H(x)) 1, deg(F (x)) 1)/2} (see
Section 10.1.3 for justification of this).
It looks like Definition 10.1.15 excludes some potentially interesting equations (such
as y 2 + H(x)y = F (x) where deg(F (x)) = 4 and deg(H(x)) = 2). In fact, it can be shown
that all the algebraic sets excluded by the definition are either k-reducible, singular over
k, or birational over k to a curve of genus 0 or 1 over k.
The equation 2 + Hd F2d = 0 in Lemma 10.1.11 can have a k-rational repeated
root, two roots in k, or two conjugate roots in k. It follows that there are three possible
behaviours at infinity: a single k-rational point, two distinct k-rational points, a pair
of distinct points defined over a quadratic extension of k (which are Galois conjugates).
These three cases correspond to the fact that the place at infinity in k[x] is ramified, split
or inert respectively in the field extension k(C)/k(x). A natural terminology for the three
types of behaviour at infinity is therefore to call them ramified, split and inert.
Definition 10.1.16. Let C be a hyperelliptic curve as in Definition 10.1.15. We denote
the points at infinity on the associated hyperelliptic curve by + = (1 : + : 0) and
= (1 : : 0) (when there is only one point, set = + = = (1 : : 0)). If
there is a single point at infinity then equation (10.2) is called a ramified model of a
hyperelliptic curve. If there are two distinct points at infinity then when + , k
equation (10.2) is called a split model of a hyperelliptic curve and when + , 6 k
it is an inert model of a hyperelliptic curve.
One finds in the literature the names imaginary hyperelliptic curve (respectively, real hyperelliptic curve) for ramified model and split model respectively. Exercise 10.1.18 classifies ramified hyperelliptic models. Exercise 10.1.19 shows that if
C(k) 6= then one may transform C into a ramified or split model. Hence, when working
over finite fields, it is not usually necessary to deal with curves having an inert model.
Exercise 10.1.17. With notation as in Definition 10.1.16 show that (+ ) = .
Exercise 10.1.18. Let C : y 2 + H(x)y = F (x) be a hyperelliptic curve over k satisfying
all the conditions above. Let d = max{deg(H(x)), deg(F (x))/2}. Show that this is a
ramified model if and only if (deg(H(x)) < d and deg(F (x)) = 2d 1) or (char(k) 6= 2,
deg(F (x)) = 2d, deg(H(x)) = d and F2d = (Hd /2)2 ).
Exercise 10.1.19. Let C : y 2 + H(x)y = F (x) be a hyperelliptic curve over k and let
P C(k). Define the rational map
P (x, y) = (1/(x xP ), y/(x xP )d ).
Then P : C C where C is also a hyperelliptic curve. Show that P is just the
translation map P 7 (0, yP ) followed by the map and so is an isomorphism from C to
C.
Show that if P = (P ) then C is birational over k (using P ) to a hyperelliptic curve
with ramified model. Show that if P 6= (P ) then C is birational over k to a hyperelliptic
curve with split model.
207
=
=
=
1. Give a birational map (assuming for the moment that the above model is a variety)
between the affine algebraic set C and the model of equation (10.4).
2. Show that the hyperelliptic involution extends to equation (10.4) as
(Y : Xd : : X0 ) = (Y Hd Xd Hd1 Xd1 H0 X0 : Xd : : X0 )
3. Show that the points at infinity on equation (10.4) satisfy X0 = X1 = X2 = =
Xd1 = 0 and Y 2 + Hd Xd Y F2d Xd2 = 0. Show that if F2d = Hd = 0 then there is
a single point at infinity.
4. Show that if the conditions of Lemma 10.1.6 or Lemma 10.1.8 hold then equation (10.4) is non-singular at infinity.
5. Show that equation (10.4) is a variety.
10.1.2
The aim of this section is to determine uniformizers for all points on hyperelliptic curves.
We begin in Lemma 10.1.21 by determining uniformizers for the affine points of a hyperelliptic curve.
Lemma 10.1.21. Let P = (xP , yP ) C(k) be a point on a hyperelliptic curve. If
P = (P ) then (y yP ) is a uniformizer at P (and vP (x xP ) = 2). If P 6= (P ) then
(x xP ) is a uniformizer at P .
Proof: We have
(y yP )(y + yP + H(xP )) =
=
Now, use the general fact for any polynomial that F (x) = F (xP )+(xxP )F (xP ) (mod (x
xP )2 ). Hence, the above expression is congruent modulo (x xP )2 to
(x xP )(F (xP ) yH (xP )) (mod (x xP )2 ).
When P = (P ) then (y yP )(y + (yP + H(xP ))) = (y yP )2 . Note also that F (xP )
yP H (xP ) is not zero since 2yP + H(xP ) = 0 and yet C is not singular. Writing G(x, y) =
(y yP )2 /(x xP ) k[x, y] we have G(xP , yP ) 6= 0 and
x xP = (y yP )2
1
.
G(x, y)
208
10.1.3
209
In Lemma 10.1.6 and Lemma 10.1.8 we showed that some hyperelliptic equations y 2 +
h(x)y = f (x) are birational to hyperelliptic equations y 2 +H(x)y = F (x) with deg(F (x)) <
deg(f (x)) or deg(H(x)) < deg(h(x)). Hence, it is natural to suppose that the geometry
of the curve C imposes a lower bound on the degrees of the polynomials H(x) and F (x)
in its curve equation. The right measure of the complexity of the geometry is the genus.
Indeed, the Riemann-Roch theorem implies that if C is a hyperelliptic curve over k
of genus g and there is a function x k(C) of degree 2 then C is birational over k to
an equation of the form y 2 + H(x)y = F (x) with deg(H(x)) g + 1 and deg(F (x))
2g + 2. Furthermore, the Hurwitz genus formula shows that if y 2 + H(x)y = F (x) is
non-singular and with degrees reduced as in Lemma 10.1.6 and Lemma 10.1.8 then the
genus is max{deg(H(x)) 1, deg(F (x))/2 1}. (Theorem 8.7.3, as it is stated, cannot
be applied for hyperelliptic curves in characteristic 2, but a more general version of the
Hurwitz genus formula proves the above statement about the genus.) Hence, writing
d = g + 1, the conditions of Lemma 10.1.6 and Lemma 10.1.8 together with
deg(H(x)) = d
or
2d 1 deg(F (x)) 2d
(10.5)
10.2
We consider maps between hyperelliptic curves in this section. We are generally interested
in isomorphisms over k rather than just k.
In the elliptic curve case (see Section 9.3) there was no loss of generality by assuming
that isomorphisms fix infinity (since any isomorphism can be composed with a translation
map). Since the points on a hyperelliptic curve do not, in general, form a group, one can
no longer make this assumption. Nevertheless, many researchers have restricted attention
to the special case of maps between curves that map points at infinity (with respect to an
affine model of the domain curve) to points at infinity on the image curve. Theorem 10.2.1
classifies this special case.
In this chapter, and in the literature as a whole, isomorphisms are usually not assumed
to fix infinity. For example, the isomorphism P defined earlier in Exercise 10.1.19 does
not fix infinity. Isomorphisms that map points at infinity to points at infinity map ramified
models to ramified models and unramified models to unramified models.
Theorem 10.2.1. Let C1 : y12 + H1 (x1 )y1 = F1 (x1 ) and C2 : y22 + H2 (x2 )y2 = F2 (x2 ) be
hyperelliptic curves over k of genus g. Then every isomorphism : C1 C2 over k that
maps points at infinity of C1 to points at infinity of C2 is of the form
(x1 , y1 ) = (ux1 + r, wy1 + t(x1 ))
where u, w, r k and t k[x1 ]. If C1 and C2 have ramified models then deg(t) g. If
C1 and C2 have split or inert models then deg(t) g + 1 and the leading coefficient of
210
t(x1 ) is not equal to the leading coefficient of wG+ (x1 ) or wG (x1 ) (where G+ and
G are as in Exercise 10.1.28).
Proof: (Sketch) The proof is essentially the same as the proof of Proposition 3.1(b) of
Silverman [560]; one can also find the ramified case in Proposition 1.2 of Lockhart [389].
One notes that the valuations at infinity of x1 and x2 have to agree, and similarly for y1
and y2 . It follows that x2 lies in same Riemann-Roch spaces as x1 and similarly for y2
and y1 . The result follows (the final conditions are simply that the valuations at infinity
of y1 and y2 must agree, so we are prohibited from setting y2 = w(y1 + t(x)) such that it
lowers the valuation of y2 ).
We now introduce quadratic twists in the special case of finite fields. As mentioned
in Example 9.5.2, when working in characteristic zero there are infinitely many quadratic
twists.
Definition 10.2.2. Let C : y 2 = F (x) be a hyperelliptic curve over a finite field k where
char(k) 6= 2. Let u k be a non-square (i.e., there is no v k such that u = v 2 ) and
define C (u) : y 2 = uF (x).
Let C : y 2 +H(x)y = F (x) be a hyperelliptic curve over a finite field k where char(k) =
2. Let u k be such that Trk/F2 (u) = 1. Define C (u) : y 2 + H(x)y = F (x) + uH(x)2 .
In both cases the k-isomorphism class of the curve C (u) is called the non-trivial
quadratic twist of C.
Exercise 10.2.3. Show that the quadratic twist is well-defined when k is a finite field.
In other words, show that in the case char(k) 6= 2 if u and u are two different non-squares
in k then the corresponding curves C (u) and C (u ) as in Definition 10.2.2 are isomorphic
over k. Similarly, when chark = 2 and for two different choices of trace one elements
u, u k, show that the corresponding curves C (u) and C (u ) are isomorphic over k.
Exercise 10.2.4. Let C be a hyperelliptic curve over a finite field k and let C (u) be a
non-trivial quadratic twist. Show that #C(Fq ) + #C (u) (Fq ) = 2(q + 1).
Exercise 10.2.5. Let C : y 2 = F (x) be a hyperelliptic curve of genus g over k (where
char(k) 6= 2). Show that C is isomorphic over k to a curve of the form
Y 2 = X(X 1)(X a1 )(X a2 ) (X a2g1 )
for some a1 , a2 , . . . , a2g1 k.
Exercise 10.2.5 indicates that one generally needs 2g1 values to specify a hyperelliptic
curve of genus g (in fancy terminology: the moduli space of genus g hyperelliptic curves
has dimension 2g 1). It is natural to seek an analogue of the j-invariant for hyperelliptic
curves (i.e., some parameters j1 , . . . , j2g1 associated with each curve C such that C1 is
isomorphic over k to C2 if and only if the corresponding values j1 , . . . , j2g1 are equal).
Such values have been given by Igusa in the case of genus 2 curves and Shioda [546] for
genus 3 curves. It is beyond the scope of this book to present and explain them. We refer
to Igusa [303] and Section 5.1.6 of [16] for details of the Igusa invariants.
A natural problem (analogous to Exercise 9.3.7 for the case of elliptic curves) is to write
down a genus 2 curve corresponding to a given triple of values for the Igusa invariants.
Mestre [417] has given an algorithm to do this for curves over finite fields1 (for details
also see Section 7 of Weng [624]).
We now consider automorphisms. Define Aut(C) to be the set of all isomorphisms
: C C over k. As usual, Aut(C) is a group under composition.
1 It
211
?
P1
- C
- P1
Proof: The result follows from Theorem III.7.3 of Farkas and Kra [199] (which uses the
notion of Weierstrass points) and Corollaries 2 and 3 on page 102 of [199].
Exercise 10.2.7. Prove Lemma 10.2.6 in the special case of automorphisms that map
points at infinity to points at infinity. Show that, in this case, has no denominator.
Example 10.2.8. Let p > 2 be a prime and C : y 2 = xp x over Fp . For a Fp , b Fp
one has isomorphisms
and b, (x, y) = (x + b, y)
a (x, y) = (ax, ay)
from C to itself (in both cases they fix the point at infinity). Hence, the subgroup of
Aut(C) consisting of maps that fix infinity is a group of at least 2p(p 1) elements.
There is also the birational map (x, y) = (1/x, y/x(p+1)/2 ) that corresponds to an
isomorphism : C C on the projective curve. This morphism does not fix infinity.
Since all the compositions b , b, a are distinct one has 2p2 (p 1) isomorphisms
of this form. Hence, Aut(C) has size at least 2p(p 1) + 2p2 (p 1) = 2p(p + 1)(p 1).
Exercise 10.2.9. Let p > 2 be a prime and C : y 2 = xp x + 1 over Fp . Show that the
subgroup of Aut(C) consisting of automorphisms that fix infinity has order 2p.
Exercise 10.2.10. Let p > 2 be a prime and C : y 2 = xn + 1 over Fp with n 6= p
(when n = p the equation is singular). Show that the subgroup of Aut(C) consisting of
automorphisms that fix infinity has order 2n.
We now give the important Hurwitz-Roquette theorem, which bounds the size of the
automorphism group.
Theorem 10.2.11. (Hurwitz-Roquette) Let C be a curve of genus g over a field k such
that char(k) > g + 1 and such that C is not isomorphic to the curve of Example 10.2.8.
Then #Aut(C) 84(g 1).
Proof: The case char(k) = 0 is Exercise IV.2.5 of Hartshorne [277] and the general case
is due to Roquette [498].
Stichtenoth [583] has given the bound #Aut(C) 16g 4 , which applies even when
char(k) g + 1 for all curves C except the Hermitian curve y q + y = xq+1 .
We refer to Chapter 2 of Gaudrys thesis [242] for a classification of Aut(C) when the
genus is two. There are many challenges to determining/classifying Aut(C) for hyperelliptic curves; we do not attempt a complete analysis of the literature.
Exercise 10.2.12. Let p 1 (mod 8) and let C : y 2 = x5 + Ax over Fp . Write 8 Fp
for a primitive 8-th root of unity. Show that 8 Fp4 . Show that (x, y) = (82 x, 8 y) is
an automorphism of C. Show that 4 = .
212
10.3
This section is about how to represent effective divisors on affine hyperelliptic curves,
and algorithms to compute with them. A convenient way to represent divisors is using
Mumford representation, and this is only possible if the divisor is semi-reduced.
Definition 10.3.1. Let C be a hyperelliptic curve over k and denote by C A2 the affine
curve. An effective affine divisor on C is
X
nP (P )
D=
P (CA2 )(k)
10.3.1
213
l
Y
i=1
(x xi )ei k[x].
Then there is a unique polynomial v(x) k[x] such that deg(v(x)) < deg(u(x)), v(xi ) = yi
for all 1 i l, and
v(x)2 + H(x)v(x) F (x) 0 (mod u(x)).
(10.6)
214
Exercise 10.3.8. Let u(x), v(x) k[x] be such that equation (10.6) holds. Let D be the
corresponding semi-reduced divisor. Show that
X
min{vP (u(x)), vP (y v(x))}(P ).
D=
P (CA2 )(k)
This is called the greatest common divisor of div(u(x)) and div(y v(x)) and is
denoted div(u(x), y v(x)).
Exercise 10.3.9. Let (u1 (x), v1 (x)) and (u2 (x), v2 (x)) be the Mumford representations of
two semi-reduced divisors D1 and D2 . Show that if gcd(u1 (x), u2 (x)) = 1 then Supp(D1 )
Supp(D2 ) = .
Lemma 10.3.10. Let C be a hyperelliptic curve over k and let D be a semi-reduced
divisor on C with Mumford representation (u(x), v(x)). Let Gal(k/k).
1. (D) is semi-reduced.
2. The Mumford representation of (D) is ((u(x)), (v(x))).
3. D is defined over k if and only if u(x), v(x) k[x].
Exercise 10.3.11. Prove Lemma 10.3.10.
Exercise 10.3.8 shows that the Mumford representation of a semi-reduced divisor D is
natural from the point of view of principal divisors. This explains why condition (10.6)
is the natural definition for the Mumford representation. There are two other ways to
understand condition (10.6). First, the divisor D corresponds to an ideal in the ideal
class group of the affine coordinate ring k[x, y] and condition (10.6) shows this ideal
is equal to the k[x, y]-ideal (u(x), y v(x)). Second, from a purely algorithmic point
of view, condition (10.6) is needed to make the Cantor reduction algorithm work (see
Section 10.3.3).
A divisor class contains infinitely many divisors whose affine part is semi-reduced.
Later we will define a reduced divisor to be one whose degree is sufficiently small. One
can then consider whether there is a unique such representative of the divisor class. This
issue will be considered in Lemma 10.3.24 below.
Exercise 10.3.12 is relevant for the index calculus algorithms on hyperelliptic curves
and it is convenient to place it here.
Exercise 10.3.12. A semi-reduced divisor D defined over k with Mumford representation
(u(x), v(x)) is said to be a prime divisor if the polynomial u(x) is irreducible over k.
Show that if D is not a prime divisor, then D can be efficiently expressed
Q as a csum
of prime divisors by factoring u(x). More precisely, show
that
if
u(x)
=
ui (x) i is
P
the complete factorization of u(x) over k, then D =
ci div(ui (x), y vi (x)) where
vi (x) = v(x) mod ui (x).
10.3.2
generalisation of Cantors algorithm to all hyperelliptic curves was given by Koblitz [343].
215
1
2yP + H(xP )
and
where G(x) = (H(x) H(xP ) H (xP )(x xP ))/(x xP )2 . In other words, we have
H(x) = H(xP ) + (x xP )H (xP ) + (x xP )2 G(x). Furthermore, note that
v(x) s1 (x)(x xP )yP + s3 (x)(yP2 + F (x)) (mod (x xP )2 ).
The core of Cantors addition and semi-reduction algorithm is to decide which functions (xxP ) are needed (and to which powers) to obtain a semi-reduced divisor equivalent
to D1 + D2 . The crucial observation is that if P is in the support of D1 and (P ) is in
the support of D2 then (x xP ) | u1 (x), (x xP ) | u2 (x) and v1 (xP ) = v2 (xP ) H(xP )
and so (xxP ) | (v1 (x)+v2 (x)+H(x)). The exact formulae are given in Theorem 10.3.14.
The process is called Cantors addition algorithm or Cantors composition algorithm.
Theorem 10.3.14. Let (u1 (x), v1 (x)) and (u2 (x), v2 (x)) be Mumford representations of
two semi-reduced divisors D1 and D2 . Let s(x) = gcd(u1 (x), u2 (x), v1 (x) + v2 (x) + H(x))
and let s1 (x), s2 (x), s3 (x) k[x] be such that
s(x) = s1 (x)u1 (x) + s2 (x)u2 (x) + s3 (x)(v1 (x) + v2 (x) + H(x)).
Define u3 (x) = u1 (x)u2 (x)/s(x)2 and
v3 (x) = (s1 (x)u1 (x)v2 (x) + s2 (x)u2 (x)v1 (x) + s3 (x)(v1 (x)v2 (x) + F (x)))/s(x). (10.7)
Then u3 (x), v3 (x) k[x] and the Mumford representation of the semi-reduced divisor D
equivalent to D1 + D2 is (u3 (x), v3 (x)).
216
(10.8)
(10.9)
(10.10)
10.3.3
217
Suppose we have an affine effective divisor D with Mumford representation (u(x), v(x)).
We wish to obtain an equivalent divisor (affine and effective) whose Mumford representation has deg(u(x)) of low degree. We will show in Theorem 10.3.21 and Lemma 10.4.6
that one can ensure deg(u(x)) g, where g is the genus; we will call such divisors reduced.
The idea is to consider
u (x) = monic (v(x)2 + H(x)v(x) F (x))/u(x) , v (x) = v(x)H(x) (mod u (x))
(10.11)
where monic u0 + u1 x + + uk xk for uk 6= 0 is defined to be (u0 /uk ) + (u1 /uk )x +
+ xk . Obtaining (u (x), v (x)) from (u(x), v(x)) is a Cantor reduction step. This
operation appears in the classical reduction theory of binary quadratic forms.
Lemma 10.3.17. Let D be an affine effective divisor on a hyperelliptic curve C with
Mumford representation (u(x), v(x)). Define (u (x), v (x)) as in equation (10.11). Then
(u (x), v (x)) is the Mumford representation of a semi-reduced divisor D and D D
on C A2 .
Proof: One checks that (u (x), v (x)) satisfies condition (10.6) and so there is an associated semi-reduced divisor D .
Write D = (P1 ) + + (Pn ) (where the same point can appear more than once).
Then div(y v(x)) A2 = (P1 ) + + (Pn ) + (Pn+1 ) + + (Pn+m ) for some points
Pn+1 , . . . , Pn+m (not necessarily distinct from the earlier n points, or from each other)
and div(v(x)2 + H(x)v(x) F (x)) A2 = div((y v(x))(y H(x) v(x))) A2 =
(P1 ) + ((P1 )) + + (Pn+m ) + ((Pn+m )). Now, div(u (x)) = (Pn+1 ) + ((Pn+1 )) +
+ (Pn+m ) + ((Pn+m )). It follows that D = ((Pn+1 )) + + ((Pn+m )) and that
D = D + div(y v(x)) A2 div(u (x)) A2 .
Example 10.3.18. Consider
C : y 2 = F (x) = x5 + 2x4 8x3 + 10x2 + 40x + 1
over Q. Let P1 = (4, 1), P2 = (2, 5), P3 = (0, 1) and D = (P1 ) + (P2 ) + (P3 ). The
Mumford representation of D is (u(x), v(x)) = (x(x + 2)(x + 4), x2 4x + 1), which is
easily checked by noting that v(xPi ) = yPi for 1 i 3.
To reduce D one sets u (x) = monic (v(x)2 F (x))/u(x) = monic(x2 + 5x 6) =
(x 3)(x 2) and v (x) = v(x) (mod u (x)) = 9x 7.
One can check that div(y v(x)) = (P1 ) + (P2 ) + (P3 ) + (P4 ) + (P5 ) where P4 =
(2, 11) and P5 = (3, 20), that div(u (x)) = (P4 ) + ((P4 )) + (P5 ) + ((P5 )) and that
D div(u (x), y v (x)) A2 = ((P4 )) + ((P5 )). See Figure 10.1 for an illustration.
Exercise 10.3.19. Show that the straight lines l(x, y) and v(x) in the elliptic curve
addition law (Definition 7.9.1) correspond to the polynomials y v(x) and u (x) (beware
of the double meaning of v(x) here) in a Cantor reduction step.
Lemma 10.3.20. Let C : y 2 + H(x)y = F (x) and let (u(x), v(x)) be the Mumford
representation of a semi-reduced divisor D. Write dH = deg(H(x)), dF = deg(F (x)), du =
deg(u(x)) and dv = deg(v(x)). Let d = max{dH , dF /2}. Let (u (x), v (x)) be the
polynomials arising from a Cantor reduction step.
1. If dv d then deg(u (x)) du 2.
2. If dF 2d 1 and du d > dv then deg(u (x)) d 1 (this holds even if dH = d).
218
(P5 )
(P4 )
P2
P1
P3
P4
P5
219
is notable that performing a Cantor reduction step on a divisor of degree d in this case
usually yields another divisor of degree d. This phenomena will be discussed in detail in
Section 10.4.2.
We now consider the uniqueness of the reduced divisor of Theorem 10.3.21. Lemma 10.3.24
below shows that non-uniqueness can only arise with split or inert models. It follows that
there is a unique reduced divisor in every divisor class for hyperelliptic curves with ramified model. For hyperelliptic curves with split or inert model there is not necessarily a
unique reduced divisor.
Lemma 10.3.24. Let y 2 + H(x)y = F (x) be a hyperelliptic curve over k of genus g. Let
dH = deg(H(x)) and dF = deg(F (x)). Let D1 and D2 be semi-reduced divisors of degree
at most g. Assume that D1 6= D2 but D1 D2 . Then dF = 2g + 2 or dH = g + 1.
Proof: First note that dH g + 1 and dF 2g + 2. Let D3 = D1 + (D2 ) so that
D3 D1 D2 0 as an affine divisor. Let D3 be the semi-reduced divisor equivalent
to D3 (i.e., by removing all occurences (P ) + ((P ))). Note that the degree of D3 is at
most 2g and that D3 6= 0. Since D3 0 and D3 is an effective affine divisor we have
D3 = div(G(x, y)) on C A2 for some non-zero polynomial G(x, y). Without loss of
generality G(x, y) = a(x) b(x)y. Furthermore, b(x) 6= 0 (since div(a(x)) is not semireduced for any non-constant polynomial a(x)).
Exercise 10.1.26 shows that the degree of div(a(x) b(x)y) on C A2 is the degree
of a(x)2 + H(x)a(x)b(x) F (x)b(x)2 . We need this degree to be at most 2g. This is
easily achieved if dF 2g (in which case dH = g + 1 for the curve to have genus g).
However, if 2g + 1 dF 2g + 2 then we need either deg(a(x)2 ) = deg(F (x)b(x)2 ) or
deg(H(x)a(x)b(x)) = deg(F (x)b(x)2 ). The former case is only possible if dF is even (i.e.,
dF = 2g + 2). If dF = 2g + 1 and dH g then the latter case implies deg(a(x))
g + 1 + deg(b(x)) and so deg(a(x)2 ) > deg(F (x)b(x)2 ) and deg(G(x, y)) > 2g.
For hyperelliptic curves of fixed (small) genus it is possible to give explicit formulae
for the general cases of the composition and reduction algorithms. For genus 2 curves
this was done by Harley [276] (the basic idea is to formally solve for u (x) such that
u (x)u(x) = monic(v(x)2 + H(x)v(x) F (x)) as in equation (10.11)). For extensive
discussion and details (and also for non-affine coordinate systems for efficient hyperelliptic
arithmetic) we refer to Sections 14.4, 14.5 and 14.6 of [16].
10.4
We now show how Cantors addition and reduction algorithms for divisors on the affine
curve can be used to perform arithmetic in the divisor class group of the projective
curve. A first remark is that Lemma 10.3.3 implies that every degree zero divisor class
on a hyperelliptic curve has a representative of the form D + n+ (+ ) + n ( ) where
D is a semi-reduced (hence, affine and effective) divisor and n+ , n Z (necessarily,
deg(D) + n+ + n = 0).
10.4.1
On a hyperelliptic curve with ramified model there is only a single point at infinity. We
will show in this section that, for such curves, one can compute in the divisor class group
using only affine divisors.
We use the Cantor algorithms for addition, semi-reduction, and reduction. In general,
if one has a semi-reduced divisor D then, by case 1 of Lemma 10.3.20, a reduction step
reduces the degree of D by 2. Hence, at most deg(D)/2 reduction steps are possible.
220
Theorem 10.4.1. Let C be a hyperelliptic curve with ramified model. Then every degree
0 divisor class on C has a unique representative of the form D n() where D is semireduced and where 0 n g.
Proof: Theorem 10.3.21 showed that every affine divisor is equivalent to a semi-reduced
divisor D such that 0 deg(D) g. This corresponds to the degree zero divisor Dn()
where n = deg(D). Uniqueness was proved in Lemma 10.3.24.
A degree zero divisor of the form Dn() where D is a semi-reduced divisor of degree
n and 0 n g is called reduced. We represent D using Mumford representation as
(u(x), v(x)) and we know that the polynomials u(x) and v(x) are unique. The divisor
class is defined over k if and only if the corresponding polynomials u(x), v(x) k[x].
Addition of divisors is performed using Cantors composition and reduction algorithms
as above.
Exercise 10.4.2. Let C : y 2 + H(x)y = F (x) be a ramified model of a hyperelliptic
curve over Fq . Show that the inverse (also called the negative) of a divisor class on C
represented as (u(x), v(x)) is (u(x), v(x) (H(x) (mod u(x)))).
Exercise 10.4.3. Let C be a hyperelliptic curve over k of genus g with ramified model.
Let D1 and D2 be reduced divisors on C. Show that one can compute a reduced divisor
representing D1 + D2 in O(g 3 ) operations in k. Show that one can compute [n]D1 in
O(log(n)g 3 ) operations in k (here [n]D1 means the n-fold addition D1 + D1 + + D1 ).
When the genus is 2 (i.e., d = 3) and one adds two reduced divisors (i.e., effective
divisors of degree 2) then the sum is an effective divisor of degree at most 4 and so only
one reduction operation is needed to compute the reduced divisor. Similarly, for curves
of any genus, at most one reduction operation is needed to compute a reduced divisor
equivalent to D + (P ) where D is a reduced divisor (such ideas were used by Katagi,
Akishita, Kitamura and Takagi [330, 329] to speed up cryptosystems using hyperelliptic
curves).
For larger genus there are several variants of the divisor reduction algorithm. In Section 4 of [118], Cantor gives a method that uses higher degree polynomials than y v(x)
and requires fewer reduction steps. In Section VII.2.1 of [65], Gaudry presents a reduction algorithm, essentially due to Lagrange, that is useful when g 3. The NUCOMP
algorithm (originally proposed by Shanks in the number field setting) is another useful
alternative. We refer to Jacobson and van der Poorten [320] and Section VII.2.2 of [65]
for details. It seems that NUCOMP should be used once the genus of the curve exceeds
10 (and possibly even for g 7).
Exercise 10.4.4. Let C be a hyperelliptic curve of genus 2 over a field k with a ramified
model. Show that every k-rational divisor class has a unique representative of one of the
following four forms:
1. (P ) () where P C(k), including P = . Here u(x) = (x xP ) or u(x) = 1.
2. 2(P ) 2() where P C(k) excluding points P such that P = (P ). Here
u(x) = (x xP )2 .
3. (P ) + (Q) 2() where P, Q C(k) are such that P, Q 6= , P 6= Q, P 6= (Q).
Here u(x) = (x xP )(x xQ ).
4. (P ) + ((P )) 2() where P C(K) C(k) for any quadratic field extension K/k,
Gal(K/k) = hi and (P ) 6 {P, (P )}. Here u(x) is an irreducible quadratic in
k[x].
221
Exercise 10.4.5 can come in handy when computing pairings on hyperelliptic curves.
Exercise 10.4.5. Let D1 = div(u1 (x), y v1 (x))A2 and D2 = div(u2 (x), y v2 (x))A2
be semi-reduced divisors on a hyperelliptic curve with ramified model over k. Write
d1 = deg(u1 (x)) and d2 = deg(u2 (x)). Let D3 = div(u3 (x), y v3 (x)) A2 be a semireduced divisor of degree d3 such that D3 d3 () D1 d1 () + D2 d2 (). Show
that if d2 = d3 then D1 d1 () D3 D2 .
10.4.2
This section is rather detailed and can safely be ignored by most readers. It presents
results of Paulus and R
uck [476] and Galbraith, Harrison and Mireles [218].
Let C be a hyperelliptic curve of genus g over k with a split model. We have already observed that every degree zero divisor class has a representative of the form
D + n+ (+ ) + n ( ) where D is semi-reduced and n+ , n Z. Lemma 10.3.20
has shown that we may assume 0 deg(D) g + 1. One could consider the divisor to
be reduced if this is the case, but this would not be optimal.
The Riemann-Roch theorem implies we should be able to take deg(D) g but Cantor
reduction becomes stuck if the input divisor has degree g +1. The following simple trick
allows us to reduce to semi-reduced divisors of degree g (and this essentially completes the
proof of the Riemann-Roch theorem for these curves). Recall the polynomial G+ (x) of
degree d = g + 1 from Exercise 10.1.28.
Lemma 10.4.6. Let y 2 + H(x)y = F (x) be a hyperelliptic curve of genus g over k with
split model. Let u(x), v(x) be a Mumford representation such that deg(u(x)) = g + 1.
Define
v (x) = G+ (x) + (v(x) G+ (x) (mod u(x))) k[x],
where we mean that v(x)G+ (x) is reduced to a polynomial of degree at most deg(u(x))
1 = g. Define
2
v (x) + H(x)v (x) F (x)
l
X
ei (xi , v (yi ))
i=1
and that div(u (x)) = div(u (x), y v (x)) + div(u (x), y + v (x) + H(x)).
222
2
u (x) = monic (v (x) F (x))/u(x) = x2 + 5x + 2 and v (x) = 3x + 5. The divisor
div(u (x), y v (x)) A2 is a sum (P ) + ((P )) where P C(F72 ) C(F7 ) and is the
non-trivial element of Gal(F72 /F7 ).
The operation (u(x), v(x)) 7 (u (x), v (x)) of equation (10.12) is called composition and reduction at infinity; the motivation for this is given in equation (10.18)
below. Some authors call it a baby step. This operation can be performed even when
deg(u(x)) < d, and we analyse it in the general case in Lemma 10.4.14.
Exercise 10.4.8. Let the notation be as in Lemma 10.4.6. Let du = deg(u(x)) so that
v (x) agrees with G+ (x) for the leading d du + 1 coefficients and so m = deg(v (x)2 +
H(x)v (x) F (x)) d + du 1. Let du = deg(u (x)) so that m = du + du . Show that
v (y v (x)) = d, div(y v (x)) =
div(u(x), y v(x)) A2 + div(u (x), y + H(x) + v (x)) A2 (du + du d)(+ ) d( ),
(10.14)
and v+ (y v (x)) = (du + du d).
We now discuss how to represent divisor classes. An obvious choice is to represent
classes as D d(+ ) where D is an affine effective divisor of degree d (see Paulus and
R
uck [476] for a full discussion of this case). A more natural representation, as pointed
out by Galbraith, Harrison and Mireles [218], is to use balanced representations at infinity.
In other words, when g is even, to represent divisor classes as D (g/2)((+ ) + ( ))
where D is an effective divisor of degree g.
Definition 10.4.9. Let C be a hyperelliptic curve of genus g over k in split model. If g
+
is even then define D = g2 ((+ ) + ( )). If g is odd then define D = (g+1)
2 ( ) +
(g1)
2 ( ).
Let u(x), v(x) k[x] be the Mumford representation of a semi-reduced divisor D =
div(u(x), y v(x))A2 and n Z. Then div(u(x), v(x), n) denotes the degree zero divisor
D + n(+ ) + (g deg(u(x)) n)( ) D .
If 0 deg(u(x)) g and 0 n g deg(u(x)) then such a divisor is called reduced.
Uniqueness of this representation is shown in Theorem 10.4.19. When g is odd then
one could also represent divisor classes using D = (g + 1)/2((+ ) + ( )). This is
applicable in the inert case too. A problem is that this would lead to polynomials of higher
degree than necessary in the Mumford representation, and divisor class representatives
would no longer necessarily be unique.
It is important to realise that u(x) and v(x) are only used to specify the affine divisor.
The values of v+ (y v(x)) and v (y v(x)) have no direct influence over the degree
zero divisor under consideration. Note also that we allow n Z in Definition 10.4.9 in
general, but reduced divisors must have n Z0 .
For hyperelliptic curves with split model then + , k and so a divisor (u(x), v(x), n)
is defined over k if and only if u(x), v(x) k[x]. Note that when the genus is even then
223
D is k-rational even when the model is inert, though in this case a divisor (u(x), v(x), n)
with n 6= 0 is not defined over k if u(x), v(x) k[x].
We may now consider Cantors addition algorithm in this setting.
Lemma 10.4.10. Let C be a hyperelliptic curve over k of genus g with split model. Let
div(u1 (x), v1 (x), n1 ) and div(u2 (x), v2 (x), n2 ) be degree zero divisors as above. Write Di =
div(ui (x), y vi (x)) A2 for i = 1, 2 and let D3 = div(u3 (x), y v3 (x)) A2 be the semireduced divisor equivalent to D1 + D2 , and s(x) such that D1 + D2 = D3 + div(s(x)) A2 .
Let m = g/2 when g is even and m = (g + 1)/2 otherwise. Then
div(u1 , v1 , n1 ) + div(u2 , v2 , n2 ) div(u3 , v3 , n1 + n2 + deg(s) m).
(10.15)
(10.17)
224
(10.18)
and so the operation corresponds to addition of D with the degree zero divisor ( )
(+ ). This justifies the name composition at infinity. To add (+ )( ) one should
use G (x) instead of G+ (x) in Lemma 10.4.6.
Exercise 10.4.15. Prove Lemma 10.4.14.
We can finally put everything together and obtain the main result about reduced
divisors on hyperelliptic curves with split model.
Theorem 10.4.16. Let C be a hyperelliptic curve over k of genus g with split model.
Then every divisor class contains a reduced divisor as in Definition 10.4.9.
Proof: We have shown the existence of a divisor in the divisor class with semi-reduced
affine part, and hence of the form (u(x), v(x), n) with n Z. Cantor reduction and
composition and reduction at infinity show that we can assume deg(u(x)) g. Finally,
to show that one may assume 0 n g deg(u(x)) note that Lemma 10.4.14 maps n to
n = n + (g + 1) deg(u(x)). Hence, if n > g deg(u(x)) then n > n 0 and continuing
the process gives a reduced divisor. On the other hand, if n < 0 then using G (x) instead
one has n = n + g + 1 deg(u (x)) g deg(u (x)).
Exercise 10.4.17. Let C : y 2 + H(x)y = F (x) be a hyperelliptic curve of genus g over Fq
in split model. If g is even, show that the inverse of div(u(x), v(x), n) is div(u(x), v(x)
(H(x) (mod u(x))), g deg(u(x)) n). If g is odd then show that computing the inverse
of a divisor may require performing composition and reduction at infinity.
225
226
10.5
As mentioned in Section 7.8, we can consider Pic0k (C) as an algebraic group, by considering the Jacobian variety JC of the curve. The fact that the divisor class group is
an algebraic group is not immediate from our description of the group operation as an
algorithm (rather than a formula).
Indeed, JC is an Abelian variety (namely, a projective algebraic group). The dimension
of the variety JC is equal to the genus of C. Unfortunately, we do not have space to
introduce the theory of Abelian varieties and Jacobians in this book. We remark that
the Mumford representation directly gives an affine part of the Jacobian variety of a
hyperelliptic curve (see Propositions 1.2 and 1.3 of Mumford [442] for the details).
An explicit description of the Jacobian variety of a curve of genus 2 has been given by
Flynn; we refer to Chapter 2 of Cassels and Flynn [123] for details, references and further
discussion.
There are several important concepts in the theory of Abelian varieties that are not
able to be expressed in terms of divisor class groups.4 Hence, our treatment of hyperelliptic curves will not be as extensive as the case of elliptic curves. In particular, we
do not give a rigorous discussion of isogenies (i.e., morphisms of varieties that are group
homomorphisms with finite kernel) for Abelian varieties of dimension g > 1. However,
we do mention one important result. The Poincare reducibility theorem (see Theorem 1
of Section 19 (page 173) of Mumford [441]) states that if A is an Abelian variety over k
and B is an Abelian subvariety of A (i.e., B is a subset of A that is an Abelian variety
over k), then there is an Abelian subvariety B A over k such that B B is finite and
B + B = A. It follows that A is isogenous over k to B B . If an Abelian variety A
over k has no Abelian subvarieties over k then we call it simple. An Abelian variety is
absolutely simple if is has no Abelian subvarieties over k.
Despite not discussing isogenies in full generality, it is possible to discuss isogenies
that arise from maps between curves purely in terms of divisor class groups. We now give
some examples, but first introduce a natural notation.
Definition 10.5.1. Let C be a curve over a field k and let n N. For D Pic0k (C)
define
[n]D = D + + D (n times).
Indeed, we usually assume that [n]D is a reduced divisor representing the divisor class
nD. Define
Pic0k (C)[n] = {D Pic0k (C) : [n]D = 0}.
Recall from Corollary 8.3.10 that if : C1 C2 is a non-constant rational map (and
hence a non-constant morphism) over k between two curves then there are corresponding group homomorphisms : Pic0k (C2 ) Pic0k (C1 ) and : Pic0k (C1 ) Pic0k (C2 ).
Furthermore, by part 5 of Theorem 8.3.8 we have (D) = [deg()]D on Pic0k (C2 ).
4 There are two reasons for this: first the divisor class group is merely an abstract group and so does
not have the geometric structure necessary for some of these concepts; second, not every Abelian variety
is a Jacobian variety.
227
2 : C E2 : Y 2 = X 3 + 2X 2 + 1
given by 2 (x, y) = (1/x2 , y/x3 ). The two elliptic curves E1 and E2 are neither isomorphic
or isogenous. One has #E1 (F11 ) = 16, #E2 (F11 ) = 14 and #Pic0F11 (C) = 14 16.
It can be shown (this is not trivial) that ker(1, ) ker(2, ) is finite. Further, since
deg(1 ) = deg(2 ) = 2 it can be shown that the kernel of 1, 2, is contained in
Pick0 (C)[2].
The Jacobian of a curve satisfies the following universal property. Let : C A
be a morphism, where A is an Abelian variety. Let P0 C(k) be such that (P0 ) = 0
and consider the Abel-Jacobi map : C JC (corresponding to P 7 (P ) (P0 )).
Then there is a homomorphism of Abelian varieties : JC A such that = .
Exercise 10.5.4 gives a special case of this universal property.
Exercise 10.5.4. Let C : y 2 = x6 + a2 x4 + a4 x2 + a6 over k, where char(k) 6= 2, and let
(x, y) = (x2 , y) be non-constant rational map : C E over k where E is an elliptic
curve. Let P0 C(k) be such that (P0 ) = OE . Show that the composition
C(k) Pick0 (C) E(k),
where the first map is the Abel-Jacobi map P 7 (P ) (P0 ) and the second map is , is
just the original map .
Exercise 10.5.5. Let a3 , a5 k, where char(k) 6= 2. This exercise gives maps over k
from the genus 2 curve C : y 2 = x5 + a3 x3 + a5 x to elliptic curves.
Choose , k such that a5 = 2 2 and a3 = (2 + 2 ). In other words,
x4 + a3 x2 + a5 = (x2 2 )(x2 2 ) = (x )(x + )(x )(x + ).
228
Now, set Y = y/(x s)3 and X = ((x + s)/(x s))2 . Show that
Y 2 = (X 1)/(4s)(AX + B)(BX + A) =
AB
(X 1)(X 2 + (B/A + A/B)X + 1).
4s
Calling the above curve E1 , the rational map 1 (x, y) = (X, Y ) maps C to E1 . Similarly,
taking Y = y/(x + s)3 and X = ((x s)/(x + s))2 gives an elliptic curve E2 : Y 2 =
(X 1)/(4s)(BX + A)(AX + B) and a rational map 2 : C E2 . Note that E2 is a
quadratic twist of E1 .
There is a vast literature on split Jacobians and we are unable to give a full survey.
We refer to Sections 4, 5 and 6 of Kuhn [353] or Chapter 14 of Cassels and Flynn [123]
for further examples.
10.6
Elements of Order n
We now bound the size of the set of elements of order dividing n in the divisor class group
of a curve. As with many other results in this chapter, the best approach is via the theory
of Abelian varieties. We state Theorem 10.6.1 for general curves, but without proof. The
result is immediate for Abelian varieties over C, as they are isomorphic to Cg /L where
L is a rank 2g lattice. The elements of order n in Cg /L are given by the n2g points in
1
n L/L.
Theorem 10.6.1. Let C be a curve of genus g over k and let n N. If char(k) = 0 or
gcd(n, char(k)) = 1 then #Pick0 (C)[n] = n2g . If char(k) = p > 0 then #Pic0k (C)[p] = pe
where 0 e g.
Proof: See Theorem 4 of Section 7 of Mumford [441].
We now present a special case of Theorem 10.6.1 (at least, giving a lower bound),
namely elements of order 2 in the divisor class group of a hyperelliptic curve with a
ramified model.
Lemma 10.6.2. Let k be a field such that char(k) 6= 2. Let F (x) k[x] be a monic
polynomial of degree 2g + 1 and let C : y 2 = F (x). Let d be the number of roots of F (x)
over k. Let Bk = {(x1 , 0), . . . , (xd , 0)} where x1 , . . . , xd k are the roots of F (x) in k.
If d 6= 2g + 1 then Bk generates a subgroup of Pic0k (C) of exponent 2 and order 2d . If
d = 2g + 1 then Bk generates a subgroup of Pic0k (C) of exponent 2 and order 22g .
Exercise 10.6.3. Prove Lemma 10.6.2 via the following method.
1. First, consider Bk = {P1 , . . . , P2g+1 } and for any subset T {1, . . . , 2g + 1} define
X
DT =
(Pj ) #T ().
jT
229
Lemma 10.6.2 describes 22g divisor classes over k of order dividing 2. Since Theorem 10.6.1 states that there are exactly 22g such divisor classes over k it follows that
every 2-torsion divisor class has a representative of this form. A corollary is that any function f k(C) with divisor div(f ) = 2(P1 ) + 2(P2 ) 4() is equal to c(x x1 )(x x2 )
for some c, x1 , x2 k.
A determination of the 2-torsion in Jacobians of hyperelliptic curves over finite fields
is given by Cornelissen; see [146] and its erratum.
Division Ideals
In some applications (particularly when generalising Schoofs algorithm for point counting) it is desired to determine the elements of a given order in the divisor class group.
It is therefore necessary to have an analogue of the elliptic curve division polynomials.
Early attempts on this problem, in the context of point counting algorithms, appear in
the work of Pila and Kampk
otter.
We sketch some results of Cantor [119]. Let C : y 2 = F (x) over k be such that F (x)
is monic of degree 2g + 1 and char(k) 6= 2 (though Section 9 of [119] does discuss how to
proceed when char(k) = 2). Cantor defines polynomials n for n g + 1 such that for
P = (xP , yP ) C(k) we have n (xP , yP ) = 0 if and only if n((xP , yP ) ()) lies in the
set of all divisor classes with a Mumford representative of the form div(u(x), y v(x))
having deg(u(x)) g 1. While points P C(k) such that [n]((P ) ()) is principal
will be roots of these polynomials, we stress that these polynomials do have other roots
as well. Also, since most divisor classes do not have representatives of the form (P ) ()
for a point P on the curve when g > 1, in general there are divisor classes of order n that
are not of this form. Equation (8.1) of [119] gives explicit recurrence formulae for n in
the case when g = 2. Section 10 of [119] gives the first few values of n (x, y) for the genus
2 curve y 2 = x5 + 1.
Let C be a genus 2 curve. Note that if D = (x1 , y1 ) + (x2 , y2 ) 2() has order n
then either [n]((xi , yi ) ()) 0 or [n]((x1 , y1 ) ()) [n]((x2 , y2 ) ()). Hence
one can find general divisors of order n using formulae for computing [n]((x, y) ())
and equating polynomials. In other words, to determine divisors of order n it is sufficient
to obtain rational functions that give the Mumford representation of [n]((x, y) ()).
Let n N and let C be a genus 2 curve over k in ramified model. There are
polynomials dn,0 (x), dn,1 (x), dn,2 (x), en,0 (x), en,1 (x), en,2 (x) k[x] of degrees respectively
2n2 3, 2n2 2, 2n2 1, 3n2 2, 3n2 3, 3n2 2 such that, for a generic point
P = (xP , yP ) C(k), the Mumford representation of [n]((xP , yP ) ()) is
e1,n (xP )
dn,2 (xP )
e2,n (xP )
dn,1 (xP )
.
x+
, yP
x+
x2 +
dn,0 (xP )
dn,0 (xP )
e0,n (xP )
e0,n (xP )
Indeed, this can be checked directly for any curve C and any prime n by computing
Cantors algorithm in a computer algebra package. These formulae are not necessarily
valid for all points P C(k) (such as those for which n((xP , yP ) ()) 0). For details
we refer to Gaudrys theses (Section 4.4 of [240] and Section 7.2 of [242]). Information
about the use of these, and other, ideals in point counting algorithms is given in Section
3 of Gaudry and Schost [247].
10.7
There are a finite number of points on a curve C of genus g over a finite field Fq . There
are also finitely many possible values for the Mumford representation of a reduced divisor
230
on a hyperelliptic curve over a finite field. Hence, the divisor class group Pic0Fq (C) of a
curve over a finite field is a finite group. Since the affine part of a reduced divisor is a sum
of at most g points (possibly defined over a field extension of degree bounded by g) it is
not surprising that there is a connection between {#C(Fqi ) : 1 i g} and #Pic0Fq (C).
Indeed, there is also a connection between {#Pic0Fqi (C) : 1 i g} and #C(Fq ). The
aim of this section is to describe these connections. We also give some important bounds
on these numbers (analogous to the Hasse bound for elliptic curves). Most results are
presented for general curves (i.e., not only hyperelliptic curves).
One of the most important results in the theory of curves over finite fields is the
following theorem of Hasse and Weil. The condition that the roots of L(t) have absolute
value q can be interpreted as an analogue of the Riemann hypothesis. This result gives
precise bounds on the number of points on curves and divisor class groups over finite
fields.
Theorem 10.7.1. (Hasse-Weil) Let C be a curve of genus g over Fq . There exists a
polynomial L(t) Z[t] of degree 2g with the following properties.
1. L(1) = #Pic0Fq (C).
Q2g
2. One can write L(t) = i=1 (1 i t) with i C such that g+i = i (this is
Q2g
i=1 (1
Proof: The polynomial L(t) is the numerator of the zeta function of C. For details see
Section V.1 of Stichtenoth [585], especially Theorem V.1.15. The proof that |i | = q
for all 1 i 2g is Theorem V.2.1 of Stichtenoth [585].
A proof of some parts of this result in a special case is given in Exercise 10.7.14.
Exercise 10.7.2. Show that part 3 of Theorem 10.7.1 follows immediately from part 2.
Definition 10.7.3. The polynomial L(t) of Theorem 10.7.1 is called the L-polynomial
of the curve C over Fq .
Theorem 10.7.4. (Schmidt) Let C be a curve of genus g over Fq . There there exists a
divisor D on C of degree 1 that is defined over Fq .
We stress that this result does not prove that C has a point defined over Fq (though
when q is large compared with the genus existence of a point in C(Fq ) will follow by the
Weil bounds). The result implies that even a curve with no points defined over Fq does
have a divisor of degree 1 (hence, not an effective divisor) that is defined over Fq .
Proof: See Corollary V.1.11 of Stichtenoth [585].
We now describe the precise connection between the roots i of the polynomial L(t)
(corresponding to Pic0Fq (C)) and #C(Fqn ) for n N.
Theorem 10.7.5. Let C be a curve of genus g over Fq and let i C for 1 i 2g be
as in Theorem 10.7.1. Let n N. Then
#C(Fqn ) = q n + 1
2g
X
i=1
ni .
(10.19)
231
n1
X
ani ti .
i=1
|#C(Fqn ) (q n + 1)| 2g q n
and
More precise bounds on #C(Fq ) are known; we refer to Section V.3 of Stichtenoth [585]
for discussion and references.
We now sketch the relationship between the above results and the q-power Frobenius
map : C C given by (x, y) = (xq , y q ). This is best discussed in terms of Abelian
varieties and so is strictly beyond the scope of the present book; however Exercise 10.7.11
shows how to consider the Frobenius map in Pic0Fq (C). We refer to Section 21 of Mumford [441], especially the subsection entitled Application II: The Riemann Hypothesis.
Briefly, the Frobenius map on C induces a morphism : JC JC where JC is the Jacobian variety of C (note that JC is defined over Fq ). Note that is not an isomorphism.
This morphism is a group homomorphism with ker() = {0} and so is an isogeny. More
generally, if A is an Abelian variety over Fq then there is a q-power Frobenius morphism
: A A. Just as in the case of elliptic curves one has A(Fqn ) = ker( n 1) and
so #A(Fqn ) = ker( n 1) = deg( n 1) (note that is inseparable and n 1 is a
separable morphism). By considering the action of on the Tate module (the Tate module of an Abelian variety is defined in the analogous way to elliptic curves, see Section
19 of [441]) it can be shown that satisfies a characteristic equation given by a monic
polynomial PA (T ) Z[T ] of degree 2g. It follows that deg( 1) = PA (1). Writing
Q2g
Q
n
PA (T ) = 2g
i=1 (1 i ). It follows
i=1 (T i ) over C it can be shown that #A(Fqn ) =
that the roots i are the same values as those used earlier, and that P (T ) = T 2g L(1/T ).
Definition 10.7.10. Let C be a curve over Fq . The characteristic polynomial of
Frobenius is the polynomial P (T ) = T 2g L(1/T ).
232
The Frobenius map : C C also induces the map : PicF0 q (C) Pic0Fq (C), and
we abuse notation by calling it as well. If D is any divisor representing a divisor class in
Pic0Fq (C) then P ()D 0. In other words, if P (T ) = T 2g + a1 T 2g1 + + a1 q g1 T + q g
then
2g (D) + [a1 ] 2g1 (D) + + [a1 q g1 ](D) + [q g ]D 0
(10.20)
10.8. ENDOMORPHISMS
10.8
233
Endomorphisms
Let A1 , A2 be Abelian varieties over k. One defines Homk (A1 , A2 ) to be the set of all
morphisms of varieties from A1 to A2 over k that are group homomorphisms (see Section
19 of [441]). We define Hom(A1 , A2 ) to be Homk (A1 , A2 ). The endomorphism ring
of an Abelian variety A over k is defined to be Endk (A) = Homk (A, A). We write
End(A) = Homk (A, A).
It is beyond the scope of this book to give a complete treatment of the endomorphism
ring. However, we make a few general remarks. First, note that Homk (A1 , A2 ) is a
Z-module. Second, recall that for elliptic curves every non-zero homomorphism is an
isogeny (i.e., has finite kernel). This is no longer true for Abelian varieties (for example,
let E be an elliptic curve and consider the homomorphism : E E E E given
by (P, Q) = (P, OE )). However, if A is a simple Abelian variety then End(A) Z Q
is a division algebra and so every non-zero endomorphism
Q ni is an isogeny in this case.
Furthermore, if an Abelian variety A is isogenous
to
i Ai with Ai simple (and Ai not
Q
isogenous to Aj for i 6= j) then End(A)Z Q
= i Mni (End(Ai )Z Q) where Mn (R) is the
ring of n n matrices over the ring R (see Corollary 2 of Section 19 of Mumford [441]).
10.9
Supersingular Curves
Recall from Theorem 10.6.1 that if C is a curve of genus g over a field k of characteristic
p then #Pick0 (C)[p] pg .
Definition 10.9.1. Let k be a field such that char(k) = p > 0 and let C be a curve of
genus g over k. The p-rank of C is the integer 0 r g such that #Pick0 (C)[p] = pr .
An Abelian variety of dimension g over Fq is defined to be supersingular if it is
isogenous over Fq to E g where E is a supersingular elliptic curve over Fq . A curve C over
Fq is supersingular if JC is a supersingular Abelian variety. It follows that the p-rank of
a supersingular Abelian variety over Fpn is zero. The converse is not true (i.e., p-rank zero
does not imply supersingular) when the dimension is 3 or more; see Example 10.9.8). If
the p-rank of a dimension g Abelian variety A over Fpn is g then A is said to be ordinary.
Lemma 10.9.2. Suppose A is a supersingular Abelian variety over Fq and write PA (T )
for the characteristic polynomial of Frobenius on A. The roots of PA (T ) are such that
/ q is a root of unity.
Proof: Since the isogeny to E g is defined over some finite extension Fqn it follows from
234
for all
1 i g.
Chapter 12
12.1
Primality Testing
12.1.1
Fermat Test
Let N N. If N is prime then the algebraic group Gm (Z/N Z) = (Z/N Z) over the ring
Z/N Z has N 1 elements. In other words, if a is an integer such that gcd(a, N ) = 1 and
aN 1 6 1 (mod N )
then N is not prime. Such a number a is called a compositeness witness for N . The
hope is that if N is not prime then the order of the group Gm (Z/N Z) is not a divisor of
N 1 and so a compositeness witness exists. Hence, the Fermat test is to choose random
1 < a < N and compute aN 1 (mod N ).
As is well-known, there are composite numbers N that are pseudoprimes for the
Fermat test.
Definition 12.1.1. An integer N N is a Carmichael number if N is composite and
aN 1 1 (mod N )
for all a N such that gcd(a, N ) = 1.
Ql
Ql
If N = i=1 pei i is composite then Gm (Z/N Z)
= i=1 Gm (Z/pei i Z) and has order
(N ) and exponent (N ) = lcm{piei 1 (pi 1) : 1 i l}.
Exercise 12.1.2. Show that all Carmichael numbers are odd. Show that N is a Carmichael
number if and only if (N ) | (N 1). Show that a composite number N N is a
Ql
Carmichael number if and only if N = i=1 pi is a product of distinct primes such that
(pi 1) | (N 1) for i = 1, . . . , l.
Exercise 12.1.3. Show that 561 = 3 11 17 is a Carmichael number.
It was shown by Alford, Granville and Pomerance [10] in 1992 that there are infinitely
many Carmichael numbers.
It is natural to replace Gm (Z/N Z) with any algebraic group or algebraic group quotient, such as the torus T2 , the algebraic group quotient corresponding to Lucas sequences
(this gives rise to the p + 1 test) or an elliptic curve of predictable group order.
Exercise 12.1.4. Design a primality test based on the algebraic group T2 (Z/N Z), which
has order N + 1 if N is prime. Also show to use Lucas sequences to test N for primality
using the algebraic group quotient.
Exercise 12.1.5. Design a primality test for integers N 3 (mod 4) based on the
algebraic group E(Z/N Z) where E is a suitably chosen supersingular elliptic curve.
Exercise 12.1.6. Design a primality test for integers N 1 (mod 4) based on the
algebraic group E(Z/N Z) where E is a suitably chosen elliptic curve.
12.1.2
This primality test is also called the Selfridge-Miller-Rabin test or the strong prime test. It
is a refinement of the Fermat test, and works very well in practice. Rather than changing
the algebraic group, the idea is to make better use of the available information. It is based
on the following trivial lemma, which is false if p is replaced by a composite number N
(except for N = pa where p is odd).
Lemma 12.1.7. Let p be prime. If x2 1 (mod p) then x 1 (mod p).
263
For the Miller-Rabin test write N 1 = 2b m where m is odd and consider the sequence
a0 = am (mod N ), a1 = a20 = a2m (mod N ), . . . , ab = a2b1 = aN 1 (mod N ) where
gcd(a, N ) = 1. If N is prime then this sequence must have the form (, , . . . , , 1, 1, . . . , 1)
or (1, 1, . . . , 1) or (1, . . . , 1) (where denotes numbers whose values are not relevant).
Any deviation from this form means that the number N is composite.
An integer N is called a base-a probable prime if the Miller-Rabin sequence has
the good form and is called a base-a pseudoprime if it is a base-a probable prime that
is actually composite.
Exercise 12.1.8. Let N = 561. Note that gcd(2, N ) = 1 and 2N 1 1 (mod N ). Show
that the Miller-Rabin method with a = 2 demonstrates that N is composite. Show that
this failure allows one to immediately split N .
Theorem 12.1.9. Let n > 9 be an odd composite integer. Then N is a base-a pseudoprime for at most (N )/4 bases between 1 and N .
Proof: See Theorem 3.5.4 of [161] or Theorem 10.6 of Shoup [552].
Hence, if a number N passes several Miller-Rabin tests for several randomly chosen
bases a then one can believe that with high probability N is prime (Section 5.4.2 of
Stinson [588] gives a careful analysis of the probability of success of a closely related
algorithm using Bayes theorem). Such an integer is called a probable prime. In
practice one chooses O(log(N )) random bases a and runs the Miller-Rabin test for each.
The total complexity is therefore O(log(N )4 ) bit operations (which can be improved to
O(log(N )2 M (log(N ))), where M (m) is the cost of multiplying two m-bit integers).
12.1.3
Primality Proving
Agrawal, Kayal and Saxena [6] (AKS) discovered a deterministic algorithm that runs in
polynomial-time and determines whether or not N is prime. We refer to Section 4.5 of
[161] for details. The original AKS test has been improved significantly. A variant due to
Bernstein requires O(log(N )4+o(1) ) bit operations using fast arithmetic (see Section 4.5.4
of [161]).
There is also a large literature on primality proving using Gauss and Jacobi sums, and
using elliptic curves. We refer to Sections 4.4 and 7.6 of [161].
In practice the Miller-Rabin test is still widely used for cryptographic applications.
12.2
Definition 12.2.1. Let X N, then (X) is defined to be the number of primes 1 <
p < X.
The famous prime number theorem states that (X) is asymptotically equal to
X/ log(X) (as always log denotes the natural logarithm). In other words, primes are
rather common among the integers. If one choose a random integer 1 < p < X then the
probability that p is prime is therefore about 1/ log(X) (equivalently, about log(X) trials
are required to find a prime between 1 and X). In practice, this probability increases
significantly if one choose p to be odd and not divisible by 3.
Theorem 12.2.2. Random (probable) prime numbers of a given size X can be generated
using the Miller-Rabin algorithm in expected O(log(X)5 ) bit operations (or O(log(X)3 M (log(X)))
using fast arithmetic).
12.2.1
Primality Certificates
12.3
265
First we recall the notion of a smooth integer. These are discussed in more detail in
Section 15.1.
Q
Definition 12.3.1. Let N = ri=1 pei i N (where we assume the pi are distinct primes
and ei 1) and let B N. Then N is B-smooth if all pi B and N is B-power
smooth (or strongly B-smooth) if all pei i B.
Example 12.3.2. 528 = 24 3 11 is 14-smooth but is not 14-power smooth.
The p 1 method was published by Pollard [482].1 The idea is to suppose that N
has prime factors p and q where p 1 is B-power smooth but q 1 is not B-power
smooth. Then if 1 < a < N is randomly chosen we have aB! 1 (mod p) and, with high
probability, aB! 6 1 (mod q). Hence gcd(aB! 1, N ) splits N . Algorithm 11 gives the
Pollard p 1 algorithm.
Example 12.3.3. Let N = 124639 and let B = 8. Choose a = 2. One can check that
gcd(aB! (mod N ) 1, N ) = 113
from which one deduces that N = 113 1103.
This example worked because the prime p = 113 satisfies p 1 = 24 7 | 8! and so
8!
2 1 (mod p) while the other prime satisfies q 1 = 2 19 29, which is not 8-smooth.
Of course, the factor returned from the gcd may be 1 or N . If the factor is not 1 or
N then we have split N as N = ab. We now test each factor for primality and attempt
to split any composite factors further.
Algorithm 11 Pollard p 1 algorithm
Input: N N
Output: Factor of N
1: Choose a suitable value for B
2: Choose a random 1 < a < N
3: b = a
4: for i = 2 to B do
5:
b = bi (mod N )
6: end for
7: return gcd(b 1, N )
Exercise 12.3.4. Factor N = 10028219737 using the p 1 method.
Lemma 12.3.5. The complexity of Algorithm 11 is O(B log(B)M (log(N ))) bit operations.
Proof: The main loop is repeated B times and contains an exponentiation modulo N to
a power i < B. The cost of the exponentiation is O(log(B)M (log(N ))) bit operations.
The algorithm is therefore exponential in B and so is only practical if B is relatively
small. If B = O(log(N )i ) then the algorithm is polynomial-time. Unfortunately, the
algorithm only splits numbers of a special form (namely those for which there is a factor
p such that p 1 is very smooth).
1 According to [630] the first stage of the method was also known to D. N. and D. H. Lehmer, though
they never published it.
= g (g
u
B+vwu
1)
(g B+vw g i ).
i6=u
12.4
Let N be an integer to be factored and let p | N be prime. One can view Pollards p 1
method as using an auxiliary group (namely, Gm (Fp )) that may have smooth order. The
idea is then to obtain an element modulo N (namely, aB! ) that is congruent modulo p
(but not modulo some other prime q | N ) to the identity element of the auxiliary group.
Lenstras idea was to replace the group Gm in the Pollard p1 method with the group
of points on an elliptic curve. The motivation was that even if p 1 is not smooth, it is
reasonable to expect that there is an elliptic curve E over Fp such that #E(Fp ) is rather
smooth. Furthermore, since there are lots of different elliptic curves over the field Fp we
have a chance to split N by trying the method with lots of different elliptic curves. We
refer to Section 9.14 for some remarks on elliptic curves modulo N .
If E is a randomly chosen elliptic curve modulo N with a point P on E modulo
N then one hopes that the point Q = [B!]P is congruent modulo p (but not modulo
267
some other prime q) to the identity element. One constructs E and P together, for
example choosing 1 < xP , yP , a4 < N and setting a6 = yP2 x3P a4 xP (mod N ). If one
computes Q = (x : y : z) using inversion-free arithmetic and projective coordinates (as
in Exercise 9.1.5) then Q OE (mod p) is equivalent to p | z. Here we are performing
elliptic curve arithmetic over the ring Z/N Z (see Section 9.14).
The resulting algorithm is known as the elliptic curve method or ECM and it
is very widely used, both as a general-purpose factoring algorithm in computer algebra
packages, and as a subroutine of the number field sieve. An important consequence of
Lenstras suggestion of replacing the group Fp by E(Fp ) is that it motivated Miller and
Koblitz to suggest using E(Fp ) instead of Fp for public key cryptography.
Algorithm 12 gives a sketch of one round of the ECM algorithm. If the algorithm fails
then one should repeat it, possibly increasing the size of B. Note that it can be more
efficient to compute [B!]P as a single exponentiation rather than a loop as in line 5 of
Algorithm 12; see [49].
Algorithm 12 Elliptic curve factoring algorithm
Input: N N
Output: Factor of N
1: Choose a suitable value for B
2: Choose random elements 0 x, y, a4 < N
3: Set a6 = y 2 x3 a4 x (mod N )
4: Set P = (x : y : 1)
5: for i = 2 to B do
6:
Compute P = [i]P
7: end for
8: return gcd(N, z) where P = (x : y : z)
Exercise 12.4.1. Show that the complexity of Algorithm 12 is O(B log(B)M (log(N )))
bit operations.
Exercise 12.4.2. Show that the complexity of Algorithm 12 can be lowered to O(BM (log(N )))
bit operations using the method of Exercise 12.3.6.
Many of the techniques used to improve the Pollard p1 method (such as the standard
continuation, though not Pollards FFT continuation) also apply directly to the elliptic
curve method. We refer to Section 7.4 of [161] for details. One can also employ all
known techniques to speed up elliptic curve arithmetic. Indeed, the Montgomery model
for elliptic curves (Section 9.12.1) was discovered in the context of ECM rather than ECC.
In practice, we repeat the algorithm a number of times for random choices of B, x, y
and a4 . The difficult problems are to determine a good choice for B and to analyse the
probability of success. We discuss these issues in Section 15.3 where we state Lenstras
conjecture that the elliptic curve method factors integers in subexponential time.
12.5
Pollard-Strassen Method
Chapter 13
270
Exercise 13.0.4. Let N be composite. Define the discrete logarithm problem DLPMOD-N in the multiplicative group of integers modulo N . Show that FACTOR R
DLP-MOD-N.
Exercise 13.0.4 gives some evidence that cryptosystems based on the DLP should be
at least as secure as cryptosystems based on factoring.
13.1
Exhaustive Search
The simplest algorithm for the DLP is to sequentially compute g a for 0 a < r and
test equality of each value with h. This requires at most r 2 group operations and r
comparisons.
Exercise 13.1.1. Write pseudocode for the exhaustive search algorithm for the DLP and
verify the claims about the worst-case number of group operations and comparisons.
If the cost of testing equality of group elements is O(1) group operations then the
worst-case running time of the algorithm is O(r) group operations. It is natural to
assume that testing equality is always O(1) group operations, and this will always be true
for the algebraic groups considered in this book. However, as Exercise 13.1.2 shows, such
an assumption is not entirely trivial.
Exercise 13.1.2. Suppose projective coordinates are used for elliptic curves E(Fq ) to
speed up the group operations in the exhaustive search algorithm. Show that testing
equality between a point in projective coordinates and a point in affine or projective
coordinates requires at least one multiplication in Fq (and so this cost is not linear).
Show that, nevertheless, the cost of testing equality is less than the cost of a group
operation.
For the rest of this chapter we assume that groups are represented in a compact way
and that operations involving the representation of the group (e.g., testing equality) all
cost less than the cost of one group operation. This assumption is satisfied for all the
algebraic groups studied in this book.
13.2
is a group homomorphism from hgi to the unique cyclic subgroup of hgi of order le . Hence,
if h = g a then
e
le (h) = le (g)a (mod l ) .
1 The paper [479] is authored by Pohlig and Hellman and so the method is usually referred to by this
name, although R. Silver, R. Schroeppel, H. Block, and V. Nechaev also discovered it.
271
Using le one can reduce the DLP to subgroups of prime power order. To reduce the
problem to subgroups of prime order we do the following: Suppose g0 has order le and
e1
h0 = g0a then we can write a = a0 + a1 l + ae1 le1 where 0 ai < l. Let g1 = g0l .
Raising to the power le1 gives
e1
h0l
= g1a0
from which one can find a0 by trying all possibilities (or using baby-step-giant-step or
other methods).
To compute a1 we define h1 = h0 g0a0 so that
a l+a2 l2 +ae1 le1
h1 = g 0 1
h1l
= g1a1
To obtain the next value we set h2 = h1 g0la1 and repeat. Continuing gives the full
solution modulo le . Once a is known modulo liei for all liei kN one computes a using the
Chinese remainder theorem. The full algorithm (in a slightly more efficient variant) is
given in Algorithm 13.
Algorithm 13 Pohlig-Hellman algorithm
Q
Input: g, h = g a , {(li , ei ) : 1 i n} such that order of g is N = ni=1 liei
Output: a
fi
fi
1: Compute {g N/li , hN/li : 1 i n, 1 fi ei }
2: for i = 1 to n do
3:
ai = 0
4:
for j = 1 to ei do
Reducing DLP of order liei to cyclic groups
j
j
5:
Let g0 = g N/li and h0 = hN/li
These were already computed in line 1
6:
Compute u = g0ai and h0 = h0 u
7:
if h0 6= 1 then
8:
Let g0 = g N/li , b = 1, T = g0
Already computed in line 1
9:
while h0 6= T do
Exhaustive search
10:
b = b + 1, T = T g0
11:
end while
12:
ai = ai + blij1
13:
end if
14:
end for
15: end for
e
16: Use Chinese remainder theorem to compute a ai (mod li i ) for 1 i n
17: return a
Example 13.2.3. Let p = 19, g = 2 and h = 5. The aim is to find an integer a such
that h g a (mod p). Note that p 1 = 2 32 . We first find a modulo 2. We have
(p 1)/2 = 9 so define g0 = g 9 1 (mod 19) and h0 = h9 1 (mod 19). It follows that
a 0 (mod 2).
Now we find a modulo 9. Since (p 1)/9 = 2 we first compute g0 = g 2 4 (mod 19)
and h0 h2 6 (mod 19). To get information modulo 3 we compute (this is a slight
change of notation from Algorithm 13)
g1 = g03 7 (mod 19) and h30 7 (mod 19).
272
It follows that a 1 (mod 3). To get information modulo 9 we remove the modulo 3 part
by setting h1 = h0 /g0 = 6/4 11 (mod 19). We now solve h1 g1a1 (mod 19), which has
the solution a1 2 (mod 3). It follows that a 1 + 3 2 7 (mod 9).
Finally, by the Chinese remainder theorem we obtain a 16 (mod 18).
Exercise 13.2.4. Let p = 31, g = 3 and h = 22. Solve the discrete logarithm problem
of h to the base g using the Pohlig-Hellman method.
We recall that an integer is B-smooth if all its prime factors are at most B.
Theorem 13.2.5. Let g G have order N . Let B N be such that N is B-smooth Then
Algorithm 13 solves the DLP in G using O(log(N )2 + B log(N )) group operations.2
Proof: One can factor N using trial division in O(BM (log(N ))) bit operations, where
M (n) is the cost of multiplying n-bit integers. We assume that M (log(N )) is O(1) group
operations (this is true for all the algebraic groups of interest in this book). Hence, we
may assume that the factorisation of N is known.
Computing all lei (g) and lei (h) can be done naively in O(log(N )2 ) group operations,
i
i
but we prefer to do it in O(log(N ) log log(N )) group operations using the method of
Section 2.15.1.
Pn
Pn
Lines 5 to 13 run i=1 ei = O(log(N )) times and, since each li 2, we have i=1 ei
log2 (N ). The computation of u in line 6 requires O(ei log(li )) group operations. Together
this gives a bound of O(log(N )2 ) group operations to the running time. (Note that when
N = 2e then the cost of these lines is e2 log(2) = O(log(N )2 ) group operations.)
Solving each DLP in a cyclic group of order li using naive methods requires O(li )
group operations (this can be improved using the baby-step-giant-step method). There
are log2 (N ) such computations to perform, giving O(log(N )B) group operations.
The final step is to use the Chinese remainder theorem to compute a, requiring
O(log(N )M (log(N ))) bit operations, which is again assumed to cost at most O(log(N ))
group operations.
Due to this method, small primes give no added security in discrete logarithm systems.
Hence one generally uses elements of prime order r for cryptography.
Exercise 13.2.6. Recall the Tonelli-Shanks algorithm for computing square roots modulo
p from Section 2.9. A key step of the algorithm is to find a solution j to the equation
b = y 2j (mod p) where y has order 2e . Write down the Pohlig-Hellman method to solve
this problem. Show that the complexity is O(log(p)2 M (log(p))) bit operations.
Q
Exercise 13.2.7. Let B N>3 . Let N = ni=1 xi where 2 xi B. Prove that
P
n
i=1 xi B log(N )/ log(B).
Hence, show that the Pohlig-Hellman method performs O(log(N )2 +B log(N )/ log(B))
group operations.
Remark 13.2.8. As we will see, replacing exhaustive
search by the baby-step-giant-step
algorithm improves the complexity to O(log(N )2 + B log(N )/ log(B)) group operations
(at the cost of more storage).
Algorithm 13 can be improved, when there is a prime power le dividing N with e large,
by structuring it differently. Section
11.2.3 of Shoup [552] gives a method to compute
the DLP in a group of order le in O(e l + e log(e) log(l)) group operations (this is using
baby-step-giant-step rather than exhaustive search). Algorithm 1 and Corollary 1 of
Sutherland [594] give an algorithm that requires
this we mean that the constant implicit in the O() is independent of B and N .
273
13.3
This algorithm, usually credited to Shanks3 , exploits an idea called the time/memory
a
tradeoff.
Suppose g has prime order r and that h = g for some 0 a < r. Let
m = r. Then there are integers a0 , a1 such that a = a0 + ma1 and 0 a0 , a1 < m. It
follows that
g a0 = h(g m )a1
and this observation leads to Algorithm 14. The algorithm requires storing a large list of
values and it is important, in the second stage of the algorithm, to be able to efficiently
determine whether or not an element lies in the list. There are a number of standard
solutions to this problem including using binary trees, hash tables, or sorting the list after
line 7 of the algorithm (see, for example, parts II and III of [145] or Section 6.3 of [314]).
Algorithm 14 Baby-step-giant-step (BSGS) algorithm
Input: g, h G of order r
Output:
a such that h = g a , or
1: m = r
2: Initialise an easily searched structure (such as a binary tree or a hash table) L
3: x = 1
4: for i = 0 to m do
Compute baby steps
5:
store (x, i) in L, easily searchable on the first coordinate
6:
x = xg
7: end for
8: u = g m
9: y = h, j = 0
10: while (y, ) 6 L do
Compute giant steps
11:
y = yu, j = j + 1
12: end while
13: if (x, i) L such that x = y then
14:
return i + mj
15: else
16:
return
17: end if
Note that the BSGS algorithm is deterministic. The algorithm also solves the decision
problem (is h hgi?) though, as discussed in Section 11.6, there are usually faster
solutions to the decision problem.
Theorem 13.3.1. Let G be a group of order r. Suppose that elements of G are represented
using O(log(r)) bits and that the group operations can be performed in O(log(r)2 ) bit
3 Nechaev
274
Proof: The algorithm computes r group operations for the baby steps. The cost of
inserting each group element into the easily searched structure is O(log(r)2 ) bit operations,
since comparisons require O(log(r)) bit operations (this is where the assumption
on the
Exercise 13.3.7. Suppose one considers the DLP in a group G where computing the
inverse g 1 is much faster than multiplication in the group. Show how to solve the DLP
275
Exercise 13.3.9. Let g G have order N = mr where r is prime and m is log(N )smooth. Suppose h = g x and w are given such that 0 x < w. Show how one
can compute p
x by combining the Pohlig-Hellman method and the BSGS algorithm in
O(log(N )2 + w/m) group operations.
13.4
This section presents a lower bound for the complexity of the discrete logarithm problem
in groups of prime order for algorithms that do not exploit the representation of the
group; such algorithms are called generic algorithms. The main challenge is to formally
model such algorithms. Babai and Szemeredi [19] defined a black box group to be a
group with elements represented (not necessarily uniquely) as binary strings and where
multiplication, inversion and testing whether an element is the identity are all performed
using oracles. Nechaev [449] used a different
model (for which equality testing does not
require an oracle query) and obtained ( r) time and space complexity.
Nechaevs paper concerns deterministic algorithms, and so his result does not cover
the Pollard algorithms. Shoup [549] gave yet anothermodel for generic algorithms (his
model allows randomised algorithms) and proved ( r) time complexity for the DLP
and some related problems. This lower bound is often called the birthday bound on
the DLP.
Shoups formulation has proven to be very popular with other authors and so we
present it in detail. We also describe the model of generic algorithms by Maurer [401].
Further results in this area, and extensions of the generic algorithm model (such as working with groups of composite order, working with groups endowed with pairings, providing
access to decision oracles etc), have been given by Maurer and Wolf [404], Maurer [401],
Boneh and Boyen [76, 77], Boyen [95], Rupp, Leander, Bangerter, Dent and Sadeghi [504].
276
13.4.1
Fix a constant t R>0 . When G is the group of points on an elliptic curve of prime order
(and log means log2 as usual) one can take t = 2.
Definition 13.4.1. An encoding of a group G of order r is an injective function :
G {0, 1}t log(r) .
A generic algorithm for a computational problem in a group G of order r is a
probabilistic algorithm that takes as input r and ((g1 ), . . . , (gk )) such that g1 , . . . , gk
G and returns a sequence (a1 , . . . , al , (h1 ), . . . , (hm )) for some a1 , . . . , al Z/rZ and
h1 , . . . , hm G (depending on the computational problem in question). The generic
algorithm is given access to a perfect oracle O such that O((g1 ), (g2 )) returns (g1 g21 ).
Note that one can obtain the encoding (1) of the identity element by O((g1 ), (g1 )).
One can then compute the encoding of g 1 from the encoding of g as O((1), (g)).
Defining O ((g1 ), (g2 )) = O((g1 ), O((1), (g2 ))) gives an oracle for multiplication in
G.
Example 13.4.2. A generic algorithm for the DLP in hgi where g has order r takes
input (r, (g), (h)) and outputs a such that h = g a . A generic algorithm for CDH (see
Definition 20.2.1) takes input ((g), (g a ), (g b )) and outputs (g ab ).
In Definition 13.4.1 we insisted that a generic algorithm take as input the order of
the group, but this is not essential. Indeed, it is necessary to relax this condition if one
wants to consider generic algorithms for, say, (Z/N Z) when N is an integer of unknown
factorisation. To do this one considers an encoding function to {0, 1}l and it follows that
the order r of the group is at most 2l . If the order is not given then one can consider
a generic algorithm whose goal is to compute the order of a group. Theorem 2.3 and
Corollary 2.4 of Sutherland [592] prove an (r1/3 ) lower bound on the complexity of
a generic algorithm to compute the order r of a group, given a bound M such that
M < r < M.
13.4.2
Maurers formulation of generic algorithms [401] does not use any external representation of group elements (in particular, there are no randomly chosen encodings). Maurer
considers a black box containing registers, specified by indices i N, that store group
elements. The model considers a set of operations and a set of relations. An oracle query
O(op, i1 , . . . , it+1 ) causes register it+1 to be assigned the value of the t-ary operation op
on the values in registers i1 , . . . , it . Similarly, an oracle query O(R, i1 , . . . , it ) returns the
value of the t-ary relation R on the values in registers i1 , . . . , it .
A generic algorithm in Maurers model is an algorithm that takes as input the order
of the group (as with Shoups model, the order of the group can be omitted), makes oracle
queries, and outputs the value of some function of the registers (for example, the value
of one of the registers; Maurer calls such an algorithm an extraction algorithm).
Example 13.4.3. To define a generic algorithm for the DLP in Maurers model one
imagines a black box that contains in the first register the value 1 (corresponding to g)
and in the second register the value a (corresponding to h = g a ). Note that the black
box contains is viewed as containing the additive group Z/rZ. The algorithm has access
to an oracle O(+, i, j, k) that assigns register k the sum of the elements in registers i and
j, an oracle O(, i, j) that assigns register j the inverse of the element in register i, and
an oracle O(=, i, j) that returns true if and only if registers i and j contain the same
277
group element. The goal of the generic algorithm for the DLP is to output the value of
the second register.
To implement the baby-step-giant-step algorithm or Pollard rho algorithm in Maurers
model it is necessary to allow a further oracle that computes a well-ordering relation on
the group elements.
We remark that the Shoup and Maurer models have been used to prove the security
of cryptographic protocols against adversaries that behave like generic algorithms. Jager
and Schwenk [306] have shown that both models are equivalent for this purpose.
13.4.3
We present the main result of this section using Shoups model. A similar result can be
obtained using Maurers model (except that it is necessary to either ignore the cost of
equality queries or else allow a total order relation on the registers).
We start with a result attributed by Shoup to Schwarz. In this section we only use
the result when k = 1, but the more general case is used later in the book.
Lemma 13.4.4. Let F (x1 , . . . , xk ) Fr [x1 , . . . , xk ] be a non-zero polynomial of total
degree d. Then for P = (P1 , . . . , Pk ) chosen uniformly at random in Fkr the probability
that F (P1 , . . . , Pk ) = 0 is at most d/r.
Proof: If k = 1 then the result is standard. We prove the result by induction on k. Write
F (x1 , . . . , xk ) = Fe (x1 , . . . , xk1 )xek + Fe1 (x1 , . . . , xk1 )xke1 + + F0 (x1 , . . . , xk1 )
where Fi (x1 , . . . , xk1 ) Fr [x1 , . . . , xk1 ] has total degree d i for 0 i e and
e d. If P = (P1 , . . . , Pk1 ) Frk1 is such that all Fi (P ) = 0 then all r choices for
Pk lead to a solution. The probability of this happening is at most (d e)/r (this is the
probability that Fe (P ) = 0). On the other hand, if some Fi (P ) 6= 0 then there are at
most e choices for Pk that give a root of the polynomial. The total probability is therefore
(d e)/r + e/r = d/r.
Theorem 13.4.5. Let G be a cyclic group of prime order r. Let A be a generic algorithm
for the DLP in G that makes at most m oracle queries. Then the probability, over uniformly chosen a Z/rZ and uniformly chosen encoding function : G {0, 1}t log(r) ,
that A((g), (g a )) = a is O(m2 /r).
Proof: Instead of choosing a random encoding function in advance, the method of proof
is to create the encodings on the fly. The algorithm to produce the encodings is called
the simulator. We also do not choose the instance of the DLP until the end of the game.
The simulation will be perfect unless a certain bad event happens, and we will analyse
the probability of this event.
Let S = {0, 1}t log(r) . The simulator begins by uniformly choosing two distinct 1 , 2
in S and running A(1 , 2 ). Algorithm A assumes that 1 = (g) and 2 = (h) for some
g, h G and some encoding function , but it is not necessary for the simulator to fix
values for g and h.
It is necessary to ensure that the encodings are consistent with the group operations.
This cannot be done perfectly without choosing g and h, but the following idea takes
care of trivial consistency. The simulator maintains a list of pairs (i , Fi ) where i S
and Fi Fr [x]. The initial values are (1 , 1) and (2 , x). Whenever A makes an oracle
query on (i , j ) the simulator computes F = Fi Fj . If F appears as Fk in the list of
pairs then the simulator replies with k and does not change the list. Otherwise, a S
278
distinct from the previously used values is chosen uniformly at random, (, F ) is added
to the simulators list, and is returned to A.
After making at most m oracle queries A outputs b Z/rZ. The simulator now
chooses a uniformly at random in Z/rZ. Algorithm A wins if b = a.
Let the simulators list contain precisely k polynomials {F1 (x), . . . , Fk (x)} for some
k m + 2. Let E be the event that Fi (a) = Fj (a) for some pair 1 i < j k. The
probability that A wins is
Pr(A wins |E) Pr(E) + Pr(A wins |E) Pr(E).
(13.3)
For each pair 1 i < j k the probability that (Fi Fj )(a) = 0 is 1/r by Lemma 13.4.4.
Hence, the probability of event E is at most k(k 1)/2r = O(m2 /r). On the other hand,
if event E does not occur then all A knows about a is that it lies in the set X of possible
values for a for which Fi (a) 6= Fj (a) for all 1 i < j k. Let N = #X r m2 /2 Then
Pr(E) = N/r and Pr(A wins |E) = 1/N .
Putting it all together, the probability that A wins is O(m2 /r).
Exercise 13.4.6. Prove Theorem 13.4.5 using Maurers model for generic algorithms.
[Hint: The basic method of proof is exactly the same. The difference is in formulation
and analysis of the success probability.]
Corollary 13.4.7. Let A be a generic algorithm for the DLP.
p If A succeeds with noticeable
probability 1/ log(r)c for some c > 0 then A must make ( r/ log(r)c ) oracle queries.
13.5
A number of generalisations of the discrete logarithm problem have been proposed over
the years. The motivation for such problems varies: sometimes the aim is to enable new
cryptographic functionalities; other times the aim is to generate hard instances of the
DLP more quickly than previous methods.
Definition 13.5.1. Let G be a finitely generated Abelian group. The multidimensional
discrete logarithm problem or representation problem4 is: given g1 , g2 , . . . , gl , h
G and S1 , S2 , . . . , Sl Z to find aj Sj for 1 j l, if they exist, such that
h = g1a1 g2a2 glal .
The product discrete logarithm problem5 is: given g, h G and S1 , S2 , . . . , Sl
Z to find aj Sj for 1 j l, if they exist, such that
h = g a1 a2 al .
Remark 13.5.2. A natural variant of the product DLP is to compute only the product
a1 a2 al rather than the l-tuple (a1 , . . . , al ). This is just the DLP with respect to a
specific instance generator (see the discussion in Section 2.1.2). Precisely, consider an
instance generator that, on input a security parameter , outputs a group element g of
prime order r and then chooses aj Sj for 1 j l and computes h = g a1 a2 al . The
stated variant of the product DLP is the DLP with respect to this instance generator.
4 This computational problem seems to be first explicitly stated in the work of Brands [97] from 1993,
in the case Si = Z.
5 The idea of using product exponents for improved efficiency appears in Knuth [340] where it is called
the factor method.
279
Note that the representation problem can be defined whether or not G = hg1 , . . . , gl i
is cyclic. The solution to Exercise 13.5.4 applies in all cases. However, there may be
other ways to tackle the non-cyclic case (e.g., exploiting efficiently computable group
homomorphisms, see [230] for example), so the main interest is the case when G is cyclic
of prime order r.
Example 13.5.3. The representation problem can arise when using the GLV method
(see Section 11.3.3) with intentionally small coefficients. In this case,g2 = (g1 ), hg1 , g2 i
is a cyclic group of order r, and h = g1a1 g2a2 where 0 a1 , a2 < w r).
The numberQ
of possible choices for h in both the representation problem and product
l
DLP is at most j=1 #Sj (it could be smaller if the same h can arise from many different
combinations of (a1 , . . . , al )). If l is even and #Sj = #S1 for all j then there is an easy
l/2
time/memory tradeoff algorithm requiring O(#S1 ) group operations.
Exercise 13.5.4. Write down an efficient BSGS algorithm to solve the representation
problem. What is the running time and storage requirement?
Exercise 13.5.5. Give an efficient BSGS algorithm to solve the product DLP. What is
the running time and storage requirement?
It is natural to ask whether one can do better than the naive baby-step-giant-step
algorithms for these problems, at least for certain values of l. The following result shows
that the answer in general turns out to be no.
Lemma 13.5.6. Assume l is even and #Sj = #S1 for all 2 j l. A generic
algorithm for the representation problem with noticeable success probability 1/ log(#S1 )c
l/2
needs (#S1 / log(#S1 )c/2 ) group operations.
Proof: Suppose A is a generic algorithm for the representation problem. Let G be a
group of order r and let g, h G. Set m = r1/l , Sj = {a Z : 0 a < m} and
j
let gj = g m for 0 j l 1. If h = g a for some a Z then the base m-expansion
a0 + a1 m + + al1 ml1 is such that
h = ga =
l1
Y
gj j .
j=0
Hence, if A solves the representation problem then we have solved the DLP using a
generic algorithm. Since we have shownpthat a generic algorithm for the DLP with
success probability 1/ log(#S1 )c needs ( r/ log(#S1 )c ) group operations, the result is
proved.
13.6
Recall that the Hamming weight of an integer is the number of ones in its binary
expansion.
Definition 13.6.1. Let G be a group and let g G have prime order r. The low
Hamming weight DLP is: Given h hgi and integers n, w to find a integer a (if
it exists) whose binary expansion has length n and Hamming weight w such that
h = ga.
280
This definition makes sense even for n > log2 (r). For example, squaring is faster than
multiplication in most representations of algebraic groups, so it could be more efficient to
compute g a by taking longer strings with fewer ones in their binary expansion.
Coppersmith developed a time/memory tradeoff algorithm to solve this problem. A
thorough treatment of these ideas was given by Stinson in [587]. Without loss of generality
we assume that n and w are even (just add one to them if not).
The idea of the algorithm is to reduce solving h = g a where a has length n and
Hamming weight w to solving hg a2 = g a1 where a1 and a2 have Hamming weight w/2.
One does this by choosing a set B I = {0, 1, . . . , n 1} of size n/2. The set B is the set
of possible bit positions for the bits of a1 and (I B) is the possible bit positions for the
bits of a2 . The detailed algorithm is given in Algorithm 15. Note that one can compactly
represent subsets Y I as n-bit strings.
Algorithm 15 Coppersmiths baby-step-giant-step algorithm for the low Hamming
weight DLP
Input: g, h G of order r, n and w
Output: a of bit-length n and Hamming weight w such that h = g a , or
1: Choose B {0, . . . , n 1} such that #B = n/2
2: Initialise an easily searched structure (such as a binary tree, a heap, or a hash table)
L
3: for Y B : #Y = w/2 do
P
4:
Compute b = jY 2j and x = g b
5:
store (x, Y ) in L ordered according to first coordinate
6: end for
7: for Y (I B) : #Y = w/2 do
P
8:
Compute b = jY 2j and y = hg b
9:
if y = xP
for some (x, Y1 ) L then
10:
a = jY Y1 2j
11:
return a
12:
end if
13: end for
14: return
Exercise 13.6.2. Write down
an algorithm, to enumerate all Y B such that #Y =
n/2
n) bit operations.
w/2, which requires O( w/2
n/2
)
w/2
Algorithm 15 is not guaranteed to succed, since the set B might not exactly correspond
to a splitting of the bit positions of the integer a into two sets of Hamming weight w/2.
We now give a collection of subsets of I that is guaranteed to contain a suitable B.
Definition 13.6.5. Fix even integers n and w. Let I = {0, . . . , n 1}. A splitting
system is a set B of subsets of I of size n/2 such that for every Y I such that #Y = w
there is a set B B such that #(B Y ) = w/2.
Lemma 13.6.6. For any even integers n and w there exists a splitting system B of size
n/2.
281
Proof:
top
repeat the algorithm 1/pY,B times. One can show, using the fact
We expect
k
2k 2/k, that 1/pY,B c w for some constant (see Stinson [587]).
2k / 2k k/2
The result follows.
Exercise 13.6.10. As with all baby-step-giant-step methods, the bottleneck for this
method is the storage requirement. Show how to modify the algorithm for the case where
only M group elements of storage are available.
Exercise 13.6.11. Adapt Coppersmiths algorithm to the DLP for low weight signed
expansions (for example, NAFs, see Section 11.1.1).
All the algorithms in this section have large storage requirements. An approach due
to van Oorschot and Wiener for solving such problems using less storage is presented in
Section 14.8.1.
13.7
Let G be an algebraic group (or algebraic group quotient) over Fp (p small) and let
g G(Fpn ) with n > 1. Let p be the p-power Frobenius on G, acting on G as g 7 g p .
Hoffstein and Silverman [289] proposed computing random powers of g efficiently by
taking products of low Hamming weight Frobenius expansions.
282
In particular, for Koblitz elliptic curves (i.e., p = 2) they suggested using three sets
and taking Sj for 1 j 3 to be the set of Frobenius expansions of length n and weight
7. The baby-step-giant-step algorithm in Section 13.5 applies to this problem, but the
running time is not necessarily optimal since #S1 #S2 6= #S3 . Kim and Cheon [335]
generalised the results of Section 13.6 to allow a more balanced time/memory tradeoff.
This gives a small improvement to the running time.
Cheon and Kim [133] give a further improvement to the attack, which is similar to
the use of equivalence classes in Pollard rho (see Section 14.4). They noted that the sets
Sj in the Hoffstein-Silverman proposal have the property that for every a Sj there is
13.8
This section presents an algorithm due to Wagner [621] (though a special case was discovered earlier by Camion and Patarin), which has a similar form to the baby-step-giant-step
algorithm. This algorithm is not useful for solving the DLP in groups of relevance to
public key cryptography, but it is an example of how a non-generic algorithm can beat
the birthday bound. Further examples of non-generic algorithms that beat the birthday
bound are given in Chapter 15. For reasons of space we do not present all the details.
Definition 13.8.1. Suppose one is given large sets Lj of n-bit strings, for 1 j l.
The l-sum problem is to find xj Lj for 1 j l such that
x1 x2 xl = 0,
(13.4)
283
each x2 L2 test whether there exists x1 L2 such that LSBm (x1 ) = LSBm (x2 ). If the
sets Li are sufficiently random then it is reasonable to suppose that the size of Lj,j+1 is
#Lj #Lj+1 /2m 2m . To each pair (xj , xj+1 ) Lj,j+1 we can associate the (n m)-bit
string obtained by removing the m least significant bits of xj xj+1 .
The second step is to find (x1 , x2 ) L1,2 and (x3 , x4 ) L3,4 such that x1 x2
x3 x4 = 0. This is done by sorting the (n m)-bit truncated x1 x2 corresponding
to (x1 , x2 ) L1,2 and then, for each (x3 , x4 ) L3,4 testing whether the (n m)-bit
truncated x3 x4 is in the list. Since #L1,2 , #L3,4 2m and n m 2m then, if the
sets Lj,j+1 are sufficiently random, there is a good chance that a solution will exist.
The above arguments lead to the following heuristic result.
Heuristic 13.8.4. Let n N and m = n/3. Suppose the sets Li {0, 1}n for 1 i 4
are randomly chosen and that #Lj #Lj+1 22m for j = 1, 3. Then Wagners algorithm
should find a solution (x1 , . . . , x4 ) to equation (13.4) in the case l = 4. The running time
m ) = O(2
n/3 ) bit operations and the algorithm requires O(2
m ) = O(2
n/3 ) bits of
is O(2
storage.
The algorithm has cube root complexity, which beats the usual square-root complexity bound for such problems. The reason is that we are working in the group (Fn2 , +)
and the algorithm is not a generic algorithm: it exploits the fact that the group operation
and group representation satisfy the property LSBm (x) = LSBm (y) LSBm (x y) = 0.
The algorithm is not expected to succeed in the case when #Lj 2n/4 since it is
finding a solution to equation (13.4) of a very special form (namely, that LSBm (x1 x2 ) =
LSBm (x3 x4 ) = 0).
Exercise 13.8.5. Generalise this algorithm to the case l = 2k . Show that the algorithm
n/(1+k) ). What is the minimum
is heuristically expected to require time and space O(l2
size for the Lj (assuming they are all of equal size)?
The 4-sum problem can be put into a more general framework: Let S, S and S be
sets such that #S = N , fix an element 0 S , let f1 , f2 : S S S and f : S S S
be functions. Let L1 , L2 , L3 , L4 S be randomly chosen subsets of size #Li N 1/3 and
suppose one wants to find xj Lj for 1 j 4 such that
f (f1 (x1 , x2 ), f2 (x3 , x4 )) = 0.
Wagners algorithm can be applied to solve this problem if there is a distinguished set
D S such that the following five conditions hold:
1. #D N 2/3 .
2. Pr(f (y1 , y2 ) = 0 : y1 , y2 D) N 2/3 .
3. Pr(f1 (x1 , x2 ) D : x1 L1 , x2 L2 ) Pr(f2 (x3 , x4 ) D : x3 L3 , x4 L4 )
N 1/3 .
1/3 ) bit operations, lists
4. For j = 1, 2 one can determine, in O(N
LJ,J+1 = {(xJ , xJ+1 ) LJ LJ+1 : fj (xJ , xJ+1 ) D}
where J = 2j 1.
1/3 ) bit operations,
5. Given L1,2 and L3,4 as above one can determine, in O(N
{((x1 , x2 ), (x3 , x4 )) L1,2 L3,4 : f (f1 (x1 , x2 ), f2 (x3 , x4 )) = 0}.
284
Exercise 13.8.6. Show that the original Wagner algorithm for S = S = S = {0, 1}n
fits this formulation. What is the set D?
Exercise 13.8.7. Describe Wagners algorithm in the more general formulation.
Exercise 13.8.8. Let S = S = S be the additive group (Z/N Z, +) of integers modulo
N . Let L1 , L2 , L3 , L4 Z/N Z be such that #Li N 1/3 . Let f1 (x1 , x2 ) = f2 (x1 , x2 ) =
f (x1 , x2 ) = x1 + x2 (mod N ). Let D = {y Z : N 2/3 /2 y N 2/3 /2}. Show that the
above 5 properties hold in this setting. Can you think of any better method to solve the
problem in this setting?
Exercise 13.8.9. Let S Z and S = S = Fp . Let (g1 , g2 , g3 , g4 , h) be an instance of
the representation problem in Fp . Consider the functions
f1 (x1 , x2 ) = g1x1 g2x2 (mod p), f2 (x3 , x4 ) = hg3x3 g4x4 (mod p)
and f (y1 , y2 ) = y1 y2 (mod p). Finding a solution to f (f1 (x1 , x2 ), f2 (x3 , x4 )) = 0 solves
the representation problem.
Let m = log2 (p)/3 and define LSBm (y) for y Fp by representing y as an integer
in the range 0 y < p and outputting the m least significant bits. Let D = {y Fp :
LSBm (y) = 0}. Explain that the property 4 of the above list does not seem to hold for
this example.
Chapter 14
14.1
Birthday Paradox
The algorithms in this chapter rely on results in probability theory. The first tool we need
is the so-called birthday paradox. This name comes from the following application,
1 Pollards paper [484] contains the remark We are not aware of any particular need for such index
calculations (i.e., computing discrete logarithms) even though [484] cites the paper of Diffie and Hellman.
Pollard worked on the topic before hearing of the cryptographic applications. Hence Pollards work is
an excellent example of research pursued for its intrinsic interest, rather than motivated by practical
applications.
285
286
which surprises most people: among a set of 23 or more randomly chosen people, the
probability that two of them share a birthday is greater than 0.5 (see Example 14.1.4).
Theorem 14.1.1. Let S be a set of N elements. If elements are sampled uniformly at
random from S then the expected
number of samples
to be taken before some element is
p
The element that is sampled twice is variously known as a repeat, match or collision.
For the restpof the chapter, we will ignore the +2 and say that the expected number of
samples is N/2.
Proof: Let X be the random variable giving the number of elements selected from S
(uniformly at random) before some element is selected twice. After l distinct elements
have been selected then the probability that the next element selected is also distinct
from the previous ones is (1 l/N ). Hence the probability Pr(X > l) is given by
pN,l = 1(1 1/N )(1 2/N ) (1 (l 1)/N ).
Note that pN,l = 0 when l N . We now use the standard fact that 1 x ex for
x 0. Hence,
pN,l 1 e1/N e2/N e(l1)/N
Pl1
e 2 (l1)l/N
e(l1)
j=0
j/N
/2N
l Pr(X = l) =
l=1
=
=
l=1
l=0
l=0
1+
We estimate this sum using the integral
Z
1+
e(l1)
/2N
l=1
ex
/2N
dx.
Since ex /2N is monotonically decreasing and takes values in [0, 1] the difference between
the value of thesum and the value of the integral is at most 1. Making the change of
variable u = x/ 2N gives
Z
2
eu du.
2N
0
A standardresult in analysis (see Section 11.7 of [337] orpSection 4.4 of [632]) is that this
integral is /2. Hence, the expected value for X is N/2 + 2.
The proof only gives an upper bound on the probability of a collision after l trials.
2
3
2
A lower bound of el /2N l /6N for N 1000 and 0 l 2N log(N ) is given in
287
p
Wiener [627]; it is also shown that the expected value of the number of trials is > N/2
0.4. A more precise analysis of the birthday paradox is given in Example II.10 of Flajolet
and
Exercise 3.1.12 of Knuth [340]. The expected number of samples
pSedgewick [204] and
is N/2 + 2/3 + O(1/ N ).
We remind the reader of the meaning of expected value. Suppose the experiment of
sampling elements of a set S of size N until a collision is found is repeated t times and
each time we count
the number l of elements sampled. Then the average of l over all
p
trials tends to N/2 as t goes to infinity.
Exercise 14.1.2. Show that the number
p of elements thatneed to be selected from S to
get a collision with probability 1/2 is 2 log(2)N 1.177 N .
Exercise 14.1.3. One may be interested in the number of samples required when one
is particularly unlucky. Determine the number of trials so that with probability 0.99 one
has a collision. Repeat the exercise for probability 0.999.
The name birthday paradox arises from the following application of the result.
Example 14.1.4. In a room containing 23 or more randomly chosen people, the probability is greater than 0.5 that two people
have the same birthday. This follows from
p
p
2 log(2)365 22.49. Note also that 365/2 = 23.944 . . . .
k > 1 collisions are found is approximately 2kN. A detailed proof of this fact is given
by Kuhn and Struik as Theorem 1 of [352].
14.2
Let g be a group element of prime order r and let G = hgi. The discrete logarithm
problem (DLP) is: Given h G to find a, if it exists, such that h = g a . In this section
we assume (as is usually the case in applications) that one has already determined that
h hgi.
The starting point of the rho algorithm is the observation that if one can find ai , bi , aj , bj
Z/rZ such that
(14.1)
g ai hbi = g aj hbj
and bi 6 bj (mod r) then one can solve the DLP as
h = g (ai aj )(bj bi )
(mod r)
288
xi = xj occurs then xi+t = xj+t for all t N. Pollards original proposal used a cyclefinding method due to Floyd to find a self-collision in the sequence; we present this in
Section 14.2.2. A better approach is to use distinguished points to find collisions; we
present this in Section 14.2.4.
14.2.1
Pollard simulates a random function from G to itself as follows. The first step is to
decompose G into nS disjoint subsets (usually of roughly equal size) so that G = S0
S1 SnS 1 . Traditional textbook presentations use nS = 3 but, as explained in
Section 14.2.5, it is better to take larger values for nS ; typical values in practice are 32,
256 or 2048.
The sets Si are defined using a selection function S : G {0, . . . , nS 1} by Si =
{g G : S(g) = i}. For example, in any computer implementation of G one represents
an element g G as a unique2 binary string b(g) and interpreting b(g) as an integer one
could define S(g) = b(g) (mod nS ) (taking nS to be a power of 2 makes this computation
especially easy). To obtain different choices for S one could apply an F2 -linear map L to
the sequence of bits b(g), so that S(g) = L(b(g)) (mod nS ). These simple methods can
be a poor choice in practice, as they are not sufficiently random. Some other ways to
determine the partition are suggested in Section 2.3 of Teske [601] and Bai and Brent [24].
The strongest choice is to apply a hash function or randomness extractor to b(g), though
this may lead to an undesirable computational overhead.
Definition 14.2.1. The rho walks are defined as follows. Precompute gj = g uj hvj for
0 j nS 1 where 0 uj , vj < r are chosen uniformly at random. Set x1 = g. The
original rho walk is
2
xi
if S(xi ) = 0
xi+1 = f (xi ) =
(14.2)
xi gj if S(xi ) = j, j {1, . . . , nS 1}
The additive rho walk is
(14.3)
An important feature of the walks is that each step requires only one group operation.
Once the selection function S and the values uj and vj are chosen, the walk is deterministic. Even though these values may be chosen uniformly at random, the function
f itself is not a random function as it has a compact description. Hence, the rho walks
can only be described as pseudorandom. To analyse the algorithm we will consider the
expectation of the running time over different choices for the pseudorandom walk. Many
authors consider the expectation of the running time over all problem instances and random choices of the pseudorandom walk; they therefore write expected running time for
what we are calling average-case expected running time.
It is necessary to keep track of the decomposition
xi = g ai hbi .
The values ai , bi Z/rZ are obtained by setting a1 = 1, b1 = 0 and updating (for the
original rho walk)
2ai (mod r)
if S(xi ) = 0
2bi (mod r)
if S(xi ) = 0
ai+1 =
and bi+1 =
ai + uS(xi ) (mod r) if S(xi ) > 0
bi + vS(xi ) (mod r) if S(xi ) > 0.
(14.4)
2 One often uses projective coordinates to speed up elliptic curve arithmetic, so it is natural to use
projective coordinates when implementing these algorithms. But to define the pseudorandom walk one
needs a unique representation for points, so projective coordinates are not appropriate. See Remark 13.3.2.
289
for the random walk function. But it is important to remember that xi+1 only depends
on xi and not on (xi , ai , bi ).
Exercise 14.2.2. Give the analogue of equation (14.4) for the additive walk.
14.2.2
(14.5)
Floyds cycle finding algorithm3 is to compare xi and x2i . Lemma 14.2.3 shows that
this will find a collision in at most lt + lh steps. The crucial advantage of comparing x2i
and xi is that it only requires storing two group elements. The rho algorithm with Floyd
cycle finding is given in Algorithm 16.
Algorithm 16 The rho algorithm
Input: g, h G
Output: a such that h = g a , or
1: Choose randomly the function walk as explained above
2: x1 = g, a1 = 1, b1 = 0
3: (x2 , a2 , b2 ) = walk(x1 , a1 , b1 )
4: while (x1 6= x2 ) do
5:
(x1 , a1 , b1 ) = walk(x1 , a1 , b1 )
6:
(x2 , a2 , b2 ) = walk(walk(x2 , a2 , b2 ))
7: end while
8: if b1 b2 (mod r) then
9:
return
10: else
11:
return (a2 a1 )(b1 b2 )1 (mod r)
12: end if
3 Apparently
this algorithm first appears in print in Knuth [340], but is credited there to Floyd.
290
Lemma 14.2.3. Let the notation be as above. Then x2i = xi if and only if lh | i and
i lt . Further, there is some lt i < lt + lh such that x2i = xi .
Proof: If xi = xj then we must have lh | (i j). Hence the first statement of the Lemma
is clear. The second statement follows since there is some multiple of lh between lt and
lt + lh .
Exercise 14.2.4. Let p = 347, r = 173, g = 3, h = 11 Fp . Let nS = 3. Determine lt
and lh for the values (u1 , v1 ) = (1, 1), (u2 , v2 ) = (13, 17). What is the smallest of i for
which x2i = xi ?
Exercise 14.2.5. Repeat Exercise 14.2.4 for g = 11, h = 3 (u1 , v1 ) = (4, 7) and (u2 , v2 ) =
(23, 5).
The smallest index i such that x2i = xi is called
pthe epact. The expected value of
the epact is conjectured to be approximately 0.823 r/2; see Heuristic 14.2.9.
Example 14.2.6. Let p = 809 and consider g = 89 which has prime order 101 in Fp .
Let h = 799 which lies in the subgroup generated by g.
Let nS = 4. To define S(g) write g in the range 1 g < 809, represent this
integer in its usual binary expansion and then reduce modulo 4. Choose (u1 , v1 ) =
(37, 34), (u2 , v2 ) = (71, 69), (u3, v3 ) = (76, 18) so that g1 = 343, g2 = 676, g3 = 627. One
computes the table of values (xi , ai , bi ) as follows:
i
1
2
3
4
5
6
7
xi
89
594
280
736
475
113
736
ai
1
38
8
16
32
7
44
bi
0
34
2
4
8
26
60
S(xi )
1
2
0
0
3
1
0
It follows that lt = 4 and lh = 3 and so the first collision detected by Floyds method
is x6 = x12 . We leave as an exercise to verify that the discrete logarithm in this case is
50.
Exercise 14.2.7. Let p = 569 and let g = 262 and h = 5 which can be checked to have
order 71 modulo p. Use the rho algorithm to compute the discrete logarithm of h to the
base g modulo p.
Exercise 14.2.8. One can simplify Definition 14.2.1 and equation (14.4) by replacing gj
by either g uj or hvj (independently for each j). Show that this saves one modular addition
in each iteration of the algorithm. Explain why this optimisation should not affect the
success of the algorithm, as long as the walk uses all values for S(xi ) with roughly equal
probability.
Algorithm 16 always terminates, but there are several things that can go wrong:
The value (b1 b2 ) may not be invertible modulo r.
Hence, we can only expect to prove that the algorithm succeeds with a certain
probability (extremely close to 1).
291
The cycle may be very long (as big as r) in which case the algorithm is slower than
brute force search.
Hence, we can only expect to prove an expected running time for the algorithm. We
recall that the expected running time in this case is the average, over all choices for
the function walk, of the worst-case running time of the algorithm over all problem
instances.
Note that the algorithm always halts, but it may fail to output a solution to the DLP.
Hence, this is a Monte Carlo algorithm.
It is an open problem to give a rigorous running time analysis for the rho algorithm.
Instead it is traditional to make the heuristic assumption that the pseudorandom walk
defined above behaves sufficiently close to a random walk. The rest of this section is
devoted to showing thatthe heuristic running time of the rho algorithm with Floyd cycle
finding is (3.093 + o(1)) r group operations (asymptotic as r ).
Before stating a precise heuristic we determine an approximation to the expected value
of the epact in the case of a truly random walk.4
Heuristic 14.2.9. Let xi be a sequence of elements of a group G of order r obtained as
above by iterating a random function f : G G. Then the expected value of p
the epact
r/2
(i.e., the
smallest
positive
integer
i
such
that
x
=
x
)
is
approximately
((2)/2)
2i
i
p
0.823 r/2, where (2) = 2 /6 is the value of the Riemann zeta function at 2.
Argument: Fix a specific sequence xi and let l be the length of the rho, so that xl+1
lies in {x1 , x2 , . . . , xl }. Since xl+1 can be any one of the xi , the cycle length lh can be
any value 1 lh l and each possibility happens with probability 1/l.
The epact is the smallest multiple of lh which is bigger than lt = l lh . Hence, if
l/2 lh l then the epact is lh , if l/3 lh < l/2 then the epact is 2lh . In general, if
l/(k + 1) lh < l/k then the epact is klh . The largest possible value of the epact is l 1,
which occurs when lh = 1.
The expected value of the epact when the rho has length l is therefore
El =
X
l
X
klh Pl (k, lh )
k=1 lh =1
where Pl (k, lh ) is the probability that klh is the epact. By the above discussion, P (k, lh ) =
1/l if l/(k + 1) lh < l/k or (k, lh ) = (1, l) and zero otherwise. Hence
El =
1
l
l1
X
k=1
1
2
El
lh
l/(k+1)lh <l/k
or (k,lh )=(1,l)
(l/k)2 (l/(k + 1))2 gives
l
2
X
k k12
1
(k+1)2
k=1
k=1
4I
and
k=1
292
Hence El l/2(1 + (2) 1). It is well-known that (2) 1.645. Finally, write Pr(e) for
the probability the epact is e, Pr(l) for the probability the rho length is l, and Pr(e | l)
for the conditional probability that the epact is e given that the rho has length l. The
expectation of e is then
E(e) =
e Pr(e) =
e=1
=
=
l=1
X
l=1
X
X
e
Pr(e | l) Pr(l)
e=1
Pr(l)
X
e=1
l=1
e Pr(e | l)
Pr(l)El ((2)/2)E(l)
Theorem. Let the notation be as above and assume Heuristic 14.2.10. Then the rho
algorithm with Floyd cycle finding has expected running time of (3.093 + o(1)) r group
operations. The probability the algorithm fails is negligible.
Proof: The number of iterations of the main loop in Algorithm
p16 is the epact. By
Heuristic 14.2.10 the expected value of the epact is (0.823 + o(1)) r/2.
Algorithm 16 performs three calls to the function walk in each iteration. Each call to
walk results in one group operation and two additions modulo r (we ignore these additions
as they cost significantly less thanpa group operation). Hence
the expected number of
14.2.3
Floyd cycle finding is not a very efficient way to find cycles. Though any cycle finding
method requires computing at least lt + lh group operations, Floyds method needs on
average 2.47(lt + lh ) group operations (2.47 is three times the expected value of the
epact). Also, the slower sequence xi is visiting group elements which have already
been computed during the walk of the faster sequence x2i . Brent [98] has given an
293
improved cycle finding method5 that still only requires storage for two group elements
but which requires fewer group operations. Montgomery has given an improvement to
Brents method in [433].
One can do even better by using more storage, as was shown by Sedgewick, Szymanski
and Yao [531], Schnorr and Lenstra [523] (also see Teske [599]) and Nivasch [464].pThe rho
algorithm
using Nivasch cycle finding has the optimal expected running time of r/2
1.253 r group operations and is expected to require polynomial storage.
Finally, a very efficient way to find cycles is to use distinguished points. More importantly, distinguished points allow us to think about the rho method in a different way
and this leads to a version of the algorithm that can be parallelised. We discuss this in
the next section. Hence, in practice one always uses distinguished points.
14.2.4
The idea of using distinguished points in search problems apparently goes back to Rivest.
The first application of this idea to computing discrete logarithms is by van Oorschot and
Wiener [470].
Definition 14.2.11. An element g G is a distinguished point if its binary representation b(g) satisfies some easily checked property. Denote by D G the set of
distinguished points. The probability #D/#G that a uniformly chosen group element is
a distinguished point is denoted .
A typical example is the following.
Example 14.2.12. Let E be an elliptic curve over Fp . A point P E(Fp ) that is not
the point at infinity is represented by an x-coordinate 0 xP < p and a y-coordinate
0 yP < p. Let H be a hash function, whose output is interpreted as being in Z0 .
Fix an integer nD . Define D to be the points P E(Fp ) such that the nD least
significant bits of H(xP ) are zero. Note that OE 6 D. In other words
D = {P = (xP , yP ) E(Fp ) : H(xP ) 0 (mod 2nD ) where 0 xP < p}.
Then 1/2nD .
The rho algorithm with distinguished points is as follows. First, choose integers 0
a1 , b1 < r uniformly and independently at random, compute the group element x1 =
g a1 hb1 and run the usual deterministic pseudorandom walk until a distinguished point
xn = g an hbn is found. Store (xn , an , bn ) in some easily searched data structure (searchable
on xn ). Then choose a fresh randomly chosen group element x1 = g a1 hb1 and repeat.
Eventually two walks will visit the same group element, in which case their paths will
continue to the same distinguished point. Once a distinguished group element is found
twice then the DLP can be solved with high probability.
Exercise 14.2.13. Write down pseudocode for this algorithm.
We stress the most significant difference between this method and the method of the
previous section: the previous method had one long walk with a tail and a cycle, whereas
the new method has many short walks. Note that this algorithm does not require selfcollisions in the walk and so there is no shape anymore; the word rho in the name of
the algorithm is therefore a historical artifact, not an intuition about how the algorithm
works.
5 This
294
Note that, since the group is finite, collisions must eventually occur, and so the algorithm halts. But the algorithm may fail to solve the DLP (with low probability). Hence,
this is a Monte Carlo algorithm.
In the analysis we assume that we are sampling group elements (we sometimes call
them points) uniformly and independently at random. It is important to determine the
expected number of steps before landing on a distinguished point.
Lemma 14.2.14. Let be the probability that a randomly chosen group element is a
distinguished point. Then
1. The probability that one chooses / group elements, none of which are distinguished, is approximately e when 1/ is large.
2. The expected number of group elements to choose before getting a distinguished point
is 1/.
3. If one has already chosen i group elements, none of which are distinguished, then the
expected number of group elements to further choose before getting a distinguished
point is 1/.
Proof: The probability that i chosen group elements are not distinguished is (1 )i . So
the probability of choosing / points, none of which are distinguished, is
(1 )/ e
/
= e .
The second statement is the standard formula for the expected value of a geometric
random variable, see Example A.14.1.
For the final statement6 , suppose one has already sampled i points without finding a
distinguished point. Since the trials are independent, the probability of choosing a further
j points which are not distinguished remains (1)j . Hence the expected number of extra
points to be chosen is still 1/.
We now make the following assumption. We believe this is reasonable when r is
sufficiently large, nS > log(r), distinguished points are sufficiently common
and specified
using a good hash function (and hence, D is well distributed), > log(r)/ r and when
the function walk is chosen at random.
Heuristic 14.2.15.
1. Walks reach a distinguished point in significantly fewer than r steps (in other
words, there are no cycles in the walks and walks are not excessively longer than
1/).7
p
2. The expected number of group elements sampled before a collision is r/2.
Theorem 14.2.16. Let the notation be as above and assume Heuristicp14.2.15. Then
the
rho algorithm
with distinguished points has expected running time of ( /2 + o(1)) r
(1.253 + o(1)) r group operations. The probability the algorithm fails is negligible.
Proof: Heuristic 14.2.15 states there are no cycles or wasted walks (in the sense that
their steps do not contribute to potential collisions). Hence, before the first collision,
after N steps of the algorithm we have visited N group elements. By Heuristic 14.2.15,
6 This
295
p
the expected number of group elements to be sampled before the first collision is r/2.
The collision is not detected until walks hit a distinguished point, which adds a further
2/ to the number of
of steps (calls
psteps. Hence, the total number
Exercise 14.2.17. Show that if = log(r)/ r then the expected storage of the rho
algorithm, assuming it takes O( r) steps, is O(log(r)) group elements (which is typically
O(log(r)2 ) bits).
Exercise 14.2.18. The algorithm requires storing a triple (xn , an , bn ) for each distinguished point. Give some strategies to reduce the number of bits that need to be stored.
Exercise 14.2.19. Let G = hg1 , g2 i be a group of order r2 and exponent r. Design a
rho algorithm that, on input h G outputs (a1 , a2 ) such that h = g1a1 g2a2 . Determine the
complexity of this algorithm.
Exercise 14.2.20. Show that the Pollard rho algorithm with distinguished points has
better average-case running time than the baby-step-giant-step algorithm (see Exercises 13.3.3 and 13.3.4).
Exercise 14.2.21. Explain why taking D = G (i.e., all group elements distinguished)
leads to an algorithm that is much slower than the baby-step-giant-step algorithm.
Suppose one is given g, h1 , . . . , hL (where 1 < L < r1/4 ) and is asked to find all ai for
1 i L such that hi = g ai . Kuhn and Struik [352] propose and analyse a method to
solve
all L instances of the DLP, using Pollard rho with distinguished points, in roughly
2rL group operations. A crucial trick, attributed to Silverman and Stapleton, is that
once the i-th DLP is known one can re-write all distinguished points g a hbi in the form
g a . As noted by Hitchcock, Montague, Carter and Dawson [286] one must be careful to
choose a random walk function that does not depend on the elements hi (however, the
random starting points do depend on the hi ).
Exercise 14.2.22. Write down pseudocode for the Kuhn-Struik algorithm for solving L
instances of the DLP, and explain why the algorithm works.
Section 14.2.5 explains why the rho algorithm with distinguished points can be easily
parallelised. That section also discusses a number of practical issues relating to the use
of distinguished points.
Cheon, Hong and Kim [132] sped up Pollard rho in Fp by using a look ahead
strategy; essentially they determine in which partition the next value of the walk lies,
without performing a full group operation. A similar idea for elliptic curves has been
used by Bos, Kaihara and Kleinjung [89].
14.2.5
296
pseudorandom walk are required. The reason for believing Heuristic 14.2.15 is that experiments with the rho algorithm (see Section 14.4.3) confirm the estimate for the running
time.
Since the algorithm is fundamental to an understanding of elliptic curve cryptography
(and torus/trace methods) it is natural to demand a complete and rigorous treatment
of it. Such an analysis is not yet known, but in this section we mention some partial
results on the problem. The methods used to obtain the results are beyond the scope of
this book, so we do not give full details. Note that all existing results are in an idealised
model where the selection function S is a random function.
We stress that, in practice, the algorithm behaves as the heuristics predict. Furthermore, from a cryptographic point of view, it is sufficient for the task of determining key
sizes to have a lower bound on the running time of the algorithm. Hence, in practice, the
absence of proved running time is not necessarily a serious issue.
The main results for the original rho walk (with nS = 3) are due to Horwitz and
Venkatesan [293], Miller and Venkatesan [423], and Kim, Montenegro, Peres and Tetali [334,
333]. The basic idea is to define the rho graph, which is a directed graph with vertex
set hgi and an edge from x1 to x2 if x2 is the next step of the walk when at x1 . Fix an
integer n. Define the distribution Dn on hgi obtained by choosing uniformly at random
x1 hgi, running the walk for n steps, and recording the final point in the walk. The
crucial property to study is the mixing time which, informally, is the smallest integer
n such that Dn is sufficiently close to the uniform distribution. For these results, the
squaring operation in the original walk is crucial. We state the main result of Miller and
Venkatesan [423] below.
Theorem 14.2.23. (Theorem 1.1 of [423]) Fix > 0.
Then the rho algorithm using the
original rho walk with nS = 3 finds a collision in O ( r log(r)3 ) group operations with
probability at least 1 , where the probability is taken over all partitions of hgi into three
sets S1 , S2 and S3 . The notation O means that the implicit constant in the O depends
on .
Kim, Montenegro, Peres and Tetali improved this result in [333] to the desired O ( r)
group operations. Note that all these works leave the implied constant in the O unspecified.
Note that the idealised model of S being a random function is not implementable
with constant (or even polynomial) storage. Hence, these results cannot be applied to the
algorithm presented above, since our selection functions S are very far from uniformly
chosen over all possible partitions of the set hgi. The number of possible partitions of hgi
into three subsets of equal size is (for convenience suppose that 3 | r)
r
2r/3
r/3
r/3
which, using ab (a/b)b , is at least 6r/3 . On the other hand, a selection function
parameterised by a key of c log2 (r) bits (e.g., a selection function obtained from a
keyed hash function) only leads to rc different partitions.
Sattler and Schnorr [509] and Teske [600] have considered the additive rho walk. One
key feature of their work is to discuss the effect of the number of partitions nS . Sattler
and Schnorr show (subject to p
a conjecture) that if nS 8 then the expected running
time for the rho algorithm is c r/2 group operations for an explicit constant c. Teske
shows, using results of Hildebrand,
that the additive walk should approximate the uniform
distribution after fewer than r steps once nS 6. She recommends using the additive
297
walk
with nS 20 and, when this is done, conjectures
that the expected cycle length is
14.3
In this section we explain how the Pollard rho algorithm can be parallelised. Rather than
a parallel computing model we consider a distributed computing model. In this model
there is a server and NP 1 clients (we also refer to the clients as processors). There
is no shared storage or direct communication between the clients. Instead, the server can
send messages to clients and each client can send messages to the server. In general we
prefer to minimise the amount of communication between server and clients.8
To solve an instance of the discrete logarithm problem the server will activate a number
of clients, providing each with its own individual initial data. The clients will run the rho
pseudorandom walk and occasionally send data back to the server. Eventually the server
will have collected enough information to solve the problem, in which case it sends all
clients a termination instruction. The rho algorithm with distinguished points can very
naturally be used in this setting.
The best one can expect for any distributed computation is a linear speedup compared
with the serial case (since if the overall total work in the distributed case was less than
the serial case then this would lead to a faster algorithm in the serial case).
In other
14.3.1
All processors perform the same pseudorandom walk (xi+1 , ai+1 , bi+1 ) = walk(xi , ai , bi )
as in Section 14.2.1, but each processor starts from a different random starting point.
Whenever a processor hits a distinguished point then it sends the triple (xi , ai , bi ) to the
server and re-starts its walk at a new random point (x0 , a0 , b0 ). If one processor ever visits
a point visited by another processor then the walks from that point agree and both walks
end at the same distinguished point. When the server receives two triples (x, a, b) and
(x, a , b ) for the same group element x but with b 6 b (mod r) then it has g a hb = g a hb
and can solve the DLP as in the serial (i.e., non-parallel) case. The server therefore
computes the discrete logarithm problem and sends a terminate signal to all processors.
Pseudocode for both server and clients are given by Algorithms 17 and 18. By design, if
the algorithm halts then the answer is correct.
We now analyse the performance of this algorithm. To get a clean result we assume
that no client ever crashes, that communications between server and client are perfectly
8 There are numerous examples of such distributed computation over the internet. Two notable examples are the Great Internet Mersenne Primes Search (GIMPS) and the Search for Extraterrestrial
Intelligence (SETI). One observes that the former search has been more successful than the latter.
298
299
reliable, that all clients have the same computational efficiency and are running continuously (in other words, each processor computes the same number of group operations in
any given time period).
It is appropriate to ignore the computation performed by the server and instead to
focus on the number of group operations performed by each client running Algorithm 18.
Each execution of the function walk(x, a, b) involves a single group operation. We must
also count the number of group operations performed in line 3 of Algorithm 18; though
this term is negligible if walks are long on average (i.e., if D is a sufficiently small subset
of G).
It is an open problem to give a rigorous analysis of the distributed rho method. Hence,
we make the following heuristic assumption. We believe this
assumption is reasonable
when r is sufficiently large, nS is sufficiently large, log(r)/ r < , the set D of distinguished points is determinedpby a good hash function, the number NP of clients is
sufficiently small (e.g., NP < r/2/ log(r), see Exercise 14.3.3), the function walk is
chosen at random.
Heuristic 14.3.1.
1. The expected number
p of group elements to be sampled before the same element is
sampled twice is r/2.
2. Walks reach a distinguished point in significantly fewer than r/NP steps (in other
words, there are no cycles in the walks and walks are not excessively long). More
realistically, one could assume that only a negligible proportion of the walks fall
into a cycle before hitting a distinguished point.
Theorem 14.3.2. Let the notation be as above, in particular, let NP be the (fixed, independent of r) number of clients. Let
the probability that a group element is a distinguished point and suppose log(r)/ r < . Assume Heuristic 14.3.1 and the above
assumptions about the the reliability and equal power of the processors hold. Then the expected number ofp
group operations performed by each client ofp
the distributed rho
method
r group
is (1 + 2 log(r)) r/2/NP + 1/ group operations. This is ( /2/NP + o(1))
p
operations when < 1/ log(r)2 . The storage requirement on the server is r/2 + NP
points.
p
Proof: Heuristic 14.3.1 states that we expect to sample r/2 group elements in total
before a collision arises. Since this work is distributed over NP clients
p of equal speed
r/2/N
it follows that each client is expected to call the function
walk
about
p
p P times.
The total number of group operations is therefore r/2/NP plus 2 log(r) r/2/NP
for the work of line 3 of Algorithm 18. The server will not detect the collision until the
second client hits a distinguished point, which is expected to take 1/ further steps by
the
p heuristic (part 3 of Lemma 14.2.14). Hence each client needs to run an expected
r/2/NP + 1/ steps of the walk.
300
Exercise
p 14.2.17 shows that the complexity in the case NP = 1 can be taken to be
(1 + o(1)) r/2 group operations with polynomial storage.
Exercise 14.3.3. When distributing the algorithm it is important to ensure that, with
very high probability, each processor finds at least one distinguished p
point in less than its
total expected running time. Show that this will be the case if 1/ r/2/ (NP log(r)).
each processor only travels a distance of r/2/NP it follows we should take > NP / r.
In practice one tends to determine
the available storage first (say, c group elements where
p
c > 109 ) and to set = c/ r/2 so that the total number of distinguished points visited is expected to be c. The results of [529] validate this approach. In particular, it
is extremely unlikely that there is a self-collision (and hence a cycle) before hitting a
distinguished point.
14.4
Gallant, Lambert and Vanstone [231] and Wiener and Zuccherato [628] showed that one
can speed up the rho method in certain cases by defining the pseudorandom walk not on
the group hgi but on a set of equivalence classes. This is essentially the same thing as
working in an algebraic group quotient instead of the algebraic group.
Suppose there is an equivalence relation on hgi. Denote by x the equivalence class
of x hgi. Let NC be the size of a generic equivalence class. We require the following
properties:
1. One can define a unique representative x
of each equivalence class x.
2. Given (xi , ai , bi ) such that xi = g ai hbi then one can efficiently compute (
xi , a
i , bi )
bi
a
i
such that x
i = g h .
We give some examples in Section 14.4.1 below.
One can implement the rho algorithm on equivalence classes by defining a pseudorandom walk function walk(xi , ai , bi ) as in Definition 14.2.1. More precisely, set x1 = g, a1 =
1, b1 = 0 and define the sequence xi by (this is the original walk)
2
x
i
if S(
xi ) = 0
xi+1 = f (xi ) =
(14.6)
x
i gj if S(
xi ) = j, j {1, . . . , nS 1}
where the selection function S and the values gj = g uj hvj are as in Definition 14.2.1.
When using distinguished points one defines an equivalence class to be distinguished if
the unique equivalence class representative has the distinguished property.
There is a very serious problem with cycles that we do not discuss yet; See Section 14.4.2 for the details.
Exercise 14.4.1. Write down the formulae for updating the values ai and bi in the
function walk.
Exercise 14.4.2. Write pseudocode for the distributed rho method on equivalence classes.
301
r (C1 + C2 )
+ o(1)
2NC
p
bit operations. As usual, this becomes ( /2NC + o(1)) r/NP (C1 + C2 ) bit operations
per client when using NP processors of equal computational power.
Exercise 14.4.4. Prove this theorem.
Theorem 14.4.3 assumes a perfect random walk. For walks defined on nS partitions of
the set of equivalence classes it is shown in Appendix B of [25] (also see Section 2.2 of [91])
that one predicts a slightly improved constant than the usual factor cnS = nS /(nS 1)
mentioned at the end of Section 14.2.5.
We mention a potential paradox with this idea. In general, computing a unique
equivalence class representative involves listing all elements of the equivalencepclass, and
C ) bit operations. Hence, naively, the running time is O(
NC r/2)
hence needs O(N
bit operations, which is worse than doing the rho algorithm without equivalence classes.
However, in practice one only uses this method when C2 < C1 , in which case the speedup
can be significant.
14.4.1
We now give some examples of useful equivalence relations on some algebraic groups.
Example 14.4.5. For a group G with efficiently computable inverse (e.g., elliptic curves
E(Fq ) or algebraic tori Tn with n > 1 (e.g., see Section 6.3)) one can define the equivalence
relation x x1 . We have NC = 2 (though note that some elements, namely the
identity and elements of order 2, are equal to their inverse so these classes have size 1).
for
If xi = g ai hbi then clearly x1 = g ai hbi . One defines a unique representative x
the equivalence class by, for example, imposing a lexicographical ordering on the binary
representation of the elements in the class.
We can generalise this example as follows.
Example 14.4.6. Let G be an algebraic group over Fq with an automorphism group
Aut(G) of size NC (see examples in Sections 9.4 and 11.3.3). Suppose that for g G
of order r one has (g) hgi for each Aut(G). Furthermore, assume that for each
Aut(G) one can efficiently compute the eigenvalue Z such that (g) = g .
Then for x G one can define x = {(x) : Aut(G)}.
Again, one defines x
by listing the elements of x as bitstrings and choosing the first
one under lexicographical ordering.
Another important class of examples comes from orbits under the Frobenius map.
302
Example 14.4.7. Let G be an algebraic group defined over Fq but with group considered
over Fqd (for examples see Sections 11.3.2 and 11.3.3). Let q be the q-power Frobenius
map on G(Fqd ). Let g G(Fqd ) and suppose that q (g) = g hgi for some known
Z.
Define the equivalence relation on G(Fqd ) so that the equivalence class of x G(Fqd )
is the set x = {qi (x) : 0 i < d}. We assume that, for elements x of interest, x hgi.
Then NC = d, though there can be elements defined over proper subfields for which the
equivalence class is smaller.
If one uses a normal basis for Fqd over Fq then one can efficiently compute the elements
qi (x) and select a unique representative of each equivalence class using a lexicographical
ordering of binary strings.
Example 14.4.8. For some groups (e.g., Koblitz elliptic curves E/F2 considered as a
group over F2m ; see Exercise 9.10.10) we can combine both equivalence classes above. Let
m be prime, #E(F2m ) = hr for some small cofactor h, and P E(F2m ) of order r. Then
2 (P ) hP i and we define the equivalence class P = {2i (P ) : 0 i < m} of size 2m.
Since m is odd, this class can be considered as the orbit of P under the map 2 . The
distributed rhop
algorithm on equivalence classes for such curves is expected to require
approximately 2m /(4m) group operations.
14.4.2
One problem that can arise is walks that fall into a cycle before they reach a distinguished
point. We call these useless cycles.
Exercise 14.4.9. Suppose the equivalence relation is such that x x1 . Fix xi = x
i
and
that
S(
x
)
=
S(
x
).
Show
that
x
x
and let xi+1 = x
i g. Suppose xi+1 = x1
i+1
i
i+2
i
i+1
and so there is a cycle of order 2. Suppose the equivalence classes generically have size
NC . Show, under the assumptions that the function S is perfectly random and that x
is
a randomly chosen element of the equivalence class, that the probability that a randomly
chosen xi leads to a cycle of order 2 is 1/(NC nS ).
A theoretical discussion of cycles was given in [231] and by Duursma, Gaudry and
Morain [185]. An obvious way to reduce the probability of cycles is to take nS to be
very large compared with the average length 1/ of walks. However, as argued by Bos,
Kleinjung and Lenstra [91], large values for nS can lead to slower algorithms (for example,
due to the fact that the precomputed steps do not all fit in cache memory). Hence, as
Exercise 14.4.9 shows, useless cycles will be regularly encountered in the algorithm. There
are several possible ways to deal with this issue. One approach is to use a look-ahead
technique to avoid falling in 2-cycles. Another approach is to detect small cycles (e.g.,
by storing a fixed number of previous values of the walk or, at regular intervals, using
a cycle-finding algorithm for a small number of steps) and to design a well-defined exit
strategy for short cycles; Gallant, Lambert and Vanstone call this collapsing the cycle;
see Section 6 of [231]. To collapse a cycle one must be able to determine a well-defined
element in it; from there one can take a step (different to the steps used in the cycle from
that point) or use squaring to exit the cycle. All these methods require small amounts
of extra computation and storage, though Bernstein, Lange and Schwabe [56] argue that
the additional overhead can be made negligible. We refer to [56, 91] for further discussion
of these issues.
Gallant, Lambert and Vanstone [231] presented a different walk that does not, in
general, lead to short cycles. Let G be an algebraic group with an efficiently computable
endomorphism of order m (i.e., m = is the identity map). Let g G of
303
order r be such that (g) = g so that (x) = x for all x hgi. Define the equivalence
classes x = { j (x) : 0 j < m}. We define a pseudorandom sequence xi = g ai hbi by
using x to select an endomorphism (1 + j ) and then acting on xi with this map. More
precisely, j is some function of x
(e.g., the function S in Section 14.2.1) and
j
(the above equation looks more plausible when the group operation is written additively:
xi+1 = xi + j (xi ) = (1 + j )xi ). One can check that the map is well-defined on
equivalence classes and that xi+1 = g ai+1 hbi+1 where ai+1 = (1 + j )ai (mod r) and
bi+1 = (1 + j )bi (mod r).
We stress that this approach still requires finding a unique representative of each
equivalence class in order to define the steps of the walk in a well-defined way. Hence, one
can still use distinguished points by defining a class to be distinguished if its representative
is distinguished. One suggestion, originally due to Harley, is to use the Hamming weight
of the x-coordinate to derive the selection function.
One drawback of the Gallant, Lambert, Vanstone idea is that there is less flexibility
in the design of the pseudorandom walk.
Exercise 14.4.10. Generalise the Gallant-Lambert-Vanstone walk to use (c + j ) for
any c Z. Why do we prefer to only use c = 1?
Exercise 14.4.11.
Show that taking nS = log(r) means the total overhead from handling
cycles is o( r), while the additional storage (group elements for the random walks) is
O(log(r)) group elements.
Exercise 14.4.11 together with Exercise 14.2.17 shows that (as long as computing
equivalence class representatives is fast) one can solve the
p discrete logarithm problem
using equivalence classes of generic size NC in (1 + o(1)) r/(2NC ) group operations
and O(log(r)) group elements storage.
14.4.3
Real computations are not as simple as the idealised analysis above: one doesnt know
in advance how many clients will volunteer for the computation; not all clients have the
same performance or reliability; clients may decide to withdraw from the computation at
any time; the communications between client and server may be unreliable etc. Hence, in
practice one needs to choose the distinguished points to be sufficiently common that even
the weakest client in the computation can hit a distinguished point within a reasonable
time (perhaps after just one or two days). This may mean that the stronger clients are
finding many distinguished points every hour.
The largest discrete logarithm problems solved using the distributed rho method are
mainly the Certicom challenge elliptic curve discrete logarithm problems. The current
records are for the groups E(Fp ) where p 2108 + 2107 (by a team coordinated by Chris
Monico in 2002) and where p = (2128 3)/76439 2111 + 2110 (by Bos, Kaihara and
Montgomery in 2009) and for E(F2109 ) (again by Monicos team in 2004). None of these
computations used the equivalence class {P, P }.
We briefly summarise the parameters used for these large computations. For the 2002
result the curve E(Fp ) has prime order so r 2108 + 2107 . The number of processors
was over 10,000 and they used = 229 . The number of distinguished
points found
p
was 68228567 which is roughly 1.32 times the expected number r/2 of points to be
304
collected. Hence, this computation was unlucky in that it ran about 1.3 times longer than
the expected time. The computation ran for about 18 months.
The 2004 result is for a curve over F2109 with group order 2r where r 2108 . The
computation used roughly 2000 processors, = 230 and the number of distinguished
p
points found was 16531676. This is about 0.79 times the expected number 2108 /2.
This computation took about 17 months.
The computation by Bos, Kaihara and Montgomery [90] was innovative in that the
work was done using a cluster of 200 computer game consoles. The random walk used
nS = 16 and = 1/224 . The total number of group operations performed was 8.5 1016
(which is 1.02 times the expected value) and 5 109 distinguished points were stored.
Exercise 14.4.12.
Verify that the parameters above satisfy
the requirements that is
14.5
This algorithm is designed for the case where the discrete logarithm is known to lie in a
short interval. Suppose g G has order r and that h = g a where a lies in a short interval
b a < b + w of width w. We assume that the values of b and w are known. Of course,
one can solve this problem using the rho algorithm, but if w is much smaller than the
order of g then this will not necessarily be optimal.
The kangaroo method was originally proposed by Pollard [484]. Van Oorschot and
Wiener [470] greatly improved it by using distinguished points. We present the improved
version in this section.
For simplicity, compute h = hg b . Then h g x (mod p) where 0 x < w. Hence,
there is no loss of generality by assuming that b = 0. Thus, from now on our problem is:
Given g, h, w to find a such that h = g a and 0 a < w.
As with the rho method, the kangaroo method relies on a deterministic pseudorandom
walk. The steps in the walk are pictured as the jumps of the kangaroo, and the group
elements visited are the kangaroos footprints. The idea, as explained by Pollard, is
to catch a wild kangaroo using a tame kangaroo. The tame kangaroo is a sequence
xi = g ai where ai is known. The wild kangaroo is a sequence yj = hg bj where bj is
known. Eventually, a footprint of the tame kangaroo will be the same as a footprint of
the wild kangaroo (this is called the collision). After this point, the tame and wild
footprints are the same.9 The tame kangaroo lays traps at regular intervals (i.e., at
distinguished points) and, eventually, the wild kangaroo falls in one of the traps.10 More
precisely, at the first distinguished point after the collision, one finds ai and bj such that
g ai = hg bj and the DLP is solved as h = g ai bj .
There are two main differences between the kangaroo method and the rho algorithm.
9 A collision between two different walks can be drawn in the shape of the letter . Hence Pollard
also suggested this be called the lambda method. However, other algorithms (such as the distributed
rho method) have collisions between different walks, so this naming is ambiguous. The name kangaroo
method emphasises the fact that the jumps are small. Hence, as encouraged by Pollard, we do not use
the name lambda method in this book.
10 Actually, the wild kangaroo can be in front of the tame kangaroo, in which case it is better to think
of each kangaroo trying to catch the other.
305
Jumps are small. This is natural since we want to stay within (or at least, not
too far outside) the interval.
When a kangaroo lands on a distinguished point one continues the pseudorandom
walk (rather than restarting the walk at a new randomly chosen position).
14.5.1
The pseudorandom walk for the kangaroo method has some significant differences to
the rho walk: steps in the walk correspond to known small increments in the exponent
(in other words, kangaroos make small jumps of known distance in the exponent). We
therefore do not include the squaring operation xi+1 = x2i (as the jumps would be too
big) or multiplication by h (we would not know the length of the jump in the exponent).
We now describe the walk precisely.
As in Section 14.2.1 we use a function S : G {0, . . . , nS 1} which partitions G
into sets Si = {g G : S(g) = i} of roughly similar size.
PnS 1
uj )/nS to be
For 0 j < nS choose exponents 1 uj w Define m = ( j=0
14.5.2
We need to specify where to start the tame and wild kangaroos, and what the mean
step size should be. The wild kangaroo starts at y0 = h = g a with 0 a < w. To
minimise the distance between the tame and wild kangaroos at the start of the algorithm,
we start the tame kangaroo at x0 = g w/2 , which is the middle of the interval. We take
alternate jumps and store the values (xi , ai ) and (yi , bi ) as above (i.e., so that xi = g ai
and yi = hg bi ). Whenever xi (respectively, yi ) is distinguished we store (xi , ai ) (resp.,
(yi , bi )) in an easily searched structure. The storage can be reduced by using the ideas of
Exercise 14.2.18.
When the same distinguished point is visited twice then we have two entries (x, a)
and (x, b) in the structure and so either hg a = g b or g a = hg b . The ambiguity is resolved
by seeing which of a b and b a lies in the interval (or just testing if h = g ab or not).
As we will explain in Section 14.5.3, the optimal choice for the mean step size is
m = w/2.
306
Figure 14.1: Kangaroo walk. Tame kangaroo walk pictured above the axis and wild
kangaroo walk pictured below. The dot indicates the first collision.
0
26
26
2
181
0
1
1
2
30
2
51
2
3
2
162
34
2
75
10
3
3
235
38
3
2
18
2
4
129
46
1
162
22
2
The collision is detected when the distinguished point 162 is visited twice. The solution
to the discrete logarithm problem is therefore 34 22 = 12.
Exercise 14.5.3. Using the same parameters as Example 14.5.2, solve the DLP for
h = 78.
14.5.3
The analysis of the algorithm does not rely on the birthday paradox; instead, the mean
step size is the crucial quantity. We sketch the basic probabilistic argument now. A more
precise analysis is given in Section 14.5.6. The following heuristic assumption seems to be
reasonable when w is sufficiently large, nS > log(w), distinguished points are sufficiently
commonand specified using a good hash function (and hence are well distributed), >
log(w)/ w and when the function walk is chosen at random.
Heuristic 14.5.4.
307
1. Walks reach a distinguished point in significantly fewer than w steps (in other
words, there are no cycles in the walks and walks are not excessively longer than
1/).
2. The footprints of a kangaroo are uniformly distributed in the region over which it
has walked with, on average, one footprint in each interval of length m.
3. The footsteps of tame and wild kangaroos are independent of one another before
the time when the walks collide.
Theorem 14.5.5. Let the notation be as above and assume Heuristic 14.5.4. Then the
kangaroo
algorithm with distinguished points has average case expected running time of
(2 + o(1)) w group operations. The probability the algorithm fails is negligible.
Proof: We dont know whether the discrete logarithm of h is greater or less than w/2.
So, rather than speaking of tame and wild kangaroos we will speak of the front and
rear kangaroos. Since one kangaroo starts in the middle of the interval, the distance
between the starting point of the rear kangaroo and the starting point of the front kangaroo is between 0 and w/2 and is, on average, w/4. Hence, on average, w/(4m) jumps
are required for the rear kangaro to pass the starting point of the front kangaroo.
After this point, the rear kangaroo is travelling over a region that has already been
jumped over by the front kangaroo. By our heuristic assumption, the footprints of the
tame kangaroo are uniformly distributed over the region with, on average, one footprint
in each interval of length m. Also, the footprints of the wild kangaroo are independent,
and with one footprint in each interval of length m. The probability, at each step, that
the wild kangaroo does not land on any of the footprints of the tame kangaroo is therefore
heuristically 1 1/m. By exactly the same arguments as Lemma 14.2.14 it follows that
the expected number of jumps until a collision is m.
Note that there is a miniscule possibility that the walks never meet (this does not
require working in an infinite group, it can even happen in a finite group if the orbits
of the tame and wild walks are disjoint subsets of the group). If this happens then the
algorithm never halts. Since the walk function is chosen at random, the probability of
this eventuality is negligible. On the other hand, if the algorithm halts then its result is
correct. Hence, this is a Las Vegas algorithm.
The overall number of jumps made by the rear kangaroo until the first collision is
therefore,
on average, w/(4m) + m. One can easily check that this is minimised by taking
m = w/2. The kangaroo is also expected to perform a further 1/ steps to the next
distinguished point. Sincethere are two kangaroos
the expected total number of group
operations performed is 2 w + 2/ = (2 + o(1)) w.
This result is proved by Montenegro and Tetali [431] under the assumption that S is
a random function and that the distinguished points are well-distributed. Pollard [485]
shows it is valid when the o(1) is replaced by for some 0 < 0.06.
Note that the expected distance, on average, travelled by a kangaroo is w/4+m2 = w/2
steps. Hence, since the order of the group is greater than w, we do not expect any selfcollisions in the kangaroo walk.
We stress that, as with the rho method, the probability of success is considered over
the random choice of pseudorandom walk, not over the space of problem instances. Exercise 14.5.6 considers a different way to optimise the expected running time.
Exercise 14.5.6. Show that, with the above choice of m, the expected number
of group
308
14.5.4
We now consider whether one should use the rho or kangaroo algorithm when solving a
general discrete logarithm problem (i.e., where the width w of
the interval is equal to, or
close to, r). If w = r then the rho methodrequires roughly 1.25 r group operations while
the kangaroo method requires roughly 2 r group operations. The heuristic assumptions
underlying both methods are similar, and in practice they work as well as the theory
predicts. Hence, it is clear that the rho method is preferable, unless w is much smaller
than r.
Exercise 14.5.10. Determine the interval size below which it is preferable to use the
kangaroo algorithm over the rho algorithm.
14.5.5
Using Inversion
Galbraith, Ruprai and Pollard [225] showed that one can improve the kangaroo method
by exploiting inversion in the group.11 Suppose one is given g, h, w and told that h = g a
with 0 a < w. We also require that the order r of g is odd (this will always be the case,
due to the Pohlig-Hellman algorithm). Suppose, for simplicity, that w is even. Replacing
h by hg w/2 we have h = g a with w/2 a < w/2. One can perform a version of
the kangaroo method with three kangaroos: One tame kangaroo starting from g u for an
appropriate value of u and two wild kangaroos starting from h and h1 respectively.
The algorithm uses the usual kangaroo walk (with mean step size to be determined
later) to generate three sequences (xi , ai ), (yi , bi ), (zi , ci ) such that xi = g ai , yi = hg bi
and zi = h1 g ci . The crucial observation is that a collision between any two sequences
leads to a solution to the DLP. For example, if xi = yj then h = g ai bj and if yi = zj then
11 This
research actually grew out of writing this chapter. Sometimes it pays to work slowly.
309
hg bi = h1 g cj and so, since g has odd order r, h = g (cj bi )2 (mod r) . The algorithm uses
distinguished points to detect a collison. We call this the three-kangaroo algorithm.
Exercise 14.5.11. Write down pseudocode for the three-kangaroo algorithm using distinguished points.
We now give a brief heuristic analysis of the three-kangaroo algorithm. Without loss
of generality we assume 0 a w/2 (taking negative a simply swaps h and h1 , so does
not affect the running time). The distance between the starting points of the tame and
wild kangaroos is 2a. The distance between the starting points of the tame and right-most
wild kangaroo is |a u|. The extreme cases (in the sense that the closest pair of kangaroos
are as far apart as possible) are when 2a = u a or when a = w/2. Making all these cases
equal leads to the equation 2a = u a = w/2 u. Calling this distance l it follows that
w/2 = 5l/2 and u = 3w/10. The average distance between the closest pair of kangaroos is
then w/10 and the closest pair of kangaroos can be thought of as performing the standard
kangaroo method in an interval of length 2w/5. Following the analysis
standard
p of thep
1
kangaroo
method
it
is
natural
to
take
the
mean
step
size
to
be
m
=
2w/5
=
w/10
2
closest pair of kangaroos) would be 32 2 2w/5 1.897 w. A more careful analysis takes
into account the possibility of collisions between any pair of kangaroos. We refer
to [225]
for the details and merely remark that the correct mean step size is m 0.375
w and
the average-case expected number of group operations is approximately 1.818 w.
Exercise 14.5.12. The distance between a and a is even, so a natural trick is to use
jumps of even length. Since we dont know whether a is even or odd, if this is done
we dont know whether to start the tame kangaroo at g u or g u+1 . However, one can
consider a variant of the algorithm with two wild kangaroos (one starting from h and one
from h1 ) and two tame kangaroos (one starting from g u and one from g u+1 ) and with
jumps of even length. This is called the four-kangaroo
algorithm. Explain why the
average-case
correct choice for the mean step size is m = 0.375 2w and why the heuristic
14.5.6
Montenegro and Tetali [431] have analysed the kangaroo method using jumps which are
powers of 2, under the assumption that the selection function S is random and that
the distinguished points are well-distributed.
They prove that the average-case expected
number of group operations is (2 + o(1)) w group operations. It is beyond the scope of
this book to present their methods.
We now present Pollards analysis of the kangaroo method from his paper [485], though
these results have been superseded by [431]. We restrict to the case where the selection
function S maps G to {0, 1, . . . , nS 1} and the kangaroo jumps are taken to be 2S(x) (i.e.,
the set of jumps is {1, 2, 4, . . . , 2nS 1 } and the mean of the jumps is m = (2nS 1)/nS ).
We assume nS > 2. Pollard argues in [485] that if one only uses two jumps {1, 2n} (for
some n) then the best one can hope for is an algorithm with running time O(w2/3 ) group
operations.
Pollard also makes the usual assumption that S is a truly random function.
310
nX
nX
S 1
S 1
B(2j i)
F (i 2j ) +
F (i) = 1
nS
j=0,2j i
j=0,2j <i
and
B(i) = 1 +
1
nS
nX
S 1
j=0,2j <i
B(i 2j ) +
nX
S 1
j=0,2j i
F (2j i)
Pollard then considers the expected value of the number of steps of the wild kangaroo to
a collision, namely
1)
2(nSX
1
q(i)F (i)
i=1
which we write as mC(nS ) for some C(nS ) R. In [485] one finds numerical data for
C(nS ) which suggest that it is between 1 and 1.06 when nS 12. Pollard also conjectures
that limnS C(nS ) = 1.
Given an interval ofsize w one chooses nS such that the mean m = (2nS 1)/nS is
as close as possible to w/2. One runs the tame Kangaroo, starting at w, for mC(nS )
steps and sets the trap. The wild kangaroo is expected to need w/2m steps to pass the
start of the tame kangaroo followed by mC(nS ) steps to fall into the trap. Hence, the
expected number of group operations for the kangaroo algorithm (for a random function
S) is
w/2m + 2mC(nS ).
311
(1 + C(nS )) w
group operations.
In practice one would slightly adjust the jumps {1, 2, 4, . . . , 2nS 1 } (while
p hoping that
this does not significantly change the value of C(nS )) to arrange that m = w/C(nS )/2.
14.6
Let NP be the number of processors or clients. A naive way to parallelise the the kangaroo
algorithm is to divide the interval [0, w) into NP sub-intervals of size w/NP and then run
the kangaroo algorithm
in parallel on each sub-interval. This gives an algorithm with
p
running time O( w/NP ) group operations per client, which is not a linear speedup.
Since we are using distinguished points one should be able to do better. But the
kangaroo method is not as straightforward to parallelise as the rho method (a good
exercise is to stop reading now and think about it for a few minutes). The solution is
to use a herd of NP /2 tame kangaroos and a herd of NP /2 wild kangaroos. These are
super-kangaroos in the sense that they take much bigger jumps (roughly NP /2 times
longer) than in the serial case. The goal is to have a collision between one of the wild
kangaroos and one of the tame kangaroos. We imagine that both herds are setting traps,
each trying to catch a kangaroo from the other herd (regrettably, they may sometimes
catch one of their own kind).
When a kangaroo lands on a distinguished point one continues the pseudorandom walk
(rather than restarting the walk at a new randomly chosen position). In other words, the
herds march ever onwards with an occasional individual hitting a distinguished point and
sending information back to the server. See Figure 14.2 for a picture of the herds in
action.
There are two versions of the distributed algorithm, one by van Oorschot and Wiener [470]
and another by Pollard [485]. The difference is how they handle the possibility of collisions between kangaroos of the same herd. The former has a mechanism to deal with
this, which we will explain later. The latter paper elegantly ensures that there will not
be collisions between individuals of the same herd.
14.6.1
We first present the algorithm of van Oorschot and Wiener. The herd of tame kangaroos
starts around the midpoint of the interval [0, w), and the kangaroos are spaced a (small)
distance s apart (as always, we describe kangaroos by their exponent). Similarly, the wild
kangaroos start near a = logg (h), again spaced a distance
s apart. As we will explain
later, the mean step size of the jumps should be m NP w/4.
Here walk(xi , ai ) is the function which returns xi+1 = xi gS(xi ) and ai+1 = ai + uS(xi ) .
Each client has a variable type which takes the value tame or wild.
If there is a collision between two kangaroos of the same herd then it will eventually be
detected when the second one lands on the same distinguished point as the first. In [470]
it is suggested that in this case the server should instruct the second kangaroo to take a
jump of random length so that it no longer follows the path of the front kangaroo. Note
that Teske [602] has shown that the expected number of collisions within the same herd
is 2, so this issue can probably be ignored in practice.
We now give a very brief heuristic analysis of the running time. The following assumption seems to be reasonable when w is sufficiently large, nS is sufficiently large,
312
Figure 14.2: Distributed kangaroo walk (van Oorschot and Wiener version). The herd
of tame kangaroos is pictured above the axis and the herd of wild kangaroos is pictured
below. The dot marks the collision.
Algorithm 19 The distributed kangaroo algorithm (van Oorschot and Wiener version):
Server side
Input: g, h G, interval length w, number of clients NP
Output: a such that h = g a
313
Algorithm 20 The distributed kangaroo algorithm (van Oorschot and Wiener version):
Client side
Input: (x1 , a1 , type) G Z/rZ, function walk
1: while terminate signal not received do
2:
(x1 , a1 ) = walk(x1 , a1 )
3:
if x1 D then
4:
Send (x1 , a1 , type) to server
5:
if Receive jump instruction then
6:
Choose random 1 < u < 2m (where m is the mean step size)
7:
Set a1 = a1 + u, x1 = x1 g u
8:
end if
9:
end if
10: end while
1. Walks reach a distinguished point in significantly fewer than w steps (in other
words, there are no cycles in the walks and walks are not excessively longer than
1/).
2. When two kangaroos with mean step size m walk over the same interval, the expected number of group elements sampled before a collision is m.
3. Walks of kangaroos in the same herd are independent.12
Theorem 14.6.2. Let NP be the number of clients (fixed, independent of w). Assume
Heuristic 14.6.1 and that all clients are reliable and have the same computing power. The
average-case expected number of group
operations performed by the distributed kangaroo
method for each client is (2 + o(1)) w/NP .
Proof: Since we dont know where the wild kangaroo is, we speak of the front herd and
the rear herd. The distance (in the exponent) between the front herd and the rear herd
is, on average, w/4. So it takes w/(4m) steps for the rear herd to reach the starting point
of the front herd.
We now consider the footsteps of the rear herd in the region already visited by the front
herd of kangaroos. Assuming the NP /2 kangaroos of the front herd are independent, the
region already covered by these kangaroos is expected to have NP /2 footprints in each
interval of length m. Hence, under our heuristic assumptions, the probability that a
random footprint of one of the rear kangaroos lands on a footprint of one of the front
kangaroos is NP /(2m). Since there are NP /2 rear kangaroos, all mutually independent,
the probability of one of the rear kangaroos landing on a tame footprint is NP2 /(4m). By
the heuristic assumption, the expected number of footprints to be made before a collision
occurs is 4m/NP2 .
12 This assumption is very strong, and indeed is false in general (since there is a chance that walks
collide). The assumption is used for only two purposes. First, to amplify the second assumption in the
heuristic from any pair of kangaroos to the level of herds. Second, to allow us to ignore collisions between
kangaroos in the same herd (Teske, in Section 7 of [602], has argued that such collisions are rare). One
could replace the assumption of independence by these two consequences.
314
Finally, the collision will not be detected until a distinguished point is visited. Hence,
one expects a further 1/ steps to be made.
The expected number of group operations made by each client in the average case is
2
therefore w/(4m) +
4m/NP + 1/. Ignoring the 1/ term, this expression is minimised
by taking m = NP w/4. The result follows.
The remarks made in Section 14.3.1 about parallelisation (for example, Exercise 14.3.3)
apply equally for the distributed kangaroo algorithm.
Exercise 14.6.3. The above analysis is optimised for the average-case running time.
Determine the mean step size to optimise the worst-case
expected running time. Show
that the heuristic optimal running time is (3 + o(1)) w/NP group operations.
Exercise 14.6.4. Give distributed versions of the three-kangaroo and four-kangaroo
algorithms of Section 14.5.5.
14.6.2
Pollard Version
Pollards version reduces the computation to essentially a collection of serial versions, but
in a clever way so that a linear speed-up is still obtained. One merit of this approach is
that the analysis of the serial kangaroo algorithm can be applied; we no longer need the
strong heuristic assumption that kangaroos in the same herd are mutually independent.
Let NP be the number of processors and suppose we can write NP = U + V where
gcd(U, V ) = 1 and U, V NP /2. The number of tame kangaroos is U and the number of
wild kangaroos is V . The (super) kangaroos perform
the usual pseudorandom walk with
steps {U V u0 , . . . , U V un1 } having mean m NP w/4 (this is U V times the mean step
size for solving the DLP in an interval of length w/U V 4w/NP2 ). As usual we choose
either uj 2j or else random values between 0 and 2m/U V .
The U tame kangaroos start at
g w/2+iV
for 0 i < U . The V wild kangaroos start at hg jU for 0 j < V . Each kangaroo then
uses the pseudorandom walk to generate a sequence of values (xn , an ) where xn = g an or
xn = hg an . Whenever a distinguished point is hit the kangaroo sends data to the server
and continues the same walk.
Lemma 14.6.5. Suppose the walks do not cover the whole group, i.e., 0 an < r. Then
there is no collision between two tame kangaroos or two wild kangaroos. There is a unique
pair of tame and wild kangaroos who can collide.
Proof: Each element of the sequence generated by the ith tame kangaroo is of the form
g w/2+iV +lUV
for some l Z. To have a collision between two different tame kangaroos one would need
w/2 + i1 V + l1 U V = w/2 + i2 V + l2 U V
and reducing modulo U implies i1 i2 (mod U ) which is a contradiction. To summarise,
the values an for the tame kangaroos all lie in disjoint equivalence classes modulo U . A
similar argument shows that wild kangaroos do not collide.
Finally, if h = g a then i = (w/2 a)V 1 (mod U ) and j = (a w/2)U 1 (mod V )
are the unique pair of indices such that the ith tame kangaroo and the jth wild kangaroo
can collide.
315
The analysis of the algorithm therefore reduces to the serial case, since we have one
tame kangaroo and one wild kangaroo who can collide. This makes the heuristic analysis
simple and immediate.
Theorem 14.6.6. Let the notation be as above. Assume Heuristic 14.5.4 and that all
clients are reliable and have the same computational
exp power. Then the average-case
pected running time for each client is (1 + o(1)) w/U V = (2 + o(1)) w/NP group
operations.
Proof: The action is now constrained to an equivalence class modulo U V , so the clients
behave like the serial kangaroo method in an interval of size w/U V (see Exercise 14.5.8
for reducing a DLP in a congruence
class to aDLP in a smaller interval). The mean step
p
size is therefore m U V w/U V /2 NP w/4. Applying Theorem 14.5.5 gives the
result.
14.6.3
Both versionsof the distributed kangaroo method have the same heuristic running time
of (2 + o(1)) w/NP group operations.13 So which is to be preferred in practice? The
answer depends on the context of the computation. For genuine parallel computation in
a closed system (e.g., using special-purpose hardware) then either could be used.
In distributed environments then both methods have drawbacks. For example, the
van Oorschot-Wiener method needs a communication from server to client in response
to uploads of distinguished point information (the take a random jump instruction);
though Teske [602] has remarked that this can probably be ignored.
More significantly, both methods require knowing the number NP of processors at
the start of the computation, since this value is used to specify the mean step size. This
causes problems if a large number of new clients join the computation after it has begun.
With the van Oorschot and Wiener method, if further clients want to join the computation after it has begun, then they can be easily added (half the new clients tame and
half wild) by starting them at further shifts from the original starting points of the herds.
With Pollards method it is less clear how to add new clients. Even worse, since only one
pair of lucky clients has the potential to solve the problem, if either of them crashes or
withdraws from the computation then the problem will not be solved. As mentioned in
Section 14.4.3 these are serious issues which do arise in practice.
On the other hand, these issues can be resolved by over-estimating NP and by issuing
clients with fresh problem instances once they have produced sufficiently many distinguished points from their current instance. Note that this also requires communication
from server to client.
14.7
Gaudry and Schost [248] give a different approach to solving discrete logarithm problems
using pseudorandom walks. As we see in Exercise 14.7.6, this method is slower than the
rho method when applied to the whole group. However, the approach leads to low-storage
algorithms for the multi-dimensional discrete logarithm problems (see Definition 13.5.1);
and the discrete logarithm problem in an interval using equivalence classes. This is
interesting since, for both problems, it is not known how to adapt the rho or kangaroo
methods to give a low-memory algorithm with the desired running time.
13 Though the analysis by van Oorschot and Wiener needs the stronger assumption that the kangaroos
in the same herd are mutually independent.
316
The basic idea of the Gaudry-Schost algorithm is as follows. One has pseudorandom
walks in two (or more) subsets of the group such that a collision between walks of different
types leads to a solution to the discrete logarithm problem. The sets are smaller than the
whole group, but they must overlap (otherwise, there is no chance of a collision). Typically, one of the sets is called a tame set and the other a wild set. The pseudorandom
walks are deterministic, so that when two walks collide they continue along the same path
until they hit a distinguished point and stop. Data from distinguished points is held in
an easily searched database held by the server. After reaching a distinguished point, the
walks re-start at a freshly chosen point.
14.7.1
317
An important practical consideration is that walks will sometimes go outside the tame
or wild regions. One might think that this issue can be solved by simply taking the values
x and y into account and altering the walk when close to the boundary, but then the
crucial property of the walk function (that once two walks collide, they follow the same
path) would be lost. By taking distinguished points to be quite common (i.e., increasing
the storage) and making M relatively small one can minimise the impact of this problem.
Hence, we ignore it in our analysis.
We now briefly explain the heuristic complexity of the algorithm. The key observation
is that a collision can only occur in the region where the two sets overlap. Let A = T W .
If one samples uniformly at random in A, alternately writing elements down on a tame
and wild
list, the expected number of samples until the two lists have an element in
common is #A + O(1) (see, for example, Selivanov [532] or [222]).
The following heuristic assumption seems to be reasonable when N is sufficiently
large, nS > log(N ), distinguished points are sufficiently common and specified using a
good hash function (and hence are well-distributed), > log(N )/N , walks are sufficiently
local that they do not go outside T (respectively, W ) but also not too local, and when
the function walk is chosen at random.
Heuristic 14.7.4.
1. Walks reach a distinguished point in significantly fewer than N steps (in other
words, there are no cycles in the walks and walks are not excessively longer than
1/).
2. Walks are uniformly distributed in T (respectively, W ).
Theorem 14.7.5. Let the notation be as above, and assume Heuristic 14.7.4. Then
the average-case
number of group operations performed by the Gaudry-Schost
expected
algorithm is ( (2(2 2))2 + o(1))N (2.43 + o(1))N .
Proof: We first compute #(T W ). When (a1 , a2 ) = (N/2, N/2) then W = T and so
#(T W ) = N 2 . In all other cases the intersection is less. The extreme case is when
(a1 , a2 ) = (0, 0) (similar cases are (a1 , a2 ) = (N 1, N 1) etc). Then W = {(x, y)
Z2 : N/2 x, y < N/2} and #(T W ) = N 2 /4. By symmetry it suffices to consider
the case 0 a1 , a2 < N/2 in which case we have #(T W ) (N/2 + a1)(N/2 + a2 ) (here
we are approximating the number
of integer points in a set by its area).
Let A = T W . To sample #A elements in A it is necessary to sample #T /#A
elements in T and W . Hence, the number of group elements to be selected overall is
#T p
#A + O(1) = (#T + o(1)) (#A)1/2 .
#A
(N 2 + o(1))
Note that
Z
2 2
N
N/2
N/2
N/2
(N x)1/2 dx =
N (2 2).
318
as stated.
The Gaudry-Schost algorithm has a number of parameters that can be adjusted (such
as the type of walks, the sizes of the tame and wild regions etc). This gives it a lot of
flexibility and makes it suitable for a wide range of variants of the DLP. Indeed, Galbraith
and Ruprai [226] have improved the running time to (2.36 + o(1))N group operations by
using smaller tame and wild sets (also, the wild set is a different shape). One drawback
is that it is hard to fine-tune all these parameters to get an implementation that achieves
the theoretically optimal running time.
Exercise 14.7.6. Determine the complexity of the Gaudry-Schost algorithm for the
standard DLP in G, when one takes T = W = G.
Exercise 14.7.7. Generalise the Gaudry-Schost algorithm to the n-dimensional DLP
(see Definition 13.5.1). What is the heuristic average-case expected number of group
operations?
14.7.2
Galbraith and Ruprai [227] used the Gaudry-Schost algorithm to solve the DLP in an
interval of length N < r faster than is possible using the kangaroo method when the
group has an efficiently computable inverse (e.g., elliptic curves or tori). First, shift the
discrete logarithm problem so that it is of the form h = g a with N/2 < a N/2. Define
the equivalence relation u u1 for u G as in Section 14.4 and determine a rule that
leads to a unique representative of each equivalence class. Design a pseudorandom walk
on the set of equivalence classes. The tame set is the set of equivalence classes coming
from elements of the form g x with N/2 < x N/2. Note that the tame set has 1 + N/2
elements and every equivalence class {g x, g x } arises in two ways, except the singleton
class {1} and the class {N/2, N/2}.
A natural choice for the wild set is the set of equivalence classes coming from elements
of the form hg x with N/2 < x N/2. Note that the size of the wild set now depends
on the discrete logarithm problem: if h = g 0 = 1 then the wild set has 1 + N/2 elements
while if h = g N/2 then the wild set has N elements. Even more confusingly, sampling
from the wild set by uniformly choosing x does not, in general, lead to uniform sampling
from the wild set. This is because the equivalence class {hg x , (hg x )1 } can arise in either
one or two ways, depending on h. To analyse the algorithm it is necessary to use a nonuniform version of the birthday paradox (see, for example, Galbraith and Holmes [222]).
The main result of [227]
is an algorithm that solves the DLP in heuristic average-case
expected (1.36 + o(1)) N group operations.
14.8
Van Oorschot and Wiener [470] propose a general method, motivated by Pollards rho
algorithm, for finding collisions of functions using distinguished points and parallelisation.
They give applications to cryptanalysis of hash functions and block ciphers that are
beyond the scope of this book. But they also give applications of their method for
algebraic meet-in-the-middle attacks, so we briefly give the details here.
First we sketch the parallel collision search method. Let f : S S be a function
mapping some set S of size N to itself. Define a set D of distinguished points in S.
Each client chooses a random starting point x1 S, iterates xn+1 = f (xn ) until it hits
a distinguished point, and sends (x1 , xn , n) to the server. The client then restarts with a
319
new random starting point. Eventually the server gets two triples (x1 , x, n) and (x1 , x, n )
for the same distinguished point. As long as we dont have a Robin Hood14 (i.e., one
walk is a subsequence of another) the server can use the values (x1 , n) and (x1 , n ) to
efficientlypfind a collision f (x) = f (y) with x 6= y. The expected running time for each
client is N/2/NP + 1/, using the notation of this chapter. The storage requirement
depends on the choice of .
We now consider the application to meet-in-the-middle attacks. A general meet-inthe-middle attack has two sets S1 and S2 and functions fi : Si R for i = 1, 2. The
goal is to find a1 S1 and a2 S2 such that f1 (a1 ) = f2 (a2 ). The standard solution
(as in baby-step-giant-step) is to compute and store all (f1 (a1 ), a1 ) in an easily searched
structure and then test for each a2 S2 whether f2 (a2 ) is in the structure. The running
time is #S1 + #S2 function evaluations and the storage is proportional to #S1 .
The idea of [470] is to phrase this as a collision search problem for a single function
f . For simplicity we assume that #S1 = #S2 = N . We write I = {0, 1, . . . , N 1}
and assume one can construct bijective functions i : I Si for i = 1, 2. One defines a
surjective map
: R I {1, 2}
and a set S = I {1, 2}. Finally, define f : S S as f (x, i) = (fi (i (x))). Clearly,
the desired collision f1 (a1 ) = f2 (a2 ) can arise from f (11 (a1 ), 1) = f (21 (a2 ), 2), but
collisions can also arise in other ways (for example, due to collisions in ). Indeed, since
#S = 2N one expects there to be roughly 2N pairs (a1 , a2 ) S 2 such that a1 6= a2
but f (a1 ) = f (a2 ). In many applications there is only one collision (van Oorschot and
Wiener call it the golden collision) that actually leads to a solution of the problem. It
is therefore necessary to analyse the algorithm carefully to determine the expected time
until the problem is solved.
Let NP be the number of clients and let NM be the total number of group elements
that can be stored on the server. Van Oorschot p
and Wiener give a heuristic argument
(2N )3 /NM /NP group operations per
that the algorithm finds a useful
collision
after
2.5
p
client. This is taking = 2.25 NM /2N for the probability of a distinguished point. We
refer to [470] for the details.
14.8.1
roughly proportional to wM .
To solve the low Hamming weight DLP using parallel collision search one sets R = hgi
and S1 , S2 to be sets of integers of binary length n/2 and Hamming weight roughly w/2.
n/2
Define the functions f1 (a) = g a and f2 (a) = hg 2 a so that a collision f1 (a1 ) = f2 (a2 )
solves the problem. Note that there is a unique choice of (a1 , a2 ) such that f1 (a1 ) = f2 (a2 )
but when one uses the construction of van Oorschot and Wiener to get a single function
f
n/2
then there will be many useless collisions in f . We have N = #S1 = #S2 w/2
M
14 Robin Hood is a character of English folklore who is expert in archery. His prowess allows him to
shoot a second arrow on exactly the same trajectory as the first, so that the second arrow splits the first.
Chinese readers may substitute the name Houyi.
320
and so get an algorithm whose number of group operations is proportional to N 3/2 = M 3/4
yet requires low storage. This is a significant improvement over the naive low-storage
method, but still slower than baby-step-giant-step.
Exercise 14.8.1. Write this algorithm in pseudocode and give a more careful analysis
of the running time.
It remains an open problem to give a lowmemory algorithm for the low Hamming
weight DLP with complexity proportional to wM as with the BSGS methods.
14.9
This algorithm was proposed in [483] and was the first algorithm invented by Pollard that
exploited pseudorandom walks. As more powerful factoring algorithms exist, we keep the
presentation brief. For further details see Section 5.6.2 of Stinson [588] or Section 5.2.1
of Crandall and Pomerance [161].
Let N be a composite integer to be factored and let p | N be a prime (usually p is
the smallest prime divisor of N ). We try to find a relation that holds modulo p but not
modulo other primes dividing N .
The basic idea of the rho factoring algorithm is to consider the pseudorandom walk
x1 = 2 and
xi+1 = f (xi ) (mod N )
where the usual choice for f (x) is x2 + 1 (or f (x) = x2 + a for some small integer a).
Consider the values xi (mod p) where p | N . The sequence
xi (mod p) is a pseudorandom
p
sequence of residues modulo p, and so after about p/2 steps we expect there to be
indicies i and j such that xi xj (mod p). We call this a collision. If xi 6 xj (mod N )
then we can split N as gcd(xi xj , N ).
Example 14.9.1. Let p = 11. Then the rho iteration modulo p is
2, 5, 4, 6, 4, 6, 4, . . .
Let p = 19. Then the sequence is
2, 5, 7, 12, 12, 12, . . .
As with the discrete logarithm algorithms, the walk is deterministic in the sense that
a collision leads to a cycle. Let lt be the length of the tail and lh be the length of the
cycle. Then the first collision is
xlt +lh xlt (mod p).
We can use Floyds cycle finding algorithm to detect the collision. The details are given
in Algorithm 21. Note that it is not efficient to compute the gcd in line 5 of the algorithm
for each iteration; Pollard [483] gave a solution to reduce the number of gcd computations
and Brent [98] gave another.
We now briefly discuss the complexity of the algorithm. Note that the algorithm
may not terminate, for example if the length of the cycle and tail are the same for all
p | N then the gcd will always be either 1 or N . In practice one would stop the algorithm
after a certain number of steps and repeat with a different choice of x1 and/or f (x). Even
if it terminates, the length of the cycle of the rho may be very large. Hence, the usual
approach is to make the heuristic assumption that the rho pseudorandom walk behaves
321
Proof: (Sketch) Let p be a prime dividing N such that p N . Define the values lt
and lh corresponding to the sequence xi (mod p). If the walk behaves
p sufficiently like a
random walk then, by the birthday paradox, we will have lh , lt p/8. Similarly, for
some other prime q | N one expects that the walk modulo q has different values lh and
lt . Hence, after O( p) iterations of the loop one expects to split N .
Bach [21] has given a rigorous analysis of the rho factoring algorithm. He proves
that if 0 x, y < N are chosen randomly and the iteration is x1 = x, xi+1 = x2i + y,
then the probability of finding the smallest prime factor p of N after k steps is at least
k(k 1)/2p + O(p3/2 ) as p goes to infinity, where the constant in the O depends on k.
Bachs method cannot be used to analyse the rho algorithm for discrete logarithms.
Example 14.9.3. Let N = 144493. The values (xi , x2i ) for i = 1, 2, . . . , 7 are
(2, 5), (5, 677), (26, 9120), (677, 81496), (24851, 144003), (9120, 117992), (90926, 94594)
and one can check that gcd(x14 x7 , N ) = 131.
The reason for this can be seen by considering the values xi modulo p = 131. The
sequence of values starts
2, 5, 26, 22, 92, 81, 12, 14, 66, 34, 109, 92
and we see that x12 = x5 = 92. The tail has length lt = 5 and the head has length lh = 7.
Clearly, x14 x7 (mod p).
Exercise 14.9.4. Factor the number 576229 using the rho algorithm.
Exercise 14.9.5. The rho algorithm usually uses the function f (x) = x2 + 1. Why do
you think this function is used? Why are the functions f (x) = x2 and f (x) = x2 2 less
suitable?
Exercise 14.9.6. Show that if N is known to have a prime factor p 1 (mod m) for
m > 2 then it is preferable to use the polynomial f (x) = xm + 1.
322
Exercise 14.9.7. Floyds and Brents cycle finding methods are both useful for the
rho factoring algorithm. Explain why one cannot use the other cycle finding methods listed in Section 14.2.2 (Sedgewick-Szymanski-Yao, Schnorr-Lenstra, Nivasch, distinguished points) for the rho factoring method.
14.10
One can also use the kangaroo method to obtain a factoring algorithm. This is a much
more direct application of the discrete logarithm algorithm
we havealready presented.
to find x in time O( p log(N )2 ). If x were found then g N x 1 (mod p) and so one can
split N as gcd(g N x 1 (mod N ), N ). However, it seems to be impossible to construct
an algorithm based on this idea. Explain why.
Chapter 15
15.1
Smooth Integers
Recall from Definition 12.3.1 that an integer is B-smooth if all its prime divisors are at
most B. We briefly recall some results on smooth integers; see Granville [266] for a survey
323
324
2. Let 0 < a1 , a2 < 1 and 0 < c1 , c2 . Then, where the term o(1) is as N ,
3.
O(LN (a1 , c1 ))
O(LN (a1 , max{c1 , c2 } + o(1)))
LN (a1 , c1 ) + LN (a2 , c2 ) =
O(LN (a2 , c2 ))
if a1 > a2 ,
if a1 = a2 ,
if a2 > a1 .
325
4. Let 0 < b < 1 and 0 < d. If M = LN (a, c) then LM (b, d) = LN (ab, dcb a1b + o(1))
as N .
6. LN (a, c) log(N )m = O(LN (a, c + o(1))) as N for any m N. Hence, one can
N (a, c)) by O(LN (a, c + o(1))).
always replace O(L
15.2
The goal of this section is to present a simple version of Dixons random squares factoring
algorithm. This algorithm is easy to describe and analyse, and already displays many of
the important features of the algorithms in this chapter. Note that the algorithm is not
used in practice. We give a complexity analysis and sketch how subexponential running
times naturally arise. Further details about this algorithm can be found in Section 16.3
of Shoup [552] and Section 19.5 of von zur Gathen and Gerhard [237].
Let N N be an integer to be factored. We assume in this section that N is odd,
composite and not a perfect power. As in Chapter 12 we focus on splitting N into a
product of two smaller numbers (neither of which is necessarily prime). The key idea is
that if one can find congruent squares
x2 y 2 (mod N )
such that x 6 y (mod N ) then one can split N by computing gcd(x y, N ).
Exercise 15.2.1. Let N be an odd composite integer and m be the number of distinct
primes dividing N . Show that the equation x2 1 (mod N ) has 2m solutions modulo N .
A general way to find congruent squares is the following.1 Select a factor base
B = {p1 , . . . , ps } consisting of the primes B for some B N. Choose uniformly at
random an integer 1 x < N , compute a = x2 (mod N ) reduced to the range 1 a < N
and try to factor a as a product in B (e.g., using trial division).2 If a is B-smooth then
this succeeds, in which case we have a relation
x2
1 This
s
Y
pei i (mod N ).
(15.2)
i=1
idea goes back to Kraitchik in the 1920s; see [486] for some history.
obtain non-trivial relations one should restrict to integers in the range N < x < N N . But
it turns out to be simpler to analyse the algorithm for the case
1 x < N . Note that the probability
that a randomly chosen integer 1 x < N satisfies 1 x < N is negligible.
2 To
326
The values x for which a relation is found are stored as x1 , x2 , . . . , xt . The corresponding
exponent vectors ej = (ej,1 , . . . , ej,s ) for 1 j t are also stored. When enough relations
have been found we can use linear algebra modulo 2 to obtain congruent squares. More
precisely, compute j {0, 1} such that not all j = 0 and
t
X
j=1
j ej = (2f1 , . . . , 2fs )
(15.3)
j=1
t
Y
xj j (mod N ) ,
j=1
s
Y
pfi i (mod N ).
(15.4)
i=1
One then has X 2 Y 2 (mod N ) and one can hope to split N by computing gcd(X Y, N )
(note that this gcd could be 1 or N , in which case the algorithm has failed). We present
the above method as Algorithm 22.
Algorithm 22 Random squares factoring algorithm
Input: N N
Output: Factor of N
1: Select a suitable B N and construct the factor base B = {p1 , . . . , ps } consisting of
all primes B
2: repeat
3:
Choose an integer 1 x < N uniformly at random and compute a = x2 (mod N )
reduced to the range 1 a < N
4:
Try to factor a as a product in B (e.g., using trial division)
5:
if a is B-smooth then
6:
store the value x and the exponent row vector e = (e1 , . . . , es ) as in equation (15.2) in a matrix
7:
end if
8: until there are s + 1 rows in the matrix
9: Perform linear algebra over F2 to find a non-trivial linear dependence among the
vectors ej modulo 2
10: Define X and Y as in equation (15.4)
11: return gcd(X Y, N )
We emphasise that the random squares algorithm has two distinct stages. The first
stage is to generate enough relations. The second stage is to perform linear algebra.
The first stage can easily be distributed or parallelised, while the second stage is hard to
parallelise.
Example 15.2.2. Let N = 19 29 = 551 and let B = {2, 3, 5}. One finds the following
congruences (in general 4 relations would be required, but we are lucky in this case)
342
522
552
2 33 (mod N )
22 53 (mod N )
2 33 5 (mod N ).
327
1 3
2 0
1 3
0
3 .
1
(4, 6, 4) .
Let
X = 264 34 52 55 (mod 551) and Y = 496 22 33 52 (mod 551).
It follows that
X 2 Y 2 (mod N )
Exercise 15.2.3. Factor N = 3869 using the above method and factor base {2, 3, 5, 7}.
15.2.1
There are a number of issues to deal with when analysing this algorithm. The main
problem is to decide how many primes to include in the factor base. The prime number
theorem implies that s = #B B/ log(B). If we make B larger then the chances of
finding a B-smooth number increase, but on the other hand, we need more relations and
the linear algebra takes longer. We will determine an optimal value for B later. First we
must write down an estimate for the running time of the algorithm, as a function of s.
Already this leads to various issues:
What is the probability that a random value x2 (mod N ) factors over the factor
base B?
How many relations do we require until we can be sure there is a non-trivial vector
e?
What are the chances that computing gcd(X Y, N ) splits N ?
We deal with the latter two points first. It is immediate that s+1 relations are sufficient
for line 9 of Algorithm 22 to succeed. The question is whether 1 < gcd(X Y, N ) < N
for the corresponding integers X and Y . There are several ways the algorithm can fail to
split N . For example, it is possible that a relation in equation (15.2) is such that all ei
Q ei /2
are even and
(mod N ). One way that such relations could arise is from
x ipi
N
or
N
N
<
x < N ; this situation occurs with negligible
probability.
1
x
<
1
2.
Proof: Let X and Y be the integers computed in line 10 of Algorithm 22. We treat Y
as fixed, and consider the probability distribution for X. By Exercise 15.2.1, the number
of solutions Z to Z 2 Y 2 (mod N ) is 2m where m 2 is the number of distinct primes
dividing N . The two solutions Z = Y are useless but the other 2m 2 solutions will all
split N .
Since the values for x are chosen uniformly at random it follows that X is a randomly
chosen solution to the equation X 2 Y 2 (mod N ). It follows that the probability to split
N is (2m 2)/2m 1/2.
328
Exercise 15.2.5. Show that if one takes s + l relations where l 2 then the probability
of splitting N is at least 1 1/2l .
We now consider the probability of smoothness. We first assume the probability that
x2 (mod N ) is smooth is the same as the probability that a random integer modulo N is
smooth.3
Lemma 15.2.6. Let the notation be as above. Let TB be the expected number of trials
until a randomly chosen integer modulo N is B-smooth. Assuming that squares modulo N
are as likely to be smooth as random integers of the same size, Algorithm 22 has expected
running time at most
c1 #B 2 TB M (log(N )) + c2 (#B)3
bit operations for some constants c1 , c2 (where M (n) is the cost of multiplying two n-bit
integers).
Proof: Suppose we compute the factorisation of x2 (mod N ) over B by trial division.
This requires O(#BM (log(N ))) bit operations for each value of x. We need (#B + 1)
relations to have a soluble linear algebra problem. As said above, the expected number
of trials of x to get a B-smooth value of x2 (mod N ) is TB . Hence the cost of finding the
relations is O((#B + 1)TB (#B)M (log(N ))), which gives the first term.
The linear algebra problem can be solved using Gaussian elimination (we are ignoring
that the matrix is sparse) over F2 , which takes O((#B)3 ) bit operations. This gives the
second term.
It remains to choose B as a function of N to minimise the running time. By the discussion in Section 15.1, it is natural to approximate TB by uu where u = log(N )/ log(B).
We now explain how subexponential functions naturally arise in such algorithms. Since
increasing B makes the linear algebra slower, but makes relations more likely (i.e., lowers
TB ), a natural approach to selecting B is to try to equate both terms of the running time
in Lemma 15.2.6. This leads to uu = #B. Putting u = log(N )/ log(B), #B = B/ log(B),
taking logs, and ignoring log(log(B)) terms, gives
log(N ) log(log(N ))/ log(B) log(B).
This implies log(B)2 log(N ) log(log(N )) and so B LN (1/2, 1). The overall complexity for this choice of B would be LN (1/2, 3 + o(1)) bit operations.
A more careful argument is to set B = LN (1/2, c) and use Corollary 15.1.3. It follows that TB = LN (1/2, 1/(2c) + o(1)) as N . Putting this into the equation of
Lemma 15.2.6 gives complexity LN (1/2, 2c + 1/(2c) + o(1)) + LN (1/2, 3c) bit operations.
The function x + 1/x is minimised at x = 1, hence we should take c = 1/2.
Theorem 15.2.7. Let the notation be as above. Under the same assumptions as Lemma 15.2.6
then Algorithm 22 has complexity
LN (1/2, 2 + o(1))
bit operations as N .
3 Section 16.3 of Shoup [552] gives a modification of the random squares algorithm for which one can
avoid this assumption. The trick is to note that at least one of the cosets of (Z/N Z) /((Z/N Z) )2 has
at least as great a proportion of smooth numbers as random integers up to N (Shoup credits Rackoff
for this trick). The idea is to work in one of these cosets by choosing at random some 1 < < N and
considering relations coming from smooth values of x2 (mod N ).
329
15.2.2
To improve the result of the previous section it is necessary to reduce the cost of the
linear algebra and to reduce the cost of decomposing smooth elements as products of
primes. We sketch the quadratic sieve algorithm of Pomerance. We do not have space
to present all the details of this algorithm (interested readers should see Section 6.1 of
[161] or Section 16.4.2 of [552]).
A crucial idea, which seems to have first appeared in the work of Schroeppel4 , is
sieving. The point is to consider a range of values of x and simultaneously determine
the decompositions of x2 (mod N ) over the factor base. It is possible to do this so that
the cost of each individual decomposition is only O(log(B)) bit operations.
Another crucial observation is that the relation matrix is sparse, in other words, rows
of the matrix have rather few non-zero entries. In such a case, the cost of linear algebra can
be reduced from O((#B)3 ) bit operations to O((#B)2+o(1) ) bit operations (as #B ).
The best methods are due to Lanczos or Wiedemann; see Section 6.1.3 of Crandall and
Pomerance [161] or Section 3.4 of Joux
[314] for references and discussion.
idea is
A furthertrick is to choose x = N + i where i = 0, 1, 1, 2, 2, . . . . The
that if x = N + then either x2 N or N x2 is a positive integer of size 2 N ||.
Since these integers are much smaller than N they have a much better chance of being
smooth than the integers x2 (mod N ) in the random squares algorithm. To allow for the
case of <Q0 we need to add 1 to our factor base andQ
use the fact that a factorisation
s
s
ei
N x2 = i=1 pei i corresponds to a relation x2 (1) i=1
pi (mod N ).
Since we are now only considering values x of the form N + where || is small it is
necessary to assume the probability that x2 N or N x2 (as appropriate) is B-smooth
is that same as the probability that a randomly chosen integer of that size is B-smooth.
This is a rather strong assumption (though it is supported by numerical evidence) and so
the running time estimates of the quadratic sieve are only heuristic.
The heuristic complexity of the quadratic sieve is determined in Exercise 15.2.8. Note
that, since we will need to test LN (1/2, 1 + o(1)) values (here o(1) is as N ) for
smoothness, we have ||
= LN (1/2, 1 + o(1)). It follows that the integers being tested for
smoothness have size N LN (1/2, 1 + o(1)) = N 1/2+o(1) .
Exercise
15.2.8. Let TB be the expected number of trials until an integer of size
2 N LN (1/2, 1) is B-smooth. Show that the running time of the quadratic sieve is at
most
c1 #BTB log(B)M (log(N )) + c2 #B 2+o(1)
bit operations for some constants c1 , c2 as N .
Let B = LN (1/2, 1/2). Show that the natural heuristic assumption (based on Corollary 15.1.8) is that TB = LN (1/2, 1/2 + o(1)). Hence, show that the heuristic complexity
of the quadratic sieve is LN (1/2, 1 + o(1)) bit operations as N .
[368, 486] for some remarks on the history of integer factoring algorithms.
330
x2 (mod N )
26 3
Not 5-smooth
24
3 52
e
(1, 6, 1, 0 )
( 1, 4, 0, 0 )
( 0, 0, 1, 2 )
Exercise 15.2.10. Show that in the quadratic sieve one can also use values x = kN +i
where k N is very small and i = 0, 1, 1, 2, 2, . . . .
Exercise 15.2.11. Show that using sieving and fast linear algebra, but not restricting to
values x2 (modN ) of size N 1/2+o(1) gives an algorithm with heuristic expected running
time of LN (1/2, 2 + o(1)) bit operations as N .
Exercise 15.2.12. A subexponential algorithm is asymptotically much faster than a
1/4 ) algorithm. Verify that if N = 21024 then N 1/4 = 2256 while LN (1/2, 2) 2197
O(N
and LN (1/2, 1) 298.5 .
The best proven asymptotic complexity for factoring integers N is LN (1/2, 1 + o(1))
bit operations. This result is due to Pomerance and Lenstra [378].
15.2.3
Summary
We briefly highlight the key ideas in the algorithms of this section. The crucial concept
of smooth elements of the group (Z/N Z) arises from considering an integer modulo N
as an element of Z. The three essential properties of smooth numbers that were used in
the algorithm are:
1. One can efficiently decompose an element of the group as a product of smooth
elements, or determine that the element is not smooth.
2. The probability that a random element is smooth is sufficiently high.
3. There is a way to apply linear algebra to the relations obtained from smooth elements to solve the computational problem.
We will see analogues of these properties in the algorithms below.
There are other general techniques that can be applied in most algorithms of this
type. For example, the linear algebra problems are usually sparse and so the matrices
and algorithms should be customised for this. Another general concept is large prime
variation which, in a nutshell, is to also store nearly smooth relations (i.e., elements
that are the product of a smooth element with one or two prime elements that are not
too large) and perform some elimination of these large primes before doing the main
linear algebra stage (this is similar to, but more efficient than, taking a larger factor base).
Finally we remark that the first stage of these algorithms (i.e., collecting relations) can
always be distributed or parallelised.
15.3
331
The elliptic curve method (ECM) works well in practice but, as with the Pollard
p 1 method, its complexity depends on the size of the smallest prime dividing N .
It is not a polynomial-time algorithm because, for any constant c > 0 and over all N
and p | N , a randomly chosen elliptic curve over Fp is not likely to have O(log(N )c )smooth order. As we have seen, the theorem of Canfield, Erd
os and Pomerance [117] says
it is more reasonable to hope that integers have a subexponential probability of being
subexponentially smooth. Hence, one might hope that the elliptic curve method has
subexponential complexity. Indeed, Lenstra [374] makes the following conjecture (which
is essentially that the Canfield-Erdos-Pomerance result holds in small intervals).
Conjecture 15.3.1. (Lenstra [374], page 670)
that an integer, cho
The probability
sen uniformly at random in the range (X X, X + X), is LX (1/2, c)-smooth is
LX (1/2, 1/(2c) + o(1)) as X tends to infinity.5
One can phrase Conjecture 15.3.1 as saying that, ifps is the probability that
a random
integer between 1 and X is Y -smooth, then (X + 2 X, Y ) (X, Y ) 2 Xps . More
generally, one would like to know that, for sufficiently large6 X, Y and Z,
(X + Z, Y ) (X, Y ) Z(X, Y )/X
(15.5)
or, in other words, that integers in a short interval at X are about as likely to be Y -smooth
as integers in a large interval at X.
We now briefly summarise some results in this area; see Granville [266] for details and
references. Harman (improved by Lenstra, Pila and Pomerance [377]) showed, for any
fixed > 1/2 and X Y exp(log(X)2/3+o(1) ), where the o(1) is as X , that
(X + X , Y ) (X, Y ) > 0.
Obtaining
results for the required value = 1/2 seems to be hard and the experts refer
to the X barrier for smooth integers in short intervals. It is known that this barrier
can be broken most of the time: Hildebrand and Tenenbaum showed that, for any > 0,
equation (15.5) holds when X Y exp(log(X)5/6+ ) and Y exp(log(X)1/6 ) Z X
for all but at most M/ exp(log(M )1/6 ) integers 1 X M . As a special case, this
result shows that, for almost all primes p, the interval [p p, p+ p] contains a Y -smooth
integer where Y = exp(log(X)5/6+ ) (i.e., subexponential smoothness).
Using Conjecture 15.3.1 one obtains the following complexity for the elliptic curve
method (we stress that the complexity is in terms of the smallest prime factor p of N ,
rather than N itself).
Theorem 15.3.2. (Conjecture 2.10 of [374]) Assume
Conjecture 15.3.1. One can find
Proof: Guess the size of p and choose B = Lp (1/2, 1/ 2) (since the size of p is not known
one actually runs the algorithm repeatedly for slowly increasing values
of B). Then each
run of Algorithm 12 requires O(B log(B)M (log(N ))) = Lp (1/2, 1/ 2 + o(1))M
(log(N ))
bit operations. By Conjecture 15.3.1 one needs to repeat the process Lp (1/2, 1/ 2+ o(1))
times. The result follows.
332
Exercise
15.3.3. Let N = pq where p is prime and p < N < 2p. Show that
Lp (1/2, 2 + o(1)) = LN (1/2, 1 + o(1)). Hence, in the worst case, the complexity of
ECM is the same as the complexity of the quadratic sieve.
For further details on the elliptic curve method we refer to Section 7.4 of [161]. We
remark that Lenstra, Pila and Pomerance [377] have considered a variant of the elliptic
curve method using divisor class groups of hyperelliptic curves of genus 2. The Hasse-Weil
interval for such curves contains an interval of the form (X, X + X 3/4 ) and Theorem 1.3
of [377] proves that such intervals contain LX (2/3, c1 )-smooth integers (for some constant
c1 ) with probability 1/LX (1/3, 1). It follows that there is a rigorous factoring algorithm
with complexity Lp (2/3, c) bit operations for some constant c2 . This algorithm is not
used in practice, as the elliptic curve method works fine already.
Exercise 15.3.4. Suppose a sequence of values 1 < x < N are chosen uniformly at
random. Show that one can find such a value that is LN (2/3, c)-smooth, together with
its factorisation, in expected LN (1/3, c + o(1)) bit operations for some constant c .
Remark 15.3.5. It is tempting to conjecture that the Hasse interval contains a polynomiallysmooth integer (indeed, this has been done by Maurer and Wolf [404]; see equation (21.9)).
This is not relevant for the elliptic curve factoring method, since such integers would
be very rare. Suppose the probability that an integer of size X is Y -smooth is exactly 1/uu , where u = log(X)/ log(Y ) (by Theorem 15.1.2, this
is reasonable
as long as
X,
X
+
2
X] is likely
Y 1 log(X)). It is natural to suppose
that
the
interval
[X
to contain a Y -smooth integer if 4 X > uu . Let Y = log(X)c . Taking logs of both sides
of the inequality gives the condition
log(4) +
1
2
log(X) >
log(X)
(log(log(X)) log(c log(log(X)))).
c log(log(X))
It is therefore natural to conclude that when c 2 there is a good chance that the Hasse
interval of an elliptic curve over Fp contains a log(p)c -smooth integer. Proving such a
claim seems to be far beyond the reach of current techniques.
15.4
The most important integer factorisation algorithm for large integers is the number field
sieve (NFS). A special case of this method was invented by Pollard.7 The algorithm
requires algebraic number theory and a complete discussion of it is beyond the scope of
this book. Instead, we just sketch some of the basic ideas. For full details we refer to
Lenstra and Lenstra [369], Section 6.2 of Crandall and Pomerance [161], Section 10.5 of
Cohen [135] or Stevenhagen [580].
As we have seen from the quadratic sieve, reducing the size of the values being tested
for smoothness yields a better algorithm. Indeed, in the quadratic sieve the numbers were
reduced from size O(N ) to O(N 1/2+o(1) ) and,
as shown by Exercise 15.2.11, this trick alone
lowers the complexity from O(LN (1/2, 2 + o(1))) to O(LN (1/2, 1 + o(1))). To break
the O(LN (1/2, c)) barrier one must make the numbers being tested for smoothness
dramatically smaller. A key observation is that if the numbers are of size O(LN (2/3, c ))
then they are O(LN (1/3, c )) smooth, for some constants c and c , with probability
approximately 1/uu = 1/LN (1/3, c/(3c) + o(1)). Hence, one can expect an algorithm
7 The goal of Pollards method was to factor integers of the form n3 + k where k is small. The the
algorithm in the case of numbers of a special form is known as the special number field sieve.
333
with running time O(LN (1/3, c+o(1))) bit operations, for some constant c, by considering
smaller values for smoothness.
It seems to be impossible to directly choose values x such that x2 (mod N ) is of size
LN (2/3, c + o(1)) for some constant c. Hence, the number field sieve relies on two factor
bases B1 and B2 . Using smooth elements over B1 (respectively, B2 ) and linear algebra one
finds an integer square u2 and an algebraic integer square v 2 . The construction allows us
to associate an integer w modulo N to v such that u2 w2 (mod N ) and hence one can
try to split N .
We briefly outline the ideas behind the algorithm. First, choose a monic irreducible
polynomial P (x) Z[x] of degree d (where d grows like (3 log(N )/ log(log(N )))1/3 )
with a root m = N 1/d modulo N (i.e., P (m) 0 (mod N )). Factor base B1 is primes
up to B = LN (1/3, c) and factor base B2 is small prime ideals in the ring Z[] in the
number field K = Q() = Q[x]/(P (x)) (i.e., is a generic root of P (x)). The algorithm
exploits, in the final step, the ring homomorphism : Z[x]/(P (x)) Z/N Z given by
() = m (mod N ). Suppose the ideal (a b) is a product of prime ideals in B2 (one
factors the ideal (a b) by factoring its norm in Z), say
(a b) =
r
Y
ei i .
i=1
s
Y
pj j .
j=1
If these equations hold then we call (a b) and a bm smooth and store a, b and the
sequences of ei and fj . We do not call this a relation as there is no direct relationship
between the prime ideals i and the primes pj . Indeed, the j are typically non-principal
ideals and do not necessarily contain an element of small norm. Hence, the two products
are modelled as being independent.
It is important to estimate the probability that both the ideal (a b) and the integer
abm are smooth. One shows that taking integers |a|, |b| LN (1/3, c +o(1)) for a suitable
constant c gives (a b) of norm LN (2/3, c + o(1)) and a bm of size LN (2/3, c + o(1))
for certain constants c and c . To obtain a fast algorithm one uses sieving to determine
within a range of values for a and b the pairs (a, b) such that both a bm and (a b)
factor over the appropriate factor base.
Performing linear algebra on both sides gives a set S of pairs (a, b) such that (ignoring
issues with units and non-principal ideals)
Y
(a bm) = u2
(a,b)S
(a,b)S
(a b) = v 2
for some u Z and v Z[]. Finally we can link the two factor bases: Applying the
ring homomorphism : Z[] Z gives u2 (v)2 (mod N ) and hence we have a chance
to split N . A non-trivial task is computing the actual numbers u and (v) modulo N so
that one can compute gcd(u (v), N ).
Since one is only considering integers a bm in a certain range (and ideals in a certain
range) for smoothness one relies on heuristic assumptions about the smoothness probability. The conjectural complexity of the number field sieve is O(LN (1/3, c + o(1)))
334
bit operations as N where c = (64/9)1/3 1.923. Note, comparing with Exercise 15.2.12, that if N 21024 then LN (1/3, 1.923) 287 .
15.5
We now explain how similar ideas to the above have been used to find subexponential
algorithms for the discrete logarithm problem in finite fields. The original idea is due
to Kraitchik [351]. While all subexponential algorithms for the DLP share certain basic
concepts, the specific details vary quite widely (in particular, precisely what linear algebra is required). We present in this section an algorithm that is very convenient when
working in subgroups of prime order r in Fq as it relies only on linear algebra over the
field Fr .
Let g Fq have prime order r and let h hgi. The starting point is the observation
that if one can find integers 0 < Z1 , Z2 < r such that
g Z1 hZ2 = 1
(15.6)
in Fq then logg (h) = Z1 Z21 (mod r). The idea will be to find such a relation using
a factor base and linear algebra. Such algorithms go under the general name of index
calculus algorithms; the reason for this is that index is another word for discrete logarithm, and the construction of a solution to equation (15.6) is done by calculations using
indices.
15.5.1
s
Y
j=1
(15.7)
335
The values zi are stored in a vector and the values ei = (ei,1 , . . . , ei,s ) are stored as a row
in a matrix. We need s relations of this form. We also need at least one relation involving
h (alternatively, we could have used a power of h in every relation in equation (15.7))
so try random values zs+1 and s+1 G until g zs+1 hs+1 (mod p) is B-smooth. One
performs linear algebra modulo r to find integers 0 1 , . . . , s+1 < r such that
s+1
X
i=1
i
g h
i
Since g Z1 hZ2 hgi and the other terms are all in G it follows from Exercise 15.5.1
Q that
g Z1 hZ2 1 (mod r) as required. We stress that it is not necessary to compute i ii or
the right hand side of equation (15.8).
The algorithm succeeds as long as B = s+1 6 0 (mod r) (and if s+1 = 0 then there
is a linear dependence from the earlier relations, which can be removed by deleting one
or more rows of the relation matrix).
Exercise 15.5.3. Show that if one replaces equation (15.7) by g z1,i hz2,i i for random
z1,i , z2,i and i then one obtains an algorithm that succeeds with probability 1 1/r.
Example 15.5.4. Let p = 223. Then g = 15 has prime order r = 37. Suppose h = 68
is the instance of the DLP we want to solve. Let B = {2, 3, 5, 7}. Choose the element
g1 = 184 of order (p 1)/r = 6. One can check that we have the following relations.
z
1
33
8
7
i
1
0
1
0
1
2 1 0
33
3 0 0
0 2 1
z=
8 ,
7
3 1 1
7
3 2 0
1
1
0
0
0
Now perform linear algebra modulo 37. One finds the non-trivial kernel vector v =
(1, 36, 20, 17, 8). Computing Z1 = v z = 7 (mod 37) and Z2 = 8 we find g Z1 hZ2
1 (mod 223) and so the solution is Z1 Z21 13 (mod 37).
Exercise 15.5.5. Write the above algorithm in pseudocode (using trial division to determine the smooth relations).
336
Exercise 15.5.6. Let the notation be as above. Let TB be the expected number of trials
of random integers modulo p until one is B-smooth. Show that the expected running
time of this algorithm (using naive trial division for the relations and using the Lanczos
or Wiedemann methods for the linear algebra) is
O((#B)2 TB M (log(p)) + (#B)2+o(1) M (log(r)))
bit operations as p
Exercise 15.5.7. Show that taking B = Lp (1/2, 1/2) is the optimal value to minimise
the complexity of the above algorithm, giving a complexity of O(Lp (1/2, 2 + o(1))) bit
operations for the discrete logarithm problem in Fp as p . (Note that, unlike many
of the results in this chapter, this result does not rely on any heuristics.)
We remark that, in practice, rather than computing a full exponentiation g z one might
use a pseudorandom walk as done in Pollard rho. For further implementation tricks see
Sections 5.1 to 5.5 of Odlyzko [466].
If g does not have prime order (e.g., suppose g is a generator of Fp and has order p 1)
then there are several options: One can apply Pohlig-Hellman and reduce to subgroups
of prime order and apply index calculus in each subgroup (or at least the ones of large
order). Alternatively, one can apply the algorithm as above and perform the linear algebra
modulo the order of g. There will usually be difficulties with non-invertible elements in
the linear algebra, and there are several solutions, such as computing the Hermite normal
form of the relation matrix or using the Chinese remainder theorem, we refer to Section
5.5.2 of Cohen [135] and Section 15.2.1 of Joux [314] for details.
Exercise 15.5.8. Give an algorithm similar to the above that works when r2 | (p 1).
Exercise 15.5.9. This exercise is about solving many different discrete logarithm instances hi = g ai (mod p), for 1 i n, to the same base g. Once sufficiently many
relations are found, determine the cost of solving each individual instance of the DLP.
Hence show that one can solve any constant number of instances of the DLP to a given
base g Fp in O(Lp (1/2, 2 + o(1))) bit operations as p .
15.5.2
To get a faster algorithm it is necessary to improve the time to find smooth relations.
It is natural to seek methods to sieve rather than factoring each value by trial division,
but it is not known how to do this for relations of the form in equation (15.7). It would
also be natural to find an analogue to Pomerances method of considering residues of size
about the square-root of random; Exercise 15.5.10 gives an approach to this, but it does
not lower the complexity.
Exercise 15.5.10. (Blake, Fuji-Hara, Mullin and Vanstone [61]) Once one has computed
w = g z (mod p) one can apply the Euclidean algorithm to find integers w1 , w2 such that
337
Coppersmith, Odlyzko and Schroeppel [144] proposed an algorithm for the DLP in Fp
that uses sieving. Their idea is to let H = p and define the factor base to be
B = {q : q prime, q < Lp (1/2, 1/2)} {H + c : 1 c Lp (1/2, 1/2 + )}.
Since H 2 (mod p) is of size p1/2 it follows that if (H + c1 ), (H + c2 ) B then (H +
c1 )(H +c2 ) (mod p) is of size p1/2+o(1) . One can therefore generate relations in B. Further,
it is shown in Section 4 of [144] how to sieve over the choices for c1 and c2 . A heuristic
analysis of the algorithm gives complexity Lp (1/2, 1 + o(1)) bit operations.
The number field sieve (NFS) is an algorithm for the DLP in Fp with heuristic
complexity O(Lp (1/3, c + o(1))) bit operations. It is closely related to the number field
sieve for factoring and requires algebraic number theory. As with the factoring algorithm,
there are two factor bases. Introducing the DLP instance requires an extra algorithm (we
will see an example of this in Section 15.5.4). We do not have space to give the details
and instead refer to Schirokauer, Weber and Denny [515] or Schirokauer [511, 513] for
details.
15.5.3
338
The algorithm then follows exactly the ideas of the previous section. Suppose g has
prime order r | (pn 1) and h hgi. The factor base is
B = {P (x) Fp [x] : P (x) is monic, irreducible and deg(P (x)) b}
for some integer b to be determined later. Note that #B = I(1) + I(2) + + I(b)
pb+1 /(b(p 1)) (see Exercise 15.5.14). We compute random powers of g multiplied by a
suitable G (where, if r2 (pn 1), G Fpn is the subgroup of order (pn 1)/r;
when r2 | (pn 1) then use the method of Exercise 15.5.8), reduce to polynomials in
Fp [x] of degree at most n, and try to factor them into products of polynomials from B.
By Exercise 2.12.11 the cost of factoring the b-smooth part of a polynomial of degree n
is O(bn log(n) log(p)M (log(p))) = O(log(pn )3 ) bit operations (in any case, polynomialtime). As previously, we are generating polynomials of degree n uniformly at random
and so, by Theorem 15.5.12, the expected number of trials to get a relation is uu(1+o(1))
where u = n/b as u . We need Q
to obtain #B relations in general. Then we obtain
a single relation of the form hg a = P B P eP , perform linear algebra, and hence solve
the DLP.
Exercise 15.5.13. Write the above algorithm in pseudocode.
Pb
Exercise 15.5.14. Show that i=1 I(b) 1b pb (1 + 2/(p 1)) + O(bpb/2 ). Show that a
very rough approximation is pb+1 /(b(p 1)).
Exercise 15.5.15. Let the notation be as above. Show that the complexity of this
algorithm is at most
c1 #Buu(1+o(1)) log(q)3 + c2 (#B)2+o(1) M (log(r))
bit operations (for some constants c1 and c2 ) as n in q = pn .
For the complexity analysis it is natural to arrange that #B Lpn (1/2, c) for a
suitable constant c. Recall that #B pb /b. To have pb /b = Lpn (1/2, c) then, taking logs,
p
b log(p) log(b) = c n log(p)(log(n) + log(log(p))).
p
It follows that b c n log(n)/ log(p).
Exercise 15.5.16. Show that one can compute discrete logarithms in Fpn in expected
O(Lpn (1/2, 2 + o(1))) bit operations for fixed p and as n . (Note that this result
does not rely on any heuristic assumptions.)
Exercise 15.5.17. Adapt the trick of exercise 15.5.10 to this algorithm. Explain that
the complexity of the algorithm remains the same, but is now heuristic.
Lovorn Bender and Pomerance [394] give rigorous complexity Lpn (1/2, 2 + o(1)) bit
operations as pn and p no(n) (i.e., p is not fixed).
15.5.4
This algorithm (inspired by the systematic equations of Blake, Fuji-Hara, Mullin and
Vanstone [61]) was the first algorithm in computational number theory to have heuristic
subexponential complexity of the form Lq (1/3, c + o(1)).
The method uses a polynomial basis for F2n of the form F2 [x]/(F (x)) for F (x) =
xn + F1 (x) where F1 (x) has very small degree. For example, F2127 = F2 [x]/(x127 + x + 1).
339
The systematic equations of Blake et al are relations among elements of the factor
base that come almost for free. For example, in F2127 , if A(x) F2 [x] is an irreducible
polynomial in the factor base then A(x)128 = A(x128 ) A(x2 + x) (mod F (x)) and
A(x2 + x) is either irreducible or is a product P (x)P (x + 1) of irreducible polynomials of
the same degree (Exercise 15.5.18). Hence, for many polynomials A(x) in the factor base
one gets a non-trivial relation.
Exercise 15.5.18. Let A(x) F2 [x] be an irreducible polynomial. Show that A(x2 + x)
is either irreducible or a product of two polynomials of the same degree.
Coppersmith [139] extended the idea as follows: Let b N be such that b = cn1/3 log(n)2/3
2/3
for a suitable
constant c (later we take c = (2/(3 log(2)))
N be such that
), let k
p
1
1/3
k
k
ln
Write D(x) for the right hand side of equation (15.9). We have deg(C(x)) max{dA +
l, dB } l n2/3 log(n)1/3 and deg(D(x)) max{2k dA + (2k l n) + deg(F1 (x)), 2k dB }
2k b n2/3 log(n)1/3 .
Example 15.5.19. (Thome [604]) Let n = 607 and F1 (x) = x9 + x7 + x6 + x3 + x + 1.
Let b = 23, dA = 21, dB = 28, 2k = 4, l = 152. The degrees of C(x) and D(x) are 173
and 112 respectively.
We have two polynomials C(x), D(x) of degree n2/3 that we wish to be b-smooth
where b n1/3 log(n)2/3 . We will sketch the complexity later under the heuristic assumption that, from the point of view of smoothness, these polynomials are independent.
We will also assume that the resulting relations are essentially random (and so with
high probability there is a non-trivial linear dependence once #B + 1 relations have been
collected).
Having generated enough relations among elements of the factor base, it is necessary
to find some relations involving the elements g and h of the DLP instance. This is not
trivial. All DLP algorithms having complexity Lq (1/3, c + o(1)) feature a process called
special q-descent
that achieves this. The first step is to express g (respectively, h) as a
Q
product i Gi (x) of polynomials of degree at most b1 = n2/3 log(n)1/3 ; this can be done
by multiplying g (resp. h) by random combinations of elements of B and factoring (one
can also apply the Blake et al trick as in Exercise 15.5.10). We now have a list of around
2n1/3 < n polynomials Gi (x) of degree n2/3 that need to be smoothed further.
Section VII of [139] gives a method to do this: essentially one performs the same sieving
as earlier except that A(x) and B(x) are chosen so that Gi (x) | C(x) = A(x)xl + B(x)
(not necessarily with the same value of l or the same degrees for A(x) and B(x)). Defining
k
D(x) = C(x)2 (mod F (x)) (not necessarily the same value of k as before) one hopes that
C(x)/G(x) and D(x) are b-smooth. After sufficiently many trials one has a relation that
expresses Gi (x) in terms of elements of B. Repeating for the polynomially many values
Gi (x) one eventually has the values g and h expressed in terms of elements of B. One can
then do linear algebra modulo the order of g to find integers Z1 , Z2 such that g Z1 hZ2 = 1
and the DLP is solved.
340
Example 15.5.20. We give an example of Coppersmiths method for F215 = F2 [x]/(F (x))
where F (x) = x15 + x + 1. We consider the subgroup of F215 of order r = 151 (note that
(215 1)/r = 7 31 = 217). Let g = x11 + x7 + x5 + x2 + 1 and h = x14 + x11 + x10 + x9 + 1
be the DLP instance.
First note that n1/3 2.5 and n2/3 6.1. We choose b = 3 and so B = {x, x + 1, x2 +
x + 1, x3 + x + 1, x3 + x2 + 1}. We hope to be testing polynomials of degree around 6 to
8 for smoothness.
First, we find some systematic equations. We obviously have the relation x15 = x+1.
We also have (x + 1)16 = x2 + x + 1 and (x3 + x + 1)16 = (x3 + xp
+ 1)(x3 +x2 + 1).
k
Now, we do Coppersmiths method. We must choose 2 n/b = 5 2.2 so
take 2k = 2. Let l = n/2k = 8, choose A(x) and B(x) of degree at most 2, set
C(x) = A(x)x8 + B(x) and D(x) = C(x)2 (mod F (x)), and test C(x) and D(x) for
smoothness over B. We find the following pairs (A(x), B(x)) such that both C(x) and
D(x) factor over B.
A(x)
1
1
1
B(x)
1
x
x2
C(x)
(x + 1)8
x(x + 1)(x3 + x + 1)(x3 + x2 + 1)
x2 (x + 1)2 (x2 + x + 1)2
D(x)
x2 + x + 1
x
x(x3 + x + 1)
15 1 0
0
0
0 16 1 0
0
0
0
0
15 1
(15.10)
.
1
2
0
2
2
3
4
4 1 0
To solve the DLP one can now try to express g and h over the factor base. One has
g 22 = x(x + 1)(x2 + x + 1)2 (x3 + x2 + 1).
For h we find
hg 30 = x6 (x + 1)4 G(x)
where G(x) = x4 +x+1 is a large prime. To smooth G(x) we choose A(x) = 1, B(x) =
A(x)x8 (mod G(x)) = x2 + 1, C(x) = A(x)x8 + B(x) and D(x) = C(x)2 (mod F (x)).
One finds C(x) = G(x)2 and D(x) = (x + 1)(x3 + x2 + 1). In other words, G(x)4 =
(x + 1)(x3 + x2 + 1).
There are now two ways to proceed. Following the algorithm description above we add
to the matrix the two rows (1, 1, 2, 0, 1) and 4(6, 4, 0, 0, 0)+(0, 1, 0, 0, 0, 1) = (24, 17, 0, 0, 1)
corresponding to g 22 and h4 g 120 . Finding a non-trivial kernel vector modulo 151, such as
(1, 114, 0, 132, 113, 133, 56) gives the relation
1 = (g 22 )133 (h4 g 120 )56 = g 133 h73
from which we deduce h = g 23 .
An alternative approach to the linear algebra is to diagonalise the system in equation (15.10) using linear algebra over Z (or at least modulo 215 1) to get x + 1 =
x15 , x2 + x + 1 = x240 , x3 + x + 1 = x1023 and x3 + x2 + 1 = x15345 . One then gets
g 22 = x(x + 1)(x2 + x + 1)2 (x3 + x2 + 1) = x1+15+2240+15345 = x15841
g = x1584122
341
Similarly, G(x)4 = (x+ 1)(x3 + x2 + 1) = x15+15345 = x15360 and so G(x) = x3840 . Finally,
h = g 30 x6 (x + 1)4 G(x) = x3026040+6+415+3840 = x9114 = (x217 )42
1
and so h = g 42120
(mod 151)
= g 23 .
Note that, to compare with Exercise 15.2.12, if q = 21024 then Lq (1/3, (32/9)1/3)
2 .
This conjecture would hold if the probability that the polynomials C(x) and D(x) are
smooth was the same as for independently random polynomials of the same degree.
p We
1/3
2/3
k
n/b
now give a justification for
the
constant.
Let
b
=
cn
log(n)
.
Note
that
2
(n/c log(n))1/3 and l nb. We need around 2b /b relations, and note that log(2b /b)
b log(2) = c log(2)n1/3 log(n)2/3 . We have deg(C(x)) dA + l and deg(D(x)) p
kd. The
number of trials until C(x) is b-smooth is uu where u = (dA + l)/b h/b n/b =
1
1 (n/ log(n))1/3 . Hence, log(uu ) = u log(u)
n1/3 log(n)2/3 . Similarly, the number
c
3 c
p
of trials until D(x) is b-smooth is approximately uu where u = (2k d)/b 2k n/b
and the same argument applies. Since both events must occur the expected number of
2
trials to get a relation is exp( 3
(n/ log(n))1/3 ). Hence, total expected time to generate
c
enough relations is
1/3
2/3
2
)n
log(n)
.
exp (c log(2) + 3
c
67
This is optimised when c3/2 log(2) = 2/3, which leads to the stated complexity for the
first stage of the algorithm. The linear algebra is O((2b /b)2+o(1) M (log(r))) bit operations,
which is the same complexity, and the final stage of solving the DLP has lower complexity
(it is roughly the same as the cost of finding polynomially many smooth relations, rather
than finding 2b /b of them). For more details about the complexity of Coppersmiths
method we refer to Section 2.4 of Thome [604].
Since one can detect smoothness of polynomials in polynomial-time it is not necessary,
from a complexity theory point of view, to sieve. However, in practice sieving can be
worthwhile and a method to do this was given by Gordon and McCurley [262].
Coppersmiths idea is a special case of a more general approach to index calculus
algorithms known as the function field sieve. Note that Coppersmiths algorithm only
has one factor base, whereas the function field sieve works using two factor bases.
15.5.5
The function field sieve of Adleman is a general algorithm for discrete logarithms in Fpn
where p is relatively small compared with n. Joux and Lercier gave a much simpler and
better algorithm. We will sketch this algorithm, but refer to Joux and Lercier [315] and
Section 15.2 of [314] for full details. We also refer to [513] for a survey of the function
field sieve.
342
Exercise 15.5.22. Let n = 15. Find polynomials F1 (t), F2 (t) F2 [t] of degree 4 such
that F2 (F1 (t)) t has an irreducible factor of degree 15.
Now consider the polynomial ring A = Fp [x, y] and two ring homomorphisms 1 :
A A1 = Fp [x] by 1 (y) = F1 (x) and 2 : A A2 = Fp [y] by 2 (x) = F2 (y). Define
1 : A1 Fpn by 1 (x) = t (mod F (t)) and 2 : A2 Fpn by 2 (y) = F1 (t) (mod F (t)).
Exercise 15.5.23. Let the notation be as above and G(x, y) Fp [x, y]. Show that
1 (1 (G(x, y))) = 2 (2 (G(x, y))) in Fpn .
Let B1 A1 = Fp [x] and B2 Fp [y] be the sets of linear polynomials. The idea
of the algorithm is simply to consider polynomials in Fp [x, y] of the form G(x, y) =
Qd+1
xy + ax + by + c. If 1 (G(x, y)) = (x + b)F1 (x) + (ax + c) factors over B1 as i=1 (x ui )
Qd+1
and if 2 (G(x, y)) = (y + a)F2 (y) + (by + c) factors over B2 as j=1 (y vj ) then we have
a relation. The point is that such a relation corresponds to
d+1
Y
i=1
(t ui ) =
d+1
Y
j=1
(F1 (t) vj )
in Fpn .
One also needs to introduce the DLP instance by using a special q-descent: given
an irreducible polynomial q(x) one constructs polynomials a(x), b(x) such that q(x) |
(a(x)F1 (x) + b(x)) and one hopes that (a(x)F1 (x) + b(x))/q(x) has small factors and that
a(F2 (y))y + b(F2 (y)) has small factors, and hence iterate the process. When enough relations are collected (including at least one systematic equation to remove the parasitic
solution explained on page 442 of Joux/indexAJoux, A. [314]) one can perform linear algebra to solve the DLP. The heuristic complexity of this algorithm is shown in [315] and
Section 15.2.1.2 of [314] to be between Lpn (1/3, 31/3 +o(1)) and Lpn (1/3, (32/9)1/3 +o(1))
for p Lpn (1/3, (4/9)1/3 + o(1)).
15.5.6
Concepts from the number field sieve for factoring have been applied in the setting of
the DLP. Again, one uses two factor bases, corresponding to ideals in the ring of integers
of some number field (one of the number fields may be Q). As with Coppersmiths
method, once sufficiently many relations have been found among elements of the factor
bases, special q-descent is needed to solve a general instance of the DLP. We refer to
Schirokauer [513] for details of the NFS algorithm for the DLP, and also for the heuristic
arguments that one can solve the DLP in Fp in Lp (1/3, (64/9)1/3 + o(1)) bit operations.
When p has a special form (e.g., p = 2n 1) then the special number field sieve (SNFS)
can be used to solve the DLP in (heuristic) Lp (1/3, (32/9)1/3 + o(1)) bit operations, see
[514].
We should also mention the special function field sieve (SFFS) for solving the
DLP in Fpn , which has heuristic complexity Lpn (1/3, (32/9)1/3 + o(1)) bit operations as
15.5.7
We have sketched algorithms for the DLP in Fp when p is large or Fpn when p is relatively
small. We have not considered cases Fq where q = pn with p large and n > 1. The basic
concepts can be extended to cover all cases, but ensuring that subexponential complexity
343
is achieved for all combinations of p and n is non-trivial. Adleman and Demarrais [2]
were the first to give a heuristic subexponential algorithm for all finite fields. They
split the problem space into p > n and p n; in the latter case they have complexity
Lq (1/2, 3 + o(1)) bit operations as q and in the former case heuristic complexity
Lq (1/2, c + o(1)) for a non-explicit constant c.
Heuristic algorithms with complexity Lq (1/3, c + o(1)) for all finite fields are given by
Joux and Lercier [315] and Joux, Lercier, Smart and Vercauteren [316].
15.6
Some index calculus algorithms for the discrete logarithm problem in finite fields generalise
naturally to solving the DLP in the divisor class group of a curve. Indeed, some of these
algorithms also apply to the ideal class group of a number field, but we do not explore that
situation in this book. An excellent survey of discrete logarithm algorithms for divisor
class groups is Chapter VII of [65].
We consider hyperelliptic curves C : y 2 + H(x)y = F (x) over Fq of genus g, so
deg(H(x)) g + 1 and deg(F (x)) 2g + 2. Recall that elements of the divisor class
group have a Mumford representation (u(x), y v(x)) (for curves with a split model there
is also an integer 0 n g deg(u(x)) to take into account the behaviour at infinity).
Let D1 and D2 be reduced divisors representing divisor classes of order r (where r is a
prime such that r2 #Pic0Fq (C)). The goal is to compute a Z/rZ such that D2 [a]D1 .
Recall from Exercise 10.3.12 that a reduced divisor with Mumford representation
(u(x), v(x)) is said to be a prime divisor if the polynomial u(x) is irreducible over Fq .
The degree of the effective affine divisor is deg(u(x)). Any effective affine divisor D can
be written as a sum of prime effective affine divisors by factoring the u(x) polynomial of
its Mumford representation. Hence, it is natural to define D to be b-smooth if it is a sum
of prime effective divisors of degree at most b. This suggests selecting the factor base B
to consist of all prime effective affine divisors of degree at most b for some smoothness
bound 1 b g.
We assume that B generates the group Pic0Fq (C); this is immediate when the group
has prime order and B contains a non-trivial element. Voloch [620] has proved that degree
1 primes generate Pic0Fq (C) whenever q > (8g(C) 2)2 , where g(C) is the genus of C.
One can obtain an algorithm for the DLP of a familiar form, by generating reduced
divisors and testing whether they are smooth. One issue is that our smoothness results for
polynomials apply when polynomials are sampled uniformly from the set of all polynomials
of degree n in Fq [x], whereas we now need to apply the results to the set of polynomials
u(x) Fq [x] of degree g that arise in Mumfords representation. This issue is handled
using Theorem 15.6.1.
There are two rather different ways to generate reduced divisors, both of which are
useful for the algorithm.
1. One can take random group elements of the form [n]D1 or [n1 ]D1 + [n2 ]D2 and
compute the Mumford representation of the corresponding reduced effective affine
divisor. This is the same approach as used in Section 15.5.1 and, in the context of
ideal/divisor class groups, is sometimes called the Hafner-McCurley algorithm.
If the divisor is B-smooth then we obtain a relation between elements of B and D1
and D2 .
2. One can consider the effective affine divisor of the function a(x) + yb(x) for random
polynomials a(x), b(x). This idea is due to Adleman, DeMarrais and Huang [4].
344
To introduce the instance of the DLP into the system it is necessary to have some relations
involving D1 and D2 . This can either be done using the first method, or by choosing
a(x) and b(x) so that points in the support of either D1 or D2 lie in the support of
div(a(x)+yb(x)) (we have seen this kind of idea already, e.g., in Coppersmiths algorithm).
It is convenient to add to B all points at infinity and all points P C(Fq ) such
that P = (P ) (equivalently all Fq -rational prime divisors with this property). Since the
latter divisors all have order 2 one automatically obtains relations that can be used to
eliminate them during the linear algebra stage of the algorithm. Hence, we say that a
reduced divisor D = div(u(x), y v(x)) in Mumford representation is b-smooth if u(x)
is b-smooth after any factors corresponding to points of order 2 have been removed.
Let C be a hyperelliptic curve over Fq of genus g and 1 b < g. Prime effective
affine divisors on C of degree b correspond to irreducible polynomials u(x) of degree
b (and for roughly half of all such polynomials u(x) there are two solutions v(x) to
v(x)2 + v(x)H(x) F (x) 0 (mod u(x))). Hence, it is natural to expect that there
Pb
are approximately q b /b such divisors. It follows that #B should be around i=1 q i /i
1 b
b p (1 + 2/(p 1)) by the same argument as Exercise 15.5.14.
For the analysis, one needs to estimate the probability that a randomly chosen reduced
divisor is smooth.
Theorem 15.6.1. (Theorem 6 of Enge and Stein [197]) Let C be a hyperelliptic curve of
genus g over Fq . Let c > 1 and let b = logq (Lqg (1/2, c)). Then the number of b-smooth
reduced divisors of degree g is at least
qg
Lqg (1/2, 1/(2c) + o(1))
for fixed q and g .
Note that the smoothness bound in the above result is the ceiling of a real number.
Hence one cannot deduce subexponential running time unless the genus is sufficiently
large compared with the field size.
15.6.1
Suppose that r | N = #Pic0Fq (C) and r2 N . Suppose D1 , D2 are two divisor classes
on C over Fq of order r represented by reduced divisors D1 and D2 . The algorithm
of Section 15.5.1 immediately applies to solve the DLP: choose the factor base as above;
generate random reduced divisors by computing [n1 ]D1 + [n2 ]D2 + (where is uniformly
chosen9 from the subgroup G Pic0Fq (C) of order N/r); store the resulting smooth
relations; perform linear algebra modulo r to find integers a, b such that [a]D1 + [b]D2 0
(extra care is needed when there are two points at infinity to be sure the relation is
correct).
Exercise15.6.2. Show that the expected running time of this algorithm is (rigorously!)
Lqg (1/2, 2 + o(1)) bit operations as g .
We refer to Section VII.5 of [65] for practical details of the algorithm. Note that the
performance can be improved using the sieving method of Flassenberg and Paulus [205].
9 We assume that generators for this group are known so that it is easy to sample uniformly from this
group.
345
15.6.2
This algorithm, from [4], uses the same factor base as the method of the previous section.
The main difference is to generate relations by decomposing principal divisors A(x) +
yB(x). An advantage of this approach is that group operations are not required.
By Exercise 10.1.26 it is easy to compute vP (A(x) + yB(x)) by computing the norm
A(x)2 H(x)A(x)B(x) F (x)B(x)2 and factoring it as a polynomial. If deg(A(x)) =
dA < g and deg(B(x)) = dB < g then the norm has degree at most max{2dA , (g + 1) +
dA + dB , 2g + 2 + 2dB }, which is much larger in general than the degree g polynomial in
a reduced Mumford representation, but still O(g) in practice.
We need to make the heuristic assumption that the probability the norm is b-smooth is
the same as the probability that a random polynomial of the same degree is b-smooth. We
therefore assume the expected number of trials to get an Lqg (1/2, c)-smooth polynomial
is Lqg (1/2, 1/(2c) + o(1)) as g tends to infinity.
We also need some relations involving D1 and D2 . Adleman et al do this by first
decomposing D1 and D2 as a sum of prime divisors. Then they smooth each prime
divisor div(u(x), y v(x)) by choosing polynomials B(x), W (x) Fq [x], setting A (x) =
B(x)(v(x) + H(x)) (mod u(x)) and then A(x) = A (x) + u(x)W (x). One computes
N (x) = (A(x)2 H(x)A(x)B(x) F (x)B(x)2 ). By construction, u(x) | N (x) and one
continues randomly choosing A and W until N (x)/u(x) is b-smooth.
The details of the algorithm are then the same as the algorithm in Section 15.5.1: one
uses linear algebra modulo r to get a relation [a]D1 + [b]D2 0 (again, care is needed
when there are two points at infinity). We leave the details as an exercise.
Exercise 15.6.3. Write pseudocode for the Adleman, DeMarrais, Huang algorithm.
The heuristic complexity of the algorithm is of the same form as the earlier algorithm
(the cost of smoothing the divisors D1 and D2 is heuristically the same as finding less
than
2g relations so is negligible. One obtains heuristic asymptotic complexity of Lqg (1/2, 2+
o(1)) bit operations as g tends to infinity. This is much better than the complexity claimed
in [4] since that paper also gives an algorithm to compute the group structure (and so
the linear algebra requires computing the Hermite normal form).
These ideas will be used again in Section 15.9.1.
15.6.3
Gaudrys Algorithm
Gaudry [241] considered the algorithm of Section 15.6.1 for fixed genus, rather than
asympotically as g . In particular he chose the smoothness bound b = 1 (so the
factor base B only consists of degree one prime divisors, i.e., points). Good surveys of
Gaudrys algorithm are given in Chapter VII of [65] and Section 21.2 of [16].
Exercise 15.6.4. Let C be a hyperelliptic curve of genus g over a finite field Fq . Show
that the number of prime divisors on C of degree 1 is #C(Fq ) = q(1 + o(1)) for fixed g
as q . Hence, show that the probability that a randomly chosen reduced divisor is
1
(1 + o(1)) as q .
1-smooth is g!
Exercise 15.6.5. Following Exercise 15.6.4, it is natural to conjecture that one needs to
choose O(g!q(1 + o(1))) divisors (again, this is for fixed g as q , in which case it is
more common to write it as O(q(1 + o(1)))) to find enough relations to have a non-trivial
linear dependence in B. Under this assumption, show that the heuristic expected running
time of Gaudrys algorithm is at most
c1 g 2 g!q(1 + o(1))M (log(q)) + c2 g 3 q 2 M (log(q))) = O(q 2 M (log(q))(1 + o(1)))
(15.11)
346
15.7
Weil Descent
As we have seen, there are subexponential algorithms for the DLP in the divisor class
group of a hyperelliptic curve of high genus. A natural approach to solve the DLP on
elliptic curves is therefore to transform the problem into a DLP on a high genus curve.
However, the naive way to do this embeds a small problem into a big one, and does not
help to solve the DLP. Frey [211] proposed10 to use Weil restriction of scalars to transform
the DLP on an elliptic curve E(Fqn ) for n > 1 to the DLP on a curve of genus g n over
Fq . Frey called this idea Weil descent.
Geometrically the principle is to identify the Weil restriction of an open affine subset
of E(Fqn ) (see Section 5.7) with an open affine subset of an Abelian variety A over Fq of
dimension n. One can then try to find a curve C on A, so that there is a map from the
Jacobian of C to A. Following Gaudry, Hess and Smart [245] it is more convenient to
express the situation in terms of function fields and divisor class groups. We only sketch
the details since an excellent survey is provided by Hess in Chapter VIII of [65] and many
important details are explained by Diem in [171].
Let E be an elliptic curve over K = Fqn and let k = Fq . The function field of E is
K(E). The idea (called in this setting a covering attack) is to find a curve C over K
such that K(C) is a finite extension of K(E) (so that there is a map C E of finite
degree) and such that there is an automorphism of degree n on K(C) extending the
q-power Frobenius so that the fixed field of K(C) under hi is k(C 0 ) for some curve C 0 .
The composition of the conorm map from E(K) to Pic0C (K) and the norm map from
10 The standard reference is a lecture given by Frey at the ECC 1998 conference. His talk was mostly
about a different (constructive) application of Weil restriction of scalars. However, he did mention the
possibility of using this idea for an attack. Galbraith and Smart developed the details further in [228]
and many works followed.
347
Pic0C (K) to Pic0C 0 (k) transfers the DLP from E(K) to Pic0C 0 (k). Hence, as long as the
composition of these maps is not trivial, then one has reduced the DLP from E(K) to
the divisor class group of a curve C 0 over k. One can then solve the DLP using an index
calculus algorithm, which is feasible if the genus of C 0 is not too large.
A variant of the Weil descent concept that avoids function fields and divisor class
groups is to perform index calculus directly on Abelian varieties. This variant is the
subject of the following section.
15.8
We now discuss some related algorithms, which can be applied to elliptic curves over
extension fields. We start by recalling Semaevs idea of summation polynomials.
15.8.1
Suppose that E is an elliptic curve defined over a prime field Fp , and that elements of Fp
are represented as integers in the interval [0, p 1]. Semaev [535] considered a factor base
B = {(x, y) E(Fp ) : 0 x p1/n }
for some fixed integer n 2. Note that #B p1/n .
Semaev hoped to perform an index calculus algorithm similar to the one in Section 15.5.1. For random points R = [a]P + [b]Q the task is to write R as a sum of points
in B. To accomplish this, Semaev introduced the notion of a summation polynomial.
Definition 15.8.1. Let E : y 2 = x3 + a4 x + a6 be an elliptic curve defined over Fq ,
where the characteristic of Fq is neither 2 nor 3 (this condition can be avoided). The
summation polynomials Summn Fq [x1 , x2 , . . . , xn ] for n 2 are defined as follows:
Summ2 (x1 , x2 ) = x1 x2 .
Summn (x1 , x2 , . . . , xn ) = Rx (Summn1 (x1 , . . . , xn2 , x), Summ3 (xn1 , xn , x)) for
n 4 where Rx (F, G) is the resultant of the polynomials F and G with respect to
the variable x.
For many more details see Section 3 of [176]. The following result is from [535].
(15.12)
348
If such a solution exists and can be found then one finds the corresponding y-coordinates
yi . Suppose that each yi Fp . Then each Pi = (xi , yi ) is in B and by Theorem 15.8.2
there exist si {1, 1} such that s1 P1 + + sn Pn = R. The sign bits si can be found
by exhaustive search, thereby yielding a relation. Since #{P1 + P2 + + Pn : Pi B}
(p1/n )n /n! = p/n! the expected number of points R that have to be selected before a
relation is obtained is about n!.
Unfortunately, no efficient algorithm is known for solving the polynomial equation (15.12)
even for n = 5 (in which case the equation has degree 16 in each of its 5 variables). Coppersmiths method (see Section 19.2) seems not to be useful for this task.
In reference to the remarks of Section 15.2.3 we see that all requirements for an index
calculus algorithm are met, except that it is not efficient to decompose a smooth element
over the factor base.
15.8.2
Gaudry [244] realised that it might be possible to take roots of summation polynomials
if one was working with elliptic curves over extension fields. Gaudrys algorithm may
be viewed as doing Weil descent without divisor class groups. Indeed, the paper [244]
presents a general approach to index calculus on Abelian varieties and so the results apply
in greater generality than just Weil descent of elliptic curves.
Suppose that E is an elliptic curve defined over a finite field Fqn with n > 1.
Gaudry [244] defines a factor base
B = {(x, y) E(Fqn ) : x Fq }
so that #B q. Gaudry considers this as the set of Fq -rational points on the algebraic
set F formed by intersecting the Weil restriction of scalars of E with respect to Fqn /Fq
by n 1 hyperplanes V (xi ) for 2 i n, where x = x1 1 + + xn n (with 1 = 1) as
in Lemma 5.7.1. If the algebraic set F is irreducible then it is a 1-dimensional variety F .
In the relation generation stage, one attempts to decompose a randomly selected point
R E(Fqn ) as a sum of points in B. Gaudry observed that this can be accomplished by
finding solutions
(x1 , x2 , . . . , xn ) Fnq
such that
Summn+1 (x1 , x2 , . . . , xn , xR ) = 0.
(15.13)
Note that Summn+1 (x1 , . . . , xn , xR ) Fqn [x1 , . . . , xn ] since E is defined over Fqn and
xR Fqn . The conditions xj Fq in equation (15.13) can be expressed algebraically as
follows. Select a basis {1 , . . . , n } for Fqn over Fq and write
Summn+1 (x1 , . . . , xn , xR ) =
n
X
Gi (x1 , . . . , xn )i
(15.14)
i=1
349
basis for the ideal generated by the Gi and then taking roots in Fq of a sequence of univariate polynomials each of which has degree at most 2n(n1) . This is predicted to take
O(2cn(n1) M (log(q))) bit operations for some constant c. Alternatively one could add
some field equations xqj xj to the ideal, to ensure it is zero-dimensional, but this could
have an adverse effect on the complexity. Gaudry makes a further heuristic assumption,
namely that the smoothness probability behaves as expected when using the large prime
variant.
The size of the set {P1 + P2 + + Pn : Pi B} is approximately q n /n! and so the
expected number of points R that have to be selected before a relation is obtained is about
n!. One needs approximately #B q relations to be able to find a non-trivial element in
the kernel of the relation matrix and hence integers a and b such that [a]D1 + [b]D2 0.
It follows that the heuristic expected running time of Gaudrys algorithm is
cn(n1) n!qM (log(q)) + q 2+o(1) )
O(2
(15.15)
bit operations as q . This is exponential in terms of n and log(q). However, for fixed
2 ) bit operations.
n, the running time can be expressed as O(q
Gaudrys focus was on n fixed and relatively small. For any fixed n 5, Gaudrys
heuristic algorithm for solving the ECDLP over Fqn is asymptotically faster than Pollards
rho method. The double large prime variant (mentioned in Section 15.6.3) can also be
2 n2 ) bit operations.
used in this setting. The complexity therefore becomes (heuristic) O(q
Hence Gaudrys algorithm is asymptotically faster than Pollard rho even for n = 3 and
4/3 ) rather than O(q
3/2 ) for n = 3 and O(q
3/2 ) rather than O(q
2 ) for
n = 4, namely O(q
n = 4.
15.8.3
Gaudrys focus was on the DLP in E(Fqn ) when n is fixed. This yields an exponentialtime algorithm. Diem [172, 176] considered the case where n is allowed to grow, and
obtained a subexponential-time algorithm.
p
The crux of Diems method is remarkably simple: he assumes n log(q) and obtains
an algorithm for the DLP in E(Fqn ) with complexity O(q c ) for some constant c (note that
2
even some exponential-time computations in n are polynomial in q as en q). Now, q c =
exp(c log(q)) and log(q n ) = n log(q) log(q)3/2 so q c exp(c log(q n )2/3 ) < Lqn (2/3, c).
Diems algorithm is very similar to Gaudrys. In Gaudrys algorithm, the factor base
consists of points whose x-coordinates lie in Fq . Diem defines a function = x,
where is an automorphism over Fqn of P1 that satisfies a certain condition, and defines
the factor base to be B = {P E(Fqn ) : (P ) P1 (Fq )}. The process of generating
relations proceeds in the standard way. Some important contributions of [176] are to
prove that the algebraic set defined by the summation polynomials has a good chance of
having dimension zero, and that when this is the case the points can be found by taking
2
resultants of multihomogeneous polynomials in time polynomial in en log(q) (which is
exponential in n but polynomial in q).
The main result of [176] is the following. We stress that this result does not rely on
any heuristics.
Theorem 15.8.4. (Diem) Let a, b R be such that 0 < a < b. There is an algorithm
for the DLP in Fqn such that if q is a prime power and n N is such that
p
p
a log(q) n b log(q)
then the algorithm solves the DLP in Fqn in an expected eO(log(q
n 2/3
bit operations.
350
15.9
Further Results
To end the chapter we briefly mention some methods for non-hyperelliptic curves. It
is beyond the scope of the book to present these algorithms in detail. We then briefly
summarise the argument that there is no subexponential algorithm for the DLP on elliptic
curves in general.
15.9.1
Diem [174] used the Adleman-DeMarrais-Huang idea of generating relations using principal divisors a(x) yb(x) for the DLP on plane curves F (x, y) = 0 of low degree (the
degree of such a curve is the total degree of F (x, y) as a polynomial). Such curves are
essentially the opposite case to hyperelliptic curves (which have rather high degree in x
relative to their genus). The trick is simply to note that if F (x, y) has relatively low degree compared to its genus then so does b(x)d F (x, a(x)) and so the divisor of the function
a(x) yb(x) has relatively low weight. The main result is an algorithm with heuristic
22/(d2) ) bit operations for a curve of degree d over Fq .
complexity O(q
In the case of non-singular plane quartics (genus 3 curves C over Fq ) Diem takes the
factor base to be a large set of points B C(Fq ). He generates relations by choosing two
distinct points P1 , P2 B and intersecting the line y = bx + c between them with the
curve C. There are two other points of intersection, corresponding to the roots of the
quadratic polynomial F (x, bx + c)/((x xP1 )(x xP2 )) and so with probability roughly
1/2 we expect to get a relation in the divisor class group among points in C(Fq ). Diem
15.9.2
The algorithms for the DLP in the divisor classgroup of a hyperelliptic curve in Sections 15.6.1 and 15.6.2 had complexity Lqg (1/2, 2 + o(1)) bit operations as q . A
natural problem is to find algorithms with complexity Lqg (1/3, c + o(1)), and this is still
open in general. However, an algorithm is known for curves of the form y n + F (x, y) = 0
where degy (F (x, y)) n 1 and degx (F (x, y)) = d for n g 1/3 and d g 2/3 . We
do not have space to give the details, so simply quote the results and refer to Enge and
Gaudry [195], Enge, Gaudry and Thome [196] and Diem [173]. An algorithm to compute
the group structure of Pic0C (Fq ) is given with heuristic complexity of Lqg (1/3, c + o(1))
bit operations for some constant c. For the discrete logarithm problem the algorithm has
heuristic complexity Lqg (1/3, c + o(1)) bit operations where c is a constant.
Unlike the LN (1/3, c + o(1)) algorithms for factoring or DLP in finite fields, the algorithm does not use two different factor bases. Instead, the algorithm is basically the
same idea as Sections 15.6.2 and 15.9.1 with a complexity analysis tailored for curves of
a certain form.
15.9.3
In this section we briefly discuss why there does not seem to be a subexponential algorithm
for the DLP on general elliptic curves.
351
An approach to an index calculus algorithm for elliptic curves was already discussed by
Miller [425] in the paper that first proposed elliptic curves for cryptography. In particular
e over Q (i.e., so
he considered lifting an elliptic curve E over Fp to an elliptic curve E
e
that reducing the coefficients of E modulo p yields E). The factor base B was defined to
e
be the points of small height (see Section VIII.6 of [560] for details of heights) in E(Q).
The theory of descent (see Chapter VIII of Silverman [560]) essentially gives an algorithm
to decompose a point as a sum of points of small height (when this is possible). The idea
e
would therefore be to take random points [a]P + [b]Q E(Fp ), lift them to E(Q)
and
then decompose them over the factor base. There are several obstructions to this method.
e
First, lifting a random point from E(Fp ) to E(Q)
seems to be hard in general. Indeed,
e
Miller argued (see also [562]) that there are very few points of small height in E(Q)
and
so (since we are considering random points [a]P + [b]Q from the exponentially large set
e
E(Fp )) it would be necessary to lift to exponentially large points in E(Q).
Second, the
lifting itself seems to be a non-trivial computational task (essentially, solving a non-linear
Diophantine equation over Z).
Silverman propsed the Xedni calculus attack11 , which was designed to solve the lifting problem. This algorithm was analysed in [319], where it is shown that the probability
of finding useful relations is too low.
By now, many people have tried and failed to discover an index calculus algorithm for
the DLP on general elliptic curves. However, this does not prove that no such algorithm
exists, or that a different paradigm could not lead to faster attacks on the elliptic curve
DLP.
11 Xedni
Chapter 16
Lattices
This is a chapter from version 1.1 of the book Mathematics of Public Key Cryptography
by Steven Galbraith, available from http://www.isg.rhul.ac.uk/sdg/crypto-book/ The
copyright for this chapter is held by Steven Galbraith.
This book is now completed and an edited version of it will be published by Cambridge
University Press in early 2012. Some of the Theorem/Lemma/Exercise numbers may be
different in the published version.
Please send an email to S.Galbraith@math.auckland.ac.nz if you find any mistakes.
All feedback on the book is very welcome and will be acknowledged.
The word lattice has two different meanings in mathematics. One meaning is related
to the theory of partial orderings on sets (for example, the lattice of subsets of a set).
The other meaning, which is the one relevant to us, is discrete subgroups of Rn .
There are several reasons for presenting lattices in this book. First, there are hard
computational problems on lattices that have been used as a building block for public key cryptosystems (e.g., the Goldreich-Goldwasser-Halevi (GGH) cryptosystem, the
NTRU cryptosystem, the Ajtai-Dwork cryptosystem, and the LWE cryptosystem); however, we do not present these applications in this book. Second, lattices are used as a
fundamental tool for cryptanalysis of public key cryptosystems (e.g., lattice attacks on
knapsack cryptosystems, Coppersmiths method for finding small solutions to polynomial
equations, attacks on signatures, and attacks on variants of RSA). Third, there are applications of lattices to efficient implementation of discrete logarithm systems (such as
the GLV method; see Section 11.3.3). Finally, lattices are used as a theoretical tool for
security analysis of cryptosystems, for example the bit security of Diffie-Hellman key exchange using the hidden number problem (see Section 21.7) and the security proofs for
RSA-OAEP.
Some good references for lattices, applications of lattices and/or lattice reduction algorithms are: Cassels [122], Siegel [559], Cohen [135], von zur Gathen and Gerhard [237],
Grotschel, Lovasz and Schrijver [268], Nguyen and Stern [459, 460], Micciancio and Goldwasser [419], Hoffstein, Pipher and Silverman [288], Lenstras chapter in [113], Micciancio
and Regevs chapter in [50] and the proceedings of the conference LLL+25.
355
356
M
s
a1 , . . . , an
b1 , . .P
. , bn
n
s = i=1 xi ai
d
M
W
U
t
357
16.1
A lattice is a subset of the vector space Rm . We write all vectors as rows; be warned that
many books and papers write lattice vectors as columns. We denote by kvk the Euclidean
norm of a vector v Rm ; though some statements also hold for other norms.
Definition 16.1.1. Let {b1 , . . . , bn } be a linearly independent set of (row) vectors in Rm
(m n). The lattice generated by {b1 , . . . , bn } is the set
L=
n
X
i=1
li b i : li Z
of integer linear combinations of the bi . The vectors b1 , . . . , bn are called a lattice basis.
The lattice rank is n and the lattice dimension is m. If n = m then L is said to be a
full rank lattice.
Let L Rm be a lattice. A sublattice is a subset L L that is a lattice.
A basis matrix B of a lattice L is an n m matrix formed by taking the rows to be
basis vectors bi . Thus Bi,j is the j-th entry of the row bi and
L = {xB : x Zn }.
By assumption the rows of a basis matrix are always linearly independent.
Example 16.1.2. The lattice in R2 generated by {(1, 0), (0, 1)} is L = Z2 . The corresponding basis matrix is B = ( 10 01 ). Any 2 2 integer matrix B of determinant 1 is
also a basis matrix for L.
We will mainly assume that the basis vectors bi for a lattice have integer entries. In
cryptographic applications this is usually the case. We interchangeably use the words
points and vectors for elements of lattices. The vectors in a lattice form an Abelian
group under addition. When n 2 there are infinitely many choices for the basis of a
lattice.
An alternative approach to lattices is to define L = Zn and to have a general length
function q(v). One finds this approach in books on quadratic forms or optimisation
problems, e.g., Cassels [121] and Schrijver [527]. In particular, Section 6.2 of [527] presents
the LLL algorithm in the context of reducing the lattice L = Zn with respect to a length
function corresponding to a positive-definite rational matrix.
We now give an equivalent definition of lattice, which is suitable for some applications.
A subset L Rm is called discrete if, for any real number r > 0, the set {v L : kvk r}
is finite. It is clear that a lattice is a subgroup of Rm that is discrete. The following result
shows the converse.
Lemma 16.1.3. Every discrete subgroup of Rm is a lattice.
Proof: (Sketch) Let {v 1 , . . . , v n } be a linearly independent subset of L of maximal size.
The result is proved by induction. The case n = 1 is easy (since L is discrete there is
an element of minimal non-zero length). When n > 1 consider V = span{v 1 , . . . , v n1 }
and set L = L V . By induction, L is a lattice and so has a basis b1 , . . . , bn1 . The set
Pn1
L { i=1 xi bi + xn v n : 0 xi < 1 for 1 i n 1 and 0 < xn 1} is finite and so
has an element with smallest xn , call it bn . It can be shown that {b1 , . . . , bn } is a basis
for L. For full details see Theorem 6.1 of [582].
358
n
X
ui,j bj
j=1
359
Lemma 16.1.9. The determinant of a lattice is independent of the choice of basis matrix
B and the choice of projection P .
Proof: Let P and P be two projection matrices corresponding to orthogonal bases
{v 1 , . . . , v n } and {v 1 , . . . , v n } for V = span{b1 , . . . , bn }. Then, by Lemma A.10.3, P =
P W for some orthogonal matrix W (hence det(W ) = 1). It follows that | det(BP )| does
not depend on the choice of P .
Let B and B be two basis matrices for a lattice L. Then B = U B where U is an nn
matrix such that det(U ) = 1. Then det(L) = | det(BP )| = | det(U BP )| = | det(B P )|.
We have seen that there are many different choices of basis for a given lattice L. A
fundamental problem is to compute a nice lattice basis for L; specifically one where the
vectors are relatively short and close to orthogonal. The following exercise shows that
these properties are intertwined.
Exercise 16.1.10. Let L be a rank 2 lattice in R2 and let {b1 , b2 } be a basis for L.
1. Show that
det(L) = kb1 kkb2 k| sin()|
(16.1)
Proof: Consider first the case where m = n. Then det(L)2 = det(B) det(B T ) =
det(BB T ) = det((hbi , bj i)i,j ). Hence, when m > n and B = BP , det(L) = | det(B )| =
p
det(B (B )T ). Now, the (i, j)th entry of B (B )T = (BP )(BP )T is hbi P, bj P i, which is
equal to the (i, j)th entry of BB T by Lemma 16.1.5. The result follows.
Note that an integer lattice of non-full rank may not have integer determinant.
Exercise 16.1.13. Find an example of a lattice of rank 1 in Z2 whose determinant is
not an integer.
lattice L in Rm and let b1 , . . . , bn
Lemma 16.1.14. Let b1 , . . . , bn be an ordered basis for a Q
n
be the Gram-Schmidt orthogonalisation. Then det(L) = i=1 kbi k.
Proof: The case m = n is already proved in Lemma A.10.8. For the general case let
v i = bi /kbi k be the orthonormal basis required for the construction of the projection P .
Then P (bi ) = kbi kei . Write B and B for the n m matrices formed by the rows bi and
bi respectively. It follows that B P is an n n diagonal matrix with diagonal entries
kbi k. Finally, by the Gram-Schmidt construction, B = U B for some n n matrix U
such that det(U ) = 1. Combining these facts gives1
det(L) = | det(BP )| = | det(U BP )| = | det(B P )| =
1 The
n
Y
i=1
kbi k.
360
Exercise 16.1.15. Let {b1 , . . . , bn } be an ordered lattice basis in Rm and let {b1 , . . . , bn }
be the Gram-Schmidt orthogonalisation. Show that kbi k kbi k and hence det(L)
Q
n
i=1 kbi k.
Exercise 16.1.17. Show that the orthogonality defect of {b1 , . . . , bn } is 1 if and only if
the basis is orthogonal.
Definition 16.1.18. Let L Rm be a lattice of rank n. The successive minima of L
are 1 , . . . , n R such that, for 1 i n, i is minimal such that there exist i linearly
independent vectors v 1 , . . . , v i L with kv j k i for 1 j i.
It follows that 0 < 1 2 n . In general there is not a basis consisting
of vectors whose lengths are equal to the successive minima, as the following example
shows.
Example 16.1.19. Let L Zn be the set
L = {(x1 , . . . , xn ) : x1 x2 xn (mod 2)}.
It is easy to check that this is a lattice. The vectors 2ei L for 1 i n are linearly
entries has length
independent and have length 2. Every other vector x L with even
2. Every vector x L with odd entries has all xi 6= 0 and so kxk n.
If n = 2 the successive
minima are 1 = 2 = 2 and if n = 3 the successive minima
16.2
We state the following results without rigorously defining the term volume and without
giving proofs (see Section 1.3 of Micciancio and Goldwasser [419], Chapter 1 of Siegel [559],
Chapter 6 of Hoffstein, Pipher and Silverman [288] or Chapter 12 of Cassels [121] for
details).
361
Theorem 16.2.2. (Minkowski convex body theorem) Let L be a lattice in Rm with basis
{b1 , . . . , bn } and let S be any convex set such that S span{bi : 1 i n}, 0 S and if
v S then v S. If the volume of S is > 2n det(L) then there exists a non-zero lattice
point v S L.
Proof: See Section III.2.2 of Cassels [121], Theorem 6.28 of Hoffstein, Pipher and Silverman [288], Theorem 1.4 of Micciancio and Goldwasser [419], or Theorem 6.1 of Stewart
and Tall [582].
The convex body theorem is used to prove Theorem 16.2.3. The intuition behind this
result is that if the shortest non-zero vector in a lattice is large then the volume of the
lattice cannot be small.
Theorem 16.2.3. Let n N. There is a constant 0 < n n such that, for any lattice
L of rank n in Rn having first minimum 1 (for the Euclidean norm),
21 < n det(L)2/n .
Proof: See Theorem 1.5 of [419], Theorem 6.25 of [288], or Theorem 12.2.1 of [121].
Exercise 16.2.4. Show that the convex body theorem is tight. In other words find a
lattice L in Rn for some n and a symmetric convex subset S Rn such that the volume
of S is 2n det(L) and yet S L = {0}.
Exercise 16.2.5. Show that, with respect to the norm, 1 det(L)1/n . Show that,
with respect to the 1 norm, 1 (n! det(L))1/n n det(L)1/n /e.
Exercise 16.2.6.Let a, b N. Show that there is a solution r, s, t Z to r = as + bt
such that s2 + r2 2b.
Definition 16.2.7. Let n N. The smallest real number n such that
21 n det(L)2/n
for all lattices L of rank n is called the Hermite constant.
362
2
2
4. Show
that the lattice L R with basis {(1, 0), (1/2, 3/2)} satisfies 1 =
(2/ 3) det(L).
(Optional) Show that L is equal to the ring of algebraic integers of Q( 3). Show
that centering balls of radius 1/2 at each point of L gives the most dense lattice
packing of balls in R2 .
Section 6.5.2 of Nguyen [453] lists the first 8 values of n , gives the bound
n
n e
(1 + o(1)) and gives further references.
n
2e
+ o(1)
i=1
!1/n
<
n det(L)1/n .
Proof: See Theorem 12.2.2 of [121]. (The term n can be replaced by n .)
The Gaussian heuristic states that the shortest non-zero vector in a random
lattice L of dimension n in Rn is expected to have length approximately
r
n
det(L)1/n .
2e
We refer to Section 6.5.3 of [453] and Section 6.5.3 of [288] for discussion and references.
16.3
There are several natural computational problems relating to lattices. We start by listing some problems that can be efficiently solved using linear algebra (in particular, the
Hermite normal form).
1. lattice membership: Given an n m basis matrix B for a lattice L Zm and a
vector v Zm determine whether v L.
2. lattice basis: Given a set of vectors b1 , . . . , bn in Zm (possibly linearly dependent)
find a basis for the lattice generated by them.
3. kernel lattice: Given an m n integer matrix A compute a basis for the lattice
ker(A) = {x Zm : xA = 0}.
4. kernel lattice modulo M : Given an m n integer matrix A and an integer M
compute a basis for the lattice {x Zm : xA 0 (mod M )}.
Exercise 16.3.1. Describe explicit algorithms for the above problems and determine
their complexity.
Now we list some computational problems that seem to be hard in general.
Definition 16.3.2. Let L be a lattice in Zm .
1. The shortest vector problem (SVP) is the computational problem: given a
basis matrix B for L, compute a non-zero vector v L such that kvk is minimal
(i.e., kvk = 1 ).
363
2. The closest vector problem (CVP) is the computational problem: given a basis
matrix B for L and a vector w Qm (one can work with high-precision approximations in Rm , but this is essentially still working in Qm ), compute v L such that
kw vk is minimal.
3. The decision closest vector problem (DCVP) is: given a basis matrix B for a
lattice L, a vector w Qm and a real number r > 0, decide whether or not there is
a vector v L such that kw vk r.
4. The decision shortest vector problem is: given a basis matrix B for a lattice
L and a real number r > 0 to decide whether or not there is a non-zero v L such
that kvk r.
5. Fix > 1. The approximate SVP is: given a basis matrix B for L, compute a
non-zero vector v L such that kvk 1 .
6. Fix > 1. The approximate CVP is: given a basis matrix B for L and a vector
w Qm , compute v L such that kw vk kw xBk for all x Zn .
In general, these computational problems are known to be hard2 when the rank is
sufficiently large. It is known that CVP is NP-hard (this is shown by relating CVP with
subset-sum; for details see Chapter 3 of [419]). Also, SVP is NP-hard under randomised
reductions and non-uniform reductions (see Chapter 4 of [419] for explanation of these
terms and proofs). Nguyen [453] gives a summary of the complexity results and current
best running times of algorithms for these problems.
On the other hand, if a lattice is sufficiently nice then these problems may be easy.
Example 16.3.3. Let L R2 be the lattice with basis matrix
1001
0
B=
.
0
2008
Then every lattice vector is of the form (1001a, 2008b) where a, b Z. Hence the shortest
non-zero vectors are clearly (1001, 0) and (1001, 0). Similarly, the closest vector to
w = (5432, 6000) is clearly (5005, 6024).
Why is this example so easy? The reason is that the basis vectors are orthogonal.
Even in large dimensions, the SVP and CVP problems are easy if one has an orthogonal
basis for a lattice. When given a basis that is not orthogonal it is less obvious whether
there exists a non-trivial linear combination of the basis vectors that gives a vector strictly
shorter than the shortest basis vector. A basis for a lattice that is as close to orthogonal
as it can be is therefore convenient for solving some computational problems.
2 We do not give details of complexity theory in this book; in particular we do not define the term
NP-hard.
Chapter 17
1 The algorithm was first written down by Lagrange and later by Gauss, but is usually called the
Gauss algorithm. We refer to [452] or Chapter 2 of [461] for the original references.
2 Chapter 1 of [461] gives an excellent survey of the historical development of the algorithm.
365
366
17.1
Let b1 , b2 R2 be linear independent vectors and denote by L the lattice for which they
are a basis. The goal is to output a basis for the lattice such that the lengths of the basis
vectors are as short as possible (in this case, successive minima). Lagrange and Gauss
gave the following criteria for a basis to be reduced and then developed Algorithm 23 to
compute such a basis.
Definition 17.1.1. An ordered basis b1 , b2 for R2 is Lagrange-Gauss reduced if kb1 k
kb2 k kb2 + qb1 k for all q Z.
The following theorem shows that the vectors in a Lagrange-Gauss reduced basis are
as short as possible. This result holds for any norm, though the algorithm presented
below is only for the Euclidean norm.
Theorem 17.1.2. Let 1 , 2 be the successive minima of L. If L has an ordered basis
{b1 , b2 } that is Lagrange-Gauss reduced then kbi k = i for i = 1, 2.
Proof: By definition we have
kb2 + qb1 k kb2 k kb1 k
for all q Z.
Let v = l1 b1 + l2 b2 be any non-zero point in L. If l2 = 0 then kvk kb1 k. If l2 6= 0
then write l1 = ql2 + r with q, r Z such that 0 r < |l2 |. Then v = rb1 + l2 (b2 + qb1 )
and, by the triangle inequality
kvk
(17.1)
is minimised at = hb1 , b2 i/B1 (to see this, note that the graph as a function of is a
parabola and that the minimum can be found by differentiating with respect to ). Since
we are working in a lattice we therefore replace b2 by b2 b1 where is the nearest
integer to . Hence lines 3 and 9 of Algorithm 23 reduce the size of b2 as much as possible
using b1 . In the one-dimensional case the formula b2 b1 is the familiar operation
ri+1 = ri1 ri1 /ri ri from Euclids algorithm.
Lemma 17.1.4. An ordered basis {b1 , b2 } is Lagrange-Gauss reduced if and only if
kb1 k kb2 k kb2 b1 k.
Proof: The forward implication is trivial. For the converse, suppose kb2 k kb2 b1 k.
We use the fact that the graph of F () = kb2 + b1 k2 is a parabola. It follows that the
367
368
17.1.1
si si1 + ri ri1
hb1 , b2 i
.
=
hb1 , b1 i
s2i + ri2
369
Hence the operation v = b2 b1 is v = (si1 qsi , ri1 qri ), which agrees with
Euclids algorithm. Finally, the Lagrange-Gauss algorithm compares the lengths of the
vectors v and b1 to see if they should be swapped. When si+1 is small compared with ri+1
then kvk is smaller than kb1 k. Hence the vectors are swapped and the matrix becomes
si1 qsi ri1 qri
.
si
ri
just as in Euclids algorithm.
The algorithms start to deviate once si become large (this can already happen on the
second iteration, as the below example shows). Further, Euclids algorithm runs until
ri = 0 (in which case si b) whereas Lagrange-Gauss reduction stops when ri si .
Example 17.1.11. Let a = 19 and b = 8. The sequence of remainders in the signed
Euclidean algorithm is 3, 1 while the Lagrange-Gauss lattice basis reduction algorithm
computes remainders 3, 2.
Example 17.1.12. Consider a = 8239876 and b = 1020301, which have gcd equal to
one. Let
0 b
B=
.
1 a
Running the Lagrange-Gauss algorithm on this matrix gives
540
379
.
619 1455
One can verify that
379 = 540a + t4 b where t4 = 4361
and
1455 = 619a + t5 b where t5 = 4999.
17.2
This section presents the crucial definition from [370] and some of its consequences. The
main result is Theorem 17.2.12, which shows that an LLL-reduced lattice basis does have
good properties.
Recall first that if b1 , . . . , bn is a set of vectors in Rm then one can define the GramSchmidt orthogonalisation b1 , . . . , bn as in Section A.10.2. We use the notation i,j =
hbi , bj i/hbj , bj i throughout.
As we have noted in Example 16.3.3, computational problems in lattices can be easy
if one has a basis that is orthogonal, or sufficiently close to orthogonal. A simple but
important observation is that one can determine when a basis is close to orthogonal by
considering the lengths of the Gram-Schmidt vectors. More precisely, a lattice basis is
close to orthogonal if the lengths of the Gram-Schmidt vectors do not decrease too
rapidly.
Example 17.2.1. Two bases for Z2 are {(1, 0), (0, 1)} and {(23, 24), (24, 25)}. In the
first case, the Gram-Schmidt vectors both have length 1. In the second case the Gram= (23, 24) and b2 = (24/1105, 23/1105), which have lengths
Schmidt vectors are b1
1105 33.24 and 1/ 1105 0.03 respectively. The fact that the lengths of the
Gram-Schmidt vectors dramatically decrease reveals that the original basis is not of good
quality.
370
Bi 2i,i1 Bi1
371
1. The Lovasz condition implies Bi ( 34 41 )Bi1 = 12 Bi1 and the result follows by
induction.
Pi1
2. From bi = bi + j=1 i,j bj we have
kbi k2
= hbi , bi i
*
+
i1
i1
X
X
i,j bj , bi +
=
bi +
i,j bj
j=1
= Bi +
i1
X
j=1
2i,j Bj ,
j=1
Pi1
j=1
2ij ) = Bi (1 + 14 (2i
Lemma 17.2.9. Let {b1 , . . . , bn } be an LLL reduced basis with = 1/4 + 1/ 2 0.957
for a lattice L Rm . Let the notation be as above. In particular, kbk is the Euclidean
norm.
1. Bj 2(ij)/2 Bi for 1 j i n.
2. Bi kbi k2 ( 16 + 2(i1)/2 )Bi for 1 i n.
3. kbj k 2i/4 kbi k for 1 j i n.
Exercise 17.2.10. Prove Lemma 17.2.9.
Lemma 17.2.11. Let {b1 , . . . , bn } be an ordered basis for a lattice L Rm and let
{b1 , . . . , bn } be the Gram-Schmidt orthogonalisation. Let 1 be the length of the shortest
non-zero vector in the lattice. Then
1 min kbi k.
1in
372
Proof:
1. From part 1 of Lemma 17.2.8 we have kbi k 2(i1)/2 kb1 k. Hence, part 1 implies
1
min kbi k
1in
1in
2(1n)/2 kb1 k.
Since {b1 , . . . , bi } are linearly independent we have i max1ji kbj k and by part
3 of Lemma 17.2.8 each kbj k 2(i1)/2 kbi k. Using kbi k kbi k we obtain the lower
bound on kbi k.
Qn
4. By Lemma 16.1.14 we have det(L) = i=1 kbi k. The result follows from kbi k
kbi k 2(i1)/2 kbi k.
n
Y
i=1
373
Corollary 17.2.13. If kb1 k kbi k for all 1 i n then b1 is a correct solution to SVP.
Exercise 17.2.14. Prove Corollary 17.2.13.
Exercise 17.2.15. Suppose L is a lattice in Zm and let {b1 , . . . , bn } be an LLL-reduced
basis. Rename these vectors as v 1 , . . . , v n such that 1 kv 1 k kv 2 k kv n k. Show
that one does not necessarily have kv1 k = kb1 k. Show that, for 1 i n,
1/(n+1i)
kv i k 2n(n1)/4 det(L)
.
As a final remark, the results in this section have only given upper bounds on the
sizes of kbi k in an LLL-reduced lattice basis. In many practical instances, one finds that
LLL-reduced lattice vectors are much shorter than these bounds might suggest.
17.3
The LLL algorithm requires computing a Gram-Schmidt basis. For the complexity analysis of the LLL algorithm it is necessary to give a more careful description and analysis
of the Gram-Schmidt algorithm than was done in Section A.10.2. We present pseudocode
in Algorithm 24 (the downto in line 4 is not necessary, but we write it that way for
future reference in the LLL algorithm).
Algorithm 24 Gram-Schmidt algorithm
Input: {b1 , . . . , bn } in Rm
Output: {b1 , . . . , bn } in Rm
1: b1 = b1
2: for i = 2 to n do
3:
v = bi
4:
for j := i 1 downto 1 do
5:
i,j = hbi , bj i/hbj , bj i
6:
v = v i,j bj
7:
end for
8:
bi = v
9: end for
374
n
di1 bi L Z as required.
4. By the previous results we have dj i,j = dj1 Bj hbi , bj i/Bj = hbi , dj1 bj i Z.
Exercise 17.3.3. Consider the vector v = bi
during iteration j. Show that
kvk2 = kbi k2
i1
X
k=j
Pi1
k=j
2i,k kbk k2 .
Deduce that kvk kbi k and that di1 v Zm throughout the loop in line 4 of the
algorithm.
Theorem 17.3.4. Let b1 , . . . , bn be vectors in Zm . Let X Z2 be such that kbi k2
X for 1 i n. Then the Gram-Schmidt algorithm performs O(n4 m log(X)2 ) bit
operations. The output size is O(n2 m log(X)).
Proof: One runs Algorithm 24 using exact Q arithmetic for the vectors bi . Lemma 17.3.2
Qi1
shows that the denominators in bi are all factors of di1 , which has size j=1 Bj
Qi1
2
i1
. Also, kbi k kbi k X, so the numerators are bounded by X i . The
j=1 kbj k X
size of each vector bi and, by Exercise 17.3.3, the intermediate steps v in the computation are therefore O(mi log(X)) bits, which gives the output size of the algorithm. The
computation hbi , bj i requires O(mn log(X)2 ) bit operations and the computation hbj , bj i
requires O(mn2 log(X)2 ) bit operations. As there are O(n2 ) vector operations to perform,
one gets the stated running time.
375
Corollary 17.3.5. Let the notation be as in Theorem 17.3.4 and let L be the lattice
in Zm with basis {b1 , . . . , bn }. Then one can compute det(L)2 in O(n4 m log(X)2 ) bit
operations.4
Qn
Proof: Lemma 16.1.14 implies det(L)2 = i=1 kbi k2 . One computes bi using exact
(naive) arithmetic over Q in O(n4 m log(X)2 ) bit operations. One computes each kbi k2
Q in O(mn2 log(X)2 ) bit operations. Since kbi k2 X and di1 kbi k2 Z it follows that
kbi k2 is a ratio of integers bounded by X n . One computes the product of the kbi k2
2
in O(n3 log(X)2 ) bit operations (since the integers in the product are bounded by X n ).
Finally, one can reduce the fraction using Euclids algorithm and division in O(n4 log(X)2 )
bit operations.
17.4
Lemma 17.4.1. Throughout the LLL algorithm the values bi and Bi for 1 i n and
i,j for 1 j < i n are all correct Gram-Schmidt values.
4 Since
det(L)2 Z while det(L) may not be rational if n < m, we prefer to work with det(L)2 .
376
Exercise 17.4.2. Prove Lemma 17.4.1. In other words, show that line 6 of the LLL
algorithm does not change bi or Bi for 1 i n. Similarly, line 12 of the algorithm does
not change any values except those mentioned in line 13.
It is illuminating to compare the LLL algorithm with the Lagrange-Gauss reduction
algorithm. The basic concept of size reduction followed by a swap is the same, however
there are two crucial differences.
1. The size reduction operation in the Lagrange-Gauss algorithm gives the minimal
value for kb2 + qb1 k over q Z. In LLL the coefficient k,j is chosen to depend on
bk and bj so it does not necessarily minimise kbk k. Indeed kbk k can grow during
the algorithm. Of course, in the two-dimensional case of LLL then 2,1 is the same
as the value used in the Lagrange-Gauss algorithm and so the size reduction step is
the same.
2. The size check in LLL (the Lovasz condition) is on the lengths of the Gram-Schmidt
vectors, unlike the size check in the Lagrange-Gauss algorithm, which is on the
length of the basis vectors themselves.
These features of LLL may seem counterintuitive, but they are essential to the proof that
the algorithm runs in polynomial-time.
Lemma 17.4.3. If bk and bk1 are swapped then the Gram-Schmidt vectors bi for 1
i n are changed as follows
1. For 1 i < k 1 and k < i < n then bi is unchanged.
2. The new value for bk1 is bk + k,k1 bk1 and the new value for Bk1 is Bk =
Bk + 2k,k1 Bk1 .
Proof: Denote by bi the new basis (i.e., bk1 = bk and bk = bk1 ), bi and i,j the new
Gram-Schmidt values and Bi the squares of the lengths of the bi . Clearly bi = bi for
1 i < k 1 and i,j = i,j for 1 j < i < k 1. Now
bk1
Hence, Bk1
= Bk + 2k,k1 Bk1 .
bk1
bk
bk
k2
X
k1,j bj
j=1
k2
X
k,j bj
j=1
+ k,k1 bk1 .
377
= bk
k1
X
=
=
j=1
= bk1
=
k,j bj
k2
X
j=1
k1,j bj
bk1
hbk1 , bk1
i/Bk1
bk .
bk1 k,k1 Bk1 /Bk1
1 2k,k1 Bk1 /Bk1
bk1
bk1
= Bk /Bk1
. Finally,
The result for bk follows since 1 2k,k1 Bk1 /Bk1
2
+ 2k,k1 Bk1
hbk , bk i/Bk1
= Bk1 Bk /Bk1
.
Bk = (Bk2 hbk1 , bk1 i/Bk1
Exercise 17.4.4. Give explicit formulae for updating the other Gram-Schmidt values in
lines 7 and 13 of Algorithm 25.
Exercise 17.4.5. Show that it is not necessary to store or update the values bi for
1 i n in the LLL algorithm once the values Bi have been computed.
Exercise 17.4.6. Show that the condition in line 9 of Algorithm 25 can be checked
immediately after k,k1 has been computed. Hence, show that the cases 1 j < k 1
in the loop in lines 5 to 8 of Algorithm 25 can be postponed to line 10.
Lemma 17.4.7. If the LLL algorithm terminates then the output basis is LLL reduced.
Exercise 17.4.8. Prove Lemma 17.4.7. Indeed, the fact that the Lovasz conditions are
satisfied is immediate. Prove the bounds on the i,j using the three following steps. Let
1 j < k and let bk = bk k,j bj .
1. Prove that hbj , bj i = hbj , bj i and hbj , bi i = 0 if j < i.
2. Hence, writing k,j = hbk , bj i/hbj , bj i, prove that |k,j | 1/2 for 1 j < k.
3. For j < i < k denote k,i = hbk , bi i/hbi , bi i. Prove that k,i = k,i .
In the next section we show that the LLL algorithm does terminate. Before then we
give an example and some further discussion.
Example 17.4.9. Let L be the lattice with basis matrix
1 0 0
B = 4 2 15 .
0 0 3
378
(3/4 22,1 )B1 and so we set k = 3. Now consider b3 . We compute 3,2 = 45/229 0.19
and, since q2 = 0 there is no reduction to be performed on b3 . We compute 3,1 = 0, so
again no size reduction is required. We now compute
b3 = b3
45
229 b2
We have B2 = 229 and B3 = hb3 , b3 i = 8244/52441 0.157. From this one can check
that B3 < (3/4 23,2 )B2 166.1. Hence we swap b2 and b3 and set k = 2.
At this point we have the vectors
b1 = (1, 0, 0) and b2 = (0, 0, 3)
and b1 = b1 , b2 = b2 . First check that 2,1 = 0 and so no size reduction on b2 is required.
Second, B1 = 1 and B2 = 9 and one checks that B2 > (3/4 22,1 )B1 = 0.75. Hence we
set k = 3. Now
b3 = (0, 2, 15)
and we compute 3,2 = 45/9 = 5. Hence we reduce
b3 = b3 5b2 = (0, 2, 0).
Now compute 3,1 = 0, so no reduction is required.
One computes 3,2 = 0, b3 = b3 and B3 = 4. Hence, B3 < (3/4 23,2 )B2 =
27/4 = 6.75 and so we should swap b2 and b3 and set k = 2. One can check that the
k = 2 phase runs without making any changes. We have B1 = 1 and B2 = 4. Consider
now k = 3 again. We have 3,2 = 3,1 = 0 and so b3 remains unchanged. Finally,
B3 = 9 > (3/4 23,2 )B2 = 3 and so we set k = 4 and halt.
Exercise 17.4.10. Perform the LLL algorithm by hand on the basis
{(1, 5, 0), (2, 5, 0), (8, 6, 16)} .
Exercise 17.4.11. Perform the LLL algorithm by hand on the basis
{(0, 3, 4), (1, 3, 3), (5, 4, 7)} .
Remark 17.4.12. Part 1 of Theorem 17.2.12 shows we have kb1 k 2(n1)/2 1 . In other
words, the LLL algorithm solves SVP up to an exponential factor but is not guaranteed
to output a shortest vector in the lattice. Hence, LLL does not officially solve SVP.
In practice, at least for relatively small dimensions, the vector b1 output by the LLL
algorithm is often much closer to the shortest vector than this bound would suggest, and
in many cases will be a shortest vector in the lattice. In Example 17.4.9, the theoretical
bound gives kb1 k 2, so (0, 2, 0) would have been a possible value for b1 (but it wasnt).
17.5
Complexity of LLL
We now show that the LLL algorithm terminates and runs in polynomial-time. The
original paper of Lenstra, Lenstra and Lovasz [370] proves polynomial termination for
any lattice in Rm but only gives a precise complexity for lattices in Zm .
Theorem 17.5.1. Let L be a lattice in Zm with basis b1 , . . . , bn and let X Z2 be such
that kbi k2 X for 1 i n. Let 1/4 < < 1. Then the LLL algorithm with factor
terminates and performs O(n2 log(X)) iterations.
379
Proof: We need to bound the number of backtracks in Algorithm 25. This number
is at most n plus the number of swaps. So it suffices to bound the number of swaps by
O(n2 log(X)).
For 1 i n 1 define the i m matrix B(i) formed by the first i basis vectors
T
for the lattice. Define di = det(B(i) B(i)
) Z, which is the square of the volume of the
sublattice generated by the rows of B(i) . Hence
di =
i
Y
Bj =
i
Y
j=1
j=1
kbi k2
i
Y
j=1
kbi k2 X i .
Define
D=
n1
Y
i=1
di =
n1
Y
Bini .
i=1
(n1)n/2
It follows that D X
.
Two vectors bk and bk1 are swapped when Bk < ( 2k,k1 )Bk1 . By Lemma 17.4.3,
Let di be the new values for the di . We have di = di when 1 i < k 1. By the Lovasz
condition Bk1
Bk1 . Hence, dk1 dk1 . Finally, since Bk1
Bk = Bk1 Bk we
380
bi
i1
X
i,j bj
j=1
i1
X
j=1
(17.2)
381
This gives a worst-case complexity of O(n8 m4 log(X)3 ) bit operations for lattice
basis reduction.
Some applications such as simultaneous Diophantine approximation (see Section 19.5)
and the hidden number problem (see Section 21.7.1) have at most m non-integer
entries, giving a complexity of O(n5 m4 log(X)3 ) bit operations.
17.6
There are many refinements of the LLL algorithm that are beyond the scope of the brief
summary in this book. We list some of these now.
As mentioned earlier, it is necessary to use floating-point arithmetic to obtain a fast
version of the LLL algorithm. A variant of floating-point LLL whose running time
grows quadratically in log(X) (rather than cubicly, as usual) is the L2 algorithm of
Nguyen and Stehle [451] (also see Stehle [578]).
Schnorr-Euchner deep insertions. The idea is that, rather than just swapping bk
and bk1 in the LLL algorithm, one can move bk much earlier in the list of vectors
if Bk is sufficiently small. With standard LLL we have shown that swapping bk and
bk1 changes Bk to Bk + 2k,k1 Bk1 . A similar argument shows that inserting bk
between bi1 and bi for some 1 < i < k changes Bk to
B = Bk +
k1
X
2k,j Bj
j=1
Hence, one can let i be the smallest index such that B < 34 Bi and insert bk between
bi1 and bi (i.e., reorder the vectors bi , . . . , bk as bk , bi , . . . , bk1 ). We refer to Schnorr
and Euchner [522] and Section 2.6.2 of Cohen [135] for more details.
Our presentation of the LLL algorithm was for the Euclidean norm. The algorithm
has been extended to work with any norm by Lovasz and Scarf [393] (also see Kaib
and Ritter [322]).
In practice, if one wants results for a norm other than the Euclidean norm, one
usually performs ordinary LLL reduction with respect to the Euclidean norm and
then uses the standard relations between norms (Lemma A.10.2) to determine the
quality of the resulting vectors.
Another important approach to lattice basis reduction is the block Korkine-Zolotarev
algorithm due to Schnorr [517]. We mention this further in Section 18.5.
Chapter 18
Exercise 18.0.2. Let L = Zn and w = (1/2, . . . , 1/2). Show that kw vk n/2 for
all v L. Hence, show that if n > 4 then kw vk > n for all v L.
18.1
Let L be a full rank lattice given by an (ordered) basis {b1 , . . . , bn } and let {b1 , . . . , bn } be
the corresponding Gram-Schmidt basis. Let w Rn . Babai [18] presented a method to
inductively find a lattice vector close to w. The vector v L output by Babais method
is not guaranteed to be such that kw vk is minimal. Theorem 18.1.6 shows that if the
383
384
U +y
w
bn
U + bn
bn
w
b
U
Figure 18.1: Illustration of the Babai nearest plane method. The x-axis represents the
subspace U (which has dimension n 1) and the y-axis is perpendicular to U .
lattice basis is LLL-reduced then kw vk is within an exponential factor of the minimal
value.
We now describe the method. Define U = span{b1 , . . . , bn1 } and let L = L U
be the sublattice spanned by {b1 , . . . , bn1 }. The idea of the nearest plane method is to
find a vector y L such that the distance from w to the plane U + y is minimal. One
then sets w to be the orthogonal projection of w onto the plane U + y (in other words,
w U + y and w w U ). Let w = w y U . Note that if w 6 L then w 6 L.
One inductively solves the (lower dimensional) closest vector problem of w in L to get
y L . The solution to the original instance of the CVP is v = y + y .
We now explain how to algebraically find y and w .
Lemma 18.1.1. Let
w=
n
X
lj bj
(18.1)
j=1
uU
Pn
Let w be as in equation (18.1) and let y = j=1 lj bj be any element of L for lj Z. One
Pn1
can write y = j=1 lj bj + ln bn for some lj R, 1 j n 1.
385
It follows that one must take ln = ln , and so the choice of y in the statement of the
Lemma is correct (note that one can add any element of L to y and it is still a valid
choice).
The vector w satisfies
w y =
n1
X
j=1
lj bj + ln (bn bn ) U,
n
X
j=1
lj bj
n1
X
j=1
lj bj ln bn = (ln ln )bn ,
(18.2)
Exercise 18.1.3. Let {b1 , . . . , bn } be an ordered basis for a lattice L. Let w Rn and
suppose that there is an element v L such that kv wk < 12 kbi k for all 1 i n.
Prove that the nearest plane algorithm outputs v.
The following Lemma is needed to prove the main result, namely Theorem 18.1.6.
Lemma 18.1.4. Let {b1 , . . . , bn } be LLL reduced (with respect to the Euclidean norm,
and with factor = 3/4). If v is the output of Babais nearest plane algorithm on input
w then
n
kw vk2 2 41 kbn k2 .
= kw w + w (y + y )k2
= kw w k2 + kw y k2
2
1
4 kbn k
2n1 1
kbn1 k2
4
386
Exercise 18.1.5. Prove that if v is the output of the nearest plane algorithm on input
w then
n
X
kv wk2 41
kbi k2 .
i=1
Theorem 18.1.6. If the basis {b1 , . . . , bn } is LLL-reduced (with respect to the Euclidean
norm and with factor = 3/4) then the output of the Babai nearest plane algorithm on
w Rn is a vector v such that kv wk < 2n/2 ku wk for all u L.
Proof: We prove the result by induction. For n = 1, v is a correct solution to the closest
vector problem and so the result holds.
Let n 2 and let u L be a closest vector to w. Let y be the vector chosen in the
first step of the Babai method. We consider two cases.
1. Case u U + y. Then ku wk2 = ku w k2 + kw wk2 so u is also a closest vector
to w . Hence u y is a closest vector to w = w y U . Let y be the output of
the Babai nearest plane algorithm on w . By the inductive hypothesis,
ky w k < 2(n1)/2 ku y w k.
Substituting w y for w gives
ky + y w k < 2(n1)/2 ku w k.
Now
kv wk2 = ky + y w k2 + kw wk2 < 2n1 ku w k2 + kw wk2 .
Using ku w k, kw wk ku wk and 2n1 + 1 2n gives the result.
2. Case u 6 U + y. Since the distance from w to U + y is 21 kbn k, we have kw uk
1
2 kbn k. By Lemma 18.1.4 we find
r
4
1
1
k
kw vk.
kb
n
2
2
2n 1
Hence, kw vk < 2n/2 kw uk.
This completes the proof.
One can obtain a better result by using the result of Lemma 17.2.9.
for all u L.
2n/4
ku wk < (1.6)2n/4 ku wk
kv wk < p
21
387
1 2
3
B = 3 0 3
3 7 3
and the vector w = (10, 6, 5) R3 . We perform the nearest plane method to find a lattice
vector close to w.
First compute the Gram-Schmidt basis b1 = (1, 2, 3), b2 = (24/7, 6/7, 12/7) and
37
14 b1
+ 2b2 +
3
20 b3 .
37
Since 3/20 = 0 we have y 3 = 0 and w = w = 14
b1 + 2b2 = (19/2, 7, 9/2). The process
X
w+
lj bj : lj R, |lj | 12
j=1
centered on w. Show that this parallelepiped has volume equal to the volume of the
lattice. Hence show that if w does not lie in the lattice then there is exactly one lattice
point in this parallelepiped.
388
1
2
min{kbi k : 1 i n}
Some improvements to the Babai nearest plane algorithm are listed in Section 3.4 of
[255]. Similar methods (but using a randomised choice of plane) were used by Klein [338]
to solve the CVP when the target vector is particularly close to a lattice point. Another
variant of the nearest plane algorithm is given by Lindner and Peikert [387]. The nearest
plane algorithm is known by the name VBLAST in the communications community
(see [437]).
18.2
An alternative to the nearest plane method is the rounding technique. This is simpler to
compute in practice, since it does not require the computation of a Gram-Schmidt basis,
but harder to analyse in theory. This method is also not guaranteed to solve CVP. Let
b1 , . . . , bn be a basis for a full rank lattice in Rn . Given a target w Rn we can write
w=
n
X
li b i
i=1
with li R. One computes the coefficients li by solving the system of linear equations
(since the lattice is full rank we can also compute the vector (l1 , . . . , ln ) as wB 1 ). The
rounding technique is simply to set
v=
n
X
i=1
li bi
where l means take the closest integer to the real number l. This procedure can be
performed using any basis for the lattice. Babai has proved that kv wk is within an
exponential factor of the minimal value if the basis is LLL-reduced. The method trivially
generalises to non-full-rank lattices as long as w lies in the R-span of the basis.
Theorem 18.2.1. Let b1 , . . . , bn be an LLL-reduced basis (with respect to the Euclidean
norm and with factor = 3/4) for a lattice L Rn . Then the output v of the Babai
rounding method on input w Rn satisfies
kw vk (1 + 2n(9/2)n/2 )kw uk
for all u L.
Proof: See Babai [18].
Pn
Babai rounding gives a lattice point v such that w v = i=1 mi bi where |mi | 1/2.
In other words, v lies in the parallelepiped, centered at w, defined by the basis vectors.
Since the volume of the parallelepiped is equal to the volume of the lattice, if w is not in
the lattice then there is exactly one lattice point in the parallelepiped. The geometry of
the parallelepiped determines whether or not an optimal solution to the CVP is found.
Hence, though the rounding method can be used with any basis for a lattice, the result
depends on the quality of the basis.
Example 18.2.2. Let b1 = (3, 2) and b2 = (2, 1) generate the lattice Z2 . Let w =
(0.4, 0.4) so that the solution to CVP is (0, 0). One can verify that (0.4, 0.4) =
1.2b1 2b2 and so Babai rounding yields b1 2b2 = (1, 0). Figure 18.2 shows the
parallelepiped centered at w corresponding to the basis. One can see that (1, 0) is the
only lattice point within that parallelepiped.
389
b
b
w
b
b
b
1
b
Figure 18.2: Parallelepiped centered at (0.4, 0.4) corresponding to lattice basis (3, 2)
and (2, 1).
Exercise 18.2.3. Consider the vector w = (0.4, 0.4) as in Example 18.2.2 again. Using
the basis {(1, 0), (0, 1)} for Z2 use the Babai rounding method to find the closest lattice
vector in Z2 to w. Draw the parallelepiped centered on w in this case.
We stress that the rounding method is not the same as the nearest plane method. The
next example shows that the two methods can give different results.
Example 18.2.4. Consider the CVP instance in Example 18.1.11. We have
w=
141
40 b1
241
120 b2
3
20 b3 .
7
0 1
1 17 1
3 0 10
and let
390
18.3
Another way to solve CVP is the embedding technique, due to Kannan (see page 437
onwards of [327]). Let B be a basis matrix for a lattice L and suppose w Rn (in practice
we assume w Qn ). A solution to the CVP corresponds to integers l1 , . . . , ln such that
w
n
X
li bi .
i=1
Pn
The crucial observation is that e = w i=1 li bi is such that kek is small.
The idea of the embedding technique is to define a lattice L that contains the short
vector e. Let M R>0 (for example M = 1). The lattice L is defined by the vectors
(which are a basis for Rn+1 )
(b1 , 0), , (bn , 0), (w, M ).
(18.3)
One sees that taking the linear combination of rows with coefficients (l1 , . . . , ln , 1)
gives the vector
(e, M ).
Hence, we might be able to find e by solving the SVP problem in the lattice L . One can
then solve the CVP by subtracting e from w.
Example 18.3.1. Consider the basis matrix
35
72
100
0
25
B = 10
20 279 678
for a lattice in R3 . We solve the CVP instance with w = (100, 100, 100).
Apply the LLL algorithm to the basis matrix (taking M = 1)
35
72
100 0
10
0
25 0
20 279 678 0
100
100
100 1
0
5
0
5
matrix
1
0
1
0
1
0
.
5
1
4
5 21 4
The first row is (0, 1, 0, 1), so we know that (0, 1, 0) is the difference between w and a
lattice point v. One verifies that v = (100, 100, 100) (0, 1, 0) = (100, 99, 100) is a lattice
point.
The success of the embedding technique depends on the size of e compared with the
lengths of short vectors in the original lattice L. As we have seen in Exercise 18.0.2, e
can be larger than n , in which case the embedding technique is not likely to be a good
way to solve the closest vector problem.
391
Lemma 18.3.2. Let {b1 , . . . , bn } be a basis for a lattice L Zn and denote by 1 the
shortest Euclidean length of a non-zero element of L. Let w Rn and let v L be a
closest lattice point to w. Define e = w v. Suppose that kek < 1 /2 and let M = kek.
Then (e, M ) is a shortest non-zero vector in the lattice L of the embedding technique.
Proof: All vectors in the lattice L are of the form
ln+1 (e, M ) +
n
X
li (bi , 0)
i=1
for some l1 , . . . , ln+1 Z. Every non-zero vector with ln+1 = 0 is of length at least 1 .
Since
k(e, M )k2 = kek2 + M 2 = 2M 2 < 221 /4
the vector (e, M ) has length at most 1 / 2. Since v is a closest vector to w it follows
that kek ke + xk for all x L and so every other vector (u, M ) L has length at least
as large. Finally, suppose |ln+1 | 2. Then
k(u, ln+1 M )k2 k(0, ln+1 M )k2 (2M )2
and so k(u, ln+1 M )k 2k(e, M )k.
Lemma 18.3.2 shows that the CVP can be reduced to SVP as long as the target vector
is very close to a lattice vector, and assuming one has a good guess M for the distance.
However, when using algorithms such as LLL that solve the approximate SVP it is not
possible, in general, to make rigorous statements about the success of the embedding technique. As mentioned earlier, the LLL algorithm often works better than the theoretical
analysis predicts. Hence the embedding technique can potentially be useful even when w
is not so close to a lattice point. For further discussion see Lemma 6.15 of Kannan [327].
Exercise 18.3.3. Let {b1 , . . . , bn } be a basis for a lattice in Rn and let w Rn . Let
M = max1in kbi k. Show that the output (e, M ) of the embedding technique (using
LLL) on the basis of equation (18.3) is the same as the output of the Babai nearest plane
algorithm when run on the LLL-reduced basis.
Exercise 18.3.4. Solve the following CVP instance using the embedding technique and
a computer algebra package.
265 287 56
B = 460 448 72 ,
w = (100, 80, 100).
50 49 8
18.4
We present a method to enumerate all short vectors in a lattice, given any basis. We will
show later that the performance of this enumeration algorithm depends on the quality of
the lattice basis. Throughout this section, kvk denotes the Euclidean norm.
The first enumeration method was given by Pohst in 1981. Further variants were given
by Finke and Pohst, Kannan [326, 327], Helfrich [280] and Schnorr and Euchner [522].
These methods are all deterministic and are guaranteed to output a non-zero vector
of minimum length. The time complexity is exponential in the lattice dimension, but
the storage requirements are polynomial. This approach is known by the name sphere
decoding in the communications community (see [437]).
392
Exercise 18.4.1. Let {b1 , . . . , bn } be an (ordered) basis in Rm for a lattice and let
{b1 , . . . , bn } be the Gram-Schmidt orthogonalisation. Let v Rm . Show that the projection of v onto bi is
hv, bi i
b .
kbi k2 i
Pn
Show that if v = j=1 xj bj then this projection is
n
X
xi +
xj j,i bi .
j=i+1
Lemma 18.4.2. Let {b1 , . . . , bn } be an (ordered) basis for a lattice and let {b1 , . . . , bn }
be
Fix A R>0 and write Bi = kbi k2 . Let v =
Pnthe Gram-Schmidt orthogonalisation.
2
i=1 xi bi be such that kvk A. For 1 i n define
zi = xi +
n
X
j,i xj .
j=i+1
n
X
i=1
zi2 Bi A.
Proof: Exercise 18.4.1 gives a formula zi bi for the projection of v onto each bi . Since
the vectors bi are orthogonal we have
kvk2 =
n
X
i=1
kzi bi k2 =
n
X
zi2 Bi .
i=1
Theorem 18.4.3. Let the notation be as in Lemma 18.4.2. Then one has
and, for 1 i < n,
2
n
n
X
X
xi +
zj2 Bj .
j,i xj Bi A
j=i+1
x2n
A/kbn k2
j=i+1
Proof: Note that zn = xn and Lemma 18.4.2 implies zn2 Bn A, which proves the first
statement. The second statement is also just a re-writing of Lemma 18.4.2.
We
now
sketch
the
enumeration
algorithm
for
finding
all
short
lattice
vectors
v
=
Pn
i=1 xi bi , which follows from the above results. First, without
p loss of generality we may
assume that xn 0. By Theorem 18.4.3 we know 0 xn A/Bi . For each candidate
xn one knows that
(xn1 + n,n1 xn )2 Bn1 A x2n Bn
and so
|xn1 + n,n1 xn |
p
(A x2n Bn )/Bn1 .
To phrase this as a bound on xn1 one uses the fact that for any a R, b R0 ,
the solutions
x R to |x + a| b satisfy (b + a) x b a. Hence, writing
p
M1 = (A x2n Bn ) /Bn1 one has
(M1 + n,n1 xn ) xn1 M1 n,n1 xn .
393
Exercise 18.4.4. Generalise the above discussion to show that for 1 i < n one has
(M1 + M2 ) xi M1 M2
where
u
u
n
X
u
x2j Bj /Bi
M1 = tA
j=i+1
and M2 =
Pn
j=i+1
j,i xj .
Exercise 18.4.5. Write pseudocode for the algorithm to enumerate all short vectors of
a lattice.
The algorithm to find a non-zero vector of minimal length is then straightforward. Set
A to be kb1 k2 , enumerate all vectors of length at most A and, for each vector, compute
the length. One is guaranteed to find a shortest vector in the lattice. Schnorr and
Euchner [522] organised the search in a manner to minimise the running time.
The running time of this algorithm depends on the quality of the basis in several ways.
First, it is evidently important to have a good bound A for the length of the shortest
k2 is only sensible if b1 is already rather short; alternatively one
vector. Taking A = kb1p
n
1/n
using the Gaussian heuristic (one can choose a
may choose, say, A =
2e det(L)
small bound for A and then, if the search fails, increase A accordingly). Second, one sees
that if bn is very short then the algorithm searches a huge range of values for xn , and
similarly if bn1 is very short etc. Hence, the algorithm performs best if the values kbi k
descrease rather gently.
To solve SVP in practice using enumeration one first performs LLL and other precomputation to get a sufficiently nice basis. We refer to Kannan [326, 327], Schnorr
and Euchner [522] and Agrell et al [7] for details. The best complexity statement in the
literature is due to Hanrot and Stehle.
Theorem 18.4.6. (Hanrot and Stehle [274]) There exists a polynomial p(x, y) R[x, y]
such that, for any n-dimensional lattice L in Zm with basis consisting of vectors with
coefficients bounded by B, one can compute all the shortest non-zero vectors in L in at
most p(log(B), m)nn/2e+o(n) bit operations.
Exercise 18.4.7. Let L be a lattice in Zn that contains qZn for some integer q. Let M
N be a fixed bound. Give an algorithm based on Wagners technique (see Section 13.8)
for finding vectors in L with all entries bounded by M . Determine the complexity of this
algorithm.
Due to lack of space we refer to the original papers for further details about enumeration algorithms. Pujol and Stehle [488] give an analysis of issues related to floating point
implementation.
In practice the most efficient enumeration methods for the SVP are heuristic pruning
methods. These methods are still exponential in the lattice dimension, and are not
guaranteed to output the shortest vector. The extreme pruning algorithm of Gama,
Nguyen and Regev [234] is currently the most practical method.
A quite different approach, leading to non-deterministic algorithms (in other words,
the output is a non-zero vector in the lattice that, with high probability, has minimal
length) is due to Ajtai, Kumar and Sivakumar (see [354] for a survey). The running time
and storage requirements of the algorithm are both exponential in the lattice dimension.
For some experimental results we refer to Nguyen and Vidick [462]. Micciancio and
Voulgaris [421] have given an improved algorithm, still requiring exponential time and
storage.
394
18.4.1
A R>0
The above ideas can be adapted to list lattice points close to someP
w Rn . LetP
n
n
2
and suppose we seek all v L such that kv wk A. Write v = i=1 xi bi = i=1 zi bi
as before and write
n
X
w=
yi bi .
i=1
Then kv wk A is equivalent to
n
X
(zi yi )2 kbi k2 A.
i=1
It follows that
yn
and so on.
p
p
A/Bn xn yn + A/Bn
u
n
u
X
u
(zj yj )2 Bj /Bi and
Mi = t A
Ni =
j=i+1
for 1 i n. If v =
Pn
i=1
n
X
j,i xj
j=i+1
yi Mi Ni xi yi + Mi Ni
18.5
Korkine-Zolotarev Bases
We present a notion of reduced lattice basis that has better properties than an LLLreduced basis.
Definition 18.5.1. Let L be a lattice of rank n in Rm . An ordered basis {b1 , . . . , bn } for
L is Korkine-Zolotarev reduced1 if
1. b1 is a non-zero vector of minimal length in L;
2. |i,1 | < 1/2 for 2 i n;
3. the basis {b2 2,1 b1 , . . . , bn n,1 b1 } is Korkine-Zolotarev reduced (this is the
orthogonal projection of the basis of L onto the orthogonal complement of b1 )
where bi is the Gram-Schmidt orthogonalisation and i,j = hbi , bj i/hbj , bj i.
One problem is that there is no known polynomial-time algorithm to compute a
Korkine-Zolotarev basis.
1 Some
395
4 2
i+3 2
i kbi k2
;
i+3
4 i
2.
n
Y
i=1
kbi k
nn
n
Y
i+3
i=1
det(L)2 .
Proof: See Theorem 2.1 and 2.3 of Lagarias, Lenstra and Schnorr [358].
As we have seen, for lattices of relatively small dimension it is practical to enumerate
all short vectors. Hence one can compute a Korkine-Zolotarev basis for lattices of small
dimension. Schnorr has developed the block Korkine-Zolotarev lattice basis reduction
algorithm, which computes a Korkine-Zolotarev basis for small dimensional projections
of the original lattice and combines this with the LLL algorithm. The output basis can
be proved to be of a better quality than an LLL-reduced basis. This is the most powerful
algorithm for finding short vectors in lattices of large dimension. Due to lack of space we
are unable to present this algorithm; we refer to Schnorr [517] for details.
Chapter 19
19.1
19.1.1
(19.1)
Vice versa, any such row vector corresponds to a polynomial. Throughout this section we
will interpret polynomials as row vectors, and row vectors as polynomials, in this way.
Theorem 19.1.2. (Howgrave-Graham [295]) Let F (x), X, M, bF be as above(i.e., there
is some x0 such that |x0 | X and F (x0 ) 0 (mod M )). If kbF k < M/ d + 1 then
F (x0 ) = 0.
Pn
Pn
Pn
Proof: Recall the Cauchy-Schwarz inequality ( i=1 xi yi )2 ( i=1 x2i )( i=1 yi2 ) for
xi , yi R. Taking xi 0 and yi = 1 for 1 i n one has
v
u n
n
X
u X
xi tn
x2i .
i=1
i=1
Now
d
d
d
X
X
X
i
|ai |X i d + 1kbF k < d + 1M/ d + 1 = M
|ai ||x0 |i
ai x0
|F (x0 )| =
i=0
i=0
i=0
399
where the third inequality is Cauchy-Schwarz. so M < F (x0 ) < M . But F (x0 )
0 (mod M ) and so F (x0 ) = 0.
Pd
Let F (x) = i=0 ai xi be a monic polynomial. We assume that F (x) has at least one
solution x0 modulo M such that |x0 | < X for some specified integer X. If F (x) is not
monic but gcd(ad , M ) = 1 then one can multiply F (x) by a1
d (mod M ) to make it monic.
If gcd(ad , M ) > 1 then one can split M and reduce the problem to two (typically easier)
problems. As explained above, to find x0 it will be sufficient to find a polynomial G(x)
with the same root x0 modulo M but with sufficiently small coefficients.
To do this, consider the d + 1 polynomials Gi (x) = M xi for 0 i < d and F (x). They
all have the solution x = x0 modulo M . Define the lattice L with basis corresponding to
these polynomials (by associating with a polynomial the row vector in equation (19.1)).
Therefore, the basis matrix for the lattice L is
M
0
0
0
0 MX
0
0
..
.
.. .
..
B= .
(19.2)
d1
0
0
MX
0
a0 a1 X ad1 X d1 X d
Every element of this lattice is a row vector that can be interpreted as a polynomial F (x)
(via equation (19.1) such that F (x0 ) 0 (mod M ).
Lemma 19.1.3. The dimension of the lattice L defined in equation (19.2) above is d + 1
and the determinant is
det(L) = M d X d(d+1)/2.
Exercise 19.1.4. Prove Lemma 19.1.3.
One now runs the LLL algorithm on this (row) lattice basis. Let G(x) be the polynomial corresponding to the first vector b1 of the LLL-reduced basis (since every row of B
has the form of equation (19.1) then so does b1 ).
Theorem 19.1.5. Let the notation be as above and let G(x) be the polynomial corresponding to the first vector in the LLL-reduced basis for L. Set c1 (d) = 21/2 (d + 1)1/d .
If X < c1 (d)M 2/d(d+1) then any root x0 of F (x) modulo M such that |x0 | X satisfies
G(x0 ) = 0 in Z.
Proof: Recall that b1 satisfies
kb1 k 2(n1)/4 det(L)1/n = 2d/4 M d/(d+1) X d/2 .
M
0
0
0
0
MX
0
0
.
B=
2
0
0
MX
0
222 5000X 10X 2 X 3
Running LLL on this matrix gives a reduced basis, the first row of which is
(444, 10, 2000, 2000).
The polynomial corresponding to this vector is
G(x) = 444 + x 20x2 2x3 .
Running Newtons root finding method on G(x) gives the solution x0 = 4.
19.1.2
The method in the previous section allows one to find small roots of modular polynomials, but it can be improved further. Looking at the proof of Theorem 19.1.5 one
n
sees that the requirement for
success is essentially det(L) < M (more precisely, it
d/4
d/(d+1) d/2
is 2 M
X
< M/ d + 1). There are two strategies to extend the utility of
the method (i.e., to allow bigger values for X). The first is to increase the dimension
n by adding rows to L that contribute less than M to the determinant. The second
is to increase the power of M on the right hand side. One can increase the dimension without increasing the power of M by using the so-called x-shift polynomials
xF (x), x2 F (x), . . . , xk F (x); Example 19.1.7 gives an example of this. One can increase the
power of M on the right hand side by using powers of F (x) (since if F (x0 ) 0 (mod M )
then F (x0 )k 0 (mod M k )).
Example 19.1.7. Consider the problem of Example 19.1.6. The lattice has dimension 4
and determinant M 3 X 3 . The condition for LLL to output a sufficiently small vector is
23/4 M 3 X 6
1/4
which, taking M = 10001, leads to X 2.07. (Note that the method worked for a larger
value of x0 ; this is because the bound used on LLL only applies in the worst case.)
Consider instead the basis matrix that also includes rows corresponding to the polynomials xF (x) and x2 F (x)
M
0
0
0
0
0
0
MX
0
0
0
0
2
0
0
MX
0
0
0
.
B=
222 5000X
10X 2
X3
0
0
0
222X 5000X 2
10X 3
X4
0
0
0
222X 2 5000X 3 10X 4 X 5
401
The dimension is 6 and the determinant is M 3 X 15 . The condition for LLL to output a
sufficiently small vector is
1/6
M
,
25/4 M 3 X 15
6
which leads to X 3.11. This indicates that some benefit can be obtained by using
x-shifts.
Exercise 19.1.8. Let G(x) be a polynomial of degree d. Show that taking d x-shifts
G(x), xG(x), . . . , xd1 G(x) gives a method that works for X M 1/(2d1) .
Exercise 19.1.8 shows that when d = 3 we have improved the result from X M 1/6
to X M 1/5 . Coppersmith [140] exploits both x-shifts and powers of F (x). We now
present the method in full generality.
Theorem 19.1.9. (Coppersmith) Let 0 < < min{0.18, 1/d}. Let F (x) be a monic
polynomial of degree d with one or more small roots x0 modulo M such that |x0 | <
1
1/d
. Then x0 can be found in time bounded by a polynomial in d, 1/ and log(M ).
2M
Proof: Let h > 1 be an integer that depends on d and and will be determined in
equation (19.3) below. Consider the lattice L corresponding (via the construction of the
previous section) to the polynomials Gi,j (x) = M h1j F (x)j xi for 0 i < d, 0 j < h.
Note that Gi,j (x0 ) 0 (mod M h1 ). The dimension of L is dh. One can represent L by a
lower triangular basis matrix with diagonal entries M h1j X jd+i . Hence the determinant
of L is
det(L) = M (h1)hd/2 X (dh1)dh/2 .
Running LLL on this basis outputs an LLL-reduced basis with first vector b1 satisfying
kb1 k < 2(dh1)/4 det(L)1/dh = 2(dh1)/4 M (h1)/2 X (dh1)/2 .
This vector corresponds to a polynomial
G(x) of degree dh 1 such that G(x0 )
which is equivalent to
c(d, h)X < M (h1)/(dh1)
Note that
dh = 1 + (d 1)/(d) and so c(d, h) = 2(1 + (d 1)/(d))d/(d1) , which
1
converges to 2 as 0. Since X < 12 M 1/d we require 21 c(d,h)
. Writing x =
a0 + a1 x + a2 x2 + a3 x3
which has a root x0 modulo M such that |x0 | 214 . Set X = 214 . Note that X M 1/4.4 .
One can verify that the basic method in Section 19.1.1 does not find the small root.
Consider the basis matrix (this is of smaller dimension than the lattice in the proof
of Theorem 19.1.9 in the case d = 3 and h = 3)
M2
0
0
0
0
0
0
0
M 2X
0
0
0
0
0
2 2
0
0
M
X
0
0
0
0
2
3
M a0 M a1 X
.
M
a
X
M
X
0
0
0
2
2
3
4
0
M
a
X
M
a
X
M
a
X
M
X
0
0
0
1
2
0
0
M a0 X 2
M a1 X 3
M a2 X 4
M X5
0
a20
2a0 a1 X (a21 + 2a0 a2 )X 2 (2a0 + 2a1 a2 )X 3 (a22 + 2a1 )X 4 2a2 X 5 X 6
The dimension is 7 and the determinant is M 9 X 21 . The first vector of the LLL reduced
basis is
(369928294330603367352173305173409792, 1451057442025994832259962670402797568, . . .)
This corresponds to the polynomial
369928294330603367352173305173409792 + 88565517701781911148679362207202x
3439987357258441728608570659x2 + 446358057645551896819258x3
+4564259979987386926x4 1728007960413053x5 21177681998x6
403
Remark 19.1.13. It is natural to wonder whether one can find roots right up to the
limit X = M 1/d . Indeed, the term can be eliminated by performing an exhaustive
search over the top few bits of the root x0 . An alternative way to proceed is to set
= 1/ log2 (M ), break the range |x0 | < M 1/d of size 2M 1/d into M 2 = 4 intervals of size
2M 1/d2 = M 1/d , and perform Coppersmiths algorithm for each subproblem in turn.
Another question is whether one can go beyond the boundary X = M 1/d . A first
observation is that for X > M 1/d one does not necessarily expect a constant number of
solutions; see Exercise 19.1.14. Coppersmith [141] gives further arguments why M 1/d is
the best one can hope for.
Exercise 19.1.14. Let M = p2 and consider F (x) = x2 + px. Show that if X = M 1/2+
where 0 < < 1/2 then the number of solutions |x| < X to F (x) 0 (mod M ) is 2M ,
Exercise 19.1.15. Let N = pq be a product of two primes of similar size and let e N
be a small integer such that gcd(e, (N )) = 1. Let 1 < a, y < N be such that there is an
integer 0 x < N 1/e satisfying (a + x)e y (mod N ). Show that, given N, e, a, y one
can compute x in polynomial-time.
19.2
Suppose one is given F (x, y) Z[x, y] and integers X, Y and M and is asked to find
one or more roots (x0 , y0 ) to F (x, y) 0 (mod M ) such that |x0 | < X and |y0 | <
Y . One can proceed using similar ideas to the above, hoping to find two polynomials
F1 (x, y), F2 (x, y) Z[x, y] such that F1 (x0 , y0 ) = F2 (x0 , y0 ) = 0 over Z, and such that
the resultant Rx (F1 (x, y), F2 (x, y)) 6= 0 (i.e., that F1 (x, y) and F2 (x, y) are algebraically
independent). This yields a heuristic method in general, since it is hard to guarantee
the independence of F1 (x, y) and F2 (x, y).
Theorem 19.2.1. Let F (x, y) Z[x, y] be a polynomial of total degree d (i.e., every
monomial xi y j satisfies i + j d). Let X, Y, M N be such that XY < M 1/d for
some 0 < < 1/d. Then one can compute (in time polynomial in log(M ) and 1/ > d)
polynomials F1 (x, y), F2 (x, y) Z[x, y] such that, for all (x0 , y0 ) Z2 with |x0 | < X,
|y0 | < Y and F (x0 , y0 ) 0 (mod M ), one has F1 (x0 , y0 ) = F2 (x0 , y0 ) = 0 over Z.
Proof: We refer to Jutla [321] and Section 6.2 of Nguyen and Stern [460] for a sketch of
the details.
19.3
We now consider F (x, y) Z[x, y] and seek a root (x0 , y0 ) Z2 such that both |x0 | and
|y0 | are small. Coppersmith has proved the following important result.
Theorem 19.3.1. Let F (x, y) Z[x, y] and let d N be such that degx (F (x, y)), degy (F (x, y))
d. Write
X
F (x, y) =
Fi,j xi y j .
0i,jd
For X, Y N define
W = max |Fi,j |X i Y j .
0i,j,d
If XY < W 2/(3d) then there is an algorithm that takes as input F (x, y), X, Y , runs in
time (bit operations) bounded by a polynomial in log(W ) and 2d , and outputs all pairs
(x0 , y0 ) Z2 such that F (x0 , y0 ) = 0, |x0 | X and |y0 | Y .
for 0 a, b < k
M Ik2
0
0
M Iw
Ik2
0
0
S
S Ik2 0 M Ik2
0 = 0
T
0
M Iw
0
0 Iw
0 M Iw
for some k 2 w matrix T . Further row operations yield a matrix of the form
S
0 T
0 0
405
It follows that the resultant Rx (F, G) is a non-zero polynomial, and so one can find
all solutions by finding the integer roots of Rx (F, G)(y) and then solving for x.
To determine the complexity it is necessary to compute the determinant of T and to
bound M . Coron shows that the method works if XY < W 2/(3d)1/k 29d . To get the
stated running time for XY < W 2/(3d) Coron proposes setting k = log(W ) and performing exhaustive search on the O(d) highest-order bits of x0 (i.e., running the algorithm
a polynomial in 2d times).
Example 19.3.2. Consider F (x, y) = axy + bx + cy + d = 127xy 1207x 1461y + 21
with X = 30, Y = 20. Let M = 1274 (see below).
Consider the 13 9 matrix (this is taking k = 2 in the above proof and introducing
the powers X i Y j from the start)
B=
aX 2 Y 2
0
0
0
M X 2Y 2
0
..
.
bX 2 Y
aX 2 Y
0
0
0
M X 2Y
cXY 2
0
aXY 2
0
0
0
dXY
cXY
bXY
aXY
0
0
0
bX 2
0
0
0
0
0
0
cY 2
0
0
0
0
dX
0
bX
0
0
0
0
dY
cY
0
0
0
0
0
d
0
0
..
.
a
0
0
0
b
a
0
0
c
0
a
0
d
c
b
a
aX 2 Y 2
aX
Y
aXY
aXY
2
2
16129X
16129Y
100125X
1064641Y
2048383Y 2
B =
2048383X
260144641Y
where blanks are zeroes and denotes an entry whose value we do not bother to write
down. Let L be the 5 5 diagonal matrix formed of columns 5 to 9 of rows 5 to 9 of B .
Performing LLL-reduction on L gives a matrix whose first row is
(16129X 2, 16129Y 2 , 1048258X, 983742Y, 28446222)
202558777
260144641
19.4
19.4.1
As discussed in Chapter 1, it is necessary to use padding schemes for RSA encryption (for
example, to increase the length of short messages and to prevent algebraic relationships
between the messages and ciphertexts). One simple proposal for -bit RSA moduli is to
take a bit message and pad it by putting ( 1) ones to the left hand side of
it. This brings a short message to full length. This padding scheme is sometimes called
fixed pattern padding; we discuss it further in Section 24.4.5.
Suppose short messages (for example, 128-bit AES keys K) are being encrypted using
this padding scheme with = 1024. Then
m = 21024 2128 + K.
Suppose also that the encryption exponent is e = 3. Then the ciphertext is
c = m3 (mod N ).
If such a ciphertext is intercepted then the cryptanalyst only needs to find the value
for K. In this case we know that K is a solution to the polynomial
F (x) = (21024 2128 + x)3 c 0 (mod N ).
This is a polynomial of degree 3 with a root modulo N of size at most N 128/1024 = N 1/8 .
So Coppersmiths method finds the solution K in polynomial-time.
Example 19.4.1. Let N = 8873554201598479508804632335361 (which is a 103 bit integer) and suppose Bob is sending 10-bit keys K to Alice using the padding scheme
m = 2100 210 + K.
407
N
0
0
0
0 NX
0
0
.
B=
2
0
0
NX
0
a0 a1 X a 2 X 2 X 3
Performing lattice reduction and taking the first row vector gives the polynomial with
factorisation
(x 987)(920735567540915376297 + 726745175435904508x + 277605904865853x2).
One can verify that the message is K = 987.
19.4.2
1
Proof: Write F (x) = (x + p) and note that N/2 p N . Let X = 2
N 1/4 .
2
We describe the lattice to be used. Let h 4 be an integer to be determined later
and let k = 2h. Consider the k + 1 polynomials
N h , N h1 F (x), N h2 F (x)2 , . . . , N F (x)h1 , F (x)h , xF (x)h , . . . , xkh F (x)h .
Note that if p = p + x0 and if G(x) is one of these polynomials then G(x0 ) 0 (mod ph ).
Consider the lattice corresponding to the above polynomials. More precisely, a basis
for the lattice is obtained by taking each polynomial G(x) above and writing the vector
of coefficients of the polynomial G(x) as in equation (19.1). The lattice has dimension
k + 1 and determinant N h(h+1)/2 X k(k+1)/2 .
Applying LLL gives a short
result, we
vector and, to apply Howgrave-Grahams
k/4
1/(k+1)
h
1/2
need
2
det(L)
<
p
/
k
+
1.
Hence,
since
p
>
(N/2)
,
it
is
sufficient
that
Taking h = k and simplifying gives < 2 . The case we have shown is = 1/2 and
< 1/4. For details see Exercise 19.4.5 or Theorems 6 and 7 of May [407].
Example 19.4.3. Let N = 16803551, p = 2830 and X = 10.
Let F (x) = (x + p) and consider the polynomials N, F (x), xF (x) = (x2 + px) and
2
x F (x), which all have the same small solution x0 modulo p.
We build the lattice corresponding to these polynomials (with the usual method of
converting a polynomial into a row vector). This lattice has basis matrix
N
0
0
0
p X
0
0
.
0 pX X 2
0
0
0 pX 2 X 3
The first row of the output of the LLL algorithm on this matrix is (105, 1200, 800, 1000),
which corresponds to the polynomial
G(x) = x3 + 8x2 120x + 105.
The polynomial has the root x = 7 over Z. We can check that p = p + 7 = 2837 is a
factor of N .
Exercise 19.4.4. Let N = 22461580086470571723189523 and suppose you are given the
approximation p = 2736273600000 to p, which is correct up to a factor 0 x < X =
50000. Find the prime factorisation of N using Coppersmiths method.
Exercise 19.4.5. Let > 0. Let F (x) be a polynomial of degree d such that F (x0 )
2
0 (mod M ) for some M | N , M = N and |x0 | 12 N /d . Generalise the proof of
Theorem 19.4.2 to show that given F (x) and N one can compute x0 in time polynomial
in log(N ), d and 1/.
Exercise 19.4.6. Coppersmith showed that one can factor N in time polynomial in
log(N ) given p such that |p p| < N 1/4 . Prove this result.
Exercise 19.4.7. Use Coppersmiths method to give an integer factorisation algorithm
1/4 ) bit operations. (A factoring algorithm with this complexity was also
requiring O(N
given in Section 12.5.)
Exercise 19.4.8. Show that the method of this section also works if given p such that
|
p kp| < N 1/4 for some integer k such that gcd(k, N ) = 1.
Exercise 19.4.9. Coppersmith also showed that one can factor N in time polynomial in
log(N ) given p such that p p (mod M ) where M > N 1/4 . Prove this result.
Exercise 19.4.10. Let N = pq with p q. Show that if one knows half the high order
bits of p then one also knows approximately half the high order bits of q as well.
19.4.3
409
Factoring pr q
As mentioned in Section 24.1.2, moduli of the form pr q, where p and q are distinct primes
and r N, can be useful for some applications. When r is large then p is relatively small
compared with N and so a natural attack is to try to factor N using the elliptic curve
method.
Boneh, Durfee and Howgrave-Graham [79] considered using Coppersmiths method to
factor integers of the form N = pr q when r is large. They observed that if one knows r
and an approximation p to p then there is a small root of the polynomial equation
F (x) = (
p + x)r 0 (mod pr )
and that pr is a large factor of N . One can therefore apply the technique of Section 19.4.2
The algorithm is to repeat the above for all p in a suitably chosen set. An analysis
of the complexity of the method is given in [79]. It
p is shown that if r log(p) then the
algorithm runs in polynomial-time and that if r = log2 (p) then the algorithm is asymptotically faster than using the elliptic curve method. One specific example mentioned
in [79] is that if p, q 2512 and r = 23 then N = pr q should be factored more quickly by
their method than with the elliptic curve method.
Exercise 19.4.11. Let N = pr q where p q, and so p N 1/(r+1) . Show that one
2
can factor N in O(N 1/(r+1) + ) bit operations. In particular, one can factor integers
2
1/9
1/16 )
N = p q in roughly O(N
) bit operations and integers N = p3 q in roughly O(N
bit operations.
When r is small it is believed that moduli of the form N = pr q are still hard to factor.
For 3076 bit moduli, taking r = 3 and p, q 2768 should be such that the best known
attack requires at least 2128 bit operations.
Exercise 19.4.12. The integer 876701170324027 is of the form p3 q where |p5000| < 10.
Use the method of this section to factor N .
19.4.4
Boneh [75], building on work of Goldreich, Ron and Sudan [256], used ideas very similar
to Coppersmiths method to give an algorithm for the following problem in certain cases.
Definition 19.4.13. Let X, p1 , . . . , pn , r1 , . . . , rn Z0 be such that p1 < p2 < < pn
and 0 ri < pi for all 1 i n. Let 1 e n be an integer. The Chinese
remaindering with errors problem (or CRT list decoding problem) is to compute
an integer 0 x < X (if it exists) such that
x ri (mod pi )
for all but e of the indices 1 i n.
Note that it is not assumed that the integers pi are coprime, though in many applications they will be distinct primes or prime powers. Also note that there is not necessarily
a solution to the problem (for example, if X and/or e are too small).
Exercise 19.4.14. A naive approach to this problem is to run the Chinese remainder
algorithm for all subsets S {p1 , . . . , pn } such that #S = (n e). Determine the
complexity of this algorithm. What is the input size of a Chinese remainder with errors
instance when 0 ri < pi ? Show that this algorithm is not polynomial in the input size
if e > log(n).
Proof: Boneh [75] gives a direct proof, but we follow Section 4.7 of May [408] and derive
the result using Exercise 19.4.5.
Let 0 x < X be an integer with M = amp(x) being divisible by at least n e of the
values pi . We have n log(p1 ) log(P ) n log(pn ) and (n e) log(p1 ) M n log(pn ).
2
Write M = P . Then Coppersmiths algorithm finds x if X < P in polynomial2
time in n and log(pn ) (note that Exercise 19.4.5 states the result for X < P but
we can remove the using the same ideas as Remark 19.1.13).
p Hence, it is sufficient
to give a bound on e so that log(X)/ log(P ) < 2 (i.e., > log(X)/ log(P )). Now,
= log(M )/ log(P ) (n e) log(p1 )/(n log(pn )). Hence, it is sufficient that
p
log(p1 )
n
log(X)/ log(P ),
(n e) log(p
)
n
which is equivalent to the equation in the Theorem.
For convenience we briefly recall how to perform the computation. One chooses appropriate integers a, a N and considers the lattice corresponding to the polynomials
Gi (x)
Hi (x)
=
=
that, by assumption, have at least one common small root x0 modulo M a . Using lattice
basis reduction one finds a polynomial F (x) that has small coefficients and that still has
the same root x0 modulo M a . Applying Theorem 19.1.2 one finds that F (x0 ) = 0 over Z
if M a is sufficiently large compared with x0 .
Exercise 19.4.16. Suppose p1p
, . . . , pn are the first n primes. Show that the above
algorithm works when e n n log(X) log(n). Hence verify that Bonehs algorithm
is polynomial-time in situations where the naive algorithm of Exercise 19.4.14 would be
superpolynomial-time.
Bleichenbacher and Nguyen [70] discuss a variant of the Chinese remaindering with
errors problem (namely, solving x ri (mod pi ) for small x, where each ri lies in a set of
m possible values) and a related problem in polynomial interpolation. Section 5 of [70]
gives some algorithms for this noisy CRT problem.
411
P4
0
0
0
0
0
0
RP 3
P 3X
0
0
0
0
0
2 2
2
2 2
R P
2RP X
P X
0
0
0
0
R3 P
3R2 P X 3RP X 2
P X3
0
0
0
.
3
2 2
3
4
R4
4R
X
6R
X
4RX
X
0
0
4
3 2
2 3
4
5
0
R X
4R X
6R X
4RX
X
0
0
0
R4 X 2
4R3 X 3 6R2 X 4 4RX 5 X 6
Note that the algorithm does not output 108 = 28 58 , since that number does not have
a very large gcd with P .
19.5
/Q 1 2
0
1 0
0
0 1
..
..
.
.
0
(19.5)
.
..
..
.
.
1
The dimension is n + 1 and the determinant is /Q = 2n(n+1)/4 n+1 . Every vector in the
lattice is of the form (q/Q, q1 p1 , q2 p2 , . . . , qn pn ). The entries of the lattice
are ratios of integers with absolute value bounded by max{X, 2n(n+1)/4 /n+1 }.
413
Note that the lattice L does not have a basis with entries in Z, but rather in Q.
By Remark 17.5.5 the LLL algorithm applied to L runs in O(n6 max{n log(X), n2 +
n log(1/)}3 ) bit operations (which is polynomial in the input size) and outputs a nonzero vector v = (q/Q, q1 p1 , . . . , qn pn ) such that
kvk 2n/4 det(L)1/(n+1) = 2n/4 2n/4 = < 1.
If q = 0 then v = (0, p1 , . . . , pn ) with some pi 6= 0 and so kvk 1, and so q 6= 0.
Without loss of generality, q > 0. Since kvk kvk it follows that q/Q < 1 and so
0 < q < Q/ = 2n(n+1)/4 (n+1) . Similarly, |qi pi | < for all 1 i n.
Exercise 19.5.4. Let 1 = 1.555111, 2 = 0.771111 and 3 = 0.333333. Let = 0.01
and Q = 106 . Use the method of this section to find a good simultaneous rational
approximation to these numbers.
See Section 17.3 of [237] for more details and references.
19.6
The basic problem is the following. Suppose positive integers a and b exist such that
d = gcd(a, b) is large. Suppose that one is not given a and b, but only approximations
a
, b to them. The problem is to find d, a and b. One issue is that there can be surprisingly
many solutions to the problem (see Example 19.6.4), so it may not be feasible to compute
all solutions for certain parameters. On the other hand, in the case b = b (i.e., one of the
values is known exactly, which often happens in practice) then there are relatively few
solutions.
Howgrave-Graham [296] has considered these problems and has given algorithms that
apply in various situations. We present one of the basic ideas. Let a = a
+x and b = b+y.
Suppose a
< b and define qa = a/d and qb = b/d. Then, since qa /qb = a/b, we have
qa y qb x
a
qa
.
=
b
bqb
qb
(19.6)
If the right hand side of equation (19.6) is small then performing Euclids algorithm on
a
/b gives a sequence of possible values for qa /qb . For each such value one can compute
b/qb = (dqb y)/qb = d + y/qb .
+ x b + y 0 (mod d).
If |y| < 21 qb then one has computed d exactly and can solve a
Note that one must use the basic extended Euclidean algorithm, rather than the improved
method using negative remainders as in Algorithm 1.
Exercise 19.6.1. Show that if a < b < b, b2/3 < d < 2b2/3 and |x|, |y| < 41 b1/3 then the
above method finds d, a and b.
Exercise 19.6.2. Let the notation be as above. Suppose |x|, |y| < b and d = b .
Explain why it is natural to assume > . Show that the above method succeeds if
(ignoring constant factors) < 1 + 2 and < 1
Exercise 19.6.3. Re-formulate this method in terms of finding a short vector in a 2 2
matrix. Derive the same conditions on and as in Exercise 19.6.2.
qa /qb to a
/b are 1, 45/46, 91/93, 409/418, 500/511, 1409/1440 and 1909/1951. Computing
approximations to a
/qa and b/qb for these values (except the first) gives the following
table.
a
/qa
b/qb
13717403.5
13714436.6
6783331.4
6783484.8
1509249.8
1509244.2
1234566.3
1234567.7
438100.2
438100.1
323354.2
323354.2
Any values around these numbers can be used as a guess for d. For example, taking
d = 13717403 one finds a
22 b + 136456 0 (mod d), which is a not particularly good
solution.
The four values d1 = 1234566, d2 = 1234567, d3 = 438100 and d4 = 323354 lead
to the solutions a
157 b 856 0 (mod d1 ), a
+ 343 b 345 0 (mod d2 ),
a
257 b 82 0 (mod d3 ) and a
371 b 428 0 (mod d4 ).
Howgrave-Graham gives a more general method for solving the problem that does not
require such a strict condition on the size of y. The result relies on heuristic assumptions
about Coppersmiths method for bivariate integer polynomials. We state this result as
Conjecture 19.6.5.
Conjecture
p 19.6.5. (Algorithm 14 and Section 4 of [296]) Let 0 < < 2/3 and <
1 /2 1 2 /2. There is a polynomial-time algorithm that takes as input a
< b
and outputs all integers d > b such that there exist integers x, y with |x|, |y| < b and
d | (
a + x) and d | (b + y).
Exercise 19.6.6. Let a
, b, X, Y N be given with X < a
< b. Give a brute force
algorithm to output all d > Y such that there exist x, y Z with |x|, |y| X and
d = gcd(
a + x, b + y). Show that the complexity of this algorithm is O(X 2 log(b)2 ) bit
operations.
We now mention the case when b = b (in other words, b is known exactly). The
natural approach is to consider the polynomial F (x) = a
+ x, which has a small solution
to the equation F (x) 0 (mod d) for some d | b. Howgrave-Graham applies the method
used in Section 19.4.2 to solve this problem.
Theorem 19.6.7. (Algorithm 12 and Section 3 of [296]) Let 0 < < 1 and < 2 .
There is a polynomial-time algorithm that takes as input a
, b and outputs all integers
d > b such that there exists an integer x with |x| < b and d | (
a + x) and d | b.
19.7
The learning with errors problem was proposed by Regev. There is a large literature on
this problem; we refer to Micciancio and Regev [420] and Regev [493] for background and
references.
Definition 19.7.1. Let q N (typically prime), R>0 , and n, m N with m > n.1
Let s (Z/qZ)n . The LWE distribution is the distribution on (Z/qZ)mn (Z/qZ)m
corresponding to choosing uniformly at random an m n matrix A with entries in Z/qZ
and a length m vector
c As + e (mod q)
1 For theoretical applications one should not assume a fixed number m of rows for A. Instead, the
attacker is given an oracle that outputs pairs (a, c) where a is a row of A and c = a s + e (mod q).
415
where the vector e has entries chosen independently from a discretised normal distribution2 on Z with mean 0 and standard deviation . The learning with errors problem
(LWE) is: Given (A, c) drawn from the LWE distribution, to compute the vector s. The
decision learning with errors problem (DLWE) is: Given A as above and a vector
c (Z/qZ)m , to determine whether (A, c) is drawn from the uniform distribution, or the
LWE distribution.
It is necessary to argue that LWE is well-defined since, for any choice s , the value
c As (mod q) is a possible choice for e. But, when m is sufficiently large, one value
for s is much more likely to have been used than any of the others. Hence, LWE is a
maximum likelihood problem. Similarly, DLWE is well-defined when m is sufficiently
large: if c is chosen uniformly at random and independent of A then there is not likely
to be a choice for s such that c As (mod q) is significantly smaller than the other
values c As (mod q). We do not make these arguments precise. It follows that m must
be significantly larger than n for these problems to be meaningful. It is also clear that
increasing m (but keeping n fixed) does not make the LWE problem harder.
We refer to [420] and [493] for surveys of cryptographic applications of LWE and
reductions, from computational problems in lattices that are believed to be hard, to LWE.
Note that the values m, q and in an LWE instance are usually determined by constraints
coming from the cryptographic application, while n is the main security parameter.
Example 19.7.2. Table 3 of Micciancio and Regev [420] suggests the parameters
(n, m, q, ) = (233, 4536, 32749, 2.8).
Lindner and Peikert [387] suggest (using Figure 4 and the condition m 2n + with
= 128)
(n, m, q, ) = (256, 640, 4093, 3.3).
Exercise 19.7.3. Show that if one can determine e then one can solve LWE efficiently.
Exercise 19.7.4. Show that, when q is prime, LWE R DLWE. Show that DLWE R
LWE.
We now briefly sketch two lattice attacks on LWE. These attacks can be avoided by
taking appropriate parameters. For other attacks on LWE see [493].
Example 19.7.5. (Lattice attack on DLWE using short vectors in kernel lattice modulo
q.) Suppose one can find a short vector w in the lattice
{w Zm : wA 0 (mod q)} .
Then w c = wAs + w e w e (mod q). If w is short enough then one might expect that
w e is a small integer. On the other hand, if c is independent of A then w c (mod q) is a
random integer modulo q. Hence, one might be able to distinguish the LWE distribution
from the uniform distribution using short enough vectors w.
Note that one is not obliged to use all the rows of A in this attack, and so one can
replace m by a much smaller value m . For analysis of the best value for m , and for
parameters that resist this attack, see Section 5.4.1 (especially equation (10)) of [420].
Example 19.7.6. (Reducing LWE to CVP.) We now consider a natural approach to
solving LWE using lattices. Since we always use row lattices, it is appropriate to take
2 In
/(2 2 ) .
19.8
There are a number of other applications of lattices in cryptography. We briefly list some
of them.
The improvement by Boneh and Durfee of Wieners attack on small private exponent
RSA. This is briefly mentioned in Section 24.5.1.
Solving the hidden number problem in finite fields and its applications to bit security
of Diffie-Hellman key exchange. See Section 21.7.1.
The attack by Howgrave-Graham and Smart on digital signature schemes in finite
fields when there is partial information available about the random nonces. See
Section 22.3.
The deterministic reduction by Coron and May from knowing (N ) to factoring N .
This is briefly mentioned in Section 24.1.3.
Cryptosystems Based on
Lattices
This is a chapter from version 1.1 of the book Mathematics of Public Key Cryptography
by Steven Galbraith, available from http://www.isg.rhul.ac.uk/sdg/crypto-book/ The
copyright for this chapter is held by Steven Galbraith.
This book is now completed and an edited version of it will be published by Cambridge
University Press in early 2012. Some of the Theorem/Lemma/Exercise numbers may be
different in the published version.
Please send an email to S.Galbraith@math.auckland.ac.nz if you find any mistakes.
All feedback on the book is very welcome and will be acknowledged.
We present some cryptosystems whose common feature is that they all rely on computational problems in lattices for their security. The subject of lattice based cryptography
is very active and there have recently been new ideas that revolutionised the field. It is
beyond the scope of this book to survey these recent developments.
19.9
The Goldreich-Goldwasser-Halevi (GGH) cryptosystem relies on the difficulty of the closest vector problem (CVP) in a lattice. The system is reminiscent of the McEliece cryptosystem, which we briefly recall in the next paragraph. Encryption for both systems is
randomised.
In the McEliece cryptosystem one chooses an error correcting code (some references
for error correcting codes are van Lint [388] and Chapter 18 of [605]) over a finite field
Fq (typically F2 ) given by a k n generator matrix G (where k < n) and publishes a
disguised version G = SGP where S and P are suitable invertible matrices (we refer
to Section 8.5 of Menezes, van Oorschot and Vanstone [415] for details). The public
key is G and the private key is (S, G, P ). To encrypt a message m Fkq one computes
c = mG + e where e Fnq is a randomly chosen error vector of low Hamming weight;
note that this computation is over Fq . To decrypt one uses the decoding algorithm for
the error correcting code.
The basic GGH public key encryption scheme is similar; we give an informal sketch of
the idea now. One chooses a nice basis B for a full rank lattice L Zn and publishes
a disguised basis B = U B for L where U is random unimodular matrix. A message
m Zn is encrypted as c = mB + e where e is a randomly chosen short error vector;
note that this computation is over Z. To decrypt one solves the closest vector problem,
using the nice basis B, to obtain the lattice point mB close to c; one can then obtain m.
417
419
2 3
3 5
giving
B = U B =
34 57
51 95
Let the message be m = (2, 5) and take e = (1, 1) (this is GGH encryption with
= 1). Then
c = mB + e = (186, 362).
To decrypt one computes
cB 1 (10.94, 19.05)
(note that mU = (11, 19) and eB 1 (0.06, 0.05)). We round the above to
(11, 19) and recover the message as m = (11, 19)U 1 = (2, 5).
Exercise 19.9.7. Show that GGH decryption gives the correct result as long as the
entries of eB 1 are real numbers of absolute value < 1/2. Let be the maximum, in
the 1 -norm, of the columns of B 1 . Show that if < 1/(2) then decryption gives the
correct result.
Exercise 19.9.8. For the public key in Example 19.9.6 decrypt the ciphertext c =
(220, 400).
As mentioned, the ciphertext in GGH encryption is considerably larger than the message. A precise analysis of this depends on the sizes of entries in B (which in turns
depends on the specific choices for B and U ). We do not give any estimates for the
ciphertext expansion.
Micciancio [418] proposed a variant of the GGH cryptosystem. The first idea is,
instead of choosing the public basis to be B = U B for a random matrix U SL2 (Z), one
can choose B to be the Hermite normal form (HNF) of B. There is no loss of security by
doing this, since anyone can compute the HNF of U B, and get the same result. The second
idea is to encode the message in the error vector rather than in the lattice point (this
is the same idea as discussed in Exercise 19.9.2) and to reduce it to the orthogonalized
parallelepiped (see Exercise 19.9.9). This results in significantly shorter ciphertexts than
the original GGH system and makes the encryption process deterministic. We refer to
[418] for further details.
xi bi : 0 xi 1 .
P=
i=1
19.10
We now discuss the one-way encryption (OWE) security of the GGH cryptosystem under
passive attacks. There are three natural ways to attack the GGH cryptosystem:
1. Try to obtain the private key B from the public key B .
2. Try to obtain information about the message from the ciphertext, given that the
error vector is small.
3. Try to solve the CVP of c with respect to the lattice L defined by B .
We also present a fourth attack, due to Nguyen, which exploits the particular format of
the error vectors in the GGH cryptosystem. Lattice basis reduction algorithms have a
role to play in the first and third of these attacks.
Computing a Private Key
For the first attack, we simply run a lattice basis reduction algorithm (such as LLL) on
the public basis matrix B . If we are lucky then it will output a basis B that is good
enough to allow the efficient solution of the required closest vector instances.
Example 19.10.1. Let
7 0 0
B = 0 23 0
0 0 99
1
0 0
1 3
1 0 . 0 1
U = 8
11 5 1
0 0
10
6 .
1
7
69
990
575 8514 .
B = 56
77 644 8019
421
For the second attack we exploit the fact that c = mB + e where e is a vector with small
entries. A naive attack is to try all values of the error vector e until c e lies in the image
of Zn B . A more subtle idea is to compute c(B )1 = m + e(B )1 and try to deduce
possible values for some entries of e(B )1 . For example, if the j-th column of (B )1
has particularly small norm then one can deduce that the j-th entry of e(B )1 is always
small and hence get an accurate estimate for the j-th entry of m. We refer to Section 4.2
of [255] for further discussion. To defeat this attack one should not naively encode the
message as a vector m Zn . Instead, one should only use some low-order bits of some
entries of m to carry information, or use an appropriate randomised padding scheme.
Solving the CVP Directly
For the third attack one can consider any of the algorithms listed in Chapter 18 for solving
the CVP. For example, one can use the Babai nearest plane algorithm or the embedding
technique.
Example 19.10.2. We use the public key and ciphertext from Example 19.10.1 and
recover the message using the embedding technique. Construct
7
69
990 0
56
575
8514 0
.
A=
77 644
8019 0
274 2368 30592 1
1
1
1
1
5
2
2
2
1 15 8
8
1 2 51 48
As desired, the first row is (1, 1, 1, 1) = (e, 1). From this one can compute the message
as m = (c e)(B )1 .
Exercise 19.10.3. For the public key from Example 19.10.1 use the embedding technique
to decrypt the ciphertexts c = (120, 1220, 18017) and c = (83, 714, 9010).
To defeat such attacks it is necessary that the lattice dimension is sufficiently large
and that the solution to the CVP instance is not too special. In particular, the error
vector should not be too short compared with the vectors the lattice.
Nguyens Attack
Nguyen noted that the choice of the error vector in the original GGH cryptosystem made
it extremely vulnerable to attack. Write = (, , . . . , ) Zn . The crucial observation
is that if c is a GGH ciphertext then c + mB (mod 2). If B is invertible modulo
2 (or even modulo a factor of 2) then one can already extract significant information
about the message m. Furthermore, if one successfully computes m0 m (mod 2), then
one obtains the simpler closest vector instance
c m0 B
e
= m B +
2
2
19.11
GGH Signatures
19.12. NTRU
423
from a lattice (when given a sufficiently good basis) such that the output is statistically
close to a Gaussian distribution. Hence, their paper gives3 a secure implementation of
the GGH signature concept.
19.12
NTRU
The NTRU4 cryptosystem was invented by Hoffstein, Pipher and Silverman. The original
proposal is phrased in terms of polynomial rings. We refer to Hoffstein, Pipher and
Silverman [287], Section 6.10 of [288] or Section 17.4 of [605] for a description of the
system in these terms.
The NTRU encryption scheme can also be described as a special case of Micciancios
variant of the GGH encryption scheme. The public key is a 2n 2n matrix
qIn 0
B=
H In
in Hermite normal form, where In is the n n identity matrix, q is an integer (typically
q = 28 or 210 ) and H is an n n matrix with entries in {0, 1, . . . , q 1}. The crucial
property of NTRU is that the matrix H is a circulant matrix, in other words, the rows
of H are just cyclic rotations of the first row of H. This means that to specify the NTRU
public key one only needs to specify q and the first row of H; the public key requires
O(n log2 (q)) bits.
The matrix H is constructed by the user in a special way so that they know a basis
for the lattice generated by B consisting of short vectors. Encryption proceeds as in the
Micciancio scheme. We refer to Section 5.2 of Micciancio and Regev [420] for further
details.
The details of the NTRU scheme have evolved over time. In particular, earlier parameter choices for NTRU had a noticeable probability of decryption failures, and this
property was used to develop active (i.e., not passive) attacks [297]. Hence, the currently
recommended parameters for NTRU have negligible probability of decryption failures.
The security of the NTRU cryptosystem relies on the difficulty of computing short
vectors in the NTRU lattice. One remark is that the NTRU lattice has a number of
special properties that can be used to improve the standard algorithms for finding short
vectors. In particular, if v is a short vector in the NTRU lattice then so are the n cyclic
rotations of v. As a sample of the literature on special properties of the NTRU lattice
we refer to May and Silverman [409], Gama, Howgrave-Graham and Nguyen [233] and
Gentry [250].
19.13
Knapsack Cryptosystems
Knapsack cryptosystems were proposed by Merkle and Hellman in 1978. As with NTRU,
the original description of knapsack cryptosystems made no reference to lattices. However
there is a general attack on knapsacks using lattices (indeed, this was the first application
of lattice basis reduction to public key cryptanalysis) and so it is natural to consider them
as a lattice-based cryptosystem. Though not used in practice, we briefly present knapsack
cryptosystems as they are an excellent source of exercises in cryptanalysis.
3 This
n
X
xi bi .
i=1
The name knapsack is a mis-use of subset sum. It comes from the idea of finding
out what is in a knapsack (a type of bag) just from its weight. The subset sum problem
is N P-complete.
Exercise 19.13.2. A decisional variant of Definition 19.13.1 is, given
Pn {b1 , . . . , bn } and
s N to decide whether or not there are xi {0, 1} such that s = i=1 xi bi . Prove that
these two computational problems are equivalent.
Pn
Exercise 19.13.3. Let notation be as in Definition 19.13.1 and let B = i=1 bi . Give a
time-memory tradeoff algorithm to find the solution xi {0, 1}, or show none exists, in
O(n2n/2 log(B)2 ) bit operations and with O(n2n/2 log(B)) bits of storage.
The attack of Exercise 19.13.3 has been greatly improved by Shamir and Schroeppel (we do not have space for the details; see Section 8.1.2 of Joux [314]). A further
improvement has been given by Howgrave-Graham and Joux [299]. Wagners algorithm
(see Section 13.8) does not seem to be directly applicable to the subset sum problem,
though has been used to solve P
the modular subset sum problem (i.e., given {bi }, s and
n
m to find xi {0, 1} such that i=1 xi bi s (mod m)) by Wagner (also see the work of
Lyubashevsky and Shallue).
Exercise 19.13.4. Show that every subset sum instance can be reduced to an instance
where the weights satisfy gcd(b1 , . . . , bn ) = 1.
Pn
The motivating idea of a knapsack cryptosystem is that computing s = i=1 xi bi is a
one-way function. The remaining problem is to design subset sum instances that can be
efficiently solved using a private key. To do this one first considers easy instances of the
subset sum problem.
Definition 19.13.5. A sequence b1 , . . . , bn in N is superincreasing if, for each 2 i n
bi >
i1
X
bj .
j=1
There is an efficient greedy algorithm to solve the subset sum problem if the bi are a
superincreasing sequence: Just subtract the largest possible value from s and repeat.
Example 19.13.6. The sequence
1, 2, 4, 8, . . . , 2n1
is a superincreasing sequence. Decomposing an integer s with respect to this sequence is
the same as writing it in binary.
Exercise 19.13.7. Consider the superincreasing sequence
1, 5, 7, 20, 35, 80, 170.
Decompose s = 112 with respect to this sequence.
425
19.13.1
The idea of the Merkle-Hellman knapsack cryptosystem is to have a superincreasing sequence as the private key but to disguise this for the public key. We briefly sketch the
algorithms for the textbook Merkle-Hellman knapsack cryptosystem (for more details
see Section 8.6 of [415]). The length n of the sequence is a security parameter.
KeyGen(n):
Generate a superincreasing sequence b1 , . . . , bn in N. Choose a modulus
Pn
M > i=1 bi and a random integer W coprime to M . Select a random permutation
of the integers {1, . . . , n}. Define ai = W b(i) (mod M ). The public key is
a = (a1 , . . . , an ) and the private key is , W, M, b = (b1 , . . . , bn ).
The message space is Mn = {0, 1}n (i.e., binary strings of length n).
To Encrypt a message m = (m1 , . . . , mn ) where mi {0, 1} a user computes the
integer
n
X
c=ma=
mi ai
i=1
To Decrypt, the user with the private key multiplies c by W 1 (mod M ) to obtain
0 s < M . The user can solve the subset sum problem for s with respect to the
superincreasing sequence. (If there is no solution then the decryption algorithm outputs the invalid ciphertext symbol .) The message is then obtained by permuting
the sequence xi using 1 .
Exercise 19.13.13. Show that decryption does recover the message.
Example 19.13.14. Consider the superincreasing sequence from Exercise 19.13.7
1, 5, 7, 20, 35, 80, 170.
We disguise using modulus 503 and multiplier 430 (and taking to be the identity permutation for simplicity) to get the public key
430, 138, 495, 49, 463, 196, 165.
i=1,ai
xi (mod 2).
odd
In general one prefers the cipertext to have a similar size to n. Exercise 19.13.21 shows
that it is impossible to have a ciphertext of exactly the same bit-length as the message
when using knapsacks.
Exercise 19.13.21. Show that a ciphertext in the textbook Merkle-Hellman scheme
is expected to require at least n + log2 (n) 2 bits.
Exercise 19.13.22. It is sometimes stated in the literature that a Merkle-Hellman public
key must have density less than 1. Show that this is not the case.
To avoid attacks (to be described in the next section) it was proposed to iterate the
Merkle-Hellman procedure t times. In other words,
first choose a superincreasing sequence
P
b1 , . . . , bn , choose (M1 , W1 ) such that M1 > ni=1 bn and compute
a1,i = W bi (mod M1 )
Pn
for 1 i n. Then choose (M2 , W2 ) such that M2 > i=1 a1,i and compute a2,i =
W a1,i (mod M2 ) for 1 i n and so on. The public key is at,1 , . . . , at,n . One can
then apply a permutation to the public key if necessary. The original Merkle-Hellman
cryptosystem is the case t = 1, which is sometimes given the anachronistic name singleiterated Merkle-Hellman.
Exercise 19.13.23. Give the decryption algorithm for the iterated Merkle-Hellman system.
Exercise 19.13.24. Show that in iterated Merkle-Hellman one expects Mi+1 > (n/2)Mi
for 1 i < n. Hence, show that the ciphertext in an iterated Merkle-Hellman system is
at least t(log2 (n) 1) + n + log2 (n) 2 bits. Determine the expected density of the public
key.
427
It follows that one cannot iterate the Merkle-Hellman construction too many times.
In the next section we will sometimes assume that b1 b2 < M . Exercise 19.13.25 shows
that if this is not the case then ciphertexts are roughly double the length of the message,
and hence are less desirable for practical applications.
Exercise 19.13.25. Let b1 , . . . , bn be a superincreasing sequence and suppose M >
P
n
i=1 bi is such that b1 b2 > M . Show that the average ciphertext size is at least 2n +
log2 (n) 6 bits.
19.13.2
We now give a number of attacks on the knapsack cryptosystem, some of which are
easy exercises. For a thorough discussion of the history of these attacks see Brickell and
Odlyzko [104] and Odlyzko [467].
We remark that there is not necessarily a unique private key for a given MerkleHellman knapsack public key since {W 1 ai (mod M ) : 1 i n} might be a superincreasing sequence for more than one choice of (M, W ).
Exercise 19.13.26. Show that, given a Merkle-Hellman knapsack public key (a1 , . . . , an ),
one can efficiently determine whether a guess for (M, W ) provides a useful private key.
We now show that the scheme is insecure if the first elements of the superincreasing
sequence are known.
Example 19.13.27. Let a1 , . . . , an be a Merkle-Hellman knapsack public key. Suppose
one knows the first two elements b1 and b2 of the superincreasing sequence. We show how
to recover the private key.
First, suppose no permutation is used. Then a1 W b1 (mod M ) and a2 W b2 (mod M ).
It follows that a1 b2 a2 b1 (mod M ) and so M is a factor of (a1 b2 a2 b1 ). Since b1 and b2
are small, ai 2n (perhaps n = 256) and (a1 b2 a2 b1 ) is not expected to have any special
form, it is natural to assume that this factoring problem is fairly easy. Furthermore, since
max{ai : 1 i n} < M and we expect M < 2 max{ai : 1 i n} (i.e., not all
ai < M/2) there are few possible values for M .
For each possible value of M one can compute W = a1 b1
1 (mod M ) and then test
whether the values W 1 ai (mod M ) look like a superincreasing sequence.
To deal with the permutation just repeat the attack for all triples (ai , aj ) with 1
i, j n distinct. If (i, j) does not correspond to the correct permutation of (1, 2) then
probably either (ai b2 aj b1 ) does not have a factor of the right size, or the corresponding
W do not yield superincreasing sequences.
Exercise 19.13.28. Perform the attack of Example 19.13.27 for the Merkle-Hellman
public key
8391588, 471287, 8625204, 906027, 8328886
given that b1 = 44899 and b2 = 1048697 (with no permutation used).
In practice, due to Example 19.13.27, one would take b1 and b2 to have around 2/2
bits each for security parameter .
We now show why M must be kept secret.
Example 19.13.29. Suppose M is known to the attacker, who wants to compute W and
hence the superincreasing sequence b1 , . . . , bn .
First, assume no permutation is used. Since
ai bi W (mod M )
bi
ki
U
=
.
M
ai
ai M
(19.7)
Since the bi are superincreasing we have bi < M/2ni and so 0 U/M ki /ai <
1/(ai 2ni ). In particular, U/M k1 /a1 < 1/(a1 2n1 ) is very small.
We now observe that to break the Merkle-Hellman knapsack it is sufficient to find
any pair (u, m) of positive integers such that uai (mod m) is a superincreasing sequence
(or at least is similar enough to such a sequence that one can solve the subset sum
problem). We show in the next paragraph that if k1 /a1 is close enough to U/M then
taking (u, m) = (k1 , a1 ) will suffice.
Subtracting the case i = 1 of equation (19.7) from the i-th gives
ki
bi
b1
a1 b i ai b 1
k1
=
a1
ai
ai M
a1 M
a1 ai M
and so, for 2 i n,
|ai k1 a1 ki | = |a1 bi ai b1 |/M < 2M bi /M = 2bi < M/2ni1 .
(19.8)
429
Hence, consider the following basis matrix (where 0 < < 1 is a parameter analogous to
/Q in equation (19.5) and where 1 < l n)
a2
a3
al
0 a1
0
0
a
1
(19.9)
.
..
..
.
.
.
.
.
.
.
.
0
a1
409/32
13
21
50
2021/32
0
0
0
.
755/32 108 19
51
205/8
137 556 216
One recovers k1 = 409 (and the first few values ai k1 a1 ki ). One can even get the result
using = 1/8 and l = 3. The LLL-reduced basis is
409/8 13 21
63/8
82 23 .
385/8 52 84
s
is
the
subset
sum
instance
of the complement
i
i=1
x1 xn (where
0 = 1 and
1 = 0). Since one can repeat any attack on s and s in turn
we may always assume that at most half the entries of the solution are non-zero.
The basic method is to consider the lattice L with basis
In a
(19.10)
0 s
where In is an n n identity matrix and a is the list of weights represented as a column
vector. Then
v = (x1 , x2 , . . . , xn , 0)
is a vector in the lattice. Since kvk n this vector is very short and so one could hope
to find it using lattice basis reduction.
Example 19.13.34. Consider the subset sum instance from Example 19.13.14. Reducing
the basis
1 0 0 0 0 0 0 430
0 1 0 0 0 0 0 138
0 0 1 0 0 0 0 495
0 0 0 1 0 0 0
49
0 0 0 0 1 0 0 463
0 0 0 0 0 1 0 196
0 0 0 0 0 0 1 165
0 0 0 0 0 0 0 942
431
1
0
1
0
1
0
1
0
0
0
0
0
1
2
0
0
0 1
1
0
1 0 1 1
1 0 2 0
2 1 0
0
2 1
0
1
1 1 1 1
1 2
0
0
1 0
0
0
0
0
1
1
0 1
0 1
.
0
1
0
1
1 2
3 0
One sees that the message (1, 0, 0, 1, 1, 0, 0, 0) appears as the first row (smallest vector) in
the lattice.
Exercise 19.13.35. Consider the knapsack public key
2381, 1094, 2188, 2442, 2280, 1129, 1803, 2259, 1665
and ciphertext 7598. Determine the message using the direct lattice method.
Lagarias and Odlyzko analysed the method for random subset sum instances of a
given size. They showed (Theorem 3.3 of [359], also see Section 2 of [153]) that for randomly chosen weights ai of size 2n with > 1.54725 (i.e., random subset sum instances
of density at most 0.6463) then with overwhelming probability (as n tends to infinity) the
desired solution vector x is the shortest non-zero vector in the lattice. If one can solve
the shortest vector problem then one therefore can break the cryptosystem.
There are therefore two problems to overcome. First, the statement only holds for
randomly generated weights of a given size and so does not say anything concrete about
specific instances. Second, there is no known efficient algorithm to solve SVP exactly.
This latter point is a serious problem: as seen in Example 19.13.34 there are many
very small vectors in the lattice that do not have entries only in {0, 1} (these are often
called parasitic solutions to the subset sum instance). Hence, for large n, it is quite
possible that LLL outputs a short basis that does not include the desired solution vector.
Nevertheless, the LLL algorithm does work well in practice and can be used to solve
subset sum instances when the density is not too high.
Theorem 3.5 of Lagarias and Odlyzko [359] shows (for randomly chosen weights ai of
2
size 2(1/2+)n with > 0) that with overwhelming probability (as n tends to infinity) the
desired solution vector x is computed using the LLL algorithm. This is done by showing
that the parasitic solutions all have significantly larger size. The problem with this result
is that it only applies when the density satisfies d (1/2 + )1 1/n, which is extremely
low.
Coster, Joux, LaMacchia, Odlyzko, Schnorr and Stern [153] improved the method by
replacing the last row of the lattice in equation (19.10) by (1/2, 1/2, . . . , 1/2, s). Under the
same simplifying assumptions as used by Lagarias and Odlyzko they showed the attack
could be applied for instances with density d < 0.9408. Again, although their method
officially requires an efficient algorithm for SVP, solving the approximate SVP using LLL
works well in practice as long as n is not too large. An alternative formulation of this
method is given in Section 3.2 of Nguyen and Stern [460].
The direct lattice attacks require lattices of dimension n so can be defeated by choosing n sufficiently large. Hence, the high-density subset sum problem remains hard in
general. The problem with knapsack cryptosystems is that one needs to iterate the basic
Merkle-Hellman construction sufficiently many times to avoid the attacks presented earlier. Iterating the Merkle-Hellman method lowers the lattice density and this can make
Chapter 20
20.1
The discrete logarithm problem (DLP) was defined in Definition 13.0.2. Our main interest
is the DLP in an algebraic group or algebraic group quotient over a finite field Fq (for
example, elliptic curves, the multiplicative group of a finite fields, tori etc). We always
use multiplicative notation for groups in this chapter. As discussed in Section 13.2, in
practice we usually restrict to groups of prime order r.
Recall that the difficulty of the DLP is defined with respect to an instance generator
that runs on input a security parameter . An algorithm to solve the DLP with respect to
a given instance generator is only required to succeed with a noticeable probability. The
discrete logarithm assumption is that there exist instance generators that, on input
, output instances of the DLP such that no algorithm A running in polynomial-time in
can solve the DLP apart from with negligible (in ) probability. The cryptosystems in
this chapter rely on the discrete logarithm assumption (and other assumptions).
435
436
20.2
Key Exchange
20.2.1
The starting point of discrete logarithms (indeed, of public key cryptography) is the
seminal paper of Diffie and Hellman [178] from 1976 (more recently it became known that
this idea was also found by Williamson at GCHQ in 1974).
Suppose Alice and Bob want to agree on a random key K. Assume they both know
an algebraic group or algebraic group quotient G and some element g G of prime order
r (everyone in the world could use the same g). They perform the following protocol:
Alice chooses a random integer 0 < a < r and sends c1 = g a to Bob.
Bob chooses a random integer 0 < b < r and sends c2 = g b to Alice.
On receiving c2 Alice computes K = ca2 .
On receiving c1 Bob computes K = cb1 .
Hence, both players share the key K = g ab . One can derive (see Definition 20.2.10
below) a bitstring from the group element K for use as the key of a symmetric encryption
scheme. Hence, encryption of data or other functionalities can be implemented using
traditional symmetric cryptography. The key K is called the session key and the values
c1 , c2 in the protocol are called messages or ephemeral keys.
We discuss the security of key exchange protocols (in particular, person-in-the-middle
attacks and authenticated key exchange) in Section 20.5. For the remainder of this section
we consider the simplest possible attacker. A passive attacker or eavesdropper (i.e.,
an attacker who learns g, c1 and c2 , but does not actively interfere with the protocol)
cannot determine K unless they can solve the following computational problem.
Definition 20.2.1. The Computational Diffie-Hellman problem (CDH)1 is: given
the triple (g, g a , g b ) of elements of G to compute g ab .
An extensive discussion of the computational Diffie-Hellman problem will be given in
Chapter 21.
Exercise 20.2.2. What is the solution to the CDH instance (2, 4, 7) in the group F11 ?
Suppose one is an eavesdropper on a Diffie-Hellman session and tries to guess the
session key K shared by Alice and Bob. The following computational problem is precisely
the problem of determining whether the guess for K is correct. This problem arises again
later in the chapter in the context of Elgamal encryption.
Definition 20.2.3. Let G be a group and g G. The Decisional Diffie-Hellman
problem (DDH) is, given a quadruple (g, g a , g b , g c ) of elements in hgi to determine
whether or not g c = g ab .
Saying that a computational problem such as DDH is hard is slightly less straightforward than with problems like DLP or CDH, since if (g, g a , g b , g c ) are chosen uniformly at
random in G4 then the solution to the DDH problem is no with overwhelming probability. Clearly, an algorithm that says no all the time is not solving the DDH problem,
1 This assumption comes in two flavours, depending on whether g is fixed or variable. We discuss this
issue in more detail later. But, as is the convention in this book, whenever we write Given...compute...
one should understand that all of the inputs are considered as variables.
437
so our notion of success must capture this. The correct approach is to define a DDH
solver to be an algorithm that can distinguish two distributions on G4 , namely the distribution of Diffie-Hellman tuples (i.e., the uniform distribution on tuples of the form
(g, g a , g b , g ab ) G4 ) and the uniform distribution on G4 .
Definition 20.2.4. Let (Gn , rn ) be a family of cyclic groups Gn of order rn , for n N.
A DDH algorithm for the family Gn is an algorithm A that takes as input a quadruple
in G4n and outputs yes or no. The advantage of the DDH algorithm A is
Adv(A) = Pr A g, g a , g b , g ab = yes : g Gn , a, b Z/rZ
Pr A g, g a , g b , g c = yes : g Gn , a, b, c Z/rZ .
A DDH algorithm is called successful if the advantage is noticeable. The DDH assumption for the family of groups is that all polynomial-time (i.e., running time O(log(rn )c )
for some constant c) DDH algorithms have negligible advantage.
Lemma 20.2.5. DDH R CDH R DLP.
Exercise 20.2.6. Prove Lemma 20.2.5.
Exercise 20.2.7. Definition 20.2.3 states that r is prime. Show that if (g, g a , g b , g c ) is
a quadruple of elements such that the order of g is n for some integer n where n has
some small factors (e.g., factors l | n such that l log2 (n)) then one can eliminate
some quadruples (g, g a , g b , g c ) G4 that are not valid DDH tuples by reducing to DDH
instances in subgroups of prime order. Show that this is enough to obtain a successful
DDH algorithm according to Definition 20.2.4.
20.2.2
n1
n2
i
K = cna
i+1 (mod n) ti+1 (mod n) ti+2 (mod n) ti+n1 (mod n) .
20.2.3
The result of Diffie-Hellman key exchange is a group element g ab . Typically this should
be transformed into an l-bit string for use as a symmetric key (where l < log2 (r)).
438
Definition 20.2.10. Let G be an algebraic group (or algebraic group quotient) and let l
be an integer. A key derivation function is a function kdf : G {0, 1}l . The output
distribution of a key derivation function is the probability distribution on {0, 1}l induced
by kdf(g) over uniformly distributed g G. A key derivation function is preimageresistant if there is no polynomial-time algorithm known that, on input x {0, 1}l ,
computes g G such that kdf(g) = x.
In general, a key derivation function should have output distribution statistically very
close to the uniform distribution on {0, 1}l. For many applications it is also necessary
that kdf be preimage-resistant.
A typical instantiation for kdf is to take a binary representation of K G, apply
a cryptographic hash function (see Chapter 3) to obtain a bit string, and concatenate/truncate as required. See the IEEE P1363 or ANSI X9.42 standards, Section 8
of Cramer and Shoup [160] or Section 6.1 of Raymond and Stiglic [492] for more details;
also see Section 3 of [46] for a specific key derivation function for elliptic curves.
20.3
In this section we present textbook Elgamal public key encryption.2 This is historically the first public key encryption scheme based on the discrete logarithm problem. As
we will see, the scheme has a number of security weaknesses and so is not recommended
for practical use. In Chapter 23 we will present secure methods for public key encryption
based on computational problems in cyclic groups.
We actually present two textbook versions of Elgamal. The first we call classic
textbook Elgamal as it is essentially the version that appears in [191]. It requires G to
be a group (i.e., we cannot use algebraic group quotients) and requires the message m
to be encoded as an element of G. Encoding messages as group elements is not difficult,
but it is un-natural and inconvenient. The second version, which we call semi-textbook
Elgamal, is more practical as it treats messages as bitstrings. As we will see, the security
properties of the two versions are slightly different.
For both schemes, denotes a security parameter (so that all attacks should require
at least 2 bit operations). Figure 20.1 gives classic textbook Elgamal and Figure 20.2
gives semi-textbook Elgamal. We call the sender Bob and the recipient Alice. Messages in
the former scheme are group elements and in the latter are l-bit strings, where l depends
on the security parameter. Semi-textbook Elgamal also requires a cryptographic hash
function H : G {0, 1}l where G is the group.
Remarks
1. Both versions of textbook Elgamal encryption are best understood as a static
Diffie-Hellman key exchange followed by symmetric encryption. By this we
mean that the sender (Bob) is essentially doing a Diffie-Hellman key exchange with
the recipient (Alice): he sends g k and Alices component is her fixed (i.e., static)
public key g a . Hence the shared key is g ak , which can then be used as a key
for any symmetric encryption scheme (this general approach is known as hybrid
encryption). The two variants of textbook Elgamal vary in the choice of symmetric
encryption scheme: the first uses the map m 7 mg ak from G to itself while the
second uses the map m 7 m H(g ak ) from {0, 1}l to itself.
2 Some authors write ElGamal and others write El Gamal. Reference [191] uses ElGamal, but
we follow the format apparently used nowadays by Elgamal himself.
439
KeyGen(): Run a parameter generation algorithm on security parameter that outputs an algebraic group G over a finite field Fq such that #G(Fq ) has a prime divisor r
and all known algorithms for the discrete logarithm problem in a subgroup of G(Fq ) of
order r require at least 2 bit operations.
Compute g G of prime order r.
Choose a random integer 0 < a < r and set h = g a . The public key is (G, g, h) and the
private key is a.
The message space is M = G.
The ciphertext space is C = G G.
Encrypt(m): (where m G).
Obtain the public key h of the recipient, Alice.
Choose a random 0 < k < r and set c1 = g k .
Set c2 = mhk .
Transmit the ciphertext (c1 , c2 ).
Decrypt(c1 , c2 ): Check that c1 , c2 G. If so, compute and output
m = c2 ca
1 .
Figure 20.1: Classic Textbook Elgamal Encryption.
2. Elgamal encryption requires two exponentiations in G and decryption requires one.
Hence encryption and decryption are polynomial-time and efficient.
3. Elgamal encryption is randomised, so encrypting the same message with the same
public key twice will yield two different ciphertexts in general.
4. Unlike RSA, all users in a system can share the same group G. So typically G and
g are fixed for all users, and only the value h = g a changes. Values that are shared
by all users are usually called system parameters.
20.4
We now briefly review the security properties for the textbook Elgamal cryptosystem.
First, note that the encryption algorithm should use a good pseudorandom number generator to compute the values for k. A simple attack when this is not the case is given in
Exercise 20.4.1.
Exercise 20.4.1. Suppose the random values k used by a signer are generated using the
linear congruential generator ki+1 = Aki + B (mod r) for some 1 A, B < r. Suppose
an adversary knows A and B and sees two classic textbook Elgamal ciphertexts (c1 , c2 )
and (c1 , c2 ), for the same public key, generated using consecutive outputs ki and ki+1 of
the generator. If both ciphertexts are encryptions of the same message then show how
the adversary can compute the message. If both ciphertexts are encryptions of different
messages then show how to decrypt both ciphertexts using one query to a decryption
oracle.
440
KeyGen(): Generate an algebraic group or algebraic group quotient G as in Figure 20.1. Choose a random g G of prime order r.
Choose a message size l and a cryptographic hash function H : G {0, 1}l.
Choose a random integer 0 < a < r and set h = g a . The public key is (G, H, g, h) and
the private key is a.
The message space is M = {0, 1}l.
The ciphertext space is C = G {0, 1}l.
Encrypt(m): (where m {0, 1}l).
Obtain the public key of the recipient, Alice.
Choose a random 0 < k < r and set c1 = g k .
Set c2 = m H(hk ).
Transmit the ciphertext (c1 , c2 ).
Decrypt(c1 , c2 ): Check that c1 G and c2 {0, 1}l. If so, compute and output
m = c2 H(ca1 ).
Figure 20.2: Semi-Textbook Elgamal Encryption.
20.4.1
Theorem 20.4.2. The computational problem of breaking OWE security of classic textbook Elgamal under passive attack is equivalent to CDH in hgi.
Proof: We prove the result only for perfect oracles. To prove OWE-CPA R CDH, let
A be a perfect oracle that solves CDH in the subgroup of order r in G. Call A(g, hA , c2 )
to get u and set m = c2 u1 .
To prove CDH R OWE-CPA let A be a perfect adversary that takes an Elgamal
public key (g, hA ) and an Elgamal ciphertext (c1 , c2 ) and returns the corresponding message m. We will use this to solve CDH. Let the CDH instance be (g, g1 , g2 ). Then choose
a random element c2 hgi and call A(g, g1 , g2 , c2 ) to get m. Return c2 m1 as the solution
to the CDH instance.
One can also consider a non-perfect adversary (for example, maybe an adversary
can only decrypt some proportion of the possible ciphertexts). It might be possible to
develop methods to self-correct the adversary using random self-reductions, but this is
considered to be the adversarys job. Instead, it is traditional to simply give a formula
for the success probability of the algorithm that breaks the computational assumption in
terms of the success probability of the adversary. In the context of Theorem 20.4.2, if
the adversary can decrypt with noticeable probability then we obtain a CDH algorithm
that is correct with probability .
Exercise 20.4.3. Prove OWE-CPA R CDH for semi-textbook Elgamal. Explain why
the proof CDH R OWE-CPA cannot be applied in this case.
20.4.2
We now show that both variants of textbook Elgamal do not have OWE security against
an adaptive (CCA) attacker (and hence not IND-CCA security either). Recall that such
441
an attacker has access to a decryption oracle that will decrypt every ciphertext except
the challenge.
Lemma 20.4.4. Let (c1 , c2 ) be a ciphertext for classic textbook Elgamal with respect to
the public key (G, g, h). Suppose A is a decryption oracle. Then under a CCA attack one
can compute the message corresponding to (c1 , c2 ).
Proof: Assume that A is perfect. Call A on the ciphertext (c1 , c2 g) 6= (c1 , c2 ) to obtain
a message m . Then the message corresponding to the original ciphertext is m = m g 1 .
More generally if A succeeds only with noticeable probability then we have a CCA2
attack that succeeds with noticeable probability .
Another version of this attack follows from Exercises 23.3.3 and 23.3.2.
Exercise 20.4.5. Show that semi-textbook Elgamal encryption does not have the OWE
security property under a CCA attack.
We have seen how a CCA attack can lead to an adversary learning the contents of
a message. Exercise 20.4.6 gives an example of a general class of attacks called small
subgroup attacks or invalid parameter attacks that can allow a CCA (even a CCA1)
adversary to obtain the private key of a user. Such attacks can be performed in many
scenarios. One example is when working in a prime order subgroup of Fp where p 1 has
many small factors. Another example is when using elliptic curves E : y 2 = x3 + a4 x + a6 ;
since the addition formula does not feature the value a6 one can pass an honest user a
point of small order on some curve E (Fp ). A related example is when using x-coordinate
only arithmetic on elliptic curves one can choose an x-coordinate corresponding to a point
that lies on the quadratic twist. For further discussion is given in Section 4.3 of [273] and a
summary of the history of these results is given in Section 4.7 of [273]. We stress that such
attacks do not only arise in encryption, but also in authenticated key exchange protocols,
undeniable signatures, etc. The general way to avoid such attacks is for all parties to test
membership of group elements in every step of the protocol (see Section 11.6).
Exercise 20.4.6. Show how a CCA1 attacker on classic textbook Elgamal can compute
ua for a group element u of their choice where a is the private key of a user. Show that if
this attack can be repeated for sufficiently many elements u of coprime small orders then
the private key a can be computed.
20.4.3
A serious problem with the classic textbook Elgamal cryptosystem is that, even though
encryption is randomised, it does not necessarily provide semantic security under passive
attacks.
Example 20.4.7. Consider the case G = Fp , M = G. Let g G have prime order r.
Then the Legendre symbol of g is ( gp ) = 1. Hence, the Legendre symbol of the message
m satisfies
c2
(m
p) =( p)
and so can be computed in polynomial-time from the public key and the ciphertext.
To prevent the attack in Example 20.4.7 one can restrict the message space to elements
of Fp with Legendre symbol 1. However, this attack is just a special case of a more general
phenomenon. The Legendre symbol is a homomorphism Fp G1 where G1 = {1, 1}
Fp is the subgroup of order 2. The attack can be performed for any homomorphism onto
a subgroup of order coprime to r (this is a slightly different application of the ideas of
Section 13.2).
442
Example 20.4.8. (Boneh, Joux and Nguyen [82]) Let p be a 3072-bit prime and let
r | (p 1) be a 256-bit prime. Let g Fp have order r. Suppose, in violation of the
description of classic textbook Elgamal in Section 20.3, one chooses the message space to
be
M = {1, 2, . . . , 232 1}
interpreted as a subset of Fp . We identify M with {0, 1}32 {0}. Let (c1 = g k , c2 = mhk )
be a challenge ciphertext for classic textbook Elgamal encryption, where m M. Then
cr2 = mr .
One expects that, with overwhelming probability, the 232 values mr are distinct, and
hence one can obtain m with at most 232 exponentiations in Fp .
Exercise 20.4.9. (Boneh, Joux and Nguyen [82]) Let p and r | (p 1) be prime and let
g Fp have order r. Suppose one uses classic textbook Elgamal with restricted message
space M = {0, 1}m {0} as in Example 20.4.8 where #M = 2m 1 < p/r. Extend
the attack of Example 20.4.8 using the baby-step-giant-step method, so that it requires
O(2m/2+ ) exponentiations in G to find m with noticeable probability, for > 0.
One way to avoid these attacks is to restrict the message space to hgi. It is then
intuitively clear that IND security under passive attacks depends on the decisional DiffieHellman problem.
Theorem 20.4.10. Classic textbook Elgamal with M = hgi has IND-CPA security if and
only if the DDH problem is hard.
Proof: (For perfect oracles.) First we show IND-CPA R DDH: Let A be an oracle to
solve DDH. Let (c1 , c2 ) be a ciphertext that is an encryption of either m0 or m1 . Call
A(g, c1 , hA , c2 m1
0 ) and if the answer is yes then the message is m0 and if the answer is
no then the message is m1 .
For the converse (i.e., DDH R IND-CPA of Elgamal): Let A be an oracle that
breaks indistinguishability of Elgamal. Then A takes as input a public key (g, h), a pair
of messages m0 , m1 and a ciphertext (c1 , c2 ) and outputs either 0 or 1. (We assume that A
outputs either 0 or 1 even if the ciphertext corresponds to neither message.) Given a DDH
instance (g, g1 , g2 , g3 ) we repeatedly do the following: choose two random messages m0
and m1 in hgi, choose a random i {0, 1}, and call A on the input (g, g1 , m0 , m1 , g2 , mi g3 ).
If A outputs i every time then we return yes as the answer to the DDH. If A only outputs
the correct answer i about half of the time, then we return no. To be sure the decryption
oracle is not just being lucky one should repeat the experiment (log(r)) times.
If the hash function is sufficiently good then one does not have to make as strong an
assumption as DDH to show that semi-textbook Elgamal encryption has IND security.
Instead, the IND security intuitively only depends on CDH. Theorem 20.4.11 is a basic
example of a security proof in the random oracle model (see Section 3.7 for background
on this model). We give the proof as it illustrates one of the ways the random oracle
model is used in theoretical cryptography.
Theorem 20.4.11. In the random oracle model, semi-textbook Elgamal encryption has
IND-CPA security if CDH is hard.
Proof: (Sketch) Let A be an adversary for the IND-CPA game on semi-textbook Elgamal
encryption. Let g, g a , g b be a CDH instance. We will describe a simulator S that will
solve the CDH problem using A as a subroutine.
First S runs the adversary A with public key (g, g a ).
443
The simulator must handle the queries made by A to the random oracle. To do this it
stores a list of hash values, initially empty. Let gi be the input for the ith hash query.
If gi = gj for some 1 j < i then we respond with the same value as used earlier. If not
then the simulator chooses uniformly at random an element Hi {0, 1}l, stores (gi , Hi )
in the list, and answers the query H(gi ) with Hi . This is a perfect simulation of a random
oracle, at least until the challenge ciphertext is issued below.
At some time A outputs a pair of messages m0 and m1 . The simulator sets c1 = g b ,
chooses c2 uniformly at random in {0, 1}l and responds with the challenge ciphertext
(c1 , c2 ). The adversary A may make further hash function queries (which are answered
using the algorithm above) and eventually A outputs b {0, 1} (of course A may crash,
or run for longer than its specified running time, in which case S treats this as the output
0).
The logic of the proof is as follows: If A never queries the random oracle H on g ab
then A has no information on H(g ab ) and so cannot determine whether the answer should
be 0 or 1. Hence, for A to succeed then one of the queries on H must have been on g ab .
Once this query is made then the simulator is seen to be fake as the adversary can check
that c2 is not equal to mb H(g ab ) for b {0, 1}. However, the simulator is not concerned
with this issue since it knows that g ab occurs somewhere in the list of hash queries.
The simulator therefore chooses a random index i and responds with gi as its solution
to the CDH instance.
Exercise 20.4.12. Fill the gaps in the proof of Theorem 20.4.11 and determine the exact
probability of success in terms of the success of the adversary and the number of queries
to the random oracle.
The power of the random oracle model is clear: we have been able to look inside
the adversarys computation.
Exercise 20.4.13. Prove the converse to Theorem 20.4.11.
Indeed, the same technique leads to a much stronger result.
Theorem 20.4.14. In the Random Oracle Model, semi-textbook Elgamal encryption has
OWE-CPA security if CDH is hard.
Exercise 20.4.15. Prove Theorem 20.4.14.
20.5
A discussion of security models for key exchange is beyond the scope of this book. We
refer to Bellare and Rogway [39], Bellare, Pointcheval and Rogaway [37], Bellare, Canetti
and Krawczyk [32], Canetti and Krawczyk [116], Shoup [550], Boyd and Mathuria [94] and
Menezes, van Oorschot and Vanstone [415] for details. However, as a rough approximation
we can consider three types of adversary:
Passive adversary (also called benign in [39]). This attacker obtains all messages
sent during executions of the key exchange protocol but does not modify or delete
any messages. This attacker is also called an eavesdropper.
Weak3 active adversary. This attacker obtains all messages sent during executions
of the key exchange protocol and can modify or delete messages. This attacker can
also initiate protocol executions with any player.
3 This
444
Active adversary. This is as above, but the attacker is allowed to corrupt any honest
player who has completed an execution of the protocol and thus obtain the agreed
key.
There are two possible goals of an adversary:
To obtain the shared session key.
To distinguish the session key from a random key. To make this notion more precise
consider a game between an adversary and a challenger. The challenger performs
one or more executions of the key exchange protocol and obtains a key K. The
challenger also chooses uniformly at random a key K from the space of possible
session keys. The challenger gives the adversary either K or K (with probability
1/2). The adversary has to decide whether the received key is K or not. This is
called real or random security.
The Diffie-Hellman key exchange protocol is vulnerable to a person-in-the-middle attack. Unlike similar attacks on public key encryption, the attacker in this case does not
need to replace any users public keys.
Imagine that an adversary Eve can intercept all communication between Alice and
Bob. When Alice sends c1 = g a to Bob, Eve stores c1 and sends g e to Bob, for some
random integer e known to Eve. Similarly, when Bob sends c2 = g b to Alice, Eve stores
c2 and sends g e to Alice. Alice computes the key g ae and Bob computes the key g be . Eve
can compute both keys. If Alice later sends an encrypted message to Bob using the key
g ae then Eve can decrypt it, read it, re-encrypt using the key g be , and forward to Bob.
Hence Alice and Bob might never learn that their security has been compromised.
One way to overcome person-in-the-middle attacks is for Alice to send a digital signature on her value g a (and similarly for Bob). As long as Alice and Bob each hold
authentic copies of the others public keys then this attack fails. Note that this solution
does not prevent all attacks on the Diffie-Hellman key exchange protocol.
Another solution is given by authenticated key exchange protocols such as STS, KEA,
MTI, MQV, etc (see Chapter 11 of Stinson [588] and the references listed earlier).
We illustrate the basic idea behind most protocols of this type using the MTI/A0
protocol: Alice and Bob have public keys hA = g a and hB = g b . We assume that Alice
and Bob have authentic copies of each others public keys. They perform Diffie-Hellman
key exchange in the usual way (Alice sends g x and Bob sends g y ). Then the value agreed
by both players is
g ay+bx .
Exercise 20.5.1. Explain why the person-in-the-middle attack fails for this protocol
(assuming the public key authentication process is robust).
Exercise 20.5.2. Consider a key exchange protocol where Alice and Bob have public
keys hA = g a and hB = g b , where Alice sends g x and Bob sends g y and where the shared
key is g ab+xy . Show that if corrupt queries are allowed then this key exchange protocol
does not provide authentication.
Exercise 20.5.3. Give a person-in-the-middle attack on the Burmester-Desmedt protocol.
20.6
445
All cryptographic protocols whose security is related to the DLP involve computations
of the form g a at some stage, and this is usually the most demanding computation in
terms of time and computing resources. To make the cryptosystem fast it is natural to
try to speed up exponentiation. One could try working in a smaller group, however it is
important to ensure that the security of the system is maintained. Indeed, many of the
main topics in this book (e.g., tori, elliptic curves and hyperelliptic curves) are attempts
to get the most efficient group for a given security level.
A number of methods to speed up exponentiation in certain groups have already been
presented. Section 11.1 discussed signed expansions, which are suitable for groups (such
as elliptic and hyperelliptic curves or tori) where inversion is very efficient. Section 11.3
presented Frobenius expansions and the GLV method, which are suitable for elliptic
curves. Those methods all assume that the exponent a takes any value.
One can also consider methods that do not correspond to values a chosen uniformly at
random. Such methods can be much faster than the general methods already mentioned,
but understanding the security implications can be more complicated. We do not have
space to describe any of these methods in detail, but we briefly mention some of them.
1. Choose a to have low Hamming weight. This is mentioned by Agnew, Mullin,
Onyszchuk and Vanstone [5] and Schnorr [519].
2. Choose a to be a random Frobenius expansion of low Hamming weight. This is
credited to H. W. Lenstra Jr. in Section 6 of Koblitz [344].
3. Choose a to be given by a random addition chain (or addition-subtraction chain).
This is proposed in Section 3.3 of Schroeppel, Orman, OMalley and Spatscheck [528].
4. Choose a to be a product of integers of low Hamming weight. This was proposed
and analysed by Hoffstein and Silverman [289].
5. Choosing a to be a random element in GLV representation, possibly with smaller
than typical coefficients.
6. Generate random elements using large amounts of precomputation. A solution that
can be used in any group is given by Boyko, Peinado and Venkatesan [96]. The
method requires precomputing and storing random powers gj = g aj . One generates
a
aj
a random
P pair (a, g ) by taking the product of a random subset of the g and setting
a = aj (mod r). This method is presented as the simple solution in [151].
A more sophisticated method for Koblitz curves is given by Coron, MRahi and
Tymen [151]. They use repeated application of sparse Frobenius expansions on
elements of the precomputed table. They also give a security analysis.
Chapter 21
21.1
We present some computational problems related to CDH, and prove reductions among
them. The main result is to prove that CDH and Fixed-CDH are equivalent. Most of the
results in this section apply to both algebraic groups (AG) and algebraic group quotients
(AGQ) of prime order r (some exceptions are Lemma 21.1.9, Lemma 21.1.16 and, later,
Lemma 21.3.1). For the algebraic group quotients G considered in this book then one can
obtain all the results by lifting from the quotient to the covering group G and applying
the results there.
A subtle distinction is whether the base element g G is considered fixed or variable
in a CDH instance. To a cryptographer it is most natural to assume the generator is
fixed, since that corresponds to the usage of cryptosystems in the real world (the group
G and element g G are fixed for all users). Hence, an adversary against a cryptosystem
leads to an oracle for a fixed generator problem. To a computational number theorist it
is most natural to assume the generator is variable, since algorithms in computational
number theory usually apply to all problem instances. Hence both problems are studied
in the literature and when an author writes CDH it is sometimes not explicit which of
the variants is meant. Definition 20.2.1 was for the case when g varies. Definition 21.1.1
below is the case when g is fixed. This issue is discussed in Section 5 of Shoup [550] and
447
448
g1a
= (g a )a
= ga
as required.
449
Proof: Let (g, g a , g b ) be a CDH instance. Let O be a perfect oracle for Square-DH. Call
2
2
2
2
O(g, g a ) to get g1 = g a , O(g, g b ) to get g2 = g b and O(g, g a g b ) to get g3 = g a +2ab+b .
Now compute
1
(g3 /(g1 g2 ))2 (mod r) ,
which is g ab as required.
Exercise 21.1.10. Let G be a group of prime order r. Show that Inverse-DH and SquareDH are random self-reducible. Hence give a self-corrector for Square-DH. Finally, show
that Lemma 21.1.9 holds for non-perfect oracles. (Note that it seems to be hard to give
a self-corrector for Inverse-DH directly, though one can do this via Lemma 21.1.8.)
Note that the proofs of Lemmas 21.1.5 and 21.1.8 require oracle queries where the
first group element in the input is not g. Hence, these proofs do not apply to variants of
these problems where g is fixed. We now define the analogous problems for fixed g and
give reductions between them.
Definition 21.1.11. Let g have prime order r and let G = hgi. The computational
1
problem Fixed-Inverse-DH is: given g a 6= 1 to compute g a (mod r) . Similarly, the
2
a
computational problem Fixed-Square-DH is: given g to compute g a .
Exercise 21.1.12. Show that Fixed-Inverse-DH and Fixed-Square-DH are random selfreducible.
Lemma 21.1.13. Let g G. Let A be a perfect Fixed-CDH oracle. Let h = g a and let
n
n N. Then one can compute g a (mod r) using 2 log2 (n) queries to A.
i
w = g (1+a) g (1a)
1
2 1
= g 2(1a
.
2
450
is correct with probability 1 1/(log(r)c ) for some constant c ; by Theorem 21.3.8 this
c
requires O(log(r) log log(r)) oracle queries.
Exercise 21.1.19. It was assumed throughout this section that G has prime order r.
Suppose instead that G has order r1 r2 where r1 and r2 are odd primes and that g is
a generator for G. Which of the results in this section no longer necessarily hold? Is
Fixed-CDH in hgi equivalent to Fixed-CDH in hg r1 i?
We end with a variant of the DDH problem.
Exercise 21.1.20. Let g have prime order r and let {x1 , . . . , xn } Z/rZ. For a subset
A {1, . . . , n} define
Q
g A = g iA xi .
The group decision Diffie-Hellman problem (GDDH) is: Given g, g A for all proper
subsets A ( {1, . . . , n}, and h, to distinguish h = g c (where c Z/rZ is chosen uniformly
at random) from g x1 x2 xn . Show that GDDH DDH.
21.2
We have seen (Theorem 13.4.5) that a generic algorithm requires ( r) group operations
to solve the DLP in a group of order r. Shoup proved an analogue of this result for CDH.
As before, fix t R>0 and assume that all group elements are represented by bitstrings
of length at most t log(r).
Theorem 21.2.1. Let G be a cyclic group of prime order r. Let A be a generic algorithm for CDH in G that makes at most m oracle queries. Then the probability that
A((g), (g a ), (g b )) = (g ab ) over a, b Z/rZ and an encoding function : G S
{0, 1}t log(r) chosen uniformly at random is O(m2 /r).
Proof: The proof is almost identical to the proof of Theorem 13.4.5. Let S = {0, 1}t log(r) .
The simulator begins by uniformly choosing three distinct 1 , 2 , 3 in S and running
A(1 , 2 , 3 ). The encoding function is then specifed at the two points 1 = (g) and
2 = (h). From the point of view of A, g and h are independent distinct elements of G.
451
It is necessary to ensure that the encodings are consistent with the group operations.
This cannot be done perfectly without knowledge of a and b, but using polynomials as
previously ensures there are no trivial inconsistencies. The simulator maintains a list
of pairs (i , Fi ) where i S and Fi Fr [x, y] (indeed, the Fi (x, y) will always be linear).
The initial values are (1 , 1), (2 , x) and (3 , y). Whenever A makes an oracle query on
(i , j ) the simulator computes F = Fi Fj . If F appears as Fk in the list of pairs then
the simulator replies with k and does not change the list. Otherwise, an element S,
distinct from the previously used values, is chosen uniformly at random, (, F ) is added
to the simulators list, and is returned to A.
After making at most m oracle queries, A outputs 4 Z/rZ. The simulator now
chooses a and b uniformly at random in Z/rZ. Algorithm A wins if 4 = (g ab ). Note
that if 4 is not 1 , 2 or one of the strings output by the oracle then the probability of
success is at most 1/(2t log(r) m 2). Hence we assume that 4 is on the simulators
list.
Let the simulators list contain precisely k polynomials {F1 (x, y), . . . , Fk (x, y)} for
some k m + 3. Let E be the event that Fi (a, b) = Fj (a, b) for some pair 1 i < j k
or Fi (a, b) = ab. The probability that A wins is
Pr(A wins |E) Pr(E) + Pr(A wins |E) Pr(E).
(21.1)
For each pair 1 i < j k the probability that (Fi Fj )(a, b) = 0 is 1/r by Lemma 13.4.4.
Similarly, the probability that Fi (a, b) ab = 0 is 2/r. Hence, the probability of event E
is at most k(k + 1)/2r + 2k/r = O(m2 /r). On the other hand, if event E does not occur
then all A knows about (a, b) is that it lies in the set
X = {(a, b) (Z/rZ)2 : Fi (a, b) 6= Fj (a, b) for all 1 i < j k and Fi (a, b) 6= ab for all 1 i k}.
Let N = #X r2 m2 /2 Then Pr(E) = N/r2 and Pr(A wins |E) = 1/N .
Hence, the probability that A wins is O(m2 /r).
21.3
We defined random self-reducibility in Section 2.1.4. Lemma 2.1.19 showed that the
DLP in a group G of prime order r is random self-reducible. Lemma 2.1.20 showed how
to obtain an algorithm with arbitrarily high success probability for the DLP from an
algorithm with noticeable success probability.
Lemma 21.3.1. Let g have order r and let G = hgi. Then CDH in G is random selfreducible.
Proof: Let X = (G {1}) G2 Let (g, h1 , h2 ) = (g, g a , g b ) X be the CDH instance.
Choose uniformly at random 1 u < r and 0 v, w < r and consider the triple
(g u , hu1 g uv , hu2 g uw ) = (g u , (g u )a+v , (g u )b+w ) X . Then every triple in X arises from
exactly one triple (u, v, w). Hence, the new triples are uniformly distributed in X . If
Z = (g u )(a+v)(b+w) is the solution to the new CDH instance then the solution to the
original CDH instance is
1
v vw
.
Z u (mod r) hw
1 h2 g
Exercise 21.3.2. Show that Fixed-CDH is random self-reducible in a group of prime
order r.
452
Static-DH problem seems to have been first studied by Brown and Gallant [111].
453
It is easy to turn a DLP oracle that succeeds with noticeable probability into one that
succeeds with probability arbitrarily close to 1, since one can check whether a solution
to the DLP is correct. It is less easy to amplify the success probability for a non-perfect
CDH oracle.
A natural (but flawed) approach is just to run the CDH oracle on random self-reduced
instances of CDH until the same value appears twice. We now explain why this approach
will not work in general. Consider a Fixed-CDH oracle that, on input (g a , g b ), returns
g ab+ where Z is uniformly chosen between 1/ log(r) and 1/ log(r). Calling the
oracle on instances arising from the random self-reduction of Exercise 21.3.2 one gets a
sequence of values g ab+ . Eventually the correct value g ab will occur twice, but it is quite
likely that some other value will occur twice before that time.
We present Shoups self-corrector for CDH or Fixed-CDH from [549].2 Also see Cash,
Kiltz and Shoup [120].
Theorem 21.3.8. Fix l N. Let g have prime order r. Let A be a CDH (resp.
Fixed-CDH) oracle with success probability at least > log(r)l . Let (g, g a , g b ) be a
CDH instance. Let 1 > > 1/r. Then one can obtain an oracle that solves the CDH
(resp. Fixed-CDH) with probability at least 1 log(2r)2 /(r2 ) and that makes at most
2log(2/ )/ queries to A (where log is the natural logarithm).
Proof: Define c = log(2/ ) R so that ec = /2. First call the oracle n = c/ times
on random-self-reduced instances (if the oracle is a CDH oracle then use Lemma 21.3.1 and
if the oracle is a Fixed-CDH oracle then use Exercise 21.3.2) of the input problem (g, g a , g b )
and store the resulting guesses Z1 , . . . , Zn for g ab in a list L1 . Note that n = O(log(r)l+1 ).
The probability that L1 contains at least one copy of g ab is 1 (1 )c/ 1 ec =
1 /2.
Now choose uniformly at random integers 1 s1 , s2 < r and define X2 = g s1 /(g a )s2 .
One can show that X2 is uniformly distributed in G = hgi and is independent of X1 = g a .
Call the oracle another n times on random-self-reduced versions of the CDH instance
(g, X2 , g b ) and store the results Z1 , . . . , Zn in a list L2 .
Hence, with probability (1 /2)2 1 there is some Zi L1 and some Zj L2
such that Zi = g ab and Zj = g b(s1 as2 ) . For each 1 i, j n test whether
Zis2 = (g b )s1 /Zj .
(21.2)
If there is a unique solution (Zi , Zj ) then output Zi , otherwise output . Finding Zi can
be done efficiently by sorting L1 and then, for each Zj L2 , checking whether the value
of the right hand side of equation (21.2) lies in L1 .
We now analyse the probability that the algorithm fails. The probability there is no
pair (Zi , Zj ) satisfying equation (21.2), or that there are such pairs but none of them have
Zi = g ab , is at most . Hence, we now assume that a good pair (Zi , Zj ) exists and we
want to bound the probability that there is a bad pair (i.e., a solution to equation (21.2)
(21.4)
(Z/Y a )s1 = Y a /Z .
2 Maurer and Wolf [402] were the first to give a self-corrector for CDH, but Shoups method is more
efficient.
454
If precisely one of Z = Y a or Z = Y a holds then this equation does not hold. Hence,
Z 6= Y a and Z 6= Y a , in which case there is precisely one value for s1 for which
equation (21.4) holds. Considering all n2 pairs (Z, Z ) L1 L2 it follows there are at
most n2 values for s1 , which would lead to an incorrect output for the self-corrector. Since
s1 is chosen uniformly at random the probability of an incorrect output is at most n2 /r.
Since n log(2r)/ one gets the result. Note that log(2r)2 /(r2 ) = O(log(r)2+2l /r).
Exercise 21.3.9. Extend Lemma 21.1.13 to the case of a non-perfect Fixed-CDH oracle.
What is the number of oracle queries required?
21.4
The goal of this section is to discuss reductions from DLP to CDH or Fixed-CDH in groups
of prime order r. Despite having proved that Fixed-CDH and CDH are equivalent, we
prefer to treat them separately in this section. The first such reduction (assuming a
perfect Fixed-CDH oracle) was given by den Boer [168] in 1988. Essentially den Boers
method involves solving a DLP in Fr , and so it requires r 1 to be sufficiently smooth.
Hence there is no hope of this approach giving an equivalence between Fixed-CDH and
DLP for all groups of prime order.
The idea was generalised by Maurer [399] in 1994, by replacing the multiplicative
group Fr by an elliptic curve group E(Fr ). Maurer and Wolf [402, 403, 405] extended
the result to non-perfect oracles. If #E(Fr ) is sufficiently smooth then the reduction is
efficient. Unfortunately, there is no known algorithm to efficiently generate such smooth
elliptic curves. Hence Maurers result also does not prove equivalence between Fixed-CDH
and DLP for all groups. A subexponential-time reduction that conjecturally applies to
all groups was given by Boneh and Lipton [83]. An exponential-time reduction (but
still faster than known algorithms to solve DLP) that applies to all groups was given by
Muzereau, Smart and Vercauteren [445], and Bentahar [42, 43].
21.4.1
Implicit Representations
Definition 21.4.1. Let G be a group and let g G have prime order r. For a Z/rZ
we call h = g a an implicit representation of a.
In this section we call the usual representation of a Z/rZ the explicit representation of a.
Lemma 21.4.2. There is an efficient (i.e., computable in polynomial-time) mapping from
Z/rZ to the implicit representations of Z/rZ. One can test equality of elements in Z/rZ
given in implicit representation. If h1 is an implicit representation of a and h2 is an
implicit representation of b then h1 h2 is an implicit representation of a + b and h1
1 is an
implicit representation of a.
In other words, we can compute in the additive group Z/rZ using implicit representations.
Lemma 21.4.3. If h is an implicit representation of a and b Z/rZ is known explicitly,
then hb is an implicit representation of ab.
Let O be a perfect Fixed-CDH oracle with respect to g. Suppose h1 is an implicit
representation of a and h2 is an implicit representation of b. Then h = O(h1 , h2 ) is an
implicit representation of ab.
455
In other words, if one can solve Fixed-CDH then one can compute multiplication
modulo r using implicit representatives.
Exercise 21.4.4. Prove Lemmas 21.4.2 and 21.4.3.
Lemma 21.4.5. Let g have order r. Let h1 be an implicit representation of a such that
h1 6= 1 (in other words, a 6 0 (mod r)).
1. Given a perfect CDH oracle one can compute an implicit representation for a1 (mod r)
using one oracle query.
2. Given a perfect Fixed-CDH oracle with respect to g one can compute an implicit
representation for a1 (mod r) using 2 log2 (r) oracle queries.
1
Proof: Given a perfect CDH oracle A one calls A(g a , g, g) = g a (mod r) . Given a perfect
r2
(mod r)
Fixed-CDH oracle one computes g a
as was done in Lemma 21.1.15.
21.4.2
We now present the den Boer reduction [168], which applies when r 1 is smooth. The
crucial idea is that the Pohlig-Hellman and baby-step-giant-step methods only require the
ability to add, multiply and compare group elements. Hence, if a perfect CDH oracle is
given then these algorithms can be performed using implicit representations.
Theorem 21.4.6. Let g G have prime order r. Suppose l is the largest prime factor of
r 1. Let A be a perfect oracle for the Fixed-CDH problem with respect to g. Then one
can solve the DLP in hgi using O(log(r)
log(log(r))) oracle queries, O(log(r)( l/ log(l) +
log(r)) multiplications in Fr and O( l log(r)2 / log(l)) operations in G (where the constant
implicit in the O() does not depend on l).
Proof: Let the challenge DLP instance be g, h = g a . If h = 1 then return a = 0.
Hence, we now assume 1 a < r. We can compute a primitive root Fr in
O(log(r) log(log(r))) operations in Fr (see Section 2.15). The (unknown) logarithm of
h satisfies
a u (mod r)
(21.5)
for some integer u. To compute a it is sufficient to compute u.3 The idea is to solve the
DLP in equation (21.5) using the implicit representation of a. Since r 1 is assumed to
be smooth then we can use the Pohlig-Hellman (PH) method, followed by the baby-stepgiant-step (BSGS)Qmethod in each subgroup. We briefly sketch the details.
n
Write r 1 = i=1 liei where the li are prime. The PH method involves projecting a
and into the subgroup of Fr of order liei . In other words, we must compute
e
(r1)/l i
i
hi = g a
for 1 i n. Using the Fixed-CDH oracle to perform computations in implicit representation, Algorithm 4 computes all the hi together in O(log(r) log log(r)) oracle queries.4 A
3 It
may seem crazy to try to work out u without knowing a, but it works!
2.15.9 does not lead to a better bound, since the value n (which is m in the notation of that
remark) is not necessarily large.
4 Remark
456
i
further O(log(r)) oracle queries are required to compute all g a
where 0 f < ei .
e
(r1)/li i
Similarly one computes all xi =
in O(log(r) log log(r)) multiplications in Fr .
We then have
e
u (mod l i )
i
hi = g x i
Pn
Following Section 13.2 one reduces these problems to i=1 ei instances of the DLP in
groups of prime order li . This requires O(log(r)2 ) group operations and field operations
overall (corresponding to the computations in line 6 of Algorithm 13).
u
For the baby-step-giant-step algorithm, suppose we wish to solve g a = g (where,for
simplicity, we redefine a and so that they now have order l modulo r). Set m = l
and write u = u0 + mu1 where 0 u0 , u1 < m. From
u
ga = g = g
one has
(g a )
u0 +mu1
( m )u1
= g
u0
( m )u1
(21.6)
u0
(21.7)
= g .
i
We compute and store (in a sorted structure) the baby steps g for i = 0, 1, 2, . . . , m 1
i+1
i
(this involves computing one exponentiation in G at each step, as g
= (g ) , which
is at most 2 log2 (r) operations in G).
mj
We then compute the giant steps (g a )
. This involves computing w0 = m (mod r)
mj
and then the sequence wj =
(mod r) as wj+1 = wi w0 (mod r); this requires
O(log(m) + m) multiplications in Fr . We also must compute (g a )wj , each of which requires 2 log2 (r) operations in G.
When we find a match then we have solved
the DLP in the subgroup of order l.The
BSGS algorithm for each prime l requires O( l log(r)) group operations and O( l +
log(r)) operations in Fr . There are O(log(r)) primes l for which the BSGS must be run,
but a careful analysis of the
an overall
cost (using the result of Exercise 13.2.7) gives
running time of O(log(r)2 l/ log(l)) group operations and O(log(r)2 + log(r) l/ log(l))
multiplications in Fr . Note that the CDH oracle is not required for the BSGS algorithm.
Once u is determined modulo all prime powers le | (r 1) one uses the Chinese
remainder theorem to compute u Z/(r 1)Z. Finally, one computes a = u (mod r).
These final steps require O(log(r)) operations in Fr .
Corollary 21.4.7. Let A() be an algorithm that outputs triples (g, h, r) such that r is
a -bit prime, g has order r, r 1 is O(log(r)2 )-smooth, and h hgi. Then DLP R
Fixed-CDH for the problem instances output by A.
Proof: Suppose one has a perfect Fixed-CDH oracle. Putting l = O(log(r)2 ) into Theorem 21.4.6 gives a reduction with O(log(r) log log(r)) oracle queries and O(log(r)3 ) group
and field operations.
The same results trivially hold if one has a perfect CDH oracle.
Exercise 21.4.8. Determine the complexity in Theorem 21.4.6 if one has a Fixed-CDH
oracle that only succeeds with probability .
Cherepnev [134] iterates the den Boer reduction to show that if one has an efficient
CDH algorithm for arbitrary groups then one can solve DLP in a given group in subexponential time. This result is of a very different flavour to the other reductions in this
chapter (which all use an oracle for a group G to solve a computational problem in the
same group G) so we do not discuss it further.
21.4.3
457
The den Boer reduction can be seen as solving the DLP in the algebraic group Gm (Fr ),
performing all computations using implicit representation. Maurers idea was to replace
Gm (Fr ) by any algebraic group G(Fr ), in particular the group of points on an elliptic
curve E(Fr ). As with Lenstras elliptic curve factoring method, even when r 1 is not
smooth then there might be an elliptic curve E such that E(Fr ) is smooth.
When one uses a general algebraic group G there are two significant issues that did
not arise in the den Boer reduction.
The computation of the group operation in G may require inversions. This is true
for elliptic curve arithmetic using affine coordinates.
Given h = g a one must be able to compute an element P G(Fr ), in implicit
representation, such that once P has been determined in explicit representation one
can compute a. For an elliptic curve E one could hope that P = (a, b) E(Fr ) for
some b Fr .
Before giving the main result we address the second of these issues. In other words,
we show how to embed a DLP instance into an elliptic curve point.
Lemma 21.4.9. Let g have prime order r and let h = g a . Let E : y 2 = x3 + Ax+ B be an
affine elliptic curve over Fr . Given a perfect Fixed-CDH oracle there is an algorithm that
outputs an implicit representation (g X , g Y ) of a point (X, Y ) E(Fr ) and some extra
data, and makes an expected O(log(r)) oracle queries and performs an expected O(log(r))
group operations in hgi. Furthermore, given the explicit value of X and the extra data
one can compute a.
Proof: The idea is to choose uniformly at random 0 < r and set X = a + .
An implicit representation of X can be computed as h1 = hg using O(log(r)) group
operations. If we store then, given X, we can compute a. Hence, the extra data is .
Given the implicit representation for X one determines an implicit representation for
= X 3 +AX +B using two oracle queries. Given g one can compute (here ( r ) {1, 1}
is the Legendre symbol)
h2 = g ( r ) = g
(r1)/2
(21.8)
Since there are at least (r 2 r)/2 possible x-coordinates of points in E(Fr ) it follows
that if one chooses X uniformly at random in Fr then the expected number of trials until
X is the x-coordinate of a point in E(Fr ) is approximately two.
Once is a square modulo r then one can compute an implicit representation for Y =
(mod r) using the Tonelli-Shanks algorithm with implicit representations. We use
the notation of Algorithm 3. The computation of the non-residue n is expected to require
O(log(r)) operations in Fr and can be done explicitly. The computation of the terms w and
b requires O(log(r)) oracle queries, some of which can be avoided by storing intermediate
values from the computation in equation (21.8). The computation of i using a Pohlige1
Hellman-style algorithm is done as follows. First compute the sequence b, b2 , . . . , b2
e1
using O(log(r)) oracle queries and the sequence y, y 2 , . . . , y 2
using O(log(r)) group
operations. With a further O(log(r)) group operations one can determine the bits of i.
Theorem 21.4.10. Let B N. Let g G have order r. Let E be an elliptic curve over
Fr such that E(Fr ) is a cyclic group. Suppose that the order of E(Fr ) is known and is
458
B-smooth. Given a perfect Fixed-CDH oracle with respect to g one can solve the DLP in
hgi using an expected O(log(r)2 log(log(r))) oracle queries.5
Indeed, there are two variants of the reduction, one using exhaustive search and one
using the baby-step-giant-step algorithm. One can also consider the case of a perfect
CDH oracle. The following table gives the full expected complexities (where the constant
implicit in the O() is independent of B). We use the abbreviation l(x) = log(x), so that
l(l(r)) = log(log(r)).
Oracle
Reduction
Oracle queries
Group operations
Fr operations
2
2
2
Fixed-CDH
PH only
O(l(r)
l(l(r)))
O(Bl(r)
/l(B))
O(Bl(r)
/l(B))
2
2
2
2
Fixed-CDH PH+BSGS O( Bl(r) /l(B) + l(r) l(l(r))) O( Bl(r) /l(B)) O( Bl(r) /l(B))
2
2
CDH
PH only
O(Bl(r)
/l(B))
O(Bl(r)
/l(B))
O(l(r)l(l(r)))
2
CDH
PH+BSGS
O( Bl(r)/l(B) + l(r)l(l(r)))
O( Bl(r) /l(B)) O( Bl(r)2 /l(B))
Qk
Proof: Let the discrete logarithm instance be (g, h = g a ). Write N = #E(Fr ) = i=1 liei .
We assume that affine coordinates are used for arithmetic in E(Fr ). Let P be a generator
of E(Fr ).
The reduction is conceptually the same as the den Boer reduction. One difference is
that elliptic curve arithmetic requires inversions (which are performed using the method
of Lemma 21.1.13 and Lemma 21.1.15), hence the number of Fixed-CDH oracle queries
must increase. A sketch of the reduction in the case of exhaustive search is given in
Algorithm 27.
The first step is to use Lemma 21.4.9 to associate with h the implicit representations
of a point Q E(Fr ). This requires an expected O(log(r)) oracle queries and O(log(r))
group operations for all four variants. Then Q hP i where P is the generator of the
cyclic group E(Fr ).
The idea is again to use Pohlig-Hellman (PH) and baby-step-giant-step (BSGS) to
solve the discrete logarithm of Q with respect to P in E(Fr ). If we can compute an
integer u such that Q = [u]P (with computations done in implicit representation) then
computing [u]P and using Lemma 21.4.9 gives the value a explicitly.
First we consider the PH algorithm. As with the den Boer reduction, one needs to
compute explicit representations (i.e., standard affine coordinates) for [N/liei ]P and implicit representations for [N/liei ]Q. It is possible that [N/liei ]Q = OE so this case must be
handled. As in Section 2.15.1, computing these points requires O(log(r) log log(r)) elliptic
curve operations. Hence, for the multiples of P we need O(log(r) log log(r)) operations in
Fr while for the multiples of Q we need O(log(r)2 log log(r)) Fixed-CDH oracle queries and
O(log(r) log log(r)) group operations. (If a CDH oracle is available then this stage only
requires O(log(r) log log(r)) oracle queries, as an inversion in implicit representation can
be done with a single CDH oracle query.) Computing the points [N/lif ]P for 1 f < ei
P
and all i requires at most a further 2 ki=1 ei log2 (li ) = 2 log2 (N ) = O(log(r)) group
operations. Similarly, computing the implicit representations of the remaining [N/lif ]Q
requires O(log(r)2 ) Fixed-CDH oracle queries and O(log(r)) group operations.
The computation of ui P0 in line 8 of Algorithm 27 requires O(log(r)) operations in
Fr followed by O(1) operations in G and oracle queries.
The exhaustive search algorithm for the solution to the DLP in a subgroup of prime
order li is given in lines 9 to 16 of Algorithm 27. The point P0 in line 8 has already been
computed, and computing Q0 requries only one elliptic curve addition (i.e., O(log(r))
Fixed-CDH oracle queries). The while loop in line 12 runs for B iterations, each
iteration involves a constant number of field operations to compute T + P0 followed by
5 This
459
the giant
steps require O( B) field operations, O( B log(r)) group operations and O( B log(r))
Fixed-CDH
Pkoracle queries.
Since
i=1 ei log2 (N ) the exhaustive search or BSGS subroutine is performed
O(log(r)) times. A more careful analysis using Exercise 13.2.7 means the complexity
is multiplied by log(r)/ log(B). The Chinese remainder theorem and later stages are
negligible. The result follows.
Algorithm 27 Maurer reduction
Input: g, h = g a , E(Fr )
Output: a
1: Associate to h an implicit representation for a point Q = (X, Y ) E(Fr ) using
Lemma 21.4.9
Qk ei
2: Compute a point P E(Fr ) that generates E(Fr ). Let N = #E(Fr ) =
i=1 li
j
3: Compute explicit representations of {[N/li ]P : 1 i k, 1 j ei }
j
4: Compute implicit representations of {[N/li ]Q : 1 i k, 1 j ei }
5: for i = 1 to k do
6:
ui = 0
7:
for j = 1 to ei do
Reducing DLP of order liei to cyclic groups
j
8:
Let P0 = [N/li ]P and Q0 = [N/lij ]Q ui P0
9:
if Q0 6= OE then
10:
Let (h0,x , h0,y ) be the implicit representation of Q0
11:
P0 = [N/li ]P0 , n = 1, T = P0 = (xT , yT )
Exhaustive search
12:
while h0,x 6= g xT or h0,y 6= g yT do
13:
n = n + 1, T = T + P0
14:
end while
15:
ui = ui + nlj1
16:
end if
17:
end for
18: end for
e
19: Use Chinese remainder theorem to compute u ui (mod li i ) for 1 i k
20: Compute (X, Y ) = [u]P and hence compute a
21: return a
Remark 21.4.11. We have seen that reductions involving a Fixed-CDH oracle are less
efficient (i.e., require more oracle queries) than reductions using a CDH oracle. A solution6
to this is to work with projective coordinates for elliptic curves. Line 12 of Algorithm 27
tests whether the point Q0 given in implicit representation is equal to the point (xT , yT )
given in affine representation. When Q0 = (x0 : y0 : z0 ) then the test h0,x = g xT in line
6 This
idea is briefly mentioned in Section 3 of [399], but was explored in detail by Bentahar [42].
460
xT
Hence the number of oracle queries in the first line of the table in Theorem 21.4.10 can
be reduced to O(log(r) log log(r)). As mentioned in Remark 13.3.2, one cannot use the
BSGS algorithm with projective coordinates, as the non-uniqueness of the representation
means one cant efficiently detect a match between two lists.
Exercise 21.4.12. Generalise the Maurer algorithm to the case where the group of
points on the elliptic curve is not necessarily cyclic. Determine the complexity if l1 is the
largest prime for which E(Fr )[l1 ] is not cyclic and l2 is the largest prime dividing #E(Fr )
for which E(Fr )[l2 ] is cyclic.
Exercise 21.4.13. If r + 1 is smooth then one can use the algebraic group G2,r
= T2 (Fr )
(see Section 6.3) instead of Gm (Fr ) or E(Fr ). There are two approaches: the first is to
use the usual representation {a + b Fr2 : NFr2 /Fr (a + b) = 1} for G2,r and the second
is to use the representation A1 (Fr ) for T2 (Fr ) {1} corresponding to the map decomp2
from Definition 6.3.7. Determine the number of (perfect) oracle queries in the reductions
from Fixed-CDH to DLP for these two representations. Which is better? Repeat the
exercise when one has a CDH oracle.
Corollary 21.4.14. Let c R>1 . Let (Gn , gn , rn ) be a family of groups for n N
where gn Gn has order rn and rn is an n-bit prime. Suppose we are given auxiliary
elliptic curves (En , Nn ) for the family, where En is an elliptic curve over Frn such that
#En (Frn ) = Nn and Nn is O(log(rn )c )-smooth. Then the DLP in hgn i is equivalent to
the Fixed-CDH problem in hgn i.
Exercise 21.4.15. Prove Corollary 21.4.14.
We now state the conjecture of Maurer and Wolf that all Hasse intervals contain
a polynomially
smoothinteger. Define (r) to be the minimum, over all integers n
(21.9)
See Remark 15.3.5 for discussion of this. Muzereau, Smart and Vercauteren [445] note
that if r is a pseudo-Mersenne prime (as is often used in elliptic curve cryptography) then
the Hasse interval usually contains a power of 2. Similarly, as noted by Maurer and Wolf
in [402], one can first choose a random smooth integer n and then search for a prime r
close to n and work with a group G of order r.
Exercise 21.4.16. Show how to use the algorithm of Section 19.4.4 to construct a
smooth integer in the Hasse interval. Construct a 240 -smooth integer (not equal to 2255 )
close to p = 2255 19 using this method.
Remark 21.4.17. There are two possible interpretations of Corollary 21.4.14. The first
interpretation is: if there exists an efficient algorithm for CDH or Fixed-CDH in a group
G = hgi of prime order r and if there exists an auxiliary elliptic curve over Fr with
sufficiently smooth order then there exists an efficient algorithm to solve the DLP in
G. Maurer and Wolf [405] (also see Section 3.5 of [406]) claim this gives a non-uniform
461
reduction from DLP to CDH, however the validity of this claim depends on the DLP
instance generator.7
In other words, if one believes that there does not exist a non-uniform polynomial-time
algorithm for DLP in G (for certain instance generators) and if one believes the conjecture
that the Hasse interval around r contains a polynomially smooth integer, then one must
believe there is no polynomial-time algorithm for CDH or Fixed-CDH in G. Hence, one
can use the results to justify the assumption that CDH is hard. We stress that this is
purely a statement of existence of algorithms; it is independent of the issue of whether or
not it is feasible to write the algorithms down.
A second interpretation is that CDH might be easy and that this reduction yields
the best algorithm for solving the DLP. If this were the case (or if one wants a uniform
reduction) then, in order to solve a DLP instance, the issue of how to implement the DLP
algorithm becomes important. The problem is that there is no known polynomial-time
algorithm to construct auxiliary elliptic curves E(Fr ) of smooth order. An algorithm to
construct smooth curves (based on the CM method) is given in Section 4 of [402] but it
has exponential complexity. Hence, if one can write down an efficient algorithm for CDH
then the above ideas alone do not allow one to write down an efficient algorithm for DLP.
Boneh and Lipton [83] handle the issue of auxiliary elliptic curves by giving a subexponentialtime reduction between Fixed-CDH and DLP. They make the natural assumption (essentially Conjecture 15.3.1; as used to show that the elliptic curve factoring method is
subexponential-time) that, for sufficiently large
primes, the probability
that a randomly
462
O(r1/3 log(r)) field operations. It is natural to conjecture8 that suitable auxiliary elliptic
curves exist for each prime r. One can construct auxiliary curves by choosing random
curves, counting points and factoring; one expects only polynomially many trials, but the
factoring computation is subexponential. We refer to [445, 42, 43] for further details.
Exercise 21.4.18. Write down the algorithm for the Muzereau-Smart-Vercauteren reduction using projective coordinates. Prove that the algorithm has the claimed complexity.
Exercise 21.4.19. Show how to generate in heuristic expected polynomial-time primes
r, p 2 (mod 3) such that r | (p + 1), r + 1 is -smooth, and 21 r < p 2+3 . Hence,
by Exercise 9.10.4, taking E : y 2 = x3 + 1 then E(Fp ) is a group of order divisible by r
and E(Fr ) has -smooth order and is a suitable auxiliary elliptic curve for the Maurer
reduction.
Finally, we remark that the den Boer and Maurer reductions cannot be applied to
relate CDH and DLP in groups of unknown order. For example, let N be composite and
g (Z/N Z) of unknown order M . Given a perfect Fixed-CDH oracle with respect to
g one can still compute with the algebraic group Gm (Z/M Z) in implicit representation
(or projective equations for E(Z/M Z)), but if M is not known then the order of G =
Gm (Z/M Z) (respectively, G = E(Z/M Z)) is also not known and so one cannot perform
the Pohlig-Hellman algorithm in G. Later we will mention how a CDH oracle in (Z/N Z)
can be used to factor N (see Exercise 24.2.23) and hence avoid this problem in that group.
21.5
Brown and Gallant [111] studied the relationship between Static-DH and DLP. Their main
result is an algorithm to solve an instance of the DLP using a perfect Static-DH oracle.
Cheon [130] independently discovered this algorithm in a different context, showing that
d
a variant of the DLP (namely, the problem of computing a given g, g a and g a ; we call
this Cheons variant of the DLP) can be significantly easier than the DLP. We now
present the algorithm of Brown-Gallant and Cheon, and discuss some of its applications.
Theorem 21.5.1. Let g have prime order r and let d | (r 1). Given h1 = g a and
p
d
(r
1)/d
+
d) log(r))group operations,
hd p
= g a then one
can
compute
a
in
O((
p
O( (r 1)/d + d) group elements of storage and O( (r 1)/d + d) multiplications
in Fr .9
Proof: First, the case a 0 (mod r) is easy, so we assume a 6 0 (mod r). The idea
is essentially the same as the den Boer reduction. Let be a primitive root modulo r.
Then a = u (mod r) for some 0 u < r 1 and it suffices to compute u. The den Boer
reduction works by projecting the unknown a into prime order subgroups of Fr using a
Diffie-Hellman oracle. In our setting, we already have an implicit representation of the
projection ad into the subgroup of Fr of order (r 1)/d.
ad
du
for some 0 u (r 1)/d. Let m =
p The first step is to solve hd = g = g
(r 1)/d and write u = u0 + mu1 with 0 u0 , u1 < m. This is exactly the setting
of equations (21.6) and (21.7) and hence one can compute (u0 , u1 ) using a baby-stepgiant-step algorithm. This requires m multiplications in Fr and 2m exponentiations
8 This conjecture seems to be possible to prove using current techniques, but I am not aware of any
reference for it.
9 As usual, we are being careless with the O()-notation. What we mean is that there is a constant c
p
independent of r, d, g and a such that the algorithm requires c( (r 1)/d+ d) log(r) group operations.
463
p
in p
the group. Thus the total complexity is O( (r 1)/d log(r)) group operations and
O( (r 1)/d) field operations.
We now have ad = du and so a = u+v(r1)/d for some 0 v < d. It remains to
compute v. Let
u
u
v(r1)/d
.
h = h1 = g a = g
Set m = d and write v = v0 + mv1 where 0 v0 , v1 < m. Using the same ideas as
above (since is known explicitly the powers are computed
efficiently) one can compute
As noted in [111] and [130] one can replace the baby-step-giant-step algorithms by
Pollard methods. Brown and Gallant10 suggest a variant of the Pollard rho method, but
with several non-standard features: one needs to find the precise location of the collision
(i.e., steps xi 6= xj in the walk such that xi+1 = xj+1 ) and there is only a (heuristic) 0.5
probability that a collision leads to a solution of the DLP. Cheon [130] suggests using the
Kangaroo method, which is a more natural choice for this application.
Exercise 21.5.5. Design a pseudorandom walk for the Pollard kangaroo method to solve
the DLP in implicit representation arising in the proof of Theorem 21.5.1.
Brown and Gallant use Theorem 21.5.1 to obtain the following result.
Theorem 21.5.6. Let g have prime order r and let d | (r 1). Let h = g a and suppose
A is a perfect oracle for the static Diffie-Hellman problem with
p respect to (g,
h) (i.e.,
(r
1)/d+
d) log(r))
A(h1 ) = ha1 ). Then one can
compute
a
using
d
oracle
queries,
O((
p
Appendix B.2 of the first version of [111]. This does not appear in the June 2005 version.
464
respect to a) that does not check that the inputs are group elements of the correct order.
Hence, the Brown-Gallant result is primarily interesting in the case where the Static-DH
oracle does perform these checks.
Corollary 21.5.7. Let g have prime order r and suppose r 1 has a factor d such that
d r1/3 . Given h = g a and a perfect Static-DH oracle with respect to (g, h) then one can
compute a in O(r1/3 ) oracle queries and O(r1/3 log(r)) group operations.
Exercise 21.5.8. Prove Corollary 21.5.7.
Brown and Gallant use Theorem 21.5.6 to give a lower bound on the difficulty of
Static-DH under the assumption that the DLP is hard.
Exercise 21.5.9. Letg have order r. Assume that the best algorithm to compute a,
given h = g a , requires r group operations. Suppose that r 1 has a factor d = c1 log(r)2
for some constant c1 . Prove
that the best algorithm to solve Static-DH with respect to
(g, h) requires at least c2 r/ log(r)2 group operations for some constant c2 .
All the above results are predicated on the existence of a suitable factor d of r 1. Of
course, r 1 may not have a factor of the correct size; for example if r 1 p
= 2l where l
2
is prime then we have shown that given (g, g a , g a ) one can compute a in O( r/2 log(r))
group operations, which is no better than general methods for the DLP. To increase the
applicability of these ideas, Cheon also gives a method for when there is a suitable factor
d of r + 1. The method in this case is not as efficient as the r 1 case, and requires more
auxiliary data.
i
Theorem 21.5.10. Let g have prime order rpand let d | (r + 1). Given hi = g a for
1
group operations,
p i 2d thenone can compute a in O(( (r + 1)/d
p + d) log(r))
O( (r + 1)/d + d) group elements storage and O(( (r + 1)/d + d) log(r)) multiplications in Fr .
Proof: As in Exercise 21.4.13 the idea is to work in the algebraic group G2,r , which
has order r + 1. Write Fr2 = Fr () where 2 = t Fr . By Lemma 6.3.10 each element
G2,r {1} Fr2 is of the form 0 + 1 where
0 =
a2 t
,
a2 + t
1 =
2a
a2 + t
for some a Fr . For each d N there exist polynomials fd,0 (x), fd,1 (x) Fr [x] of degree
2d such that, for as above, one has
d =
The idea is to encode the DLP instance g a into the element G2,r as
=
a2 t
2a
+ 2
.
a2 + t
a +t
We do not know , but we can compute (a2 t), (a2 + t) and 2a in implicit representation.
Let be a generator for G2,r , known explicitly. Then = u for some 0 u < r + 1.
It suffices to compute u.
d
du
The first step is to project into the
p subgroup of order (r + 1)/d. We have = for
some 0 u < (r + 1)/d. Let m = (r + 1)/d so that u = u0 + mu1 for 0 u0 , u1 < m.
465
Write i = i,0 +i,1 . Then d u0 = du1 and so (fd,0 (a)+fd,1 (a))(u0 ,0 +u0 ,1 ) =
(a2 + t)d (du1 ,0 + du1 ,1 ). Hence
u0 ,0
u0 ,1 2 d du1 ,0
g fd,0 (a)
= g (a +t)
g fd,1 (a)
and similarly for the implicit representation of the coefficient of . It follows that one
can perform the baby-step-giant-step algorithm in this setting to compute (u0 , u1 ) and
2
d
hence u (mod (r + 1)/d). Note that computing g fd,0 (a) , g fd,1 (a) and g (a +t) requires 6d
exponentiations. The stated complexity follows.
For the second stage, we have = u+v(r+1)/d where 0 v < d. Giving a babystep-giant-step algorithm here is straightforward and we leave the details as an exercise.
One derives the following result. Note that it is not usually practical to consider a
computational problem whose input is a O(r1/3 )-tuple of group elements, hence this result
is mainly of theoretical interest.
Corollary 21.5.11. Let g have prime order r and suppose r + 1 has a factor d such that
i
d r1/3 . Given hi = g a for 1 i 2d then one can compute a in O(r1/3 log(r)) group
operations.
Corollary 21.5.12. Let g have prime order r and suppose r + 1 has a factor d such that
d r1/3 . Given h = g a and a perfect Static-DH oracle with respect to (g, h) then one can
compute a in O(r1/3 ) oracle queries and O(r1/3 log(r)) group operations.
Exercise 21.5.13. Fill in the missing details in the proof of Theorem 21.5.10 and prove
Corollaries 21.5.11 and 21.5.12.
Satoh [507] extends Cheons algorithm to algebraic groups of order n (r) (essentially,
to the groups Gn,r ). He also improves Theorem 21.5.10 in the case of d | (r + 1) to only
i
require hi = g a for 1 i d.
A natural problem is to generalise Theorem 21.5.10 to other algebraic groups, such as
elliptic curves. The obvious approach does not seem to work (see Remark 1 of [130]), so
it seems a new idea is needed to achieve this. Finally, Section 5.2 of [131] shows that, at
least asymptotically, most primes r are such that r 1 or r + 1 has a useful divisor.
Both [111] and [130] remark that a decryption oracle for classic textbook Elgamal
leads to an Static-DH oracle: Given an Elgamal public key (g, g a ) and any h1 hgi one
can ask for the decryption of the ciphertext (c1 , c2 ) = (h1 , 1) (one can also make this
less obvious using random self-reducibility of Elgamal ciphertexts) to get c2 ca
= ha
1
1 .
From this one computes ha1 . By performing this repeatedly one can compute a sequence
i
hi = g a as required. The papers [111, 130] contain further examples of cryptosystems
that provide Static-DH oracles, or computational assumptions that contain values of the
i
form hi = g a .
21.6
Saying that a computational problem is hard is the same as saying that it is hard to
write down a binary representation of the answer. Some bits of a representation of the
answer may be easy to compute (at least, up to a small probability of error) but if a
computational problem is hard then there must be at least one bit of any representation
of the answer that is hard to compute. In some cryptographic applications (such as key
derivation or designing secure pseudorandom generators) it is important to be able to
466
locate some of these hard bits. Hence, the main challenge is to prove that a specific bit
is hard. A potentially easier problem is to determine a small set of bits, at least one of
which is hard. A harder problem is to prove that some set of bits are all simultaneously
hard (for this concept see Definition 21.6.14).
The aim of this section is to give a rigorous definition for the concept of hard bits
and to give some easy examples (hard bits of the solution to the DLP). In Section 21.7 we
will consider related problems for the CDH problem. We first show that certain individual
bits of the DLP, for any group, are as hard to compute as the whole solution.
Definition 21.6.1. Let g G have prime order r. The computational problem DL-LSB
is: given (g, g a ) where 0 a < r to compute the least significant bit of a.
Exercise 21.6.2. Show that DL-LSB R DLP.
Theorem 21.6.3. Let G be a group of prime order r. Then DLP R DL-LSB.
Proof: Let A be a perfect oracle that, on input (g, g a ) outputs
Pmthe least significant bit
of 0 a < r. In other words, if the binary expansion of a is i=0 ai 2i then A outputs
a0 . We will use A to compute a.
The first step is to call A(g, h) to get a0 . Once this has been obtained we set h =
a0
. Then h = g 2a1 +4a2 + . Let u = 21 = (r + 1)/2 (mod r) and define
hg
h1 = (h )u .
Then h1 = g a1 +2a2 + so calling A(g, h1 ) gives a1 . For i = 2, 3, . . . compute hi =
(hi1 g ai1 )u and ai = A(g, hi ), which computes the binary expansion of a. This reduction runs in polynomial-time and requires polynomially many calls to the oracle A.
Exercise 21.6.4. Give an alternative proof of Theorem 21.6.3 based on bounding the
unknown a in the range
(l 1)r/2j a < lr/2j .
Initially one sets l = 1 and j = 0. At step j, if one has (l 1)r/2j a < lr/2j and if
a is even then (l 1)r/2j+1 a/2 < lr/2j+1 and if a is odd then (2j + l 1)r/2j+1
(a + r)/2 < (2j + l)r/2j+1 . Show that when j = log2 (r) one can compute 2j a (mod r)
exactly and hence deduce a.
Exercise 21.6.5. Since one can correctly guess the least significant bit of the DLP with
probability 1/2, why does Theorem 21.6.3 not prove that DLP is easy?
One should also consider the case of a DL-LSB oracle that only works with some
noticeable probability . It is then necessary to randomise the calls to the oracle, but the
problem is to determine the LSB of a given the LSBs of some algebraically related values.
The trick is to guess some u = O(log(1/)) = O(log(log(r))) most significant bits of a
and set them to zero (i.e., replace h by h = g a where the u most significant bits of a
are zero). One can then call the oracle on h g y for random 0 y r r/2u and take a
majority vote to get the result. For details of the argument see Blum and Micali [73].
We conclude that computing the LSB of the DLP is as hard as computing the whole
DLP. Such bits are called hardcore bits since if DLP is hard then computing the LSB
of the DLP is hard.
Definition 21.6.6. Let f : {0, 1} {0, 1} be a function computable in polynomialtime (i.e., there is some polynomial p(n) such that for x {0, 1}n one can compute f (x)
467
We now give some candidate hardcore predicates for the DLP. We also restate the
meaning of hardcore bit for functions defined on {0, 1, . . . , r 1} rather than {0, 1}.
Definition 21.6.7. For all n N let (Gn , gn , rn ) be such that Gn is a group and gn Gn
is an element of order rn where rn is an n-bit prime. We call this a family of groups.
For n N define the function fn : {0, 1, . . . , rn 1} Gn by fn (a) = gna . For n N
define i(n) {0, 1, . . . , n 1}. The predicate bi(n) : {0, 1, . . . , rn 1} {0, 1} is defined
so that bi(n) (a) is bit i(n) of a, when a is represented as an n-bit string. Then bi(n) is a
hardcore predicate for the DLP (alternatively, bit i(n) is a hardcore bit for the
DLP) if, for all probabilistic polynomial-time algorithms A, the advantage
Adva{0,1,...,rn 1} A(fn (a)) = bi(n) (a)
is negligible as a function of n.
The least significant bit (LSB) is the case i(n) = 0 in the above definition. If the DLP
is hard then Theorem 21.6.3 shows that the LSB is a hardcore bit.
Example 21.6.8. Fix m N. Let g have prime order r > 2m . Suppose A is a perfect
oracle such that, for x {0, 1, . . . , r 1}, A(g x ) is the predicate bm (x) (i.e., bit m of
x). One can use A to solve the DLP by guessing the m 1 LSBs of x and then using
essentially the same argument as Theorem 21.6.3. Hence, if m is fixed and g varies in a
family of groups as in Example 21.6.7 then bm (x) is a hardcore predicate for the DLP. A
similar result holds if m is allowed to grow, but is bounded as m = O(log(log(r))).
We now give an example of a hardcore predicate that is not just a bit of the DLP.
Exercise 21.6.9. Let g have prime order r. Let f : {0, 1, . . . , r 1} G be f (x) = g x .
Define the predicate b : {0, 1, . . . , r 1} {0, 1} by b(x) = x1 x0 where x0 and x1 are
the two least significant bits of x. Show that b is a hardcore predicate for f .
It is not true that any bit of the DLP is necessarily hardcore. For example, one can
consider the most significant bit of a, which is bn1 (x) in Definition 21.6.7.
Example 21.6.10. Let r = 2l + u be a prime where 0 < u < 2l . Let 0 a < r
be chosen uniformly at random and interpreted as an (l + 1)-bit string. Then the most
significant bit of a is equal to 1 with probability u/r < u/2l < 1/2 and is equal to 0 with
probability at least 1 1/2 . Hence, when 1 then the most significant bit is not a
hardcore bit for the DLP. Note that the function g a is not used here; the result merely
follows from the distribution of integers modulo r.
Exercise 21.6.11. Let r = 2l + 2l1 + u where 0 < u < 2l/2 . Let 0 a < r be uniformly
chosen and represented as an (l + 1)-bit string. Show that neither the most significant
bit (i.e., bit l) nor bit l 1 of a are hardcore for the DLP.
The above examples show that for some primes the most significant bit is easy to
predict. For other primes the most significant bit can be hard.
468
Exercise 21.6.12. Suppose r = 2l 1 is a Mersenne prime and let g have order r. Fix
0 i l. Show that if O(g, h) is a perfect oracle that returns the i-th bit of the DLP of
h with respect to g then one can compute the whole DLP.
To summarise, low order bits of the DLP are always as hard as the DLP, while high
order bits may or may not be hard. However, our examples of cases where the high
order bits are easy are due not to any weakness of the DLP, but rather to statistical
properties of residues modulo r. One way to deal with this issue is to define a bit as being
hard if it cannot be predicted better than the natural statistical bias (see, for example,
Definition 6.1 of H
astad and N
aslund [278]). However this approach is less satisfactory for
cryptographic applications if one wants to use the DLP as a source of unpredictable bits.
Hence, it is natural to introduce a more statistically balanced predicate to use in place of
high order bits. In practice, it is often more efficient to compute the least significant bit
than to evaluate this predicate.
Exercise 21.6.13. Let g have order r. Let f : {0, 1, . . . , r 1} G be f (x) = g x .
Define b(x) = 0 if 0 x < r/2 and b(x) = 1 if r/2 x < r. Show, using the method of
Exercise 21.6.4, that b(x) is a hardcore bit for f .
We do not cover all results on hard bits for the DLP. See Section 9 of H
astad and
N
aslund [278] for a general result and further references.
So far we only discussed showing that single bits of the DLP are hard. There are
several approaches to defining the notion of a set of k bits being simultaneously hard. One
definition states that the bits are hard if, for every non-constant function B : {0, 1}k
{0, 1}, given an oracle that takes as input g x and computes B on the k bits of x in
question one can use the oracle to solve the DLP. Another definition, which seems to be
more useful in practice, is in terms of distinguishing the bits from random.
Definition 21.6.14. Let f : {0, 1}n {0, 1}m be a one way function and let S
{1, . . . , n}. We say the bits labelled by S are simultaneously hard if there is no
polynomial-time algorithm that given f (x) can distinguish the sequence (xi : i S)
from a random #S-bit string.
Peralta [477] (using next-bit-predictability instead of hardcore predicates or Definition 21.6.14) proves that O(log(log(r))) least significant bits of the DLP are hard.
Schnorr [520] (using Definition 21.6.14) proves that essentially any O(log(log(r))) bits
of the DLP are simultaneously hard (using the bits of Exercise 21.6.13 for the most
significant bits).
Patel and Sundaram [475] showed, under a stronger assumption, that many more
bits are simultaneously hard. Let g be an element of prime order r, let l N and set
k = log2 (r) l. The ideas of Patel and Sundaram lead to the following result. If,
given g x , the k least significant bits of x are not simultaneously hard then there is an
efficient algorithm to solve the DLP in an interval of length 2l (see Exercise 13.3.6 for
the definition of this problem). Hence, under the assumption that the DLP in an interval
of length 2l is hard, then one can output many bits. Taking l = log(log(p))1+ gives an
essentially optimal asymptotic bit security result for the DLP.
21.6.1
One can consider hard bits for the DLP in algebraic group quotients. In other words, let
Oi be a perfect oracle that on input the equivalence class of an element [g a ] outputs bit
i of a. The first problem is that there is more than one value a for each class [g a ] and so
the bit is not necessarily well-defined.
469
Section 7 of Li, N
aslund and Shparlinski [384] considers this problem for LUC. To
make the problem well-defined they consider an element g Fp2 of prime order r and an
oracle A such that A(t) = ai where ai is the i-th bit of a for the unique 0 a < r/2
such that t = TrFp2 /Fp (g a ). The idea of their method is, given t, to compute the two
roots h1 = g a and h2 = g ra of X 2 tX + 1 in Fp2 then use previous methods (e.g.,
Theorem 21.6.3 or Exercise 21.6.4) on each of them to compute either a or ra (whichever
is smaller).
Exercise 21.6.15. Work out the details of the Li, N
aslund and Shparlinski result for the
case of the least significant bit of the DLP in LUC.
Exercise 21.6.16. Consider the algebraic group quotient corresponding to elliptic curve
arithmetic using x-coordinates only. Fix P E(Fq ) of prime order r. Let A be an oracle
that on input u Fq outputs a0 where a0 is the 0-th bit of a such that 0 a < r/2 and
x([a]P ) = u. Show that the method of Li, N
aslund and Shparlinski can be applied to
show that this bit is a hard bit for the DLP.
Li, N
aslund and Shparlinski remark that it seems to be hard to obtain a similar result
for XTR. Theorem 3 of Jiang, Xu and Wang [313] claims to be such a result, but it does
not seem to be proved their paper.
21.7
We now consider which bits of the CDH problem are hard. Since the solution to a CDH
instance is a group element it is natural to expect, in contrast with our discussion of the
DLP, that the hardcore bits and the proof techniques will depend on which group is being
studied.
We first consider the case g Fp where p is a large prime and g is a primitive root.
Our presentation follows Boneh and Venkatesan [85]. We assume every element x Fp
is represented as an element of the set {1, 2, . . . , p 1} and we interpret x (mod p) as
returning a value in this set.
Definition 21.7.1. Let p be odd. Let x {1, 2, . . . , p 1}. Define
0 if 1 x < p/2
MSB1 (x) =
1 otherwise.
For k N let 0 t < 2k be the integer such that
tp/2k x < (t + 1)p/2k
and define MSBk (x) = t.
An alternative definition, which is commonly used in the literature and sometimes used
in this book, is MSBk (x) = u Z such that |x u| p/2k+1 (e.g., u = tp/2k + p/2k+1 ).
For this definition it is unnecessary to assume k N and so one can allow k R>0 .
Note that these are not bits of the binary representation of x. Instead, as in Exercise 21.6.13, they correspond to membership of x in a certain partition of {1, 2, . . . , p 1}.
Ideally we would like to show that, say, MSB1 is a hardcore
p bit for CDH. This seems to
be out of reach for Fp . Instead, we will show that, for k log2 (r), if one can compute
MSBk (g ab (mod p)) then one can compute g ab (mod p). A consequence of this result is
that there exists some predicate defined on MSBk (g ab (mod p)) whose value is a hardcore
bit for CDH.
470
The central idea of most results on the bit security of CDH is the following. Let p be
an odd prime and let g Fp be a primitive root. Let h1 = g a , h2 = g b be a CDH instance
where b is coprime to p 1. For k N let Ak be a perfect oracle such that
Ak (g, g a , g b ) = MSBk (g ab ).
Choose a random element 1 x < p and set u = Ak (g, h1 g x , h2 ). One has
u = MSBk (g (a+x)b ) = MSBk (g ab t)
where
t = hx2 .
In other words, the oracle Ak gives the most significant bits of multiples of the unknown
g ab by uniformly random elements t Fp . The problem of using this information to
compute g ab is (a special case of) the hidden number problem.
21.7.1
Definition 21.7.2. Let p be an odd prime and k R>1 . Let Fp and let t1 , . . . , tn
Fp be chosen uniformly at random. The hidden number problem (HNP) is, given
(ti , ui = MSBk (ti (mod p))) for 1 i n to compute .
Throughout this section we will allow any k R>1 and define MSBk (x) to be any
integer u such that |x u| < p/2k+1 .
Before giving the main results we discuss two easy variants of Definition 21.7.2 where
the values ti can be chosen adaptively.
Lemma 21.7.3. Let p be an odd prime and 1 < p. Suppose one has a perfect oracle
A1 such that A1 (t) = MSB1 (t (mod p)). Then one can compute using O(log(p)) oracle
queries.
Exercise 21.7.4. Prove Lemma 21.7.3.
Lemma 21.7.5. Let p be an odd prime and 1 < p. Suppose one has a perfect oracle
A such that A(t) = LSB1 (t (mod p)), where LSB1 (x) is the least significant bit of the
binary representation of 0 x < p. Then one can compute using O(log2 (p)) oracle
queries.
Exercise 21.7.6. Prove Lemma 21.7.5.
Lemmas 21.7.3 and 21.7.5 show that the hidden number problem can be easy if the
values ti in Definition 21.7.2 are chosen adaptively. However, it intuitively seems harder
to solve the hidden number problem when the ti are randomly chosen. On the other
hand, as k grows the HNP becomes easier; the case k = log2 (p) being trivial. Hence, one
could hope to be able to solve the HNP as long as k is sufficiently large. We now explain
the method of Boneh and Venkatesan [85] to solve the HNP using lattices.
Definition 21.7.7. Let (ti , ui = MSBk (ti )) for 1 i n.
by the rows of the basis matrix
p 0 0 0
0
0 p 0
0
0
..
.
.
.
..
..
..
B= .
0 0 0 p
0
t1 t2 t3 tn 1/2k+1
Define the vector u = (u1 , u2 , . . . , un , 0) Rn+1 where |ui (ti (mod p))| < p/2k+1 .
471
n
k+1
and
Lemma 21.7.8. Let L, u and n be as in Definition
21.7.7.k+1Then det(L) = p /2
there exists a vector v L such that ku vk < n + 1p/2 .
Proof: The first statement is trivial. For the second, note that ui = MSBk (ti (mod p))
is the same as saying ti = ui + i + li p for some i , li Z such that |i | p/2k+1 , for
1 i n. Now define v L by
v = (l1 , l2 , . . . , ln , )B
=
=
(t1 l1 p, . . . , tn ln p, /2k+1 )
(u1 + 1 , . . . , un + n , /2k+1 ).
< .
A
p1
p1
2
2
p1
2
Since there are n uniformly and independently chosen t1 , . . . , tn Fp the probability
that |ti (mod p)| < p/2 for all 1 i n is An . Finally, there are p 1 choices for
{0, 1, . . . , p 1} such that 6 (mod p). Hence, the probability over all such and
all t1 , . . . , tn that kv uk < p/2+1 is at most
(p 1)An <
2log2 (p)+2n
(p 1)4n
<
.
n
2
2n
p
p
Now, n = ( 21 log2 (p) + 3)2 log2 (p) log2 (p) + 3n so (p 1)An < 2n . Since n 6
the result follows.
472
p
p
Corollary 21.7.10. Let p > 232 be prime, let n = 2 log2 (p) and let k = log2 (p)+
log2 (log2 (p)). Given (ti , ui = MSBk (ti )) for 1 i n as in Definition 21.7.2 one can
compute in polynomial-time.
Proof: Onepconstructs the basis matrix B for the lattice L in polynomial-time. Note
2
that n = O( log(p)) so that the matrix requires O(log(p)
) bits storage.
Running the LLL algorithm with factor = 1/4 + 1/ 2 is a polynomial-time computation (the lattice is not a subset of Zn+1 so Remark 17.5.5 should be applied, noting that only one column has non-integer entries) which returns an LLL-reduced basis.
Let u be as
above. The Babai nearest plane algorithm finds v such that kv uk <
(1.6)2(n+1)/4 n + 1p/2k+1 by Theorem 18.1.7 and Lemma 21.7.8. This computation
requires O(log(p)4.5 ) bit operations by Exercise 18.1.9. To apply Theorem 21.7.9 we
+1
of u where
need the
p vector v output from the Babai algorithm to be within p/2
1
= 2 log2 (p) + 3. Hence, we need
(1.6)2(n+1)/4 n + 1
1
< +1 ,
2k+1
2
p
which is + log2 (1.6) + (n + 1)/4 + log2 ( n + 1) < k = log2 (p) + log2 (log2 (p)).
Since
p
p
the result follows whenever p is sufficiently large (the reader can check that p > 232 is
sufficient).
It follows from Theorem 21.7.9 that, with probability at least 63/64 the vector v =
(v1 , . . . , vn+1 ) Rn+1 output by the Babai algorithm is such that vn+1 2k+1 (mod p).
It follows that the hidden number can be efficiently computed.
Note that if p 2160 then 9.32. In practice, the algorithm works well for primes
of this size. For example, Howgrave-Graham and Smart [298] present results of practical
experiments where 8 of the most significant bits
p are provided by an oracle. We stress that
these results do not show that all of the k = log2 (p) + log2 (log2 (p)) most significant
bits are hard. Instead, one can only deduce that there is a predicate defined on these k
bits that is a hardcore predicate for CDH.
Nguyen and Shparlinski [456] also remark that one could use other methods than
LLL and the Babai nearest plane algorithm. They show that if one uses the Ajtai,
Kumar and Sivakumar algorithm for CVP then one only needs k = log(log(p)) bits to
obtain an algorithm for the hidden number problem with complexity of pO(1/ log(log(p)))
bit operations. They further show that if one has a perfect oracle for CVP (with respect
to the norm) then one can solve the hidden number problem in polynomial time given
only k = 1 + bits for any > 0.
One final remark, the methods in this section assume a perfect oracle that outputs
MSB1 (t (mod p)). Since there seems to be no way to determine whether the output of
the oracle is correct, it is an open problem to get results in the presence of an oracle that
sometimes makes mistakes (though, as we mention in the next section, when applying the
hidden number problem to the bit security of CDH then there is a solution in the case
of oracles with a relatively low probability of giving an incorrect answer). For further
discussion and applications of the hidden number problem see Shparlinski [555].
21.7.2
473
32
Theorem
p 21.7.11. Let p > 2 be prime, let g be a primitive root modulo p and let
k = log2 (p) + log2 (log2 (p)). Suppose there is no polynomial-time algorithm to
solve11 CDH in Fp . Then there is no polynomial-time algorithm to compute the k most
significant bits of g ab when given g, g a and g b .
Proof: Let (g, g a , g b ) be an instance of the CDH problem in hgi and write = g ab for
the solution. We assume that gcd(b, p 1) = 1 (this requirement is removed by Gonzalez
Vasco and Shparlinski [260]; other work mentioned below allows g to have prime order,
in which case this restriction disappears).
Given a polynomial-time algorithm A such that A(g, g x , g y ) = MSBk (g xy (mod p))
then one can call A(g, g a g r , g b ) polynomially many times for uniformly random r
{1, 2, . . . , p 2} to get MSBk (t) where t = g br (mod p). Applying Corollary 21.7.10
gives a polynomial time algorithm to compute .
A number of significant open problems remain:
1. Theorem 21.7.11 shows it is hard to compute all of MSBk (g ab ) but that does not
imply that, say, MSB1 (g ab ) is hard. A stronger result would be to determine specific
hardcore bits for CDH, or at least to extend the results to MSBk for smaller values
of k. Boneh and Venkatesan [86] give a method that works for k = 2 log(log(p))
bits (where g is a primitive root in Fp ) but which needs a hint depending on p and
g; they claim this is a non-uniform result but this depends on the instance generator
(see the footnote of Section 21.4.3). For k = log(log(p)) one can also consider the
approach of Nguyen and Shparlinski [456] mentioned above.
Akavia [8] uses a totally different approach to prove that MSB1 is hard for CDH,
but the method is again at best non-uniform (i.e., needs polynomial-sized auxiliary
information depending on p and g b ).
2. We assumed perfect oracles for computing MSBk (t) in the above results. For nonperfect oracles one can use the above methods to generate a list of candidate values
for g ab and then apply the CDH self-corrector of Section 21.3. We refer to Gonzalez
Vasco, N
aslund and Shparlinski [259] for details.
The method of Akavia [8] also works when the oracle for MSB1 is unreliable.
3. The above results assumed that g is a primitive root modulo p, whereas in practice one chooses g to lie in a small subgroup of Fp of prime order. The proof of
Theorem 21.7.11 generates values t that lie in hgi and so they are not uniformly at
alez Vasco and Shparlinski have given results that apply when
random in Fp . Gonz
the order of g is less than p 1 (see Chapter 14 of [554] for details and references).
Shparlinski and Winterhof [556, 557], building on work of Bourgain and Konyagin,
have obtained results when the order of g is at least log(p)/ log(log(p))1 .
Exercise 21.7.12. This exercise concerns a static Diffie-Hellman key exchange protocol
due to Boneh and Venkatesan [85] for which one can prove that the most significant bit
is a hardcore bit. Suppose Alice chooses a prime p, an integer 1 a < p 1 such that
1
gcd(a, p 1) = 1 and sets g = 2a (mod p1) (mod p). Alice makes p and g public and
keeps a private. When Bob wants to communicate with Alice he sends g x for random
1 x < p 1 so that Alice and Bob share the key 2x . Prove that MSB1 (2x ) is a hardcore
bit.
11 As we have seen, to make such a statement precise one needs an instance generator that outputs
groups from a family.
474
[Hint: Suppose one has a perfect oracle A that on input g y outputs MSB1 (2y ). Then one
can store Bobs tranmission g x and call A(g x g y ) to get 2y , where = 2x is the desired
hidden number. Then apply Lemma 21.7.3.]
Exercise 21.7.13. Let g Fp be a primitive root and let > 0. Show that if one has a
perfect oracle for MSB1+ (g ab ) then one can solve DDH in Fp .
21.7.3
So far we have only considered CDH in (subgroups of) Fp where p is prime. It is natural
to consider CDH in subgroups of Fpm , in algebraic tori, in trace systems such as LUC and
XTR, and in elliptic curves. The first issue is what is meant by bits of such a value.
In practice, elements in such a group are represented as an n-tuple of elements in Fp and
so it is natural to consider one component in Fp and take bits of it as done previously.
When p is small one can consider a sequence of bits, each from different components. An
early reference for bit security of CDH in this setting is Verheul [615].
It is possible to extend the results to traces
Pm relatively easily. The ideaPism that if
{1 , . . . , m } is a basis for Fpm over Fp , if = j=1 j j is hidden and if ti = j=1 ti,j j
are known then Tr(ti ) is a linear equation in the unknown i . Li, N
aslund and Shparlinski [384] have studied the bit security of CDH in LUC and XTR. We refer to Chapters
6 and 19 of Shparlinski [554] for further details and references.
Exercise 21.7.14. Let F2m be represented using a normal basis and let g F2m . Suppose
one has a perfect oracle A such that A(g, g a , g b ) returns the first coefficient of the normal
basis representation of g ab . Show how to use A to compute g ab . Hence, conclude that the
first coefficient is a hardcore bit for CDH in F2m .
Exercise 21.7.15. Let F2m = F2 [x]/(F (x)) and let g F2m have prime order r > m.
Suppose one has a perfect oracle A such that A(g, g a , g b ) returns the constant coefficient
of the polynomial basis representation of g ab . Show how to use A to compute g ab . Hence,
conclude that the constant coefficient is a hardcore bit for CDH in F2m .
Hard Bits for Elliptic Curve Diffie-Hellman
We now consider the case of elliptic curves E over Fq . A typical way to extract bits from
an elliptic curve point P is to consider the x-coordinate x(P ) as an element of Fq and
then extract bits of this. It seems hard to give results for the bit security of CDH using
an oracle A(P, [a]P, [b]P ) = MSBk (x([ab]P )); the natural generalisation of the previous
approach is to call A(P, [a]P + [z]P, [b]P ) = MSBk (x([ab]P + [zb]P )) but the problem is
that it is difficult to infer anything useful about x([ab]P ) from x([ab]P + [zb]P ) (similarly
for least significant bits); see Jao, Jetchev and Venkatesan [307] for some results. However,
Boneh and Shparlinski [84] had the insight to consider a more general oracle.
Definition 21.7.16. Let p be an odd prime and k N. Let Ax,k (A, B, P, [a]P, [b]P )
be an oracle that returns LSBk (x([ab]P )) where P E(Fp ) for the elliptic curve E :
y 2 = x3 + Ax + B. Similarly, let Ay,k (A, B, P, [a]P, [b]P ) be an oracle that returns
LSBk (y([ab]P )).
The crucial idea is that, given a point P = (xP , yP ) E(Fp ) where E : y 2 = x3 +
Ax + B, one can consider an isomorphism (x, y) = (u2 x, u3 y) and (P ) E (Fp ) where
E : Y 2 = X 3 + u4 AX + u6 B. Hence, instead of randomising instances of CDH in a way
analogous to that done earlier, one calls the oracle Ax,k (u4 A, u6 B, (P ), ([a]P ), ([b]P ))
to get LSBk (x(([ab]P ))) = LSBk (u2 x([ab]P ) (mod p)) where u is controlled by the
475
attacker. This is very similar to the easy case of the hidden number problem in Fp from
Lemma 21.7.5.
Lemma 21.7.17. Suppose p 2 (mod 3). Then LSB1 (y([ab]P )) is a hardcore bit for
CDH on elliptic curves over Fp .
Proof: We suppose Ay,1 is a perfect oracle for LSB1 (y([ab]P )) as above. Calling
Ay,1 (u4 A, u6 B, (P ), ([a]P ), ([b]P ))
gives LSB1 (u3 y([ab]P )). Since gcd(3, p 1) = 1 it follows that cubing is a permutation
of Fp and one can perform the method of Lemma 21.7.5 to compute y([ab]P ). Given
y([ab]P ) there are at most 3 choices for x([ab]P ) and so CDH is solved with noticeable
probability.
In the general case (i.e., when p 6 2 (mod 3)) Boneh and Shparlinski have to work
harder. They use the method of Alexi, Chor, Goldreich and Schnorr [9] or the simplified
version by Fischlin and Schnorr [202] to extend the idea to non-perfect oracles.12 Once
this is done, the following trick can be applied to determine
LSB1 (tx([ab]P )): when t is
a square one calls the oracle for LSB1 (u2 x([ab]P )) on u = t (mod p), and when t is not
a square one flips a coin. The resulting non-perfect oracle for LSB1 therefore solves the
problem. We refer to [84] for the details.
We make some remarks.
1. A nice feature of the elliptic curve results is that they are independent of the order
of the point P and so work for subgroups of any size.
2. The literature does not seem to contain bit security results for CDH on elliptic
curves over non-prime fields. This would be a good student project.
3. Jetchev and Venkatesan [312] use isogenies to extend the applicability of the BonehShparlinski method. Their motivation is that if one has an LSB1 (x([ab]P )) oracle
that works with only small (but noticeable) probability then it is possible to have
a CDH instance on an elliptic curve E for which the oracle does not work for any
twist of E. By moving around the isogeny class they claim that the probability of
success increases. However, it is still possible to have a CDH instance on an elliptic
curve E for which the oracle does not work for any elliptic curve in the isogeny class
of E.
21.8
Further Topics
There are a number of other results related to the Diffie-Hellman problem that we do
not have to space to cover. For example, Coppersmith and Shparlinski considered the
existence of polynomial relations between g x , g y and g xy . Canetti, Friedlander and Shparlinski considered the distribution of Diffie-Hellman triples (g x , g y , g xy ) in G3 . We refer
to [554] for a survey of these topics and references.
12 This is why Boneh and Shparlinski consider least significant bits rather than most significant bits for
their result. The technique of Alexi et al is to randomise the query LSB1 (t) as LSB1 (s)LSB 1 ((t+s))
for suitable values s. A good student project would be to obtain an analogous result for other bits (e.g.,
most significant bits).
Chapter 22
22.1
Schnorr Signatures
22.1.1
(22.1)
and accepts the proof if this is the case. In other words, the Prover has successfully
identified themself to the Verifier if the Verifier accepts the proof.
Exercise 22.1.1. Show that the Verifier in an execution of the Schnorr identification
scheme does accept the proof when the Prover follows the steps correctly.
Exercise 22.1.2. Let p = 311 and r = 31 | (p 1). Let g = 169, which has order r. Let
a = 11 and h = g a 47 (mod p). Which of the following is a transcript (s0 , s1 , s2 ) of a
correctly performed execution of the Schnorr identification scheme?
(15, 10, 12), (15, 10, 27), (16, 10, 12), (15, 16, 0).
Security of the Private Key
Unlike public key encryption (at least, under passive attacks), with identification schemes
and digital signature schemes a user is always outputting the results of computations
involving their private key. Hence, it is necessary to ensure that we do not leak information
about the private key. An attack of this type on GGH signatures was given by Nguyen
and Regev; see Section 19.11. Hence, we now explain why executions of the Schnorr
identification protocol do not leak the private key.
A protocol (involving a secret) that does not leak any information about the secret
is known as zero knowledge. It is beyond the scope of this book to discuss this topic
in detail, but we make a few remarks in the setting of the Schnorr identification scheme.
First, consider a Verifier who really does choose s1 independently and uniformly at random
(rather than as a function of s0 and h). It is easy to see that anyone can produce triples
(s0 , s1 , s2 ) that satisfy equation (22.1), without knowing the private key (just choose s1
and s2 first and solve for s0 ). Hence, a protocol transcript (s0 , s1 , s2 ) itself carries no
information about a (this shows that the protocol is honest verifier zero knowledge).
However, this argument does not imply that the protocol leaks no information about the
secret to an adversary who chooses s1 carefully. We now argue that the protocol is secure
in this setting. The idea is to consider any pair (s1 , s2 ). Then, for every 1 a < r,
there is some integer 0 k < r such that s2 k + as1 (mod r). Now, if k were known
to the verifier then they could solve for a. But, since the discrete logarithm problem is
hard, it is computationally infeasible to determine any significant information about the
distribution of k from s0 . Hence s2 leaks essentially no information about a. Furthermore,
there are no choices for s1 that more readily allow the Verifier to determine a.
For security, k must be chosen uniformly at random; see Exercise 22.1.3 and Section 22.3 for attacks if some information on k is known. We stress that such attacks are
much stronger than the analogous attacks for Elgamal encryption (see Exercise 20.4.1);
479
there the adversary only learns something about a single message, whereas here they learn
the private key!
Exercise 22.1.3. Suppose the random values k used by a prover are generated using the
linear congruential generator ki+1 = Aki + B (mod r) for some 1 A, B < r. Suppose
an adversary knows A and B and sees two protocol transcripts (s0 , s1 , s2 ) and (s0 , s1 , s2 )
generated using consecutive outputs ki and ki+1 of the generator. Show how the adversary
can determine the private key a.
A generalisation of Exercise 22.1.3, where the modulus for the linear congruential
generator is not r, is given by Bellare, Goldwasser and Micciancio [34].
Security Against Impersonation
Now we explain why the Verifier is convinced that the prover must know the private
key a. The main ideas will also be used in the security proof of Schnorr signatures, so
we go through the argument in some detail. First, we define an adversary against an
identification protocol.
Definition 22.1.4. An adversary against an identification protocol (with an honest
verifier) is a polynomial-time randomised algorithm A that takes as input a public key,
plays the role of the Prover in the protocol with an honest Verifier, and tries to make
the Verifier accept the proof. The adversary repeatedly and adaptively sends a value s0 ,
receives a challenge s1 and answers with s2 (indeed, the sessions of the protocol can be
interleaved). The adversary is successful if the Verifier accepts the proof with noticeable
probability (i.e., the probability, over all outputs s0 by A and all choices for s1 , that the
adversary can successfully respond with s2 is at least one over a polynomial function of
the security parameter). The protocol is secure if there is no successful adversary.
An adversary is just an algorithm A so it is reasonable to assume that A can be run
in very controlled conditions. In particular, we will assume throughout this section that
A can be repeatedly run so that it always outputs the same first commitment s0 (think
of A as a computer programme that calls a function Random to obtain random bits and
then simply arrange that the function always returns the same values to A). This will
allow us to respond to the same commitment with various different challenges s1 . Such
an attack is sometimes known as a rewinding attack (Pointcheval and Stern [480] call
it the oracle replay attack): If A outputs s0 , receives a challenge s1 , and answers with
s2 then re-running A on challenge s1 is the same as rewinding the clock back to when
A had just output s0 and then giving it a different challenge s1 .
Theorem 22.1.5. The Schnorr identification scheme is secure against impersonation (in
the sense of Definition 22.1.4) if the discrete logarithm problem is hard.
We first prove the result for perfect adversaries (namely, those that impersonate the
user successfully every time the protocol is run). Later we discuss the result for more
general adversaries.
Proof: (In the case of a perfect adversary) We build an expected polynomial-time algorithm (called the simulator) that solves a DLP instance (g, h) where g has prime order r
and h = g a where 0 a < r is chosen uniformly at random.
The simulator will play the role of the Verifier and will try to solve the DLP by
interacting with A. First, the simulator starts A by giving it h as the public key and
giving some choice for the function Random. The adversary outputs a value s0 , receives a
22.1.2
481
Schnorr Signatures
We now present the Schnorr signature scheme [518, 519], which has very attractive
security and efficiency. The main idea is to make the identification protocol of the previous
section non-interactive by replacing the challenge s1 by a random integer that depends
on the message being signed. This idea is known as the Fiat-Shamir transform. By
Exercise 22.1.6 it is important that s1 cannot be predicted and so it is also necessary to
make it depend on s0 .
More precisely, one sets s1 = H(mks0 ) where H is a cryptographic hash function from
{0, 1} to {0, 1}l for some parameter l and where m and s0 are interpreted as binary
strings (and where k denotes concatenation of binary strings as usual).
One would therefore obtain the following signature scheme, which we call naive
Schnorr signatures: To sign a message m choose a random 0 k < r, compute
s0 = g k , s1 = H(mks0 ) and s2 = k + as1 (mod r), and send the signature (s0 , s2 ) together
with m. A verifier, given m, (s0 , s2 ) and the public key h, would compute s1 = H(mks0 )
and accept the signature if
(22.2)
g s 2 = s0 hs 1 .
Schnorr makes the further observation that instead of sending (s0 , s2 ) one could send
(s1 , s2 ). This has major implications for the size of signatures. For example, g may be
an element of order r in Fp (for example, with r 2256 and p 23072 ). In this case,
s0 = g k requires 3072 bits, s2 requires 256 bits, and s1 may require as little as 128 bits. In
other words, signatures would have 3072 + 256 = 3328 bits in the naive scheme, whereas
Schnorr signatures only require 128 + 256 = 384 bits.
We present the precise Schnorr signature scheme in Figure 22.1.
KeyGen: This is the same as classic textbook Elgamal encryption. It outputs an
algebraic group, an element g of prime order r, a public key h = g a and a private key
1 a < r where a is uniformly chosen.
Sign(g, a, m): Choose uniformly at random 0 k < r, compute s0 = g k , s1 = H(mks0 )
and s2 = k + as1 (mod r), where the binary string s1 is interpreted as an integer in the
usual way. The signature is (s1 , s2 ).
Verify(g, h, m, (s1 , s2 )): Ensure that h is a valid public key for the user in question then
test whether
s1 = H(mkg s2 hs1 ).
Figure 22.1: Schnorr Signature Scheme.
Example 22.1.9. Let p = 311 and r = 31 | (p 1). Let g = 169 which has order r. Let
a = 11 and h = g a 47 (mod p).
To sign a message m (a binary string) let k = 20 and s0 = g k 225 (mod p). The
binary expansion of s0 is (11100001)2. We must now compute s1 = H(mk11100001).
Since we dont want to get into the details of H, lets just suppose that the output length
of H is 4 and that s1 is the binary string 1001. Then s1 corresponds to the integer
9. Finally, we compute s2 = k + as1 20 + 11 9 26 (mod r). The signature is
(s1 , s2 ) = (9 = (1001)2 , 26). To verify the signature one computes
g s2 hs1 = 16926 479 225 (mod p)
and checks that s1 = H(mk11100001).
22.1.3
The security of Schnorr signatures essentially follows from the same ideas as used in
Theorem 22.1.5. In particular, the security depends on the discrete logarithm problem
(rather than CDH or DDH as is the case for Elgamal encryption). However, since the
challenge is now a function of the message m and s0 , the exact argument of Theorem 22.1.5
cannot be used directly.
One approach is to replace the hash function by a random oracle H (see Section 3.7).
The simulator can then control the values of H and the proof of Theorem 22.1.5 can be
adapted to work in this setting. A careful analysis of Schnorr signatures in the random
oracle model, using this approach and the forking lemma, was given by Pointcheval and
Stern [480]. We refer to Theorem 14 of their paper for a precise result in the case where
the output of H is (Z/rZ) . A proof is also given in Section 10.4.2 of Vaudenay [612].
An analysis of the case where the hash function H maps to {0, 1}l where l < log2 (r) is
given by Neven, Smart and Warinschi [450].
There is no known proof of the security of Schnorr signatures in the standard model
(even under very strong assumptions about the hash function). Paillier and Vergnaud [473]
give evidence that one cannot give a reduction, in the standard model, from signature
forgery for Schnorr signatures (with H mapping to Z/rZ) to DLP. More precisely, they
show that if there is a reduction of a certain type (which they call an algebraic reduction)
in the standard model from signature forgery for Schnorr signatures to DLP, then there
is an algorithm for the one-more DLP. We refer to [473] for the details.
We now discuss some specific ways to attack the scheme:
1. Given a signature (s1 , s2 ) on message m, if one can find a message m such that
H(mkg s2 hs1 ) = H(m kg s2 hs1 ), then one has a signature also for the message m .
This fact can be used to obtain an existential forgery under a chosen-message attack.
While one expects to be able to find hash collisions after roughly 2l/2 computations
of H (see Section 3.2), what is needed here is not a general hash collision. Instead,
we need a collision of the form H(mkR) = H(m kR) where R = g s2 hs1 is not known
until a signature (s1 , s2 ) on m has been obtained. Hence, the adversary must first
output a message m, then get the signature (s1 , s2 ) on m, then find m such that
H(mkR) = H(m kR). This is called the random-prefix second-preimage problem
in Definition 4.1 of [450]. When R is sufficiently large it seems that solving this
problem is expected to require around 2l computations of H.
2. There is a passive existential forgery attack on Schnorr signatures if one can compute
pre-images of H of a certain form. Precisely, choose any (s1 , s2 ) (for example, if the
output of H is highly non-uniform then choose s1 to be a very likely output of
H), compute R = g s2 hs1 , then find a bitstring m such that H(mkR) = s1 . This
attack is prevented if the hash function is hard to invert.
Hence, given a security parameter (so that breaking the scheme is required to take
more than 2 bit operations) one can implement the Schnorr signature scheme with
r 22 and l = . For example, taking = 128, 2255 < r < 2256 and l = 128 gives
signatures of 384 bits.
483
Exercise 22.1.11. Fix g G of order r and m {0, 1}. Can a pair (s1 , s2 ) be a
Schnorr signature on the same message m for two different public keys? Are there any
security implications of this fact?
22.1.4
The Sign algorithm performs one exponentiation, one hash function evaluation, and one
computation modulo r. The Verify algorithm performs a multi-exponentiation g s2 hs1
where 0 s2 < r and 1 s1 < 2l and one hash function evaluation. Hence, signing is
faster than verifying.
There are a number of different avenues to speed up signature verification, depending
on whether g is fixed for all users, whether one is always verifying signatures with respect
to the same public key h or whether h varies, etc. We give a typical optimisation in
Example 22.1.13. More dramatic efficiency improvements are provided by online/offline
signatures (see Section 22.4), server-aided signature verification etc.
Exercise 22.1.12. Show how to modify the Schnorr signature scheme (with no loss of
security) so that the verification equation becomes
s1 = H(mkg s2 hs1 ).
Example 22.1.13. Suppose a server must verify many Schnorr signatures (using the
variant of Exercise 22.1.12),
always for the same value of g but for varying values of h.
Suppose that 2l1 < r < 2l (where l is typically also the output length of the hash
function). One strategy to speed up signature verification is for the server to precompute
l
and store the group element g1 = g 2 .
Given a signature (s1 , s2 ) with 0 s1 < 2l and 0 s2 < r one can write s2 = s2,0 +2l s2,1
with 0 s2,0 , s2,1 < 2l . The computation of g s2 hs1 is performed as the 3-dimensional
multi-exponentiation (see Section 11.2)
s
22.2
The Schnorr signature scheme is probably the best public key signature scheme for practical applications.2 A number of similar schemes have been discovered, the most well-known
of which are Elgamal and DSA signatures. We discuss these schemes very briefly in this
section.
2 However, Schnorr signatures are not very widely used in practice. The reason for their lack of use
may be the fact that they were patented by Schnorr.
22.2.1
Elgamal [191] proposed the first efficient digital signature based on the discrete logarithm
problem. We present the scheme for historical reasons, and because it gives rise to some
nice exercises in cryptanalysis. For further details see Section 11.5.2 of [415] or Section
7.3 of [588].
Assume that g is an element of prime3 order r in an algebraic group G. In this section
we always think of G as being the full algebraic group (such as Fq or E(Fq )) and assume
that testing membership g G is easy. The public key of user A is h = g a and the private
key is a, where 1 a < r is chosen uniformly at random.
The Elgamal scheme requires a function F : G Z/rZ. The only property required
of this function is that the output distribution of F restricted to hgi should be close to
uniform (in particular, F is not required to be hard to invert). In the case where G = Fp
it is usual to define F : {0, 1, . . . , p 1} {0, 1, . . . , r 1} by F (n) = n (mod r). If G
is the set of points on an elliptic curve over a finite field then one could define F (x, y)
by interpreting x (or x and y) as binary strings, letting n be the integer whose binary
expansion is x (or xky), and then computing n (mod r).
To sign a message m with hash H(m) Z/rZ one chooses a random integer 1 k < r,
computes s1 = g k , computes s2 = k 1 (H(m) aF (s1 )) (mod r), and returns (s1 , s2 ). To
verify the signature (s1 , s2 ) on message m one checks whether s1 hgi, 0 s2 < r, and
hF (s1 ) ss12 = g H(m)
in G. Elgamal signatures are the same size as naive Schnorr signatures.
A striking feature of the scheme is the way that s1 appears both as a group element
and as an exponent (this is why we need the function F ). In retrospect, this is a poor
design choice for both efficiency and security. The following exercises explore these issues
in further detail. Pointcheval and Stern give a variant of Elgamal signatures (the trick is
to replace H(m) by H(mks1 )) and prove the security in Sections 3.3.2 and 3.3.3 of [480].
Exercise 22.2.1. Show that the Verify algorithm succeeds if the Sign algorithm is run
correctly.
Exercise 22.2.2. Show that one can verify Elgamal signatures by computing a single 3dimensional multi-exponentiation. Show that the check s1 hgi can therefore be omitted
if gcd(s2 , #G) = 1. Hence, show that the time to verify an Elgamal signature when F
and H map to Z/rZ is around twice the time of the method in Example 22.1.13 to verify
a Schnorr signature. Explain why choosing F and H to map to l-bit integers where l
log2 (r)/2 does not lead to a verification algorithm as fast as the one in Example 22.1.13.
Exercise 22.2.3. (Elgamal [191]) Suppose the hash function H is deleted in Elgamal
signatures (i.e., we are signing messages m Z/rZ). Give a passive existential forgery in
this case. (i.e., the attack only requires the public key).
Exercise 22.2.4. Consider the Elgamal signature scheme in Fp with the function
F (n) = n (mod r). Suppose the function F (n) computes n (mod r) for all n N (not
just 0 n < p) and that the check s1 hgi does not include any check on the size of the
integer s1 (for example, it could simply be the check that sr1 1 (mod p) or the implicit
check of Exercise 22.2.2). Give a passive selective forgery attack.
3 The original Elgamal signature scheme specifies that g is a primitive root in F , but for compatibility
p
with all other cryptographic protocols in this book we have converted it to work with group elements of
prime order in any algebraic group.
485
Exercise 22.2.5. Consider the following variant of Elgamal signatures in a group hgi
of order r: The signature on a message m for public key h is a pair (s1 , s2 ) such that
0 s1 , s2 < r, and
hs1 g H(m) = g s2 .
Show how to determine the private key of a user given a valid signature.
Exercise 22.2.6. (Bleichenbacher [66]) Consider the Elgamal encryption scheme in Fp
with the function F (n) = n (mod r). Suppose the checks s1 hgi and 0 s2 < r are not
performed by the Verify algorithm. Show how an adversary who has maliciously chosen
the system parameter g can produce selective forgeries for any public key under a passive
attack.
Exercise 22.2.7. (Vaudenay [611]) Let H be a hash function with l-bit output. Show
how to efficiently compute an l-bit prime r, and messages m1 , m2 such that H(m1 )
H(m2 ) (mod r). Hence, show that if one can arrange for an algebraic group with subgroup
of order r to be used as the system parameters for a signature scheme then one can obtain
a signature on m1 for any public key h by obtaining from user A a signature on m2 .
A convenient feature of Elgamal signatures is that one can verify a batch of signatures
faster than individually verifying each of them. Some details are given in Exercise 22.2.8.
Early work on this problem was done by Naccache, MRahi, Vaudenay and Raphaeli [446]
(in the context of DSA) and Yen and Laih [633]. Further discussion of the problem is
given by Bellare, Garay and Rabin [33].
Exercise 22.2.8. Let (s1,i , s2,i ) be purported signatures on messages mi with respect to
public keys hi for 1 i t. A verifier can choose random integers 1 wi < r and verify
all signatures together by testing whether s1,i hgi and 0 s2,i < r for all i and the
single equation
!
! t
t
Y ws
Y
Pt
wi F (s1,i )
i 2,i
= g i=1 wi H(mi ) .
(22.3)
s1,i
hi
i=1
i=1
Show that if all the signatures (s1,i , s2,i ) are valid then the batch is declared valid. Show
that if there is at least one invalid signature in the batch then the probability the batch
is declared valid is at most 1/(r 1). Show how to determine, with high probability, the
invalid signatures using a binary search.
If one uses the methods of Exercise 22.2.2 then verifying the t signatures separately
requires t three-dimensional multi-exponentiations. One can break equation (22.3) into
about 2t/3 three-dimensional multi-exponentiations. So, for groups where testing s1,i
hgi is easy (e.g., elliptic curves of prime order), the batch is asymptotically verified in
about 2/3 the time of verifying the signatures individually. Show how to speed up verification of a batch of signatures further if the public keys hi are all equal. How much
faster is this than verifying the signatures individually?
Yen and Laih [633] consider batch verification of naive Schnorr signatures as mentioned
in Section 22.1.2. Given t signatures (s0,i , s2,i ) on messages mi and for keys hi , Yen and
Laih choose 1 wi < 2l (for a suitable small value of l; they suggest l = 15) and verify
the batch by testing s0,i hgi, 0 s2,i < r and
g
Pt
i=1
wi s2
t
Y
i=1
i
sw
0,i
t
Y
w H(mi ks0,i )
hi i
i=1
Give the verification algorithm when the public keys are all equal. Show that the cost is
roughly l/(3 log2 (r)) times the cost of verifying t Elgamal signatures individually.
22.2.2
DSA
A slight variant of the Elgamal signature scheme was standardised by NIST4 as a digital
signature standard. This is often called DSA.5 In the case where the group G is an
elliptic curve then the scheme is often called ECDSA.
In brief, the scheme has the usual public key h = g a where g is an element of prime
order r in an algebraic group G and 1 a < r is chosen uniformly at random. As with
Elgamal signatures, a function F : G Z/rZ is required. To sign a message with hash
value H(m) one chooses a random 1 k < r and computes s1 = F (g k ). If s1 = 0
then repeat6 for a different value of k. Then compute s2 = k 1 (H(m) + as1 ) (mod r)
and, if s2 = 0 then repeat for a different value of k. The signature on message m is
(s1 , s2 ). To verify the signature one first checks that 1 s1 , s2 < r, then computes
1
u1 = H(m)s1
2 (mod r), u2 = s1 s2 (mod r), then determines whether or not
s1 = F (g u1 hu2 ).
(22.4)
(22.5)
4 NIST stands for National Institute of Standards and Technology and is an agency that develops
technology standards for the USA.
5 DSA stands for digital signature algorithm.
6 The events s = 0 and s = 0 occur with negligible probability and so do not effect the performance
1
2
of the signing algorithm.
487
Show that one can also verify the signature by checking, for any 1 v < r, whether
sv0 = g u1 v hu2 v .
(22.6)
Show that one can efficiently compute an integer 1 v < r such that the equation (22.6)
can be verified more quickly than equation (22.5).
There is no proof of security for DSA signatures in the standard or random oracle
model. A proof of security in the random oracle model of a slightly modified version
of DSA (the change is to replace H(m) with H(mks1 )) was given by Pointcheval and
Vaudenay [481, 105] (also see Section 10.4.2 of [612]). A proof of security for DSA in the
generic group model7 was given by Brown; see Chapter II of [65].
Exercise 22.2.13. Consider a digital signature scheme where a signature on message m
with respect to public key h is an integer 0 s < r such that
s = H(mkhs ).
What is the problem with this signature scheme?
22.2.3
None of the signature schemes considered so far has a proof of security in the standard
model. Indeed, as mentioned, Paillier and Vergnaud [473] give evidence that Schnorr signatures cannot be proven secure in the standard model. In this section we briefly mention
a signature scheme due to Boneh and Boyen [76, 77] that is secure in the standard model.
However, the security relies on a very different computational assumption than DLP and
the scheme needs groups with an extra feature (namely, a pairing; see Definition 22.2.14).
We present a simple version of their scheme that is unforgeable under a weak chosenmessage attack if the q-strong Diffie-Hellman problem holds (these notions are defined
below).
We briefly introduce pairing groups (more details are given in Chapter 26). We use
multiplicative notation for pairing groups, despite the fact that G1 and G2 are typically
subgroups of elliptic curves over finite fields and hence are usually written additively.
Definition 22.2.14. (Pairing groups) Let G1 , G2 , GT be cyclic groups of prime order
r. A pairing is a map e : G1 G2 GT such that
1. e is non-degenerate and bilinear, i.e., g1 G1 {1} and g2 G2 {1} implies
e(g1 , g2 ) 6= 1 and e(g1a , g2b ) = e(g1 , g2 )ab for a, b Z,
For the Boneh-Boyen scheme we also need there to be an efficiently computable injective
group homomorphism : G2 G1 (for example, a distortion map; see Section 26.6.1).
We will assume that elements in G1 have a compact representation (i.e., requiring not
much more than log2 (r) bits) whereas elements of G2 do not necessarily have a compact
representation. The signature is an element of G1 and hence is very short. Figure 22.2
gives the (weakly secure) Boneh-Boyen Signature Scheme.
Exercise 22.2.15. Show that if the Verify algorithm for weakly secure Boneh-Boyen
(m+a)1 (mod r)
signatures accepts (m, s) then s = g1
.
7 The generic group model assumes that any algorithm to attack the scheme is a generic algorithm for
the group G. This seems to be a reasonable assumption when using elliptic curves.
compute s = g1
and return s.
This problem may look rather strange at first sight since the value q can vary. The
problem is mainly of interest when q is polynomial in the security parameter (otherwise,
reading the problem description is not polynomial-time). Problems (or assumptions)
like this are sometimes called parameterised since there is a parameter (in this case q)
that determines the size for a problem instance. Such problems are increasingly used
in cryptography, though many researchers would prefer to have systems whose security
relies on more familiar assumptions.
There is evidence that the computational problem is hard. Theorem 5.1
p of Boneh
and Boyen [76] shows that a generic algorithm for q-SDH needs to make ( r/q) group
operations to have a good chance of success. The algorithms ofSection 21.5 can be used
to solve q-SDH. In particular, if q | (r 1) (and assuming
q < r) then Theorem 21.5.1
p
gives an algorithm to compute a with complexity O( r/q) group operations, which meets
the lower bound for generic algorithms.
Exercise 22.2.18. Show that one can use and e to verify that the input to an instance
of the q-SDH is correctly formed. Similarly, show how to use e to verify that a solution
to a q-SDH instance is correct.
489
Theorem 22.2.19. If the q-SDH problem is hard then the weak Boneh-Boyen signature
scheme is secure under a weak chosen message attack, where the adversary requests at
most q 1 signatures.
2
Proof: (Sketch) Let (g1 , g2 , g2a , g2a , . . . , g2a ) be a q-SDH instance and let A be an adversary against the scheme. Suppose A outputs messages m1 , . . . , mt with t < q.
Without loss of generality, t = q 1 (since one can just add dummy messages). The
(mi +a)1
for all 1 i t.
Qt
g1 = g1
Qt
but the problem is that we dont know a. The trick is to note that F (a) = i=1 (mi +a) =
Pt
i
i=0 Fi a is a polynomial in a with explicitly computable coefficients in Z/rZ. One can
F (a)
aF (a)
therefore compute g2 = g2 , g1 = (g2 ) and h = g2
using, for example,
g2 =
t
Y
ga
i=0
Fi
Similarly, one can compute signatures for all the messages mi . Hence, the simulator
provides to A the public key (g1 , g2 , h, z = e(g1 , g2 )) and all t signatures.
Eventually, A outputs a forgery (m, s) such that m 6= mi for 1 i t. If t < q 1
and q is polynomial in the security parameter then m is one of the dummy messages with
negligible probability (q 1 t)/r. One has
(m+a)1 (mod r)
s = g1
F (a)(m+a)1 (mod r)
= g1
The final trick is to note that the polynomial F (a) can be written as G(a)(a + m) + c
for some explicitly computable values G(a) (Z/rZ)[a] and c (Z/rZ) . Hence, the
rational function F (a)/(a + m) can be written as
F (a)
c
= G(a) +
.
a+m
a+m
(a+m)1 (mod r)
22.3
As mentioned earlier, there is a possibility that signatures could leak information about
the private key. Indeed, Nguyen and Regev [454] give such an attack on lattice-based
signatures.
The aim of this section is to present an attack due to Howgrave-Graham and Smart [298].
They determine the private key when given some signatures and given some bits of the
random values k (for example, due to a side-channel attack or a weak pseudorandom
number generator). The analysis of their attack was improved by Nguyen and Shparlinski [456, 457] (also see Chapter 16 of [554]).
(22.7)
with r/2l+1 z r/2l+1 . Then one can re-write any of the above linear equations in
the form
z ta u (mod r)
for some t, u Z that are known. In other words, we know
t, u = MSBl (at (mod r)) ,
(22.8)
22.4
There are many topics that are beyond the scope of this book. We briefly mention some
of them now.
One time signatures. These are fundamental in provable security and are used as
a tool in many theoretical public key schemes. However, since these are usually
realised without using the number theoretic structures presented in this book we do
not give the details. Instead, we refer the reader to Section 11.6 of [415], Section 12.5
of [331] and Section 7.5.1 of [588].
491
Online/offline signatures. The goal here is to design public key signature schemes
that possibly perform some (slow) precomputations when they are offline but
that generate a signature on a given message m extremely quickly. The typical
application is smart cards or other tokens that may have extremely constrained
computing power.
The first to suggest a solution to this problem seems to have been Schnorr in his
paper [518] on efficient signatures for smart cards. The Schnorr signature scheme
already has this functionality: if s0 = g k is precomputed during the idle time
of the device, then generating a signature on message m only requires computing
s1 = H(mks0 ) and s2 = k + as1 (mod r). The computation of s1 and s2 is relatively
fast since no group operations are performed.
A simple idea due to Girault [253] (proposed for groups of unknown order, typically
(Z/N Z) ) is to make Schnorr signatures even faster by omitting the modular reduction in the computation of s2 . In other words, k, a, s1 are all treated as integers
and s2 is computed as the integer k + as1 . To maintain security it is necessary to
take k to be bigger than 2l r (i.e., bigger than any possible value for the integer as1 ).
This idea was fully analysed (and generalised to groups of known order) by Girault,
Poupard and Stern [254].
Threshold signatures. The idea is to have a signature that can only be generated
by a collection of users. There is a large literature on this problem and we do not
attempt a full treatment of the subject here.
A trivial example is when two users hold additive shares a1 , a2 of a private key (in
other words, h = g a1 +a2 = g a1 g a2 is the public key). A Schnorr signature on message
m can be computed as follows: User i {1, 2} chooses a random integer 0 ki < r,
computes g ki , and sends it to the other. Both users can compute s0 = g k1 g k2 . User
i {1, 2} can then compute s1 = H(mks0 ) and s2,i = ki + ai s1 (mod r). The
signature is (s0 , s2,1 + s2,2 (mod r)).
Signatures with message recovery. Usually a signature and a message are sent
together. Signatures with message recovery allow some (or all) of the message
to be incorporated in the signature. The whole message is recovered as part of
the signature verification process. We refer to Section 11.5.4 of [415] for Elgamal
signatures with message recovery.
Undeniable signatures. These are public key signatures that can only be verified by
interacting with the signer (or with some other designated verifier). A better name
would perhaps be invisible signatures or unverifiable signatures. We refer to
Section 7.6 of [588].
Identity-Based Signatures. Identity-based cryptography is a concept introduced by
Shamir. The main feature is that a users public key is defined to be a function
of their identity (for example, their email address) together with some master
public key. Each user obtains their private key from a Key Generation Center that
possesses the master secret. One application of identity-based cryptography is to
simplify public-key infrastructures.
An identity-based signature is a public-key signature scheme for which it is not
necessary to verify a public key certificate on the signers key before verifying the
signature (though note that it may still be necessary to verify a certificate for the
master key of the system). There are many proposals in the literature, but we do
not discuss them in this section. One example is given in Section 24.6.3).
Chapter 23
23.1
Recall that security notions for public key encryption were given in Section 1.3.1. As we
have seen, the textbook Elgamal encryption scheme does not have OWE-CCA security,
since one can easily construct a related ciphertext whose decryption yields the original
message. A standard way to prevent such attacks is to add a message authentication code
493
494
495
algebraic group quotient) elements (g k , hk ) rather than just hk . This case is presented in
Section 10 of Cramer and Shoup [160]. As explained in Section 10.7 of [160], this variant
can yield a tighter security reduction.
23.1.1
Shoup introduced a formalism for public key encryption that has proved to be useful.
The idea is to separate the public key part of the system (i.e., the value c1 in Figure 23.1) from the symmetric part (i.e., (c2 , c3 ) in Figure 23.1). A key encapsulation
mechanism (or KEM) outputs a public key encryption of a random symmetric key (this
functionality is very similar to key transport; the difference being that a KEM generates a fresh random key as part of the algorithm). A data encapsulation mechanism
(or DEM) is the symmetric part. The name hybrid encryption is used to describe an
encryption scheme obtained by combining a KEM with a DEM.
More formally, a KEM is a triple of three algorithms (KeyGen, Encrypt and Decrypt)1
that depend on a security parameter . Instead of a message space, there is space K
of possible keys to be encapsulated. The randomised algorithm Encrypt takes as input
a public key and outputs a ciphertext c and a symmetric key K K (where is the
security parameter). One says that c encapsulates K. The Decrypt algorithm for a
KEM takes as input a ciphertext c and the private key and outputs a symmetric key K
(or if the decryption fails). The Encrypt algorithm for a DEM takes as input a message
and a symmetric key K and outputs a ciphertext. The Decrypt algorithm of a DEM takes
a ciphertext and a symmetric key K and outputs either or a message.
The simplest way to obtain a KEM from Elgamal is given in Figure 23.2. The DEM
corresponding to the hybrid encryption scheme in Section 23.1 takes as input m and K,
parses K as K1 kK2 , computes c2 = EncK1 (m) and c3 = MACK2 (c2 ) and outputs (c2 , c3 ).
KeyGen(): This is the same as standard Elgamal; see Figure 23.1.
Encrypt(h): Choose random 0 k < r and set c = g k and K = kdf(hk ). Return the
ciphertext c and the key K.
Decrypt(c, a): Return if c 6 hgi. Otherwise return kdf(ca ).
Figure 23.2: Elgamal KEM.
Shoup has defined an analogue of IND-CCA security for a KEM. We refer to Section
7 of Cramer and Shoup [160] for precise definitions for KEMs, DEMs and their security
properties, but give an informal statement now.
Definition 23.1.2. An IND-CCA adversary for a KEM is an algorithm A that plays
the following game: The input to A is a public key. The algorithm A can also query
a decryption oracle that will provide decryptions of any ciphertext of its choosing. At
some point the adversary requests a challenge, which is a KEM ciphertext c together
with a key K K . The challenger chooses K to be either the key corresponding to
the ciphertext c or an independently chosen random element of K (both cases with
probability 1/2). The game continues with the adversary able to query the decryption
oracle with any ciphertext c 6= c . Finally, the adversary outputs a guess for whether
K is the key corresponding to c , or a random key (this is the same as the real or
random security notion for key exchange in Section 20.5). Denote by Pr(A) the success
1 Sometimes
the names Encap and Decap are used instead of Encrypt and Decrypt.
496
probability of A in this game and define the advantage Adv(A) = | Pr(A) 1/2|. The
KEM is IND-CCA secure if every polynomial-time adversary has negligible advantage.
Theorem 5 of Section 7.3 of [160] shows that, if a KEM satisfies IND-CCA security
and if a DEM satisfies an analogous security property, then the corresponding hybrid
encryption scheme has IND-CCA security. Due to lack of space we do not present the
details.
23.1.2
We now sketch a proof that the Elgamal KEM of Figure 23.2 has IND-CCA security. The
proof is in the random oracle model. The result requires a strong assumption (namely, the
Strong-Diffie-Hellman or gap-Diffie-Hellman assumption). Do not be misled by the use
of the word strong! This computational problem is not harder than the Diffie-Hellman
problem. Instead, the assumption that this problem is hard is a stronger (i.e., less likely
to be true) assumption than the assumption that the Diffie-Hellman problem is hard.
Definition 23.1.3. Let G be a group of prime order r. The strong Diffie-Hellman
problem (Strong-DH) is: Given g, g a , g b G (where 1 a, b < r), together with a decision static Diffie-Hellman oracle (DStatic-DH oracle) Ag,ga (h1 , h2 ) (i.e., Ag,ga (h1 , h2 ) =
1 if and only if h2 = ha1 ), to compute g ab .
An instance generator for Strong-DH takes as input a security parameter , outputs
a group G and an element g of prime order r (with r > 22 ) and elements g a , g b G
where 1 a, b < r are chosen uniformly at random. As usual, we say that Strong-DH
is hard for the instance generator if all polynomial-time algorithms to solve the problem
have negligible success probability. The strong Diffie-Hellman assumption is that
there is an instance generator for which the Strong-DH problem is hard.
It may seem artificial to include access to a decision oracle as part of the assumption.
Indeed, it is a significant drawback of the encryption scheme that such an assumption is
needed for the security. Nevertheless, the problem is well-defined and seems to be hard
in groups for which the DLP is hard. A related problem is the gap Diffie-Hellman
problem: again the goal is to compute g ab given (g, g a , g b ), but this time one is given
a full DDH oracle. In some situations (for example, when using supersingular elliptic
or hyperelliptic curves) one can use pairings to provide a DDH oracle and the artificial
nature of the assumption disappears. The proof of Theorem 23.1.4 does not require a full
DDH oracle and so it is traditional to only make the Strong-DH assumption.
Theorem 23.1.4. The Elgamal KEM of Figure 23.2, with the key derivation function
replaced by a random oracle, is IND-CCA secure if the strong Diffie-Hellman problem is
hard.
Proof: (Sketch) Let (g, g a , g b ) be the Strong-DH instance and let Ag,ga be the DStaticDH oracle. Let B be an IND-CCA adversary against the KEM. We want to use B to
solve our Strong-DH instance. To do this we will simulate the game that B is designed to
play. The simulation starts B by giving it the public key (g, g a ). Note that the simulator
does not know the corresponding private key.
Since the key derivation function is now a random oracle, it is necessary for B to
query the simulator whenever it wants to compute kdf; this fact is crucial for the proof.
Indeed, the whole idea of the proof is that when B requests the challenge ciphertext we
reply with c = g b and with a randomly chosen K K . Since kdf is a random oracle,
the adversary can have no information about whether or not c encapsulates K unless
497
the query kdf((c )a ) is made. Finally, note that (c )a = g ab is precisely the value the
simulator wants to find.
More precisely, let E be the event (on the probability space of Strong-DH instances
a
ab
and random choices made
by B) that B queries kdf on (c ) = g . The advantage of B is
1
Adv(B) = Pr(B) 2 where Pr(B) is the probability that B wins the IND-CCA security
game. Note that Pr(B) = Pr(B|E) Pr(E) + Pr(B|E) Pr(E). When kdf is a random
oracle we have Pr(B|E) = 1/2. Writing Pr(B|E) = 1/2 + u for some 1/2 u 1/2
we have Pr(B) = 1/2 + u Pr(E) and so Adv(B) = |u| Pr(E). Since Adv(B) is assumed
to be non-negligible it follows that Pr(E) is non-negligible. In other words, a successful
adversary makes an oracle query on the value g ab with non-negligible probability.
To complete the proof it is necessary to explain how to simulate kdf and Decrypt
queries, and to analyse the probabilities. The simulator maintains a list of all queries
to kdf. The list is initially empty. Every time that kdf(u) is queried the simulator first
checks whether u G and returns if not. Then the simulator checks whether an entry
(u, K) appears in the list of queries and, if so, returns K. If no entry appears in the list
then use the oracle Ag,ga to determine whether u = g ab (i.e., if Ag,ga (g b , u) = 1). If this
is the case then g ab has been computed and the simulation outputs that value and halts.
In all other cases, a value K is chosen uniformly and independently at random from K ,
(u, K) is added to the list of kdf queries, and K is returned to B.
Similarly, when a decryption query on ciphertext c is made then one checks, for each
pair (u, K) in the list of kdf values, whether Ag,ga (c, u) = 1. If this is the case then return
K. If there is no such triple then return a random K K.
One can check that the simulation is sound (in the sense that Decrypt does perform
the reverse of Encrypt) and that the outputs of kdf are indistinguishable from random.
Determining the advantage of the simulator in solving the strong-DH problem is then
straightforward. We refer to Section 10.4 of Cramer and Shoup [160] for a careful proof
using the game hopping methodology (actually, that proof applies to the variant in
Exercise 23.1.5, but it is easily adapted to the general case).
Exercise 23.1.5. A variant of the scheme has the key derivation function applied to
the pair (g k , hk ) in Encrypt instead of just hk (respectively, (c1 , ca1 ) in Decrypt). Adapt
the security proof to this case. What impact does this have on the running time of the
simulator?
The IND-CPA security of the Elgamal KEM can be proved in the standard model
(the proof is analogous to the proof of Theorem 20.4.10) under the assumption of Definition 23.1.6. The IND-CCA security can also be proved in the standard model under an
interactive assumption called the oracle Diffie-Hellman assumption. We refer to Abdalla,
Bellare and Rogaway [1] for the details of both these results.
Definition 23.1.6. Let G be a group and kdf : G K a key derivation function.
The hash Diffie-Hellman problem (Hash-DH) is to distinguish the distributions
(g, g a , g b , kdf(g ab )) and (g, g a , g b , K) where K is chosen uniformly from K. The hash
Diffie-Hellman assumption is that there exist instance generators such that all polynomialtime algorithms for Hash-DH have negligible advantage.
Exercise 23.1.7. Let G be a group of prime order r and let kdf : G {0, 1}l be
a key derivation function such that log2 (r)/2 < l < log2 (r) and such that the output
distribution is statistically close to uniform. Show that DDH R Hash-DH R CDH.
An elegant variant of Elgamal, with IND-CCA security in the random oracle model
depending only on CDH rather than strong Diffie-Hellman, is given by Cash, Kiltz and
Shoup [120].
498
23.2
Cramer-Shoup Encryption
In their landmark paper [158], Cramer and Shoup gave an encryption scheme with a proof
of CCA security in the standard model. Due to lack of space we will only be able to give
a sketch of the security analysis of the scheme.
To motivate how they achieve their result, consider the proof of security for the Elgamal KEM (Theorem 23.1.4). The simulator uses the adversary to solve an instance of
the CDH problem. To do this one puts part of the CDH instance in the public key (and
hence, one does not know the private key) and part in the challenge ciphertext. To prove
CCA security we must be able to answer decryption queries without knowing the private
key. In the proof of Theorem 23.1.4 this requires a DDH oracle (to determine correct
ciphertexts from incorrect ones) and also the use of the random oracle model (to be able
to see some internal operations of the adversary).
In the random oracle model one generally expects to be able to prove security under an
assumption of similar flavour to CDH (see Theorem 20.4.11 and Theorem 23.1.4). On the
other hand, in the standard model one only expects2 to be able to prove security under
a decisional assumption like DDH (see Theorem 20.4.10). The insight of Cramer and
Shoup is to design a scheme whose security depends on DDH and is such that the entire
DDH instance can be incorporated into the challenge ciphertext. The crucial consequence
is that the simulator can now generate public and private keys for the scheme, run the
adversary, and be able to handle decryption queries.
The proof of security hinges (among other things) on the following result.
Lemma 23.2.1. Let G be a group of prime order r. Let g1 , g2 , u1 , u2 , h G with
(g1 , g2 ) 6= (1, 1). Consider the set
Xg1 ,g2 ,h = {(z1 , z2 ) (Z/rZ)2 : h = g1z1 g2z2 }.
Then #Xg1 ,g2 ,h = r. Let 0 k < r be such that u1 = g1k . If u2 = g2k then uz11 uz22 = hk
for all (z1 , z2 ) Xg1 ,g2 ,h . If u2 6= g2k then
{uz11 uz22 : (z1 , z2 ) Xg1 ,g2 ,h } = G.
Exercise 23.2.2. Prove Lemma 23.2.1.
Figure 23.3 presents the basic Cramer-Shoup encryption scheme. The scheme
requires a group G of prime order r and the message m is assumed to be an element of G.
Of course, it is not necessary to encode data into group elements, in practice one would
use the Cramer-Shoup scheme as a KEM; we briefly describe a Cramer-Shoup KEM at
the end of this section. The scheme requires a target-collision-resistant hash function
H : G3 Z/rZ (see Definition 3.1.2) chosen at random from a hash family.
Exercise 23.2.3. Show that the value v = ck dk computed in the Encrypt algorithm
does satisfy equation (23.1).
Exercise 23.2.4. Show that the tests u1 , u2 G and equation (23.1) imply that v G.
Exercise 23.2.5. Show that the final stage of Decrypt in the Cramer-Shoup scheme can
be efficiently performed using multiexponentiation as
1 rz2
m = eurz
u2 .
1
2 Unless
performing bit by bit encryption, which is a design approach not considered in this book.
499
v = u1 1
(23.1)
u1 1
20
= u30
1 u2 = 89
1 rz2
the ciphertext passes this test. One then computes eurz
u2
= 126 2426 11325 = 265.
1
Exercise 23.2.7. Using the same private keys as Example 23.2.6, which of the following
ciphertexts are valid, and for those that are, what is the corresponding message? Assume
that H(243, 83, 13) = 2.
(243, 83, 13, 97), (243, 83, 13, 89), (243, 83, 13, 49).
We now turn to the security analysis. Note that the condition of equation (23.1) does
not imply that the ciphertext (u1 , u2 , e, v) was actually produced by the Encrypt algorithm. However, we now show that, if u1 and u2 are not of the correct form, then the
probability that a randomly chosen v satisfies this condition is 1/r. Indeed, Lemma 23.2.8
shows that an adversary who can solve the discrete logarithm problem cannot even construct an invalid ciphertext that satisfies this equation with probability better than 1/r.
Lemma 23.2.8. Let G be a cyclic group of prime order r. Let g1 , g2 , c, d, v G and
Z/rZ be fixed. Suppose u1 = g1k and u2 = g2k+k where k 6 0 (mod r). Then
the probability, over all choices (x1 , x2 , y1 , y2 ) such that c = g1x1 g2x2 and d = g1y1 g2y2 , that
v = u1x1 +y1 u2x2 +y2 is 1/r.
500
Proof: Write g2 = g1w , c = g1w1 and d = g1w2 for some 0 w, w1 , w2 < r with w 6= 0. The
values c and d imply that x1 + wx2 = w1 and y1 + wy2 = w2 . Now ux1 1 +y1 ux2 2 +y2 equals
g1 to the power
k(x1 + y1 ) + (k + k )w(x2 + y2 ) =
=
The values w, w1 , w2 , k, k , are all uniquely determined but, by Lemma 23.2.1, x2 and
y2 can take any values between 0 and r 1. Hence, u1x1 +y1 ux2 2 +y2 equals any fixed value
v for exactly r of the r2 choices for (x2 , y2 ).
Theorem 23.2.9. The basic Cramer-Shoup encryption scheme is IND-CCA secure if
DDH is hard and if the function H is target-collision-resistant.
Proof: (Sketch) Let A be an adversary against the Cramer-Shoup scheme. Given a
DDH instance (g1 , g2 , u1 , u2 ) the simulator performs the KeyGen algorithm using the
given values g1 , g2 . Hence, the simulator knows the private key (x1 , x2 , y1 , y2 , z1 , z2 ). The
simulator runs A with this public key.
The algorithm A makes decryption queries and these can be answered correctly by the
simulator since it knows the private key. Eventually, A outputs two messages (m0 , m1 )
and asks for a challenge ciphertext. The simulator chooses a random b {0, 1}, computes
e = uz11 uz22 mb , = H(u1 , u2 , e) and v = ux1 1 +y1 ux2 2 +y2 . Here, and throughout this
proof, u1 and u2 denote the values in the DDH instance that was given to the simulator.
The simulator returns
c = (u1 , u2 , e, v).
to A. The adversary A continues to make decryption queries, which are answered as
above. Eventually, A outputs a bit b . The simulator returns valid as the answer to the
DDH instance if b = b and invalid otherwise.
The central idea is that if the input is a valid DDH tuple then c is a valid encryption
of mb and so A ought to be able to guess b correctly with non-negligible probability. On
the other hand, if the input is not a valid DDH tuple then, by Lemma 23.2.1, uz11 uz22
could be any element in G (with equal probability) and so c could be an encryption of
any message m G. Hence, given c , both messages m0 and m1 are equally likely and so
the adversary can do no better than output a random bit. (Of course, A may actually
output a fixed bit in this case, such as 0, but this is not a problem since b was randomly
chosen.)
There are several subtleties remaining in the proof. First, by Lemma 23.2.8, before the
challenge ciphertext has been received there is a negligible probability that a ciphertext
that was not produced by the Encrypt algorithm satisfies equation (23.1). Hence, the
simulation is correct with overwhelming probability. However, the challenge ciphertext is
potentially an example of a ciphertext that satisfies equation (23.1) and yet is not a valid
output of the algorithm Encrypt. It is necessary to analyse the probability that A can
somehow produce another ciphertext that satisfies equation (23.1) without just running
the Encrypt algorithm. The target-collision-resistance of the hash function enters at this
point (since a ciphertext of the form (u1 , u2 , e , v) such that H(u1 , u2 , e ) = H(u1 , u2 , e)
would pass the test). Due to lack of space we refer to Section 4 of [158] (for a direct
proof) or Section 6.2 of [160] (for a proof using game hopping).
A number of variants of the basic scheme are given by Cramer and Shoup [160] and
other authors. In particular, one can design a KEM based on the Cramer-Shoup scheme
(see Section 9 of [160]): just remove the component e of the ciphertext and set the encapsulated key to be K = kdf(g1k , hk ). An alternative KEM (with even shorter ciphertext)
501
was proposed by Kurosawa and Desmedt [356]. Their idea was to set K = kdf(v) where
v = ck dk for = H(u1 , u2 ). The KEM ciphertext is therefore just (u1 , u2 ) = (g1k , g2k ).
The security again follows from Lemma 23.2.8: informally, querying the decryption oracle
on badly formed (u1 , u2 ) gives no information about the key K.
Exercise 23.2.10. Write down a formal description of the Cramer-Shoup KEM.
Exercise 23.2.11. Show that an adversary against the Cramer-Shoup scheme who knows
any pair (z1 , z2 ) such that h = g1z1 g2z2 can decrypt valid ciphertexts.
Exercise 23.2.12. Suppose an adversary against the Cramer-Shoup scheme knows x1 , x2 , y1 , y2 .
Show how the adversary can win the OWE-CCA security game.
Exercise 23.2.13. Suppose the checks that u1 , u2 G are omitted in the Cramer-Shoup
cryptosystem. Suppose G Fp where l | (p 1) is a small prime (say l < 210 ). Suppose
the Decrypt algorithm uses the method of Exercise 23.2.5. Show how to determine, using
a decryption oracle, z1 (mod l) and z2 (mod l). Show that if p 1 has many such small
factors l then one could recover the values z1 and z2 in the private key of the CramerShoup scheme.
Cramer and Shoup [159] have shown how the above cryptosystem fits into a general framework for constructing secure encryption schemes using universal hash proof
systems. We do not have space to present this topic.
23.3
There are many variants of public key encryption (such as threshold decryption, serveraided decryption, etc). In this section we briefly sketch two important variants: homomorphic encryption and identity-based encryption.
23.3.1
Homomorphic Encryption
Let c1 , . . . , ck be ciphertexts that are encryptions under some public key of messages
m1 , . . . , mk . The goal of homomorphic encryption is for any user to be able to efficiently
compute a ciphertext that encrypts F (m1 , . . . , mk ) for any function F , given only a description of the function F and the ciphertexts c1 , . . . , ck . An encryption scheme that has
this property is called fully homomorphic.
Homomorphic encryption schemes allow third parties to perform computations on encrypted data. A common additional security requirement is that the resulting ciphertexts
do not reveal to a user with the private key what computation was performed (except its
result). A typical application of homomorphic encryption is voting: If users encrypt either
0 or 1 under a certain public key3 then a trusted third party can compute a ciphertext
that is an encryption of the sum of all the users votes, and then this ciphertext can be
decrypted to give the total number of votes. If the user with the private key never sees
the individual votes then they cannot determine how an individual user voted. A general
survey on homomorphic encryption that gives some references for applications is Fontaine
and Galand [207].
For many applications it is sufficient to consider encryption schemes that only allow
a user to compute F (m1 , . . . , mk ) for certain specific functions (for example, addition in
the voting application). In this section we focus on the case where F (m1 , m2 ) is a group
operation.
3 It
is necessary for users to prove that their vote lies in {0, 1}.
502
23.3.2
Identity-Based Encryption
503
private key
Qid = H1 (id)s
from the key generation center.
To Encrypt a message m {0, 1}l to the user with identity id one obtains the master
key (g, g ), computes Qid = H1 (id), chooses a random 1 k < r and computes c1 =
g k , c2 = m H2 (e(Qid , g )k ). The ciphertext is (c1 , c2 ).
To Decrypt the ciphertext (c1 , c2 ) the user with private key Qid computes
m = c2 H2 (e(Qid , c1 )).
This completes the description of the basic Boneh-Franklin scheme.
Exercise 23.3.6. Show that the Decrypt algorithm does compute the correct message
when (c1 , c2 ) are the outputs of the Encrypt algorithm.
Exercise 23.3.7. Show that the basic Boneh-Franklin scheme does not have IND-CCA
security.
The security model for identity-based encryption takes into account that an adversary
can ask for private keys on various identities. Hence, the IND security game allows an
adversary to output a challenge identity id and two challenge messages m0 , m1 . The
adversary is not permitted to know the private key for identity id (though it can receive
private keys for any other identities of its choice). The adversary then receives an encryption with respect to identity id of mb for randomly chosen b {0, 1} and must output a
guess for b.
Exercise 23.3.8. Suppose there is an efficiently computable group homomorphism :
G2 G1 . Show that if an adversary knows and can compute preimages of the hash
function H1 then it can determine the private key for any identity by making a private
key query on a different identity.
If the output of H2 is indistinguishable from random l-bit strings then it is natural
to believe that obtaining the message from a ciphertext under a passive attack requires
computing
e(Qid , c1 ) = e(Qsid , g k ) = e(Qid , g)sk .
Hence, it is natural that the security (at least, in the random oracle model) depends on
the following computational problem.
Definition 23.3.9. Let G1 , G2 and GT be groups of prime order r and let e : G1
G2 GT be a non-degenerate bilinear pairing. The bilinear Diffie-Hellman problem
(BDH) is: Given Q G1 , g G2 , g a and g b , where 1 a, b < r, to compute
e(Q, g)ab .
Exercise 23.3.10. Show that if one can solve CDH in G2 or in GT then one can solve
BDH.
As seen in Exercise 23.3.7, the basic Boneh-Franklin scheme does not have IND-CCA
security. To fix this one needs to provide some extra components in the ciphertext.
Alternatively, one can consider the basic Boneh-Franklin scheme as an identity-based
KEM: The ciphertext is c1 = g k and the encapsulated key is K = kdf(e(Qid , g )k ). In the
random oracle model (treating both H1 and kdf as random oracles) one can show that
the Boneh-Franklin identity-based KEM has IND-CCA security (in the security model for
504
identity-based encryption as briefly mentioned above) assuming that the BDH problem
is hard. We refer to Boneh and Franklin [80, 81] for full details and security proofs.
There is a large literature on identity-based encryption and its extensions, including
schemes that are secure in the standard model. We do not discuss these topics further in
this book.
Chapter 24
24.1
Figure 24.1 recalls the textbook RSA cryptosystem, which was already presented in
Section 1.2. We remind the reader that the main application of RSA encryption is to
transport symmetric keys, rather than to encrypt actual documents. For digital signatures
we always sign a hash of the message, and it is necessary that the hash function used in
signatures is collision-resistant.
In Section 1.3 we noted that the security parameter is not necessarily the same as
the bit-length of the RSA modulus. In this chapter it will be convenient to ignore this,
and use the symbol to denote the bit-length of an RSA modulus N . We always assume
that is even.
As we have seen in Section 1.2, certain security properties can only be satisfied if the
encryption process is randomised. Since the RSA encryption algorithm is deterministic
it follows that the message m used in RSA encryption should be obtained from some
randomised padding scheme. For example, if N is a 3072-bit modulus then the
507
508
In textbooks, the message space and ciphertext space are usually taken to be C =
M = (Z/N Z) , but it fits Definition 1.3.1 better (and is good training) to define them
to be C = {0, 1} and M = {0, 1}2 or {0, 1}1.
Encrypt(m, (N, e)): Assume that m M .
Compute c = me (mod N ) (see later for padding schemes).
24.1.1
509
As we have seen in Section 12.2, /2-bit probable primes can be found in expected time
of O(5 ) bit operations (or O(4+o(1) ) using fast arithmetic). One can make this provable
using the AKS method, with asymptotic complexity O(5+o(1) ) bit operations using fast
arithmetic. In any case, RSA key generation is polynomial-time. A more serious challenge
is to ensure that encryption and decryption (equivalently, signing and verification) are as
fast as possible.
Encryption and decryption are exponentiation modulo N and thus require O(log(N )M (log(N )))
bit operations, which is polynomial-time. For current RSA key sizes, Karatsuba multiplication is most appropriate, hence one should assume that M (log(N )) = log(N )1.58 .
Many of the techniques mentioned in earlier chapters to speed up exponentiation can be
used in RSA, particularly sliding window methods. Since e and d are fixed one can also
pre-compute addition chains to minimise the cost of exponentiation.
In practice the following two improvements are almost always used.
Small public exponents e (also called low-exponent RSA). Traditionally e = 3
was proposed, but these days e = 65537 = 216 + 1 is most common. Encryption
requires only 16 modular squarings and a modular multiplication.
Use the Chinese remainder theorem (CRT) to decrypt.2 Let dp e1 (mod p 1)
and dq e1 (mod q 1). These are called the CRT private exponents. Given
a ciphertext c one computes mp = cdp (mod p) and mq = cdq (mod q). The message
m is then computed using the Chinese remainder theorem (it is convenient to use
the method of Exercise 2.6.3 for the CRT).
For this system the private key is then sk = (p, q, dp , dq ). If we denote by T =
c log(N )M (log(N )) the cost of a single exponentiation modulo N to a power d N
then the cost using the Chinese remainder theorem is approximately 2c(log(N )/2)M (log(N )/2)
(this is assuming the cost of the Chinese remaindering is negligible). When using
Karatsuba multiplication this speeds up RSA decryption by a factor of approximately 3 (in other words, the new running time is a third of the old running time).
24.1.2
Variants of RSA
There has been significant effort devoted to finding more efficient variants of the RSA
cryptosystem. We briefly mention some of these now.
Example 24.1.4. (Multiprime-RSA3) Let p1 , . . . , pk be primes of approximately /k
bits and let N = p1 pk . One can use N as a public modulus for the RSA cryptosystem.
Using the Chinese remainder theorem for decryption has cost roughly the same as k
exponentiations to powers of bit-length /k and modulo primes of bit-length /k. Hence,
the speedup is roughly by a factor k/k 2.58 = 1/k 1.58 .
To put this in context, going from a single exponentiation to using the Chinese remainder theorem in the case of 2 primes gave a speedup by a factor of 3. Using 3 primes
gives an overall speedup by a factor of roughly 5.7, which is a further speedup of a factor
1.9 over the 2-prime case. Using 4 primes gives an overall speedup of about 8.9, which is
an additional speedup over 3 primes by a factor 1.6.
However, there is a limit to how large k can be, as the complexity of the elliptic curve
factoring method mainly depends on the size of the smallest factor of N .
2 This
3 This
idea is often credited to Quisquater and Couvreur [490] but it also appears in Rabin [491].
idea was proposed (and patented) by Collins, Hopkins, Langford and Sabin.
510
(24.1)
gives a linear equation in x modulo p. Note that computing mei (mod pi+1 ) in equation (24.1) is only efficient when e is small. If e is large then the Hensel lifting stage is no
1
r
faster than just computing ce (mod (p )) (mod pr ) directly.
The total cost is two full exponentiations to compute mp and mq , r 1 executions of
the Hensel lifting stage, plus one execution of the Chinese remainder theorem. Ignoring
everything except the two big exponentiations one has an algorithm whose cost is 2/(r +
1)2.58 times faster than naive textbook RSA decryption. Taking r = 2 this is about 9
times faster than standard RSA (i.e., about 1.6 times faster than using 3-prime RSA) and
taking r = 3 is about 18 times faster than standard RSA (i.e., about 2 times faster than
using 4-prime RSA).
Exercise 24.1.7. Let N = (220 + 7)3 (219 + 21) and let c = 474776119073176490663504
be the RSA encryption of a message m using public exponent e = 3. Determine the
message using the Takagi decryption algorithm.
Exercise 24.1.8. Describe and analyse the RSA cryptosystem using moduli of the form
N = pr q s . Explain why it is necessary that r 6= s.
Exercise 24.1.9. Write pseudocode for Takagi-RSA decryption.
Exercise 24.1.10. (Shamirs RSA for paranoids [544]) Let N = pq where q is much
larger than p (for example, p 2500 and q 24500 ). The assumption is that factoring
numbers of this form is much harder than factoring N = pq where p, q 2500 . Suppose
one is encrypting a (padded) message m such that 1 m < p and suppose we use public
exponent e > 2 log2 (N )/ log2 (p) (so that, typically, me N 2 ). Encryption is computing
c = me (mod N ) as usual. Shamirs observation is that one can decrypt by computing
m = cdp (mod p) where edp 1 (mod p 1).
How much faster is this than RSA decryption using CRT with a 5000-bit modulus if
the primes have equal size? If no padding scheme is used (i.e., every 1 m < p is a valid
message) give an adaptive (CCA1) attack on this scheme that yields the factorisation of
a users modulus.
24.1.3
511
We have presented textbook RSA above. This is unsuitable for practical applications for
many reasons. In practice, RSA should only be used with a secure randomised padding
scheme. Nevertheless, it is instructive to consider the security of textbook RSA with
respect to the security definitions presented earlier.
Exercise 1.3.4 showed that textbook RSA encryption does not have OWE-CCA security and Exercise 1.3.9 showed that textbook RSA signatures do not have existential
forgery security even under a passive attack. We recall one more easy attack.
Exercise 24.1.11. Show that one can use the Jacobi symbol to attack the IND-CPA
security of RSA encryption.
Despite the fact that RSA is supposed to be related to factoring, the security actually
relies on the following computational problem.
Definition 24.1.12. Let N, e be such that gcd(e, (N )) = 1. The RSA problem
(also called the e-th roots problem) is: Given y (Z/N Z) to compute x such that
xe y (mod N ).
It is clear that the RSA problem is not harder than factoring.
Lemma 24.1.13. The OWE-CPA security of textbook RSA is equivalent to the RSA
problem.
Proof: (Sketch) We show that an algorithm to break OWE-CPA security of textbook
RSA can be used to build an algorithm to solve the RSA problem. Let A be an adversary
against the OWE-CPA security of RSA. Let (N, e, c) be a challenge RSA problem instance.
If 1 < gcd(c, N ) < N then split N and solve the RSA problem. Otherwise, call the
adversary A on the public key (N, e) and offer the challenge ciphertext c. If A returns
the message m then we are done. If A returns (e.g., because the decryption of c does
not lie in M ) then replace c by cre (mod N ) for a random 1 < r < N and repeat. When
M = {0, 1}2 then, with probability at least 1/4, the reduction will succeed, and so one
expects to perform 4 trials. The converse is also immediate.
Exercise 24.1.14. Show that textbook RSA has selective signature forgery under passive
attacks if and only if the RSA problem is hard.
One of the major unsolved problems in cryptography is to determine the relationship
between the RSA problem and factoring. There is no known reduction of factoring to
breaking RSA. Indeed, there is some indirect evidence that breaking RSA with small
public exponent e is not as hard as factoring: Boneh and Venkatesan [87] show that an
efficient algebraic reduction4 from FACTOR to low-exponent RSA can be converted
into an efficient algorithm for factoring. Similarly, Coppersmith [140] shows that some
variants of the RSA problem, where e is small and only a small part of an e-th root x is
unknown, are easy (see Exercise 19.1.15).
Definition 24.1.15 describes some computational problems underlying the security of
RSA. The reader is warned that some of these names are non-standard.
Definition 24.1.15. Let S be a set of integers, for example S = N or S = {pq : p and q
are primes such that p < q < 2p}. We call the latter set the set of RSA moduli.
4 We do not give a formal definition. Essentially this is an algorithm that takes as input N , queries an
oracle for the RSA problem, and outputs a finite set of short algebraic formulae, one of which splits the
integer N .
512
513
Exercise 24.1.20. Suppose one has a perfect oracle A that takes as input pairs (N, g),
where N is an RSA modulus and g is uniformly chosen in (Z/N Z) , and returns the
order of g modulo N . Show how to use A to factor an RSA modulus N in expected
polynomial-time.
Exercise 24.1.21. The STRONG-RSA problem is: Given an RSA modulus N and
y N to find any pair (x, e) of integers such that e > 1 and
xe y (mod N ).
Give a reduction from STRONG-RSA to RSA. This shows that the STRONG-RSA problem is not harder than the RSA problem.6
We end with some cryptanalysis exercises.
Exercise 24.1.22. Let N = pq be an RSA modulus. Let A be an oracle that takes as
input an integer a and returns a (mod (N )). Show how to use A to factor N .
An interesting question is to study the bit security of RSA. More precisely, if (N, e) is
an RSA public key one considers an (not necessarily perfect) oracle which computes the
least significant bit of x given y = xe (mod N ). It can be shown that if one has such an
oracle then one can compute x. One approach is to use the binary Euclid algorithm; due
to lack of space we simply refer to Alexi, Chor, Goldreich and Schnorr [9] for details of
the method and a comprehensive list of references. A simpler approach, which does not
use the binary Euclid algorithm, was given by Fischlin and Schnorr [202]. A complete
analysis of the security of any bit (not just the least significant bit) was completed by
H
astad and N
aslund [278].
Exercise 24.1.23. Consider the following variant of RSA encryption. Alice has a public
key N and two public exponents e1 , e2 such that e1 6= e2 and gcd(ei , (N )) = 1 for i = 1, 2.
To encrypt to Alice one is supposed to send c1 = me1 (mod N ) and c2 = me2 (mod N ).
Show that if gcd(e1 , e2 ) = 1 then an attacker can determine the message given the public
key and a ciphertext (c1 , c2 ).
Exercise 24.1.24. Consider the following signature scheme based on RSA. The public
key is an integer N = pq, an integer e coprime to (N ) and an integer a such that
gcd(a, N ) = 1. The private key is the inverse of e modulo (N ), as usual. Let H be a
collision-resistant hash function. The signature on a message m is an integer s such that
se aH(m) (mod N )
where H(m) is interpreted as an integer. Explain how the signer can generate signatures
efficiently. Find a known message attack on this system that allows an adversary to make
selective forgery of signatures.
24.2
The textbook Rabin cryptosystem [491] is given in Figure 24.2. Rabin is essentially RSA
with the optimal choice of e, namely e = 2.7 As we will see, the security of Rabin is
6 The word strong is supposed to indicate that the assumption that STRONG-RSA is hard is a
stronger assumption than the assumption that RSA is hard. Of course, the computational problem is
weaker than RSA, in the sense that it might be easier to solve STRONG-RSA than RSA.
7 The original paper [491] proposed encryption as E
N,b (x) = x(x + b) (mod N ) for some integer b.
However, there is a gain in efficiency with no loss of security by taking b = 0.
514
more closely related to factoring than RSA. We first have to deal with the problem that if
N = pq where p and q are distinct primes then squaring is a four-to-one map (in general)
so it is necessary to have a rule to choose the correct solution in decryption.
Lemma 24.2.1. Suppose p and q are primes such that p q 3 (mod 4). Let N = pq
x
) = 1. Then either x or N x is a square modulo N .
and 1 < x < N be such that ( N
Exercise 24.2.2. Prove Lemma 24.2.1.
Definition 24.2.3. Let p q 3 (mod 4) be primes. Then N = pq is called a Blum
integer.
KeyGen(): Generate two random /2-bit primes p and q such that p q 3 (mod 4)
and set N = pq. Output the public key pk = N and the private key sk = (p, q).
The message space and ciphertext space depend on the redundancy scheme (suppose
for the moment that they are C = M = (Z/N Z) ).
Encrypt(m, N ): Compute c = m2 (mod N ) (with some redundancy or padding
scheme).
Decrypt(c, (p, q)): We want to compute c (mod N ), and this is done by the following
method: Compute mp = c(p+1)/4 (mod p) and mq = c(q+1)/4 (mod q) (see Section 2.9).
Test that m2p c (mod p) and m2q c (mod q), and if not then output . Use the
Chinese remainder theorem (Exercise 2.6.3) to obtain four possibilities for m (mod N )
such that m mp (mod p) and m mq (mod q). Use the redundancy (see later) to
determine the correct value m and return if there is no such value.
m
) = 1 (possibly by addingsome randomness). Then
Sign(m, (p, q)): Ensure that ( N
either m or N m is a square modulo N . Compute s = m (mod N ) by computing (m)(p+1)/4 (mod p), (m)(q+1)/4 (mod q) and applying the Chinese remainder
theorem.
24.2.1
To ensure that decryption returns the correct message it is necessary to have some redundancy in the message, or else to send some extra bits. We now describe three solutions
to this problem.
Redundancy in the message for Rabin: For example, insist that the least
significant l bits (where l > 2 is some known parameter) of the binary string m are
all ones. (Note 8.14 of [415] suggests repeating the last l bits of the message.) If l is
big enough then it is unlikely that two different choices of square root would have
the right pattern in the l bits.
515
516
Exercise 24.2.4. Prove all the unproved claims in the above discussion of the Williams
redundancy scheme.
Exercise 24.2.5. Let N = (259 + 21)(220 + 7). The three ciphertexts below are Rabin
encryptions for each of the three redundancy schemes above (in the first case, l = 5).
Determine the corresponding message in each case.
273067682422 ,
(309135051204, 1, 0) ,
17521752799.
24.2.2
Variants of Rabin
Let i be the index such that |ri | < N < |ri1 |. Then ri ui s (mod N ) and so
ri2 u2i s2 u2i H(m) (mod N ).
One can therefore send ui as the signature. Verification is to compute w = u2i H(m) (mod N )
and check that w is a perfect square in Z (e.g., using the method of Exercise
2.4.9 or Exercise 2.2.8). Part 6 of Lemma 2.3.3 states |ri1 ui | N and so |ui | < N . Hence, this
approach compresses the signature to half the size.
517
Example 24.2.10. (Bernstein; Coron and Naccache) Another way to compress Rabin
signatures is to send the top half of the bits of s. In other words, the signature is
s = s/2/2 if N < 2 . To verify s one uses Coppersmiths method to find the small
solution x to the equation
(s 2/2 + x)2 H(m) 0 (mod N ).
Verification of this signature is much slower than the method of Example 24.2.9.
24.2.3
Since the Rabin cryptosystem involves squaring it is natural to assume the security is
related to computing square roots modulo N , which in turn is equivalent to factoring.
Hence, an important feature of Rabin compared with RSA is that the hardness of breaking
Rabin can be shown to be equivalent to factoring.
Definition 24.2.11. Let S = N or S = {pq : p, q 3 (mod 4), primes}. The computational problem SQRT-MOD-N is: Given N S and y Z/N Z to output if y is not
a square modulo N , or a solution x to x2 y (mod N ).
Lemma 24.2.12. SQRT-MOD-N is equivalent to FACTOR.
Proof: Suppose we have a FACTOR oracle and are given a pair (N, y). Then one can
use the oracle to factor N and then solve SQRT-MOD-N using square roots modulo p and
Hensel lifting and the Chinese remainder theorem. This reduction is polynomial-time.
Conversely, suppose we have a SQRT-MOD-N oracle A and let N be given. First, if
N = pe then we can factor N in polynomial time (see Exercise 2.2.9). Hence we may now
assume that N has at least two distinct prime factors.
Choose a random x ZN and set y = x2 (mod N ). Call A on y to get x . We have
2
x (x )2 (mod N ) and there are at least four possible solutions x . All but two of
these solutions will give a non-trivial value of gcd(x x , N ). Hence, since x was chosen
randomly, there is probability at least 1/2 that we can split N . Repeating this process
splits N (the expected number of trials is at most 2). As in Lemma 24.1.17 one can
repeat the process to factor N in O(log(N )) iterations. The entire reduction is therefore
polynomial-time.
An important remark about the above proof is that the oracle A is not assumed to
output a random square root x of y. Indeed, A could be deterministic. The randomness
comes from the choice of x in the reduction.
Exercise 24.2.13. Consider the computational problem FOURTH-ROOT: Given y ZN
compute a solution to x4 y (mod N ) if such a solution exists. Give reductions that show
that FOURTH-ROOT is equivalent to FACTOR in the case N = pq with p, q distinct
odd primes.
It is intuitively clear that any algorithm that breaks the one-way encryption property
(or selective signature forgery) of Rabin under passive attacks must compute square roots
modulo N . We have seen that SQRT-MOD-N is equivalent to FACTOR. Thus we expect
breaking Rabin under passive atacks to be as hard as factoring. However, giving a precise
security proof involves taking care of the redundancy scheme.
Theorem 24.2.14. Let N = pq, where p q 3 (mod 4) are primes, and define SN,l =
{1 x < N : gcd(x, N ) = 1, 2l | (x + 1)}. Assume the probability, over x ZN SN,l ,
that there exists y SN,l with x 6= y but x2 y 2 (mod N ), is 1/2l1 . Then breaking
518
the one-way encryption security property of the Rabin cryptosystem with the redundancy
in the message redundancy scheme where l = O(log(log(N ))) under passive attacks is
equivalent to factoring Blum integers.
Theorem 24.2.15. Breaking the one-way encryption security property of the Rabin cryptosystem with the extra bits redundancy scheme under passive attacks is equivalent to
factoring products N = pq of primes p q 3 (mod 4).
Theorem 24.2.16. Breaking the one-way encryption security property of the Rabin cryptosystem with the Williams redundancy scheme under passive attacks is equivalent to factoring products N = pq of primes p q 3 (mod 4), p 6 q (mod 8).
Note that Theorem 24.2.14 gives a strong security guarantee when l is small, but in
that case decryption failures are frequent. Indeed, there is no choice of l for the Rabin
scheme with redundancy in the message that provides both a tight reduction to factoring
and negligible probability of decryption failure.
We prove the first and third of these theorems and leave Theorem 24.2.15 as an
exercise.
Proof: (Theorem 24.2.14) Let A be an oracle that takes a Rabin public key N and
a ciphertext c (with respect to the redundancy in the message padding scheme) and
returns either the corresponding message m or an invalid ciphertext symbol .
Choose a random x ZN such that neither x nor N x satisfy the redundancy scheme
(i.e., the l least significant bits are not all 1). Set c = x2 (mod N ) and call the oracle A
on c. The oracle A answers with either a message m or .
According to the (heuristic) assumption in the theorem, the probability that exactly
one of the two (unknown) square roots of c modulo N has the correct l least significant
bits is 2(l1) . If this is the case then calling the oracle A on c will output a value m such
that, writing x = 2l m + (2l 1), we have (x )2 x2 (mod N ) and x 6 x (mod N ).
Hence gcd(x x, N ) will split N .
We expect to require approximately 2l1 trials before factoring N with this method.
Hence, the reduction is polynomial-time if l = O(log(log(N ))).
Proof: (Proof of Theorem 24.2.16; following Williams [629]) Let A be an oracle that
takes a Rabin public key N and a ciphertext c (with respect to the Williams redundancy
scheme) and returns either the corresponding message m or an invalid ciphertext symbol
.
x
) = 1, e.g., let x = 2z 2 (mod N ) for
Choose a random integer x such that ( N
2
random z (Z/N Z) . Set c = x (mod N ) and call A on (N, c). The oracle computes
the unique even integer 1 < x < N such that (x )2 c (mod N ) and ( xN ) = 1. The
oracle then attempts to decode x to obtain the message. If 8 x (which happens with
probability 3/4) then decoding succeeds and the corresponding message m is output by
the oracle. Given m we can recover the value x as 2(2m + 1) or 4(2m + 1), depending on
If 8 | x then the oracle outputs so we compute c = c24 (mod N ) and call the
oracle on c . The even integer x computed by the oracle is equal to x /4 and so the
above argument may apply. In extremely rare cases one might have to repeat the process
1
2 log2 (N ) times, but the expected number of trials is constant.
Exercise 24.2.17. Prove Theorem 24.2.15.
Exercise 24.2.18. Prove Theorem 24.2.14 when the message space is {0, 1}l2.
The above theorems show that the hardness guarantee for the Rabin cryptosystem is
often stronger than for the RSA cryptosystem (at least, under passive attacks). Hence
519
the Rabin cryptosystem is very attractive: it has faster public operations and also has a
stronger security guarantee than RSA. On the other hand, the ideas used in the proofs of
these theorems can also be used to give adaptive (CCA) attacks on the Rabin scheme that
allow the attacker to determine the private key (i.e., the factorisation of the modulus). In
comparison, a CCA attack on textbook RSA only decrypts a single message rather than
computes the private key.
Example 24.2.19. We describe a CCA attacker giving a total break of Rabin with
redundancy in the message.
As in the proof of Theorem 24.2.14 the adversary chooses a random x ZN such that
neither x nor N x satisfy the redundancy scheme (i.e., the l least significant bits are
not all 1). Set c = x2 (mod N ) and call the decryption oracle on c. The oracle answers
with either a message m or . Given m one computes x such that gcd(x x, N ) splits
N.
Exercise 24.2.20. Give CCA attacks giving a total break of Rabin when using the other
two redundancy schemes (extra bits and Williams).
As we have seen, the method to prove that Rabin encryption has one-way security
under a passive attack is also the method to give a CCA attack on Rabin encryption. It
was remarked by Williams [629] that such a phenomenon seems to be inevitable. This
remark has been formalised and discussed in detail by Paillier and Villar [474].
Exercise 24.2.21. Generalise Rabin encryption to N = pq where p q 1 (mod 3)
and encryption is c = m3 (mod N ). How can one specify redundancy? Is the security
related to factoring in this case?
Exercise 24.2.22. Consider the following public key cryptosystem related to Rabin: A
users public key is a product N = pq where p and q are primes congruent to 3 modulo
4. To encrypt a message 1 < m < N to the user compute and send
c1 = m2 (mod N )
and
c2 = (m + 1)2 (mod N ).
24.2.4
We now give some other computational problems in algebraic groups modulo N that are
related to factoring.
Exercise 24.2.23. Let N = pq N be a product of two large primes p q 3 (mod 4)
and let G = {x2 : x (Z/N Z) }. Let A be an oracle for CDH in G (i.e., A(g, g a , g b ) =
g ab ). Show how to use A to factor N .
Exercise 24.2.24. Let N = pq. Show how to factor N when given M = (p + 1)(q + 1).
More generally, given N = pq and M = k (p)k (q) one can split N as follows: Write
F1 (x, y) = xy N and F2 (x, y) = k (x)k (y) M . One then takes the resultant of
F1 (x, y) and F2 (x, y) to get a polynomial G(x). Note that G(x) has p as a root, so one
can find p by taking real roots of G(x) to high precision.
Exercise 24.2.25. Let N = pq = 1125907426181141 and M = (p2 + p + 1)(q 2 + q + 1) =
1267668742445499725931290297061. Determine p and q using resultants as above.
520
Exercise 24.2.26. Let N = pq where p q 1 (mod 4) are primes. Recall that the
torus T2 (Z/N Z) has order (p + 1)(q + 1). Let G = {g 2 : g T2 (Z/N Z)}. Let A be an
oracle for CDH in G. Use the method of Exercise 24.2.23 to factor N using A.
Exercise 24.2.27. Let N = pq where p and q are odd primes and let E : y 2 = x3 +a4 x+a6
be an elliptic curve. Suppose A is an oracle that, on input (P, [a]P, [b]P ), where P has
odd order, outputs [ab]P in hP i. Explain why one can not immediately factor N using
this oracle. Consider now an oracle A, taking input (a4 , a6 , P, [a]P, [b]P ) where P lies on
the elliptic curve y 2 = x3 + a4 x + a6 modulo N and P has odd order, that outputs [ab]P .
Show how to use A to factor N .
There are two approaches to using information about #E(Z/N Z) to split N . One is
more suitable when one has an oracle that computes #E(Z/N Z) and the other is more
suitable when E is fixed.
Example 24.2.28. (Kunihiro and Koyama [355]) Let N = pq be a product of two primes.
Let A be an oracle that takes as input (N, a4 , a6 ) and returns M = #E(Z/N Z) where
E : y 2 = x3 + a4 x + a6 .
Given the oracle A one can split N using exactly the same method as Lemma 24.1.17.
First choose a random elliptic curve E together with a point P on it modulo N . Use
the oracle A to compute M = #E(Z/N Z). Now, find small prime factors l of M (such
as l = 2) and compute [M/l]P = (x : y : z) in projective coordinates. There is a good
chance that l divides both #E(Fp ) and #E(Fq ) and that P (mod p) has order divisible
by l but P (mod q) does not. Hence, gcd(z, N ) splits N .
Exercise 24.2.29. (Martn Mollev, Morillo and Villar [397]) Use the method of Example 24.2.28 to show how to factor N given an oracle A that takes as input (N, a4 , a6 , xP , yP )
and returns the order of the point P = (xP , yP ) E(Z/N Z).
Example 24.2.30. Let N = pq be a product of two primes. Let E : y 2 = x3 + a4 x + a6
be an elliptic curve modulo N such that E is not supersingular modulo p or q. Let
M = #E(Z/N Z) be given.
Now choose a random integer 1 xP < N . There may not be a point on E(Z/N Z)
with x-coordinate xP . Indeed, we hope that there is not. Then there is a quadratic
twist E : uY 2 = X 3 + a4 X + a6 of E with a point P = (xP , yP ) E (Z/N Z). With
probability 1/2 we have #E(Fp ) = #E (Fp ) but #E(Fp ) 6= #E (Fp ) (or vice versa). It
is not necessary to compute yP or to determine E . Using x-coordinate only arithmetic
on E one can compute the projective representation (xQ : zQ ) for the x-coordinate of
Q = [M ](xP , yP ) on E . Then gcd(zQ , N ) splits N .
Exercise 24.2.31. Adapt the methods in Examples 24.2.28 and 24.2.30 to give alternative methods to factor N = pq given #T2 (Z/N Z) or #T6 (Z/N Z).
24.3
Homomorphic Encryption
Homomorphic encryption was defined in Section 23.3.1. We first remark that the textbook
RSA scheme is homomorphic for multiplication modulo N : If c1 me1 (mod N ) and c2
me2 (mod N ) then c1 c2 (m1 m2 )e (mod N ). Indeed, this property is behind the CCA
attack on textbook RSA encryption. Padding schemes can destroy this homomorphic
feature.
Exercise 24.3.1. Show that textbook Rabin encryption is not homomorphic for multiplication when using any of the redundancy schemes of Section 24.2.1.
521
We now give a scheme that is homomorphic for addition, and that allows a much
larger range of values for the message compared with the scheme in Exercise 23.3.5.
Example 24.3.2. (Paillier [471]) Let N = pq be a users public key. To encrypt a
message m Z/N Z to the user choose a random integer 1 < u < N (note that, with
overwhelming probability, u (Z/N Z) ) and compute the ciphertext
c = (1 + N m)uN (mod N 2 ).
To decrypt compute
c(N ) 1 + (N )N m (mod N 2 )
and hence determine m (mod N ) (this requires multiplication by (N )1 (mod N )).
The homomorphic property is: if c1 and c2 are ciphertexts encrypting m1 and m2
respectively, then
c1 c2 (1 + N (m1 + m2 ))(u1 u2 )N (mod N 2 )
encrypts m1 + m2 (mod N ).
Exercise 24.3.3. Verify the calculations in Example 24.3.2.
As always, one cannot obtain CCA secure encryption using a homomorphic scheme.
Hence, one is only interested in passive attacks. To check whether or not a Paillier ciphertext c corresponds to a specific message m is precisely solving the following computational
problem.
Definition 24.3.4. Let N = pq. The composite residuosity problem is: Given
y Z/N 2 Z to determine whether or not y uN (mod N 2 ) for some 1 < u < N .
Exercise 24.3.5. Show that the Paillier encryption scheme has IND-CPA security if and
only if the composite residuosity problem is hard.
Exercise 24.3.6. Show that composite residuosity is not harder than factoring.
Exercise 24.3.7. Show how to use the Chinese remainder theorem to speed up Paillier
decryption.
Encryption using the Paillier scheme is rather slow, since one needs an exponentiation
to the power N modulo N 2 . One can use sliding windows for this exponentiation, though
since N is fixed one might prefer to use an addition chain optimised for N . Exercises 24.3.8
and 24.3.9 suggest variants with faster encryption. The disadvantage of the scheme in
Exercise 24.3.8 is that it requires a different computational assumption. The disadvantage
of the scheme in Exercise 24.3.9 is that it is no longer homomorphic.
Exercise 24.3.8. Consider the following efficient variant of the Paillier cryptosystem.
The public key of a user consists of N and an integer h = uN (mod N 2 ) where 1 < u < N
is chosen uniformly at random. To encrypt a message m to the user, choose a random
integer 0 x < 2k (e.g., with k = 256) and set
c (1 + N m)hx (mod N 2 ).
State the computational assumption underlying the IND-CPA security of the scheme.
Give an algorithm to break the IND-CPA security that requires O(2k/2 ) multiplications
modulo N 2 . Use multi-exponentiation to give an even more efficient variant of the Paillier
cryptosystem, at the cost of even larger public keys.
522
24.4
The goal of this section is to briefly describe a number of relatively straightforward attacks
on the textbook RSA and Rabin cryptosystems. These attacks can all be prevented if one
uses a sufficiently good padding scheme. Indeed, by studying these attacks one develops
a better idea of what properties are required of a padding scheme.
24.4.1
The H
astad Attack
We now present an attack that can be mounted on the RSA or Rabin schemes in a multiuser situation. Note that such attacks are not covered by the standard security model for
encryption as presented in Chapter 1.
8 This scheme was proposed by Choi, Choi and Won at ICISC 2001 and an attack was given by Sakurai
and Takagi at ACISP 2002.
523
Example 24.4.1. Suppose three users have RSA public keys N1 , N2 , N3 and all use
encryption exponent e = 3. Let 0 < m < min{N1 , N2 , N3 } be a message. If m is
encrypted to all three users then an attacker can determine m from the three ciphertexts
c1 , c2 and c3 as follows: The attacker uses the Chinese remainder theorem to compute
1 < c < N1 N2 N3 such that c m3 (mod Ni ) for 1 i 3. It follows that c = m3 over Z
and so one can determine m using root finding algorithms.
This attack is easily prevented by using randomised padding schemes (assuming that
the encryptor is not so lazy that they re-use the same randomness each time). Nevertheless, this attack seems to be one of the reasons why modern systems use e = 65537 = 216 +1
instead of e = 3.
Exercise 24.4.2. Show that the H
astad attack applies when the same message is sent
using textbook Rabin encryption (with any of the three redundancy schemes) to two
users.
Exercise 24.4.3. Two users have Rabin public keys N1 = 144946313 and N2 = 138951937.
The same message m is encrypted using the extra bits padding scheme to the two users,
giving ciphertexts
C1 = (48806038, 1, 1) and C2 = (14277753, 1, 1).
Use the H
astad attack to find the corresponding message.
24.4.2
Algebraic Attacks
We already discussed a number of easy algebraic attacks on textbook RSA, all of which
boil down to exploiting the multiplicative property
me1 me2 (m1 m2 )e (mod N ).
We also noted that, since textbook RSA is deterministic, it can be attacked by trying all
messages. Hence, if one knows that 1 m < 2k (for example, if m is a k-bit symmetric
key) then one can attack the system in at most
2k exponentiations modulo N . We now
k
show that one can improve this to roughly 2 exponentiations in many cases.
Exercise 24.4.4. (Boneh, Joux, Nguyen [82]) Suppose c = me (mod N ) where 1 m <
2k . Show that if m = m1 m2 for two integers 1 < m1 , m2 < B then one can determine m
in O(B) exponentiations modulo N . If B = 2k/2+ then the probability that m splits in
this way is noticeable.
24.4.3
Desmedt-Odlyzko Attack
r
Y
sfi i .
i=1
This attack is not feasible if messages are random elements between 1 and N (as the
probability of smoothness is usually negligible) but it can be effective if messages in the
system are rather small.
524
Exercise 24.4.5. Let N = 9178628368309 and e = 7 be an RSA public key. Suppose one learns that the signatures of 2, 3 and 5 are 872240067492, 6442782604386 and
1813566093366 respectively. Determine the signatures for messages m = 6, 15, 12 and 100.
An analogous attack applies to encryption: Ask for decryptions of the first rQprimes
r
(treating them as ciphertexts) and then, given a challenge ciphertext c, if c i=1 pei i
then one can work out the decryption of c. Since ciphertexts (even of small messages) are
of size up to N this attack is usually not faster than factoring the modulus.
This idea, together with a number of other techniques, has been used by Coron,
Naccache, Tibouchi and Weinmann [152] to attack real-world signature proposals.
24.4.4
This attack is due to Franklin and Reiter.9 Consider textbook RSA with small exponent
e or textbook Rabin (e = 2). Suppose we obtain ciphertexts c1 and c2 (with respect to the
same public key (N, e)) for messages m and m + a for some known integer a. Then m is a
common root modulo N of the two polynomials F1 (x) = xe c1 and F2 (x) = (x + a)e c2
(in the case of Rabin we may have polynomials like F1 (x) = (2l (x + 1) 1)2 c1 or
F1 (x) = (2(2x + 1))2 c1 ). Hence one can run Euclids algorithm on F1 (x) and F2 (x)
in (Z/N Z)[x] and this will either lead to a factor of N (since performing polynomial
division in (Z/N Z)[x] involves computing inverses modulo N ) or will output, with high
probability, a linear polynomial G(x) = x m.
Euclids algorithm for polynomials of degree e has complexity O(e2 M (log(N ))) or
O(M (e) log(e)M (log(N ))) bit operations. Hence, this method is feasible only when e is
rather small (e.g., e < 230 ).
Exercise 24.4.6. Extend the Franklin-Reiter attack to ciphertexts c1 and c2 (again, for
the same public key) where c1 is an encrypion of m and c2 is an encryption of am + b for
known integers a and b.
Exercise 24.4.7. Let N = 2157212598407 and e = 3. Suppose we have ciphertexts
c1 = 1429779991932 and c2 = 655688908482
such that c1 is the encryption of m and c2 is the encryption of m + 210 . Determine the
message m.
These ideas have been extended by Coppersmith, Franklin, Patarin and Reiter [143].
Among other things they study how to break related encryptions for any polynomial
relation by using resultants (see Exercise 24.4.8).
Exercise 24.4.8. Let (N, e) be an RSA key. Suppose one is given c1 = me1 (mod N ),
c2 = me2 (mod N ) and a polynomial P (x, y) Z[x, y] such that P (m1 , m2 ) 0 (mod N ).
Let d be a bound on the total degree of P (x, y). Show how to compute m1 and m2 in
O((d + e)3 d2 M (log(N ))) bit operations.
24.4.5
The aim of this section is to present a simple padding scheme, often called fixed pattern
padding for RSA. We then sketch why this approach may not be sufficient to obtain
RSA signatures secure against adaptive attackers. These ideas originate in the work
9 The
525
of De Jonge and Chaum and later work by Girault and Misarsky. We present the more
recent attacks by Brier, Clavier, Coron and Naccache [106]. An attack on RSA encryption
with fixed padding is given in Section 19.4.1.
Example 24.4.9. Suppose we are using moduli of length 3072 bits and that messages
(or message digests) m are of length at most 1000 bits.
The padding scheme uses a fixed value P = 23071 and the signature on the message
digest m (such that 0 m < 21000 ) is
s = (P + m)d (mod N ).
The verifier computes se (mod N ) and checks it is of the correct form P + m with
0 m < 21000 .
The following method (from [106]) forges signatures if messages are roughly N 1/3 in
size. We assume that a signing oracle is available (we assume the signing oracle will
only generate signatures if the input is correctly padded) and that a hash function is not
applied to the messages. Suppose m is the target message, so we want to compute the
d-th power of z = P + m. The idea is to find small values u, v, w such that
z(z + u) (z + v)(z + w) (mod N ).
(24.2)
1
1
92
ri
1043957
524791
519166
5625
1666
si
1
0
1
1
93
ti
0
1
1
2
185
526
This gives the solution 185z 1666 (mod N ) and | 185| N 1/3 . So set s = 185
and r = 1666. We try to factor r and are lucky that 1666 = 2 72 17. So choose v = 34
and w = 49. Finally, choose u = v + w 185 = 102. One can check that
z(z + u) (z + v)(z + w) (mod N )
and that z + u, z + v and z + w are all between P and P + 210 . Hence, if one obtains
signatures s1 , s2 , s3 on m + u = 401, m + v = 537 and m + w = 552 then one has the
signature on z as s2 s3 s1
1 (mod N ).
The success of this attack depends on the cost of factoring r and the probability
that it can be written as a product of integers of similar size. Hence, the attack has
subexponential complexity. For fixed m the attack may not succeed (since r might not
factor as a product of integers of the required size). On the other hand, if m can vary
a little (this is now more like an existential forgery) then the attack should succeed. A
method for existential forgery that does not require factoring is given in Example 24.4.13.
Exercise 24.4.11. Give a variant of the above attack for the case where messages can
be of size N 1/2 and for which it is only necessary to obtain signatures on two messages.
Exercise 24.4.12. One could consider affine padding Am + B instead of P + m, where
A and B are fixed integers and m is small. Show that, from the point of view of attacks,
the two padding schemes are equivalent.
Example 24.4.13. We show sketch the existential forgery from [106]. As before we seek
messages m1 , . . . , m4 of size N 1/3 such that
(P + m1 )(P + m2 ) (P + m3 )(P + m4 ) (mod N ).
Writing m1 = x + t, m2 = y + t, m3 = t and m4 = x + y + z + t the equation is seen to be
equivalent to P z xy tz (mod N ). One again uses Euclid to find s N 1/3 , r N 2/3
such that P s r (mod N ). One sets z = s and then wants to find x, y, t such that
xy = r+tz. To do this choose a random integer N 1/3 < y < 2N 1/3 such that gcd(y, z) = 1
and set t z 1 r (mod y). One then easily solves for the remaining values and one can
check that the mi are roughly of the right size.
For further details and results we refer to Brier, Clavier, Coron and Naccache [106]
and Lenstra and Shparlinski [371].
This idea, together with other techniques, has been used to cryptanalyse the ISO/IEC
9796-1 signature standard with great success. We refer to Coppersmith, Coron, Grieu,
Halevi, Jutla, Naccache and Stern [142].
24.4.6
Example 24.4.14. (Bleichenbacher) Consider a padding scheme for RSA signatures with
e = 3 that is of the following form.
00 01
FF FF FF
Special block
H(m)
In other words, to verify a signature s one computes s3 (mod N ) and checks if the
resulting integer corresponds to a binary string of the above form.
Suppose now that the verification algorithm parses the binary string from the left
hand side (most significant bit) and does not check that H(m) sits in the least significant
bits (this was the case for some padding schemes in practice). In other words, a signature
will verify if s3 (mod N ) is an integer whose binary representation is as follows, where r
is any binary string.
527
FF FF FF
Special block
H(m)
Bleichenbacher noticed that a forger could choose r to ensure that the integer is a
cube in Z.
Precisely, suppose 23071 < N < 23072 , that the special block is 0000 (i.e., 32 zero
bits), and that H has 256-bit output. Let m be a message such that H(m) 1 (mod 3).
We want to find r such that
y = 23057 22360 + H(m)22072 + r
is a cube. Note that
(21019 (2288 H(m))234 /3)3 = 23057 22360 + H(m)22072 + 21087 ((2288 H(m))/3)2 + z
where |z| < 2980 and so is of the right form. To find the right value one can take an
integer of the form y, take its cube root in R and then round up to the nearest integer.
Exercise 24.4.15. Compute a signature using the method of example 24.4.14 for the
hash value H(m) = 4. Check your answer.
Bleichenbacher has also given a chosen ciphertext attack on RSA encryption when
using a fixed padding scheme [67]. More precisely, suppose a message m is padded as in
the figure below to form an integer x, and is then encrypted as c = xe (mod N ).
00 02
00
24.5
In this section we briefly recall some attacks on certain choices of RSA public key.
24.5.1
528
We now sketch Wieners idea [626]. We assume the key generation of Figure 24.1 is
used, so that N = pq where p < q < 2p. Consider the equation defining e and d
ed = 1 + k(N )
(a similar attack can be mounted using the equation ed = 1 + k(N
), see Exercise 24.5.5).
ed + kN = (1 + ku) < 3k N.
(24.3)
If d is smaller than N /3 then the right hand side is < N . Hence, one could try to find
d by running the extended Euclidean algorithm on (e, N ) and testing the coefficient of e
to see if it is a candidate value for d (e.g., by testing whether (xe )d x (mod N ) for
a random 1 < x < N ). Note that one must use the basic extended Euclidean algorithm
rather than the faster variant of Algorithm 1. We now explain that this method is
guaranteed to find d when it is sufficiently small.
1
Theorem 24.5.2. Let
N = pq where p < q < 2p are primes. Let e = d (mod (N ))
1/4
where 0 < d < N / 3. Then given (N, e) one can compute d in polynomial time.
5
1
4
1
3
60
1
ri
86063528783122081
14772019882186053
12203429372191816
2568590509994237
1929067332214868
639523177779369
10497798876761
9655245173709
842553703052
si
1
0
1
1
5
6
23
138
1409
ti
0
1
5
6
29
35
134
8075
8209
529
Exercise 24.5.5. Show how to perform the Wiener attack when (N ) is replaced by
(N ). What is the bound on the size of d for which the attack works?
Exercise 24.5.6. Let (N, e) = (63875799947551, 4543741325953) be an RSA public key
where N = pq with gcd(p 1, q 1) > 2 and small private exponent d such that ed
1 (mod (N )). Use the Wiener attack to find d.
Exercise 24.5.7. Show that one can prevent the Wiener attack by adding a sufficiently
large multiple of (N ) to e.
Wieners result has been extended in several ways. Dujella [183] and Verheul and van
Tilborg [617] show how to extend the range of d, while still using Euclids algorithm.
Their algorithms are exponential time. Boneh and Durfee [78] used lattices to extend the
attack to d < N 0.284 and, with significant further work, extended the range to d < N 0.292 .
Bl
omer and May [71] give a simpler formulation of the Boneh-Durfee attack for d < N 0.284 .
Some unsuccessful attempts to extend Wieners method to larger d are discussed by
Suk [591] and Bauer [31].
24.5.2
L1
Y
j=0
(mej1 x 1) (mod N ).
This polynomial has degree L and can be constructed using the method in the proof
of Theorem 2.16.1 in O(M (L) log(L)M (log(N ))) bit operations. The polynomial G(x)
requires L log2 (N ) bits of storage.
Now, compute c = mLe (mod N ). We wish to evaluate G(cd1 ) (mod N ) for each of
the candidate values 0 d1 < L (to obtain a list of L values). This can be performed
using Theorem 2.16.1 in O(L log(L)2 log(log(L))M (log(N ))) bit operations. For each
value G(cd1 ) (mod N ) in the list we can compute
gcd(G(cd1 ), N )
K) bit operations.
to see if we have split N . The total running time of the attack is O(
530
Exercise 24.5.9. (Galbraith, Heneghan and McKee [219]) Suppose one chooses private
CRT exponents of bit-length n and Hamming weight w. Use the ideas of this section
together with those of Section 13.6 to give
an algorithm to compute a CRT private
exponent, given n and w, with complexity O( wW log(W )2 log(log(W ))M (log(N ))) bit
n/2
operations where W = w/2
.
When e is also small (e.g., when using the key generation method of Exercise 24.1.5)
then there are lattice attacks on small CRT private exponents. We refer to Bleichenbacher
and May [69] for details.
24.5.3
Variants of RSA have been proposed for moduli N = pq where there is some integer r
greater than 2 such that both r | (p 1) and r | (q 1). For example, as follows from
the solution to Exercise 24.5.5, one can prevent the Wiener attack by taking r large. We
explain in this section why such variants of RSA must be used with caution.
First we remark, following McKee and Pinch [411], that r should not be considered
as a secret. This is because r is a factor of N 1 and so the elliptic curve method or
the Pollard rho factoring method can be used to compute a, usually short, list of possible
values for r. Note that there is no way to determine the correct value of r from the list, but
the attacks mentioned below can be repeated for each candidate value for r. Certainly, if
r is small then it can be easily found this way. Even if r is large, since factoring N 1 in
this setting is not harder than factoring N , it follows that the problem of computing r is
not harder than the most basic assumption underlying the scheme.
Even if r is not known, as noted by McKee and Pinch [411], it cannot be too large:
applying the Pollard rho method by iterating the function
x 7 xN 1 + 1 (mod N )
p
will produce a sequence that repeats modulo p after O( p/r) terms, on average. Hence,
if r is too large then the factorisation of N will be found even without knowing r.
We now explain a method to factor N when r is known. Suppose N = pq where
p < q < 2p are primes. Write
p = xr + 1 ,
q = yr + 1.
Then
(N 1)/r = xyr + (x + y) = ur + v
where u and v (0 v < r) are known and x, y are unknown.
Exercise 24.5.10. Let the notation be as above. Show that if r >
determine x and y in polynomial-time.
(24.4)
1/4
3N
then one can
Exercise24.5.11. (McKee and Pinch [411]) Let the notation be as above and suppose
that r < 3N 1/4 . Write
x + y = v + cr , xy = u c
531
Exercise 24.5.12. Let the notation be as above. Show that the exponent of (Z/N Z)
divides xyr. Hence, deduce that
z ur z xyr+cr z cr (mod N )
for every z (Z/N Z) . Given r, show how to find c (and hence split N ) in an expected
O(N 1/4 M (log(N ))/r) bit operations using the Pollard kangaroo algorithm (one could also
use baby-step-giant-step).
Exercise 24.5.13. Suppose N = pq where p and q are 1536-bit primes such that p 1
and q 1 have a large common factor r. Show that, to ensure security against an attacker
who can perform 2128 operations, one should impose the restriction 1 r < 2640 .
Exercise 24.5.14. Generalise the above attacks to the case where r | (p+1) and r | (q+1).
24.6
There are numerous signature schemes based on RSA and Rabin. Due to lack of space
we just sketch two schemes in the random oracle model. Hohenberger and Waters [291]
have given an RSA signature scheme in the standard model whose security relies only on
the Strong-RSA assumption.
24.6.1
A simple way to design RSA signatures that are secure in the random oracle model is to
assume each user has a hash function H : {0, 1} (Z/N Z) where N is their public
key.11 Such a hash function is called a full domain hash, since the hash output is the
entire domain of the RSA trapdoor permutation. Constructing such a hash function is
not completely trivial; we refer to Section 3.6. The signature on a message m in this case
is s = H(m)d (mod N ). These ideas were formalised by Bellare and Rogaway, but we
present the slightly improved security result due to Coron.
Theorem 24.6.1. RSA signatures with full domain hash (FDH-RSA) have UF-CMA
security in the random oracle model (i.e., where the full domain hash function is replaced
by a random oracle) if the RSA problem is hard.
Proof: (Sketch) Let A be an perfect12 adversary playing the UF-CMA game. We build a
simulator that takes an instance (N, e, y) of the RSA problem and, using A as a subroutine,
tries to solve the RSA problem.
The simulator in this case starts by running the adversary A on input (N, e). The
adversary will make queries to the hash function H, and will make decryption queries.
The adversary will eventually output a pair (m , s ) such that s is a valid signature on
m . To explain the basic idea of the simulator we remark that if one could arrange that
H(m ) = y then (s )e y (mod N ) and the RSA instance is solved.
The simulator simulates the random oracle H in the following way. First, the simulator
will maintain a list of pairs (m, H(m)) where m was a query to the random oracle and
H(m) (Z/N Z) was the value returned. This list is initially empty. For each query
m to the random oracle the simulator first checks if m has already been queried and, if
11 In practice one designs H : {0, 1} {0, 1, . . . , N 1} since the probability that a random element
of Z/N Z does not lie in (Z/N Z) is negligible.
12 The proof in the general case, where the adversary succeeds with non-negligible probability , requires
minor modifications.
532
so, responds with the same value H(m). If not, the simulator chooses a random element
1 < r < N , computes gcd(r, N ) (and if this is not 1 then factors N , solves the RSA
instance, and halts), computes z = re (mod N ) and with some probability 1 p (we
determine p at the end of the proof) returns z as H(m) and with probability p returns
yz (mod N ). The information (m, H(m), r) is stored.
When the simulator is asked by the adversary to sign a message m it performs the
following: First it computes H(m) and the corresponding value r. If H(m) = re (mod N )
then the simulator returns s = r. If H(m) = yre (mod N ) then the simulator fails.
Eventually, the adversary outputs a pair (m , s ). If H(m ) = yre (mod N ) where r
is known to the simulator, and if (s )e H(m ) (mod N ), then y = (s r1 )e (mod N )
and so the simulator returns s r1 (mod N ). Otherwise, the simulator fails.
To complete the proof it is necessary to argue that the simulator succeeds with nonnegligible probability. If the adversary makes qS signing queries then the probability that
the simulator can answer all of them is (1 p)qS . The probability that the message m
corresponds to a random oracle query that allows us to solve the RSA problem is p. Hence,
the probability of success is (ignoring some other negligible factors) (1 p)qS p. Assume
that qS 1 and that qS is known (in practice, one can easily learn a rough estimate of
qS by experimenting with the adversary A). Choose p = 1/qS so that the probability of
success is (1 1/qS )qS q1S , which tends to 1/(eqS ) for large qS (where e = 2.71828 . . . ).
Since a polynomial-time adversary can only make polynomially many signature queries
the result follows. We refer to Coron [147] for all the details.
One problem with the full domain hash RSA scheme is the major loss of security (by
a factor of qS ) in Theorem 24.6.1. In other words, the reduction is not tight. This can
be avoided by including an extra random input to the hash function. In other words, an
RSA signature is (s1 , s2 ) such that se2 H(mks1 ) (mod N ). Then, when the simulator
is asked to output a signature on message m, it can choose a fresh value s1 and define
H(mks1 ) = (s2 )e (mod N ) as above. This approach avoids previous queries to H(mks1 )
with high probability. Hence, the simulator can answer standard hash queries with
yre (mod N ) and special hash queries during signature generation with re (mod N ).
This scheme is folklore, but the details are given in Appendix A of Coron [148]. The
drawback is that the extra random value s1 must be included as part of the signature.
The PSS signature padding scheme was designed by Bellare and Rogaway [41] precisely
to allow extra randomness in this way without increasing the size of the signature. We
refer to [148] for a detailed analysis of RSA signatures using the PSS padding.
Exercise 24.6.2. Give a security proof for the RSA full domain hash signature scheme
with verification equation H(mks1 ) = se2 (mod N ).
The above results are all proved in the random oracle model. Paillier [472] has given
some evidence that full domain hash RSA and RSA using PSS padding cannot be proved
secure in the standard model. Theorem 1 of [472] states that if one has a black box
reduction from the RSA problem to selective forgery for a signature scheme under a passive
attack, then under an adaptive chosen message attack one can, in polynomial-time, forge
any signature for any message.
24.6.2
In this section we give a tight security result, due to Bernstein [47], for Rabin signatures.
We assume throughout that N = pq is a Williams integer; in other words, a product of
primes p 3 (mod 8) and q 7 (mod 8) (such integers were discussed in Section 24.2.1).
533
Then ( 1
p ) = ( q ) = ( p ) = 1 while ( q ) = 1. Show that, for any integer h (Z/N Z) ,
there are unique integers e {1, 1} and f {1, 2} such that ef h is a square modulo N .
The signature scheme for public key N is as follows. For a message m {0, 1} one
computes H(m) and interprets it as an integer modulo N (with overwhelming probability,
H(m) (Z/N Z) ). The signer determines the values e and f as in Exercise 24.6.3 and
determines the four square roots s1 , s2 , s3 , s4 satisfying s2i H(m)ef (mod N ). The
signer then deterministically chooses one of the values si (for example, by ordering the
roots as integers s1 < s2 < s3 < s4 and then generating an integer i {1, 2, 3, 4} using a
pseudorandom number generator with a secret key on input m). The signature is the triple
s = (e, f, (ef )1 si (mod N )). It is crucially important that, if one signs the same message
twice, then the same signature is output. To verify a signature s = (e, f, s) for public key
N and message m one computes H(m) and then checks that ef s2 H(m) (mod N ).
Exercise 24.6.4. Show that if a signer outputs two signatures (e1 , f1 , s1 ) and (e2 , f2 , s2 )
for the same message m such that s1 6 s2 (mod N ) then one can factor the modulus.
Exercise 24.6.5. Show that it is not necessary to compute the Jacobi symbol ( H(m)
N )
when generating Rabin-Williams signatures as above. Instead, one can compute H(m)(p+1)/4 (mod p)
and H(m)(q+1)/4 (mod q), as is needed to compute the si , and determine e and f with
only a little additional computation.
Theorem 24.6.6. The Rabin-Williams signature scheme sketched above has UF-CMA
security in the random oracle model (i.e., if H is replaced by a random oracle) if factoring
Williams integers is hard and if the pseudorandom generator is indistinguishable from a
random function.
Proof: (Sketch) Let A be a perfect adversary against the Rabin-Williams signature
scheme and let N be a Williams integer to be factored. The simulator runs the adversary
A on N .
The simulator must handle the queries made by A to the random oracle. To do this
it maintains a list of hash values, which is initially empty. When A queries H on m
the simulator first checks whether m appears on the list of hash values, and, if it does,
responds with the same value as previously. If H has not been previously queried on
m the simulator chooses random s (Z/N Z) , e {1, 1}, f {1, 2} and computes
h = ef s2 (mod N ). If 0 h < 2 then return h and store (m, e, f, s, h) in the list. If
h is too big then repeat with a different choice for (s, e, f ). Since N < 2+O(log()) the
expected number of trials is polynomial in log(N ).
When A makes a signature query on m the simulator first queries H(m) and gets the
values (e, f, s) from the hash list such that H(m) ef s2 (mod N ). The simulator can
therefore answer with (e, f, s), which is a valid signature. (It is necessary to show that the
values (e, f, s) output in this way are indistinguishable from the values output in the real
cryptosystem, and this requires that the pseudorandom choice of s from among the four
possible roots be computationally indistinguishable from random; note that the adversary
cannot detect whether or not a pseudorandom generator has actually been used since it
does not know the secret key for the generator.)
Finally, A outputs a signature (e , f , s ) on a message m . Recalling the values
(e, f, s) from the construction of H(m ) we have e = e, f = f and (s )2 s2 (mod N ).
534
With probability 1/2 we can factor N as gcd(N, s s). We refer to Section 6 of Bernstein [47] for the full details.
Exercise 24.6.7. Show that if one can find a collision for the hash function H in Bernsteins variant of Rabin-Williams signatures and one has access to a signing oracle then
one can factor N with probability 1/2.
Exercise 24.6.8. Suppose the pseudorandom function used to select the square root in
Rabin-Williams signatures is a function of H(m) rather than m. Show that, in contrast
to Exercise 24.6.7, finding a collision in H no longer leads to an algorithm to split N . On
the other hand, show that if one can compute a preimage for H and one has access to a
signing oracle then one can factor N with probability 1/2.
Exercise 24.6.9. Adapt the proof of Theorem 24.6.6 to the case where H has full domain
output.
Exercise 24.6.10. Adapt the proof of Theorem 24.6.1 to the case where H : {0, 1}
{0, 1} where 2 < N < 2+O(log()) .
24.6.3
Example 24.6.11. Let (N, e) be an RSA public key for a trusted authority, where e is
a large prime. Shamir [543] proposed the following identity-based signature scheme. A
user with identity id {0, 1} has a corresponding public key H1 (id) (Z/N Z) , where
H1 is a cryptographic hash function. The user obtains their private key sid such that
seid H1 (id) (mod N )
from the trusted authority.
To sign a message m {0, 1} the user chooses a random integer 1 < r < N , computes
s1 = re (mod N ), then computes s2 = sid rH2 (mks1 ) (mod N ) where H2 : {0, 1}
{0, 1, . . . , N 1} is a cryptographic hash function and s1 is represented as a binary string.
The signature is (s1 , s2 ).
To verify a signature one checks that
H (mks1 )
se2 H1 (id)s1 2
(mod N ).
Exercise 24.6.12. Show that outputs (s1 , s2 ) of the Shamir signature algorithm do satisfy
the verification equation.
The security of Shamir signatures was analysed by Bellare, Namprempre and Neven [35]
(also see Section 4 of [332]). They show, in the random oracle model, that the scheme is
secure against existential forgery if the RSA problem is hard.
Exercise 24.6.13. Show how to break the Shamir signature scheme under a known
message attack if e is small (for example, e = 3).
Exercise 24.6.14. Consider the following modified version of the Shamir identity-based
signature scheme: The verification equation for signature (s1 , s2 ) on message m and identity id is
H (m)
se2 H1 (id)s1 2
(mod N ).
Show how the user with private key sid can generate signatures. Give a selective forgery
under a known message attack for this scheme.
535
Exercise 24.6.15. Consider the following modified version of the Shamir identity-based
signature scheme: The verification equation for signature s on message m and identity id
is
se H1 (id)H2 (m) (mod N ).
Show how the user with private key sid can generate signatures. Show how to compute
the private key for this scheme under a known message attack.
We now briefly present two interactive identification schemes that are convenient alternatives to RSA and Rabin signatures for constrained devices. The notion of an identification scheme was sketched in Section 22.1.1; recall that it is an interactive protocol
between a prover and a verifier.
Example 24.6.16. (Feige, Fiat and Shamir [200]) A prover has public key (N, v1 , . . . , vk ),
where N = pq is an RSA modulus and 1 < vj < N for 1 j k. The private key is a
list of integers u1 , . . . , uk such that vj u2j (mod N ) for 1 j k. The identification
protocol is as follows: The prover chooses a random integer 1 < r < N and send s1 =
r2 (mod N ) to the verifier. The verifier sends a challenge c = (c1 , . . . , ck ) {0, 1}k . The
prover computes
k
Y
c
uj j (mod N )
s2 = r
j=1
k
Y
vj j (mod N ).
j=1
One can try to impersonate a user by guessing the challenge c and defining s1 accordingly;
this succeeds with probability 1/2k . The protocol can be repeated a number of times if
necessary. For example, one might choose k = 20 and repeat the protocol 4 times.
The point is that both signing and verification are only a small number of modular
multiplications (in contrast to Rabin signatures, for which signing requires computing two
large exponentiations). The security is based on factoring (see Exercise 24.6.17).
The public key can be shortened by choosing v1 , . . . , vk to be the first k primes.
Alternatively, the scheme can be made identity-based by defining v1 , . . . , vk as a function
of the identity of a user; the user can get the values ui from the trusted authority who
knows the factorisation of N . The identification scheme can be turned into a signature
scheme (either public key or identity-based) by choosing the value c as the output of
a hash function H(mks1 ); but then k should be taken to be very large. For further
discussion of these schemes see Sections 10.4.2 and 11.4.1 of Menezes, van Oorschot and
Vanstone [415].
Exercise 24.6.17. Let A be an algorithm that takes as input a public key for the FeigeFiat-Shamir scheme, outputs a value s1 , receives two distinct challenges c1 and c2 and
outputs values s2,1 and s2,2 satisfying the verification equation for c1 and c2 respectively.
Show how to use A to compute square roots modulo N .
Example 24.6.18. (Guillou and Quisquater [270]) A prover has public key (N, e, u)
where N = pq is an RSA modulus, e is an RSA exponent (i.e., gcd(e, (N )) = 1) and
1 < u < N is a randomly chosen integer. The private key is an integer s such that
se u (mod N ). The identification protocol is as follows: The prover chooses a random
integer 1 < r < N and sends s1 = re (mod N ) to the verifier. The verifier sends a
536
challenge 0 c < 2k to the prover, who replies with s2 = rsc (mod N ). The verifier
checks that
se2 s1 uc (mod N ).
When e and c are small then both the prover and verifier have easy computations (in
contrast with RSA, where at least one party must perform a large exponentiation). Hence
this scheme can be more suitable than RSA in constrained environments. The security
depends on the RSA problem (see Exercise 24.6.19).
One can try to impersonate a user by guessing the challenge c and defining s1 accordingly; this succeeds with probability 1/2k . The protocol can be repeated a number of
times if necessary.
The scheme can be made identity-based by replacing u with H1 (id). Each user can
get their private key sid such that seid H1 (id) (mod N ) from the trusted authority.
The scheme can be turned into a signature scheme (either public key or identity-based)
by setting c = H(mks1 ); but then k should be sufficiently large. A variant with lower
bandwidth is to send only some bits of s1 and change the verification equation to checking
that the appropriate bits of se2 uc (mod N ) equal those of s1 . For further discussion of
these schemes see Sections 10.4.3 and 11.4.2 of Menezes, van Oorschot and Vanstone [415].
Exercise 24.6.19. Let A be an algorithm that takes as input a public key for the GuillouQuisquater scheme, outputs a value s1 , receives two distinct challenges c1 and c2 and
outputs values s2,1 and s2,2 satisfying the verification equation for c1 and c2 respectively.
Show how to use A to solve the RSA problem.
24.7
24.7.1
To prevent algebraic attacks such as those mentioned in Sections 1.2 and 24.4 it is necessary to use randomised padding schemes for encryption with RSA. Three goals of padding
schemes for RSA are listed below.
1. To introduce randomness into the message and expand short messages to full size.
2. To ensure that algebraic relationships among messages do not lead to algebraic
relationships between the corresponding ciphertexts.
3. To ensure that random elements of ZN do not correspond to valid ciphertexts. This
means that access to a decryption oracle in CCA attacks is not useful.
Example 24.7.1. Consider the following naive padding scheme when using 3072-bit RSA
moduli: suppose messages are restricted to be at most 256 bits, and suppose a random
2815-bit value is r appended to the message to bring it to full length (i.e., the value input
to the RSA function is m + 2256 r). This certainly adds randomness and destroys many
algebraic relationships. However, there is an easy CCA attack on the OWE-security of
RSA with this padding scheme.
To see this, suppose 23071 < N < 23072 and let c be the ciphertext under attack. With
probability 1/2 the most significant bit of r is zero and so c = c2e (mod N ) is a valid
padding of 2m (except that the most significant bit of m is lost). Hence, by decrypting c
one determines all but the most significant bit of m. Similarly, if the least significant bit
of m is zero then one can make a decryption query on c = 2e c (mod N ).
Exercise 24.7.2. Consider the situation of Example 24.7.1 again. Find a CCA attack on
RSA with the padding scheme 1+2m+2257 r+23071 where 0 m < 2256 and 0 r < 22813 .
24.7.2
537
OAEP
In this section we present the OAEP (Optimal Asymmetric Encryption Padding) scheme,
which was developed by Bellare and Rogaway [40] (for more details see Section 5.9.2 of
Stinson [588]). The word optimal refers to the length of the padded message compared
with the length of the original message: the idea is that the additional bits in an OAEP
padding of a message are as small as can be.
The padding scheme takes as input an n-bit message m and outputs a -bit binary
string S that can then be encrypted (in the case of RSA one encrypts by interpreting S as
an element of ZN and raising to the power e modulo N ). Similarly, to decrypt a ciphertext
c corresponding to an RSA-OAEP message one computes cd (mod N ), interprets this
number as a bitstring S, and then unpads to get m or . Figure 24.3 describes the OAEP
scheme in detail. As usual, akb denotes the concatenation of two binary strings and a b
denotes the bitwise XOR of two binary strings of the same length.
System Parameters: Suppose we are using ( + 1)-bit RSA moduli (e.g., = 3071).
Let 0 and 1 be chosen so that no adversary can perform 2i operations, e.g. 0 =
1 = 128. Set n = 0 1 . Messages are defined to be n-bit strings.
Let G be a cryptographic hash function mapping 0 bit strings to n + 1 bit strings and
H be a cryptographic hash function from n + 1 bit strings to 0 bit strings.
Pad: Let m be an n-bit string.
1. Choose a random 0 -bit string R.
2. Set S1 = (mk01 ) G(R), where 01 denotes a 1 -bit string of zeroes.
3. Set S2 = R H(S1 ).
4. Set S = S1 kS2 .
Unpad: Given a -bit string S one calls the low 0 bits S2 and the high n + 1 bits S1 .
1. Compute R = S2 H(S1 ).
2. Compute S3 = S1 G(R).
3. Check whether the low 1 bits of S3 are all zero, if not output and halt.
538
24.7.3
Rabin-SAEP
Boneh [74] has considered a padding scheme (which he calls SAEP) that is simpler than
OAEP and is suitable for encrypting short messages with Rabin. Note that the restriction
to short messages is not a serious problem in practice, since public key encryption is mainly
used to transport symmetric keys; nevertheless, this restriction means that SAEP is not
an optimal padding. The scheme can also be used for RSA with extremely short public
exponents (such as e = 3). We sketch the proof of security of Rabin encryption using this
padding, as it is relatively straightforward and it is also a nice application of some of the
cryptanalysis techniques already seen in Section 24.4.
Let N be a security parameter (for example, = 3070). Suppose N = pq is an
RSA modulus such that 2+1 < N < 2+1 + 2 . We will also assume that p q
3 (mod 4). Suppose 0 < 0 , 1 < 0 + 1 < (later we will insist that 1 > ( + 2)/2
and 0 < n = 0 1 < /4). Let H : {0, 1}1 {0, 1}n+0 be a cryptographic hash
function. The SAEP padding of a message m {0, 1}n is as follows:
1. Set S0 = mk00 .
2. Choose a random element R {0, 1}1 .
3. Set S1 = S0 H(R).
4. Output S = S1 kR.
To unpad an SAEP bitstring S one first writes S as S1 kR where R {0, 1}1 , then
computes S0 = S1 H(R), then writes S0 = mkS2 where S2 {0, 1}0 . If S2 is not the
all zero string then return , else return m.
Rabin-SAEP encryption proceeds by taking a message m {0, 1}n, padding it as
above, interpreting the bitstring as an integer 0 x < 2 < N/2, and computing c =
x2 (mod N ). As usual, with overwhelming probability we have gcd(x, N ) = 1, so we
assume that this is the case.
To decrypt the ciphertext c one computes the four square roots 1 x1 , . . . , x4 < N
of c modulo N in the usual way. Writing these such that x3 = N x1 , x4 = N x2 it
follows that at exactly two of them are less than N/2. Hence, at most two are less than
2 . For each root x such that x < 2 perform the unpad algorithm to get either or a
message m. If there are two choices for x, and either both give or both give messages,
then output . Otherwise, output m. An integer c is said to be a valid ciphertext if it
decrypts to a message m, otherwise it is an invalid ciphertext.
Exercise 24.7.3. Show that if S is the output of the pad algorithm on m {0, 1}n then
m is the output of unpad on S. Then show that if c is a Rabin-SAEP encryption of a
message m {0, 1}n then the decryption algorithm on c outputs m with overwhelming
probability.
We will now study the security of the scheme. Intuitively, a CCA attacker of the
Rabin-SAEP scheme either computes at least one square root of the challenge ciphertext
c , or else computes a ciphertext c related to c in the sense that if c is an encryption
of m then c is an encryption of m z for some bitstring z. In this latter case, there is a
square root of c that differs from the desired square root of c only in some of the most
significant n bits. The proof of Theorem 24.7.5 shows how either situation leads to the
factorisation of N .
539
Lemma 24.7.4. Let N (with > 10). Let N = pq, where p q 3 (mod 4)
are prime. Suppose further that 2+1 < N < 2+1 + 2 . Let y = x2 (mod N ) where
0 < x < 2 is randomly chosen. Then there is an integer 0 < x < 2 such that x 6= x
but y (x )2 (mod N ) with probability at least 1/3.
540
541
24.7.4
Further Topics
Hybrid Encryption
It is standard in cryptography that encryption is performed using a hybrid system. A
typical scenario is to use public key cryptography to encrypt a random session key K1 kK2 .
Then the document is encrypted using a symmetric encryption scheme such as AES with
the key K1 . Finally a MAC (message authentication code) with key K2 of the symmetric
ciphertext is appended to ensure the integrity of the transmission.
Encryption Secure in the Standard Model
Cramer and Shoup [159] have given an encryption scheme, using the universal hash proof
systems framework based on the Paillier cryptosystem. Their scheme (using N = pq where
p and q are safe primes) has IND-CCA security in the standard model if the composite
residuosity problem is hard (see Definition 24.3.4). We do not have space to present this
scheme.
Hofheinz and Kiltz [290] have shown that the Elgamal encryption scheme of Section 23.1, when implemented in a certain subgroup of (Z/N Z) , has IND-CCA security
in the random oracle model if factoring is hard, and in the standard model if a certain
higher residuosity assumption holds.
Chapter 25
25.1
546
d such that (OE ) = OEe . Such a map is automatically a group homomorphism and has
kernel of size dividing d.
e over k may be written in
Theorem 9.7.5 states that a separable isogeny : E E
the form
(x, y) = (1 (x), cy1 (x) + 3 (x)),
(25.1)
where 1 (x), 3 (x) k(x), where 1 (x) = d1 (x)/dx is the (formal) derivative of the
ai for
rational function 1 (x), where c k is a non-zero constant, and where (writing e
e
the coefficients of E)
23 (x) = e
a1 1 (x) e
a3 + (a1 x + a3 )1 (x) .
Lemma 9.6.13 showed that if 1 (x) = a(x)/b(x) in equation (25.1) then the degree of
is max{degx (a(x)), deg x (b(x))}. The kernel of an isogeny with 1 (x) = a(x)/b(x) is
{OE } {P = (xP , yP ) E(k) : b(xP ) = 0}. The kernel of a separable isogeny of degree
d has d elements.
Let E be an elliptic curve over a field k and G a finite subgroup of E(k) that is defined
e (up to isomorphism)
over k. Theorem 9.6.19 states that there is a unique elliptic curve E
e
and a separable isogeny : E E over k such that ker() = G. We sometimes write
e = E/G. Let be a prime such that gcd(, char(k)) = 1. Since E[] is isomorphic (as a
E
group) to the product of two cyclic groups, there are + 1 different subgroups of E[] of
order . It follows that there are + 1 isogenies of degree , not necessarily defined over
k, from E to other curves (some of these isogenies may map to the same image curve).
As implied by Theorem 9.6.18 and discussed in Exercise 9.6.20, an isogeny is essentially
e are
determined by its kernel. We say that two separable isogenies 1 , 2 : E E
equivalent isogenies if ker(1 ) = ker(2 ).
e be a separable isogeny. Show that if Aut(E)
e
Exercise 25.1.1. Let : E E
then is equivalent to . Explain why is not necessarily equivalent to for
Aut(E).
Theorem 25.1.2 shows that isogenies can be written as chains of prime-degree isogenies. Hence, in practice one can restrict to studying isogenies of prime degree. This
observation is of crucial importance in the algorithms.
e be elliptic curves over k and let : E E
e be a separable
Theorem 25.1.2. Let E and E
isogeny that is defined over k. Then = 1 k [n] where 1 , . . . , k are isogenies
Qk
of prime degree that are defined over k and deg() = n2 i=1 deg(i ).
Proof: Theorem 9.6.19 states that is essentially determined by its kernel subgroup G
and that is defined over k if and only if G is. We will also repeatedly use Theorem 9.6.18,
e defined over k factors as = 2 1 (where
that states that an isogeny : E E
e
1 : E E1 and 2 : E1 E are isogenies over k) whenever ker() has a subgroup
G = ker(1 ) defined over k.
First, let n be the largest integer such that E[n] G = ker() and note that = [n]
where [n] : E E is the usual multiplication by n map. Set i = 1, define E0 = E and set
G = G/E[n]. Now, let | #G be a prime and let P G have prime order . There is an
isogeny i : Ei1 Ei of degree with kernel hP i. Let Gal(k/k). Since (P ) G
but E[] 6 G it follows that (P ) hP i and so hP i is defined over k. It follows that i
is defined over k. Replace G by i (G)
= G/hP i and repeat the argument.
Exercise 25.1.3. How must the statement of Theorem 25.1.2 be modified if the requirement that be separable is removed?
547
25.1.1
V
elus Formulae
We now present explicit formulae, due to Velu [613], for computing a separable isogeny
from an elliptic curve E with given kernel G. These formulae work in any characteristic.
As motivation for Velus formulae we now revisit Example 9.6.9.
Example 25.1.5. Let E : y 2 = x3 + x and consider the subgroup of order 2 generated
by the point (0, 0). From Example 9.2.4 we know that the translation by (0, 0) map is
given by
1 y
, 2 .
(0,0) (x, y) =
x x
Hence, it follows that functions invariant under this translation map include
X = x + 1/x = (x2 + 1)/x,
= (x6 x4 x2 + 1)/x3
= X 3 4X.
It follows that the map
(x, y) =
x2 + 1 x2 1
,y
x
x2
e : Y 2 = X 3 4X.
is an isogeny from E to E
We remark that can also be written as
2
x2 1
y
,
y
(x, y) =
x2
x2
and can be written projectively as
(x : y : z) =
=
=
=
(x(x2 + z 2 ) : y(x2 z 2 ) : x2 z)
(y(x2 + z 2 ) : xy 2 x2 z z 3 : xyz)
(y 2 z : y(x2 z 2 ) : x2 z)
(xy 2 : y(y 2 2xz) : x3 ).
Theorem 25.1.6. (Velu) Let E be an elliptic curve over k defined by the polynomial
F (x, y) = x3 + a2 x2 + a4 x + a6 y 2 + a1 xy + a3 y = 0.
Let G be a finite subgroup of E(k). Let G2 be the set of points in G {OE } of order 2
and let G1 be such that #G = 1 + #G2 + 2#G1 and
G = {OE } G2 G1 {Q : Q G1 }.
548
Write
Fx =
F
F
= 3x2 + 2a2 x + a4 a1 y and Fy =
= 2y a1 x a3 .
x
y
Fx (Q)
2Fx (Q) a1 Fy (Q)
if Q G2
if Q G1 .
(u(Q) + xQ t(Q))
QG1 G2
and set
A1 = a1 , A2 = a2 , A3 = a3 , A4 = a4 5t(G), A6 = a6 (a21 + 4a2 )t(G) 7w(G).
Then the map : (x, y) 7 (X, Y ) where
X =x+
QG1 G2
u(Q)
t(Q)
+
x xQ
(x xQ )2
and
Y = y
QG1 G2
u(Q)
=
.
2Y + A1 X + A3
2y + a1 x + a3
Proof: (Sketch) The basic idea (as used in Example 25.1.5) is that the function
X
X(P ) =
x(P + Q)
QG
on E is invariant under G (in the sense that X = X Q for all Q G) and so can be
considered as defined on E/G. To simplify some calculations
sketched below it turns
P
out to be more convenient to subtract the constant QG{OE } x(Q) from X. (Note
that x(Q) = xQ .) Let t = x/y be a uniformizer on E at OE (one could also take
t = x/y, but this makes the signs more messy). The function x can be written as
2
1
t
a 1 t
a2 a3 t (a1 a3 + a4 )t2 (for more details about the expansions of
x, y and E in terms of power series see Section IV.1 of Silverman [560]). It follows that
2
X = t
a1 t1
and so vOE (X) = 2.
549
3
2
1
P One can also show that y = t a1 t a2 t . The function Y (P ) =
QG y(P + Q) is invariant under G and has vOE (Y ) = 3. One can therefore show
(see Section 12.3 of Washington [622]) that the subfield k(X, Y ) of k(x, y) is the function
e (Washington [622] does this in Lemma 12.17 using the Hurwitz
field of an elliptic curve E
genus formula). The map : (x, y) 7 (X, Y ) is therefore an isogeny of elliptic curves. By
considering the expansions in terms of t one can show that the equation for the image
curve is Y 2 + A1 XY + A3 Y = X 3 + A2 X 2 + A4 X + A6 where the coefficients Ai are as
in the statement of the Theorem.
2
Now, let E = dx/(2y + a1 x + a3 ). One has dx = (2t3
+ a1 t + )dt and
3
2
2y + a1 x + a3 = 2t a1 t + and so E = (1 a1 t + )dt . Similarly,
1
3
(Ee ) = d(X )/(2Y + A1 X + A3 ) = d(t2
+ a1 t + )/(2t + ) =
(1 + )dt . It follows that the isogeny is separable and that (Ee ) = f E for some
function f . Further, div(E ) = 0 and div( (Ee )) = (div(Ee )) = 0 (by Lemma 8.5.36,
since is unramified1 ) and so div(f ) = 0. It follows that f is a constant, and the power
series expansions in terms of t imply that f = 1 as required.
Write the isogeny as (x, y) = (1 (x), y2 (x) + 3 (x)). By Theorem 9.7.5 the isogeny
is determined by 1 (x) (for the case char(k) = 2 see Exercise 9.7.6). Essentially, one only
has to prove Velus formula for 1 (x); we do this now. First, change the definition of X
to
X
(xP +Q xQ )
X(P ) = xP +
QG{OE }
where P is a generic point (i.e., P = (xP , yP ) where xP and yP are variables) on the
elliptic curve and Q G {OE }. Let F (x, y) be as in the statement of the Theorem and
let y = l(x) be the equation of the line through P and Q (so that l(x) = (x xQ ) + yQ
where = (yP yQ )/(xP xQ )). Define
F1 (x) = F (x, l(x)) = (x xQ )(x xP )(x xP +Q ).
Further,
F1
(Q) = (xQ xP )(xQ xP +Q )
x
and
F1
F
F l
=
+
= Fx + Fy .
x
x
y x
Hence, xP +Q xQ = Fx (Q)/(xP xQ ) + (yP yQ )Fy (Q)/(xP xQ )2 . One now considers
two cases: When [2]Q = OE then Fy (Q) = 0. When [2]Q 6= OE then it is convenient to
consider
xP +Q xQ + xP Q xQ .
Now, xQ = xQ , yQ = yQ +Fy (Q), Fx (Q) = Fx (Q)a1 Fy (Q) and Fy (Q) = Fy (Q).
The formula for 1 (x) follows.
Now we sketch how to obtain the formula
P for the Y -coordinate of the isogeny in the
case char(k) 6= 2. Note that 1 (x) = x + Q [t(Q)/(x xQ ) + u(Q)/(x xQ )2 ] and so
P
1 (x) = 1 Q [t(Q)/(x xQ )2 + 2uQ /(x xQ )3 ]. Using 3 (x) = (A1 1 (x) A3 +
1 This was already discussed in Section 9.6. One can directly see that separable isogenies are unramified
since if (P1 ) = P2 then the set of pre-images under of P2 is {P1 + Q : Q ker()}.
550
= y1 (x) + 3 (x)
X
= y 1
t(Q)/(x xQ )2 + 2u(Q)/(x xQ )3
Q
(a1 x + a3 )/2 a1
X
Q
[t(Q)/(x xQ )2 + 2u(Q)/(x xQ )3 ]
X
Q
X
[t(Q)/(x xQ )2 2u(Q)/(x xQ )3 ]
2y + a1 x + a3
y + a1 (x xQ ) yQ
+ u(Q)
(x xQ )2
(x xQ )3
Q
t(Q)((a1 xQ + a3 )/2 + yQ ) + a1 u(Q)/2
+
.
(x xQ )2
= y
t(Q)
It suffices to show that the numerator of the final term in the sum is equal to a1 u(Q)
Fx (Q)Fy (Q). However, this follows easily by noting that (a1 xQ +a3 )/2+yQ = Fy (Q)/2,
u(Q) = Fy (Q)2 and using the facts that Fy (Q) = 0 when [2]Q = OE and t(Q) =
2Fx (Q) a1 Fy (Q) otherwise.
Corollary 25.1.7. Let E be an elliptic curve defined over k and G a finite subgroup of
e = E/G defined over k and
E(k) that is defined over k. Then there is an elliptic curve E
e
an isogeny : E E defined over k with ker() = G.
Proof: It suffices to show that the values t(G), w(G) and the rational functions X and
Y in Theorem 25.1.6 are fixed by any Gal(k/k).
e be a separable isogeny of odd degree between elliptic
Corollary 25.1.8. Let : E E
curves over k. Write (x, y) = (1 (x), 2 (x, y)), where 1 (x) and 2 (x, y) are rational
functions. Then 1 (x, y) = u(x)/v(x)2 , where deg(u(x)) = and deg(v(x)) = ( 1)/2.
Also, 2 (x, y) = (yw1 (x)+w2 (x))/v(x)3 , where deg(w1 (x)) 3(1)/2 and deg(w2 (x))
(3 1)/2.
Exercise 25.1.9. Prove Corollary 25.1.8.
e is normalised if ( e ) = E .
Definition 25.1.10. An isogeny : E E
E
Velus formulae give a normalised isogeny. Note that normalised isogenies are incompatible with Theorem 9.7.2 (which, for example, implies [m] (E ) = mE ). For this
e to obtain the desired
reason, in many situations one needs to take an isomorphism from E
isogeny. Example 25.1.12 shows how this works.
551
have the same j-invariant, but they are clearly not the same Weierstrass equation. Hence,
the Velu isogeny with kernel E[2] is not the isogeny [2] : E E.
e to E. The
To recover the map [2] one needs to find a suitable isomorphism from E
isomorphism will have the form (X, Y ) 7 (u2 X + r, u3 Y + su2 X + t) where we must have
u = 1/2 to have the correct normalisation for the action of the isogeny on the invariant
differential (see Exercise 25.1.13). One can verify that taking r = 291, s = 233 and t = 67
e to E and that the composition of the Velu isogeny
gives the required isomorphism from E
and this isomorphism is the map [2].
Exercise 25.1.13. Show that if : (x, y) 7 (u2 x + r, u3 y + su2 x + t) is an isomorphism
e then ( e ) = 1 E .
from E to E
u
E
Exercise 25.1.14. Determine the complexity of constructing and computing the Velu
isogeny. More precisely, show that if d = #G and G E(Fqn ) then O(dM (n, q)) bit
operations are sufficient, where M (n, q) = M (n log(nq)) is the number of bit operations
to multiply two degree n polynomials over Fq .
Further, show that if d is an odd prime then n d 1 and so the complexity can be
written as O(d2+ log(q)1+ ) bit operations.
where the si are the i-th symmetric polynomials in the roots of (x) (equivalently, in
the x-coordinates of elements of G1 ). Define b2 = a21 + 4a2 , b4 = 2a4 + a1 a3 and b6 =
e with ker() = G, of the form (x, y) =
a23 + 4a6 . Then there is an isogeny : E E,
(A(x)/(x)2 , B(x, y)/(x)3 ) where A(x) and B(x, y) are polynomials. Indeed,
A(x)
= (2d+1)x2s1 (4x3 +b2 x2 +2b4 x+b6 )((x) /(x)) (6x2 +b2 x+b4 )((x) /(x)).
(x)2
The proof of Lemma 25.1.16 is given as a sequence of exercises.
552
Exercise 25.1.17. Let the notation be as in Lemma 25.1.16. Let Fx (Q), Fy (Q), t(Q)
and u(Q) be as in Theorem 25.1.6. Show that
t(Q) = 6x2Q + b2 xQ + b4
and
Exercise 25.1.18. Let the notation be as in Lemma 25.1.16. Let Fx (Q), Fy (Q), t(Q)
and u(Q) be as in Theorem 25.1.6. Show that
xQ
x xQ
xQ
(x xQ )2
x2Q
(x xQ )
x2Q
(x xQ )2
x3Q
(x xQ )2
=
=
=
=
=
x
1,
x xQ
x
1
,
(x xQ )2
(x xQ )
x2
x xQ ,
x xQ
x2
2x
+ 1,
(x xQ )2
(x xQ )
x3
3x2
+ 2x + xQ .
(x xQ )2
(x xQ )
Show that S1 = (x) /(x) and that S2 = ( (x)/(x)) = (((x) )2 (x)(x) )/(x)2 .
Exercise 25.1.20. Complete the proof of Lemma 25.1.16.
Exercise 25.1.21. Determine the complexity of using Lemma 25.1.16 to compute isogenies over finite fields. More precisely, show that if G E(Fqn ) is defined over Fq and
d = #G then one can compute (x) in O(d2 ) operations in Fq . Once (x) Fq [x] is
computed show that one can compute the polynomials A(x) and B(x, y) for the isogeny
in O(d) operations in Fq .
25.2
Velus formulae require that one knows the kernel of the desired isogeny. But in some
applications one wants to take a k-rational isogeny of a given degree d (assuming such
e (where E
e may or may not be known), and
an isogeny exists) from E to another curve E
one does not know a specific kernel. By Theorem 25.1.2 one can restrict to the case when
d = is prime. We usually assume that is odd, since the case = 2 is handled by points
of order 2 and Velus formulae.
One solution is to choose a random point P E[] that generates a k-rational subgroup
of order . To find such a point, compute the -division polynomial (which has degree
(2 1)/2 when is odd) and find irreducible factors of it in k[x] of degree up to ( 1)/2.
Roots of such factors are points of order , and one can determine whether or not they
generate a k-rational subgroup by computing all points in the subgroup. Roots of factors
of degree d > ( 1)/2 cannot give rise to k-rational subgroups of order . This approach
is expensive when is large for a number of reasons. For a start, finding roots of degree
at most ( 1)/2 of a polynomial of degree (2 1)/2 in Fq [x] takes (3 log() log(q))
bit operations.
553
A more elegant approach is to use the -th modular polynomial. It is beyond the scope
of this book to present the theory of modular functions and modular curves (some basic
references are Sections 5.2 and 5.3 of Lang [363] and Section 11.C of Cox [156]). The fundamental fact is that there is a symmetric polynomial, called the modular polynomial2
e is an elliptic
(x, y) Z[x, y] such that if E is an elliptic curve over a field k and E
curve over k then there is a separable isogeny of degree (where gcd(, char(k)) = 1) with
e if and only if (j(E), j(E))
e = 0 (see Theorem 5, Section 5.3 of
cyclic kernel from E to E
Lang [363]). The modular polynomial (x, y) is a singular model for the modular curve
X0 () over Q. This modular curve is a moduli space in the sense that a (non-cusp) point
of X0 ()(k) corresponds to a pair (E, G) where E is an elliptic curve over k and where
G is a cyclic subgroup of E, defined over k, of order . Note that it is possible to have
an elliptic curve E together with two distinct cyclic subgroups G1 and G2 of order such
that the image curves E/G1 and E/G2 are isomorphic; in this case (E, G1 ) and (E, G2 )
are distinct points of X0 () but correspond to a repeated root of (j(E), y) (it follows
from the symmetry of (x, y) that this is a singular point on the model). In other words,
a repeated root of (j(E), y) corresponds to non-equivalent -isogenies from E to some
e
elliptic curve E.
Since there are + 1 cyclic subgroups of E[] it follows that (j(E), y) has degree
+ 1. Indeed, (x, y) = x+1 + y +1 (xy) + with all other terms of lower degree (see
Theorem 11.18 of Cox [156] or Theorem 3 of Section 5.2 of Lang [363]). The coefficients of
(x, y) are large (as seen in Example 25.2.1, even when = 2 the coefficients are large).
Example 25.2.1.
2 (x, y)
= x3 + y 3 x2 y 2 + 1488(x2 y + xy 2 ) 162000(x2 + y 2 )
+40773375xy + 8748000000(x + y) 157464000000000.
Let be prime. Cohen [137] showed that the number of bits in the largest coefficient
of (x, y) is O( log()) (see Br
oker and Sutherland [110] for a more precise bound).
Since there are roughly 2 coefficients it follows that (x, y) can be written down using O(3 log()) bits, and it is believed that this space requirement cannot be reduced.
Hence, one expects to perform at least O(3 log()) = O(3+ ) bit operations3 to compute
(x, y). Indeed, using methods based on modular functions one can conjecturally4 compute (x, y) in O(3+ ) bit operations (see Enge [193]). Using modular functions other
than the j-function can lead to polynomials with smaller coefficients, but this does not
affect the asymptotic complexity.
The fastest method to compute modular polynomials is due to Br
oker, Lauter and
Sutherland [109]. This method exploits some of the ideas explained later in this chapter
(in particular, isogeny volcanoes). The method computes (x, y) modulo small primes
and then determines (x, y) by the Chinese remainder theorem. Under the Generalized
Riemann Hypothesis (GRH) the complexity is O(3 log()3 log(log())) bit operations. For
the rest of the chapter we abbreviate the cost as O(3+ ) bit operations. The method
can also be used to compute (x, y) modulo p, in which case the space requirements are
O(2 log()2 + 2 log(p)) bits.
The upshot is that, given an elliptic curve E over k, the j-invariants of elliptic curves
e that are -isogenous over k (where gcd(, char(k)) = 1) are given by the roots of
E
2 The reader should not confuse the modular polynomial (x, y) with the cyclotomic polynomial
m (x).
3 Recall that a function f () is O(3+ ) if, for every > 0, there is some C(), L() R
>0 such that
f () < C()3+ for all > L().
4 Enge needs an assumption that rounding errors do not affect the correctness of the output.
554
(j(E), y) in k. When E is ordinary, Theorem 25.4.6 implies that (j(E), y) has either
0, 1, 2 or + 1 roots in k (counted with multiplicities).
Exercise 25.2.2. Given the polynomial (x, y) and a value j Fq show that one can
compute F (y) = (j, y) Fq [y] in O(2 ( log() log(q)+M (log(q)))) bit operations. Show
also that one can then compute the roots Fq of F (y) = (j(E), y) (or determine
that there are no roots) in expected time bounded by O(2 log() log(q)) field operations
(which is O(2+ log(q)3 ) bit operations).
e
For the rest of this section we consider algorithms to compute an -isogeny : E E
e
given an elliptic curve E and the j-invariant of E.
Exercise 25.2.3. Let E be an elliptic curve over Fq and let E over Fq be a twist of E.
Show that there is an Fq -rational isogeny of degree from E (to some elliptic curve) if and
only if there is an Fq -rational isogeny of degree from E . Show that End(E)
= End(E )
(where
= denotes ring isomorphism).
25.2.1
Elkies Algorithm
Let > 2 be a prime and let E be an elliptic curve over k where char(k) = 0 or char(k) >
+ 2. Assume j(E) 6= 0, 1728 (for the case j(E) {0, 1728} one constructs isogenies
using the naive method or the methods of the following sections). Let k be such that
(j(E), ) = 0. We also assume that is a simple root of (j(E), y) (more precisely,
( (x, y)/x)(j, ) 6= 0 and ( (x, y)/y)(j, ) 6= 0); see page 248 of Schoof [526] for a
discussion of why this condition is not too severe.
e such
Elkies gave a method to determine an explicit equation for an elliptic curve E,
e
e
that j(E) = , and a polynomial giving the kernel of an -isogeny from E to E. Elkies
original motivation (namely, algorithms for point counting) only required computing the
kernel polynomial of the isogeny, but as we have seen, from this information one can easily
compute the rational functions describing the isogeny. The method also works when > 2
is composite, but that is not of practical relevance. The condition that char(k) not be
small (if it is non-zero) is essential.
We use the same notation as in Lemma 25.1.16: (x) is the polynomial of degree
( 1)/2 whose roots are the x-coordinates of affine points in the kernel G of the isogeny
and si are the i-th symmetric polynomials in these roots. We also define
X
xiP
pi =
P G{OE }
so that p1 = 2s1 and p2 = 2(s21 2s2 ) (these are Newtons formulae; see Lemma 10.7.6).
e (up to isomorphism)
While the value specifies the equation for the isogenous curve E
it does not, in general, determine the isogeny (see pages 37 and 44 of Elkies [192] for
discussion). It is necessary to have some extra information, and for this the coefficient p1
suffices and can be computed using partial derivatives of the modular polynomial (this is
why the condition on the partial derivatives is needed).
The explanation of Elkies algorithm requires theory that we do not have space to
present. We refer to Schoof [526] for a good summary of the details (also see Elkies [192]
for further discussion). The basic idea is to use the fact (Deuring lifting theorem) that
the isogeny lifts to an isogeny between elliptic curves over C. One can then interpret the
-isogeny in terms of Tate curves5 C /q Z (we have not presented the Tate curve in this
5 The
555
book; see Section C.14 of [560] or Section 5.3 of [561]) as a map from q(z) to q(z) . As
discussed on page 40 of Elkies [192], this isogeny is not normalised. There are a number
of relations between the modular j-function, certain Eisenstein series, the equation of the
elliptic curve (in short Weierstrass form) and the kernel polynomial of the isogeny. These
relations give rise to formulae that must also hold over k. Hence, one can work entirely
over k and obtain the kernel polynomial.
The details of this approach are given in Sections 7 and 8 of Schoof [526]. In particular:
Theorem 7.3 shows how to get j (derivative); Proposition 7.1 allows one to compute the
coefficients of the elliptic curve; Proposition 7.2 gives the coefficient p1 of the kernel polynomial (which is a function of values specified in Proposition 7.1 and Theorem 7.3). The
coefficients of the kernel polynomial are related to the coefficients of the series expansion
of the Weierstrass -function (see Theorem 8.3 of [526]).
The algorithm is organised as follows (see Algorithm 28). One starts with an ordinary
elliptic curve E : y 2 = x3 + Ax + B over k and j = j(E). We assume that j 6 {0, 1728}
)(j, ). One computes the derivative j
( x2 )(j, ), yy = ( y2 )(j, ) and xy = ( xy
and the corresponding values for E4 and E6 . Given one computes and then the
of the image curve E.
Finally one computes p1 , from which it is
coefficients A and B
relatively straightforward to compute all the coefficients of the kernel polynomial (x).
1: Compute x , y , xx , yy and xy
Compute A and B
= m
= m
4: Let A
k/48
and B
k/864
5: Let r = (j 2 xx + 2j xy + 2 2 yy )/(j x )
Compute p1
end for
for n = 3 to d 1 do
Let
tn+1 =
16:
17:
18:
19:
20:
21:
end for
Let s0 = 1
Compute the symmetric functions sn of the roots of (x)
for n = 1 to d do
Pn
i
Let sn = 1
i=1 (1) ti sni
n
end for
P
return (x) = di=0 (1)i si xdi
556
Exercise 25.2.4. Show that Elkies algorithm requires O(d2 ) = O(2 ) operations in k.
Bostan, Morain, Salvy and Schost [93] have given algorithms (exploiting fast arithmetic on power series) based on Elkies methods. The algorithms apply when the characteristic of the field is zero or is sufficiently large compared with . There is a slight
difference in scope: Elkies starts with only j-invariants whereas Bostan et al assume that
e in short Weierstrass form such that there is a norone is given elliptic curves E and E
e In general, one needs to perform Elkies
malised isogeny of degree over k from E to E.
e and so the computations with modular
method before one has such an equation for E
curves still dominate the cost. Theorem 2 of [93] states that one can compute the rational
functions giving the isogeny in O(M ()) operations in k when char(k) > 2 1 and when
the coefficient p1 is known. Note that Bostan et al are not restricted to prime degree
isogenies. An application of the result of Bostan et al is to determine whether there is
e without needing to compute modular polynomials.
a normalised isogeny from E to E
e
Lercier and Sirvent [381] (again, assuming one is given explicit equations for E and E
such that there is a normalised -isogeny between them) have showed how to achieve a
similarly fast method even when the characteristic of the field is small.
A number of calculations can fail when char(k) is non-zero but small compared with
, due to divisions by small integers arising in the power series expansions for the modular functions. Algorithms for the case of small characteristic will be mentioned in Section 25.2.3.
25.2.2
Starks Algorithm
Stark [577] gave a method to compute the rational function giving the x-coordinate of an
endomorphism : E E corresponding to a complex number (interpreting End(E) as
a subset of C). The idea is to use the fact that, for an elliptic curve E over the complex
numbers given by short Weierstrass form,
(z) =
A((z))
B((z))
(25.3)
where A and B are polynomials and where (z) = z 2 + 3G4 z 2 + is the Weierstrass
function (see Theorem VI.3.5 of Silverman [560]). This isogeny is not normalised (since
(z) = 2 z 2 + it follows that the pullback of E under is E ). Starks idea
is to express as a (truncated) power series in z; the coefficients of this power series
are determined by the coefficients of the elliptic curve E. One computes A and B by
taking the continued fraction expansion of the left hand side of equation (25.3). One can
apply this algorithm to curves over finite fields by applying the Deuring lifting theorem.
Due to denominators in the power series coefficients of (z) the method only works when
char(k) = 0 orchar(k) is sufficiently large. Starks paper [577] gives a worked example in
the case = 2.
e by writing e (z) = A(E (z))/B(E (z))
The idea generalises to normalised isogenies : E E
E
where now the power series for E (z) and Ee (z) are different since the elliptic curve
equations are different. Note that it is necessary to have actual curve equations for the
normalised isogeny, not just j-invariants. We refer to Section 6.2 of Bostan, Morain, Salvy
and Schost [93] for further details and complexity estimates.
25.2.3
As we have seen, the Elkies and Stark methods require the characteristic of the ground
field to be either zero or relatively large since they use lifting to short Weierstrass forms
557
over C and since the power series expansions have rational coefficients that are divisible
by various small primes. Hence, there is a need for algorithms that handle the case
when char(k) is small (especially, char(k) = 2). A number of such methods have been
developed by Couveignes, Lercier, Morain and others. We briefly sketch Couveignes
second method [154].
Let p be the characteristic of the field. We assume that p is small (in the sense that
e be ordinary6
an algorithm performing p operations is considered efficient). Let E and E
elliptic curves over Fpm .
e is an isogeny of odd prime degree
The basic idea is to use the fact that if : E E
6= p (isogenies of degree p are easy: they are either Frobenius or Verschiebung) then
e Hence, one can try to determine
maps points of order pk on E to points of order pk on E.
the rational functions describing by interpolation from their values on E[pk ]. One could
interpolate using any torsion subgroup of E, but using E[pk ] is the best choice since it
is a cyclic group and so there are only (pk ) = pk pk1 points to check (compared
e in the
with (n2 ) if using E[n]). The method can be applied to any elliptic curve E
isomorphism class, so in general it will not return a normalised isogeny.
Couveignes method is as follows: First, compute points P E[pk ] E[pk1 ] and
e k ] E[p
e k1 ] over Fp and guess that (P ) = Pe . Then try to determine the
Pe E[p
rational function 1 (x) = u(x)/v(x) by interpolating 1 (x([i]P )) = x([i]Pe ); if this does
not work then try another guess for Pe. The interpolation is done as follows (we assume
pk > 2). First, compute a polynomial A(x) of degree d where 2 < d pk such that
Qd
A(x([i]P )) = x([i]Pe ) for 1 i d. Also compute B(x) = i=1 (x x([i]P )). If the guess
for Pe is correct then A(x) u(x)/v(x) (mod B(x)) where deg(u(x)) = , deg(v(x)) = 1
and v(x) is a square. Writing this equation as A(x)v(x) = u(x) + B(x)w(x) it follows
that u(x) and v(x) can be computed using Euclids algorithm. The performance of the
algorithm depends on the method used to determine points in E[pk ], but is dominated
by the fact that these points lie in an extension of the ground field of large degree, and
that one expects to try around 21 pk choices for Pe before hitting the right one. The
complexity is polynomial in , p and m (where E is over Fpm ). When p = 2 the fastest
method was given by Lercier [379]. For further details we refer to the surveys by Lercier
and Morain [380] and De Feo [201].
25.3
Let E be an elliptic curve over Fq . The Fq -isogeny class of E is the set of Fq -isomorphism
classes of elliptic curves over Fq that are isogenous over Fq to E. Tates isogeny theorem
e over Fq are Fq -isogenous if and only if #E(Fq ) =
states that two elliptic curves E and E
e q ) (see Theorem 9.7.4 for one implication).
#E(F
We have seen in Sections 9.5 and 9.10 that the number of Fq -isomorphism classes
of elliptic curves over Fq is roughly 2q and that there are roughly 4 q possible values
for #E(Fq ). Hence, if isomorphism classes were distributed uniformly across all group
orders one would expect around 21 q elliptic curves in each isogeny class. The theory of
complex multiplication gives a more precise result (as mentioned in Section 9.10.1). We
6 The restriction to ordinary curves is not a significant problem. In practice we are interested in elliptic
curves over Fpm where m is large, whereas supersingular curves are all defined over Fp2 . Indeed, for small
p there are very few supersingular curves, and isogenies of small degree between them can be computed
by factoring division polynomials and using V
elus formulae.
558
denote by q the q-power Frobenius map; see Section 9.10 for its properties. The number
of Fq -isomorphism classes of ordinary elliptic curves
over Fq with q + 1 t points is the
p
Hurwitz class number of the ring Z[q ] = Z[(t+ t2 4q)/2]. This is the sum of the ideal
class numbers h(O) over all orders Z[q ] O OK . It follows (see Remark 9.10.19) that
there are O(q 1/2 log(q) log(log(q))) elliptic curves in each isogeny class. For supersingular
curves see Theorem 9.11.11.
Definition 25.3.1. Let E be an elliptic curve over a field k of characteristic p. Let S N
be a finite set of primes. Define
XE,k,S
to be the (directed) graph7 with vertex set being the k-isogeny class of E. Vertices are
e though we also speak of the vertex E.
e 8 There is a (directed)
typically labelled by j(E),
edge (j(E1 ), j(E2 )) labelled by for each equivalence class of -isogenies from E1 to E2
defined over k for some S. We usually treat this as an undirected graph, since for
every -isogeny : E1 E2 there is a dual isogeny b : E2 E1 of degree (though see
Remark 25.3.2 for an unfortunate, though rare, complication).
Remark 25.3.2. Edges in the isogeny graph correspond to equivalence classes of isogenies. It can happen that two non-equivalent isogenies from E1 E2 have equivalent
dual isogenies from E2 E1 . It follows that there are two directed edges in the graph
from E1 to E2 but only one directed edge from E2 to E1 . (Note that this does not conb
tradict the fact that isogenies satisfy b = , as we are speaking here about isogenies up
to equivalence.) Such an issue was already explained in Exercise 25.1.1; it only arises if
#Aut(E2 ) > 2 (i.e., if j(E2 ) = 0, 1728).
Definition 25.3.3. A (directed) graph is k-regular if every vertex has (out-)degree k
(a loop is considered as having degree 1). A path in a graph is a sequence of (directed)
edges between vertices, such that the end vertex of one edge is the start vertex of the
next. We will also describe a path as a sequence of vertices. A graph is connected if there
is a path from every vertex to every other vertex. The diameter of a connected graph is
the maximum, over all pairs v1 , v2 of vertices in the graph, of the length of the shortest
path from v1 to v2 .
There are significant differences (both in the structure of the isogeny graph and the
way it is used in applications) between the ordinary and supersingular cases. So we
present them separately.
25.3.1
Fix an ordinary elliptic curve E over Fq such that #E(Fq ) = q + 1 t. The isogeny graph
of elliptic curves isogenous over Fq to E can be identified, using the theory of complex
multiplication, with a graph whose vertices are ideal classes (in certain orders). The goal
of this section is to briefly sketch this theory in the special case (the general case is given
in Section 25.4) of the sub-graph where all elliptic curves have the same endomorphism
ring, in which case the edges correspond to multiplication by prime ideals. We do not
7 Some authors would use the name multi-graph, since there can be loops and/or multiple edges
between vertices.
8 In the supersingular case one can label vertices as j(E)
e without ambiguity only when k is algebraically
closed: when k is finite then, in the supersingular case, one has two distinct vertices in the graph for
e and its quadratic twist. For the ordinary case there is no ambiguity by Lemma 9.11.12 (also see
E
Exercise 25.3.8).
559
have space to give all the details; good references for the background are Cox [156] and
Lang [363].
The p
endomorphism ring of E (over Fq ) is an order O in the quadratic imaginary field
K = Q( t2 4q). (We refer to Section A.12 for background
p on orders and conductors.)
Let OK be the ring of integers of K. Then Z[q ] = Z[(t + t2 4q)/2] O OK and
if OK = Z[] then O = Z[c] where c is the conductor of O. The ideal class group Cl(O)
is defined to be the classes of invertible O-ideals that are prime to the conductor; see
Section 7 of [156] or Section 8.1 of [363]. There is an explicit formula for the order h(O)
of the ideal class group Cl(O) in terms of the class number h(OK ) of the number field;
see Theorem 7.24 of [156] or Theorem 8.7 of [363].
There is a one-to-one correspondence between the set of isomorphism classes of elliptic
curves E over Fq with End(E) = O and the set Cl(O). Precisely, to an invertible O-ideal
a one associates the elliptic curve E = C/a over C. An O-ideal a is equivalent to a in
Cl(O) if and only if C/a is isomorphic to E. One can show that End(E) = O. The theory
of complex multiplication shows that E is defined over a number field (called the ring
class field) and has good reduction modulo the characteristic p of Fq . This correspondence
is not canonical, since the reduction modulo p map is not well-defined (it depends on a
choice of prime ideal above p in the ring class field).
Let a be an invertible O-ideal and E = C/a. Let l be an invertible O-ideal and,
interpreting l End(E), consider the set E[l] = {P E(C) : (P ) = OE for all l}.
Since O C we can interpret l C, in which case
E[l]
= l1 a/a.
It follows that #E[l] is equal to the norm of the ideal l. The identity map on C induces
the isogeny
C/a C/l1 a
with kernel l1 a/a
= E[l]. The above remarks apply to elliptic curves over C, but the
theory reduces well to elliptic curves over finite fields, and indeed, every isogeny from E
e with End(E) = End(E)
e arises in this way. This shows that, not
to an elliptic curve E
only do ordinary elliptic curves correspond to ideals in O, but so do their isogenies.
Exercise 25.3.4. Show that if l = () where N then the isogeny C/a C/l1 a is [].
e
Exercise 25.3.5. Suppose the prime splits in O as () = l1 l2 in O. Let : E E
b
correspond to the ideal l1 . Show that corresponds to l2 .
Let be a prime. Then splits in OK = Z[] if and only if the minimal polynomial of
factors modulo with two linear factors. If D is the discriminant of K then splits if
and only if the Kronecker symbol satisfies ( D ) = +1. Note that the Kronecker symbol is
the Legendre symbol when is odd and
D 0 (mod 4),
0
1
D 1 (mod 8),
)
=
(25.4)
(D
2
1 D 5 (mod 8).
Let E be an elliptic curve over Fq with End(E) = O and let be coprime to the conductor
of O. There are 1 + ( D ) prime ideals l above , and so there are this many isogenies of
degree from E. It follows that there are -isogenies in the isogeny graph for roughly half
the primes .9
9 Of course, there are still + 1 isogenies of degree for each , but the rest of them are not to curves
e such that End(E)
e = O.
E
560
25.3.2
561
Let X be a finite (directed) graph on vertices labelled {1, . . . , n}. The adjacency matrix
of X is the n n integer matrix A with Ai,j being the number of edges from vertex i
to vertex j. The eigenvalues of a finite graph X are defined to be the eigenvalues
of its adjacency matrix A. For the rest of this section we assume that all graphs are
un-directed. Since the adjacency matrix of an un-directed graph is real and symmetric,
the eigenvalues are real.
Lemma 25.3.11. Let X be a k-regular graph. Then k is an eigenvalue, and all eigenvalues are such that || k.
Proof: The first statement follows since (1, 1, . . . , 1) is an eigenvector with eigenvalue
k. The second statement is also easy (see Proposition 1.1.2 of Davidoff, Sarnak and
Valette [165] or Theorem 1 of Murty [443]).
Let X be a k-regular graph. We denote by (X) the maximum of the absolute values
of all the eigenvalues that are not equal to k. Alon and Boppana showed that the lim
inf of (X)
over any family of k-regular graphs (as the number of vertices goes to ) is
at least 2 k 1 (see Theorem 1.3.1 of Davidoff, Sarnak and Valette [165], Theorem 3.2
of Pizer [478]
or Theorem 10 of Murty [443]). The graph X is said to beRamanujan if
(X) 2 k 1. Define 1 (X) to be the largest eigenvalue that is strictly less than k.
Let G be a finite group and S a subset of G such that g S implies g 1 S (we also
allow S to be a multi-set). The Cayley graph of G is the graph X with vertex set G
and an edge between g and gs for all g G and all s S. Murty [443] surveys criteria
for when a Cayley graph is a Ramanujan graph. If certain character sums are small then
X may be a Ramanujan graph; see Section 2 of [443].
Definition 25.3.12. Let X be a graph and A a subset of vertices of X. The vertex
boundary of A in X is
v (A) = {v X A : there is an edge between v and a vertex in A}.
Let EX be the set of edges (x, y) in X. The edge boundary of A in X is
e (A) = {(x, y) EX : x A and y X A}.
Let c > 0 be real. A k-regular graph X with #X vertices is a c-expander if, for all
subsets A X such that #A #X/2,
#v (A) c#A.
Exercise 25.3.13. Show that v (A) e (A) kv (A).
Exercise 25.3.14. Let X be a k-regular graph with n vertices that is a c-expander.
Show that if n is even then 0 c 1 and if n is odd then 0 c (n + 1)/(n 1).
Expander graphs have a number of theoretical applications; one important property
is that random walks on expander graphs reach the uniform distribution quickly.
Let X be a k-regular graph. Then
#e (A)
k 1 (X)
#A
2
(25.5)
when #A #X/2 (see Theorem 1.3.1 of Davidoff, Sarnak and Valette [165] or Section 4
of Murty [443]10 ). Hence #v (A) ( 12 1 (X)/(2k))#A and so Ramanujan graphs are
expander graphs. Indeed, small values for 1 (X) give large expansion factors. We refer
to [165, 443] for further details, and references.
562
25.3.3
For the supersingular isogeny graph we work over Fp . The graph is finite. Indeed,
Theorem 9.11.11 implies p/12 1 < #XE,Fp ,S < p/12 + 2. Note that it suffices to
consider elliptic curves defined over Fp2 (although the isogenies between them are over
Fp in general).
In contrast to the ordinary case, the supersingular graph is always connected using
isogenies of any fixed degree. A proof of this result, attributed to Serre, is given in
Section 2.4 of Mestre [416].
e be supersingular elliptic curves
Theorem 25.3.17. Let p be a prime and let E and E
e over
over Fp . Let be a prime different from p. Then there is an isogeny from E to E
Fp whose degree is a power of .
that the proof on page 16 of [443] is for e (A), not v (A) as stated.
example was shown to me by David Kohel.
563
Exercise 25.3.20. Find a prime p such that the set of isomorphism classes of supersingular elliptic curves over Fp does not form a connected subgraph of XE,Fp ,{2} .
There is a one-to-one correspondence between supersingular elliptic curves E over Fp
and projective right modules of rank 1 of a maximal order of the quaternion algebra over
Q ramified at p and (see Section 5.3 of Kohel [347] or Gross [267]). Pizer has exploited
this structure (and connections with Brandt matrices and Hecke operators) to show that
the supersingular isogeny graph is a Ramanujan graph. Essentially, the Brandt matrix
gives the adjacency matrix of the graph. A good survey is [478], though be warned that
the paper does not mention the connection to supersingular elliptic curves.
The supersingular isogeny graph has been used by Charles, Goren and Lauter [128] to
construct a cryptographic hash function. It has also been used by Mestre and Oesterle [416]
for an algorithm to compute coefficients of modular forms.
25.4
This section presents Kohels results on the structure of the isogeny graph of ordinary
elliptic curves over finite fields. Section 25.4.2 gives Kohels algorithm to compute End(E)
for a given ordinary elliptic curve over a finite field.
25.4.1
Isogeny Volcanoes
Let E be an ordinary p
elliptic curve over Fq and let #E(Fq ) = q + 1 t. Denote by
K the number field Q( t2 4q) and by OK the ring of integers of K. We
p know that
End(E) = EndFq (E) is an order in OK that contains the order Z[q ] = Z[(t+ t2 4q)/2]
of discriminant t2 4q. Let K be the discriminant of K, namely K = (t2 4q)/c2
where c is the largest positive integer such that K is an integer congruent to 0 or 1
modulo 4. The integer c is the conductor of the order Z[q ].
Suppose E1 and E2 are elliptic curves over Fq such that End(Ei ) = Oi , for i = 1, 2,
where O1 and O2 are orders in K containing Z[q ]. We now present some results about
the isogenies between such elliptic curves.
e be an isogeny of elliptic curves over Fq . If [End(E) :
Lemma 25.4.1. Let : E E
e
End(E)] = (or vice versa) then the degree of is divisible by .
e : O] = , [O : End(E)]
e = ).
e
End(E) = O (respectively, [End(E)
Example 25.4.4. We now give a picture of how the orders relate to one another. Suppose
the conductor of Z[q ] is 6 (e.g., q = 31 and t = 4), so that [OK : Z[q ]] = 6. Write
OK = Z[]. Then the orders O2 = Z[2] and O3 = Z[3] are contained in OK and are
564
O3
O2
3
Z[q ]
Definition 25.4.5. Let the notation be as above. If End(E) = OK then E is said to be
on the surface of the isogeny graph.12 If End(E) = Z[q ] then E is said to be on the
floor of the isogeny graph.
By the theory of complex multiplication, the number of isomorphism classes of elliptic
curves over Fq on the surface is equal to the ideal class number of the ring OK .
Theorem 25.4.6. Let E be an ordinary elliptic curve over Fq as above and let O =
End(E) be an order in OK containing Z[q ]. Let c = [OK : O] and let be a prime.
e arises from one of the following cases.
Every -isogeny : E E
2
the ideal class number of Q( 7) is 1, it follows that E is the unique elliptic curve up to
isomorphism on the surfaceof the isogeny graph.
Since 2 splits in Z[(1 + 7)/2] there are two 2-isogenies from E to elliptic curves on
the surface (i.e., to E itself) and so there is only one 2-isogeny down from E. Using the
modular polynomial we deduce that the 2-isogeny down maps to the isomorphism class
12 Kohels metaphor was intended to be aquatic: the floor represents the ocean floor and the surface
represents the surface of the ocean.
13 The
symbol ( t
4q
)
42
O2
14
33
O3
O6
57
51
44
565
35
18
32
Figure 25.2: A 2-isogeny graph with two volcanoes. The symbols on the left hand side
denote the endomorphism ring of curves on that level, using the same notation as Example 25.4.4.
of elliptic curves with j-invariant 14. One can verify that the only 2-isogeny over Fq from
j = 14 is the ascending isogeny back to j = 42.
We have ( 7
3 ) = 1 so there are no horizontal 3-isogenies from E. Hence, we expect
four 3-isogenies down from E. Using the modular polynomial we compute the corresponding j-invariants to be 33, 35, 51 and 57. One can now consider the 2-isogeny graphs
containing these elliptic curves on their surfaces. It turns out that the graph is connected,
and that there is a cycle of horizontal 2-isogenies from j = 33 to j = 51 to j = 35 to
j = 57. For each vertex we therefore only expect one 2-isogeny down to the floor. The corresponding j-invariants are 44, 4, 18 and 32 respectively. Figure 25.2 gives the 2-isogeny
graph in this case.
Exercise 25.4.9. Draw the 3-isogeny graph for the elliptic curves in Example 25.4.8. Is
XE,Fq ,{2,3} connected? If so, what is its diameter?
Fix a prime | c where c is the conductor of Z[q ]. Consider the sub-graph of the
isogeny graph corresponding to isogenies whose degree is equal to . We call this the
-isogeny graph. This graph is often not connected (for example, it is not connected when
c is not a power of or when primes above do not generate Cl(OK )). Even when splits
and c is 1 or a power of , the graph is often not connected (the graph is connected only
when the prime ideals above generate the ideal class group). Theorem 25.4.6 shows
that each component of the -isogeny graph has a particular shape (that Fouquet and
Morain [208] call a volcano).
We now give a precise definition for volcanoes. Letp
#E(Fq ) = q + 1 t and let c be
the conductor of Z[q ] and suppose m kc. Let K = Q( t2 4q) and denote by OK the
maximal order in K. A volcano is a connected component of the graph XE,Fq ,{} . A
volcano has m + 1 levels V0 , . . . , Vm , being sub-graphs of the -isogeny graph; where
e such
vertices in Vi (i.e., on level i) correspond to isomorphism classes of elliptic curves E
e
that i k[OK : End(E)].
In other words, V0 is on the surface of this component of the
-isogeny graph (but not necessarily on the surface of the entire isogeny graph XE,Fq ,S )
and Vm is on the floor of this component of the -isogeny graph (though, again, not
566
necessarily on the floor of the whole isogeny graph). The surface of a volcano (i.e., V0 )
is also called the crater. The graph V0 is a connected regular graph with each vertex of
degree at most 2. For all 0 < i m and every vertex v Vi there is a unique edge from
v up to a vertex in Vi1 . For all 0 i < m and every v Vi , the degree of v is + 1.
Every vertex in Vm has degree 1.
25.4.2
Kohel used the results of Section 25.4.1 to develop deterministic algorithms for computing
End(E) (i.e., determining the level) for an elliptic curve E over Fq . We sketch the
algorithm for ordinary curves. Two facts are of crucial importance in Kohels algorithm.
The first (Corollary 25.4.7) is that one can recognise the floor when standing on it. The
second fact is that if one starts a chain of -isogenies with a descending isogeny, and avoids
backtracking, then all the isogenies in the chain are descending.
Before going any further, we discuss how to compute a non-backtracking chain of isogenies. Given j(E) one can compute the j-invariants of -isogenous curves over Fq by
computing the roots of F (y) = (j(E), y) in Fq . Recall that one computes (x, y) in
O(3+ ) bit operations and finds the roots of F (y) in Fq in expected time bounded by
O(2 log() log(q)) operations in Fq . Let j0 = j(E) and let j1 be one of the roots of of
F (y). We want to find, if possible, j2 Fq such that there are -isogenies from E to E1
(where j(E1 ) = j1 ) and from E1 to E2 (where j(E2 ) = j2 ) and such that j2 6= j0 (so the
second isogeny is not the dual of the first). The trick is to find roots of (j1 , y)/(y j0 ).
This process can be repeated to compute a chain j0 , j1 , j2 , . . . of j-invariants of -isogenous
curves. As mentioned earlier, an alternative approach to walking in the isogeny graph is
to find Fq -rational factors of the -division polynomial and use Velus formulae; this is
less efficient in general and the method to detect backtracking is to compute the image
curve using Velus formulae and then compute its j-invariant.
The basic idea of Kohels algorithm is, for each prime dividing14 the conductor of
Z[q ], to find a chain of -isogenies from E to an elliptic curve on the floor. Suppose is a
prime and m kc. Kohel (on page 46 of [347]) suggests to take two non-backtracking chains
of -isogenies of length at most m from E. If E is on the floor then this is immediately
detected. If E is not on the surface then at least one of the initial -isogenies is descending,
so in at most m steps one finds oneself on the floor. So if after m steps neither chain of
isogenies has reached the floor then it follows that we must have started on the surface
(and some or all of the -isogenies in the chain were along the surface). Note that, apart
from the algorithm for computing roots of polynomials, the method is deterministic.
Exercise 25.4.10. Let E : y 2 = x3 + 3x + 6 over F37 be an elliptic curve. Note that
#E(F37 ) = 37 + 1 4 and 42 4 37 = 24 7. Hence the conductor is 4. We have
j(E) = 10. Using the modular polynomial one finds the following j-invariants of elliptic
curves 2-isogenous to E: 11, 29, 31. Further, there is a single 2-isogeny from j-invariants
11, 31 (in both cases, back to j = 10). But from 29 there is a 2-isogeny to j = 10 and two
2-isogenies to j = 29. What is End(E)? Give j-invariants for a curve on the floor and a
curve on the surface.
The worst case of Kohels algorithm is when the conductor is divisible by one or more
very large primes (since determining the j-invariant of an -isogenous curve is polynomial
in and so exponential in the input size). Since c can be as big as q the above method
(i.e., taking isogenies to the floor) would therefore have worst-case complexity of at least
14 It is necessary to find the square factors of t2 4q, which can be done in deterministic time O(q
1/6 );
see Exercise 12.5.1.
567
q 1/2 bit operations (indeed, it would be O(q 3/2+ ) operations in Fq if one includes the
cost of generating modular polynomials). Kohel (pages 53 to 57 of [347]) noted that when
is very large one can more efficiently resolve the issue of whether or not divides the
conductor by finding elements in the ideal class group that are trivial for the maximal
order but non-trivial for an order whose conductor is divisible by ; one can then test
such a relation using isogenies. Using these ideas Kohel proves in Theorem 24 of [347]
that, assuming a certain generalisation of the Riemann hypothesis, his algorithm requires
O(q 1/3+ ) bit operations. Kohel also considers the case of supersingular curves.
Bisson and Sutherland [58] consider a randomised version of Kohels method using
ideas from index calculus algorithms in ideal class groups.
Their algorithm has heuris
tic subexponential expected complexity of O(Lq (1/2, 3/2)) bit operations. We do not
present the details.
Remark 25.4.11. An important remark is that neither of the two efficient ways to
generate elliptic curves over finite fields is likely to lead to elliptic curves E such that the
conductor of End(E) is divisible by a large prime.
When generating elliptic curves by choosing a random curve over a large prime field
and counting points, then t2 4q behaves like a random integer and so is extremely
unlikely to be divisible by the square of a very large prime
When using the CM method then it is automatic that the curves have q+1t points
where t2 4q has a very large square factor. It is easy to arrange that the square
factor is divisible by a large prime. However, the elliptic curve itself output by the
CM method has End(E) being the maximal order. To get End(E) to be a nonmaximal order one can either use class polynomials corresponding to non-maximal
orders or use descending isogenies. Either way, it is infeasible to compute a curve
E such that a very large prime divides the conductor of End(E). Furthermore,
Kohels algorithm is not needed in this case, since by construction one already
knows End(E).
Hence, in practice, the potential problems with large primes dividing the conductor of
End(E) do not arise. It is therefore safe to assume that determining End(E) in practice
is easy.
25.5
e over
The isogeny problem for elliptic curves is: given two elliptic curves E and E
Fq with the same number of points, to compute an isogeny between them. Solving this
problem is usually considered in two stages:
1. Performing a pre-computation, that computes a chain of prime-degree isogenies
e The chain is usually computed as a sequence of explicit isogenies,
from E to E.
though one could store just the Elkies information for each isogeny in the chain.
2. Given a specific point P E(Fq ) to compute the image of P under the isogeny.
The precomputation is typically slow, while it is desirable that the computation of the
isogeny be fast (since it might be performed for a large number of points).
An algorithm to solve the isogeny problem, requiring exponential time and space in
terms of the input size, was given by Galbraith [216]. For the case of ordinary elliptic
curves, an improved algorithm with low storage requirements was given by Galbraith,
Hess and Smart [220]. We briefly sketch both algorithms in this section.
568
We now make some preliminary remarks in the ordinary case. Let c1 be the conductor
e If there is a large prime that divides c1 but
of End(E) and c2 the conductor of End(E).
e will have degree divisible by
not c2 (or vice versa), then any isogeny between E and E
and hence the isogeny will be slow to compute. Since the conductor is a square factor
of t2 4q it can be, in theory, as big as q 1/2 . It follows that one does not expect an
efficient algorithm for this problem in general. However, as mentioned in Remark 25.4.11,
in practice one can ignore this bad case and assume the primes dividing the conductor
are moderate.
e =
For the rest of this section, in the ordinary case, we assume that End(E) = End(E)
O. (If this is not the case then take vertical isogenies from E to reduce to it.) Usually O
is the maximal order. This is desirable, because the class number of the maximal order is
typically smaller (and never larger) than the class number of the sub-orders, and so the
algorithm to find the isogeny works more quickly in this case. However, for the sake of
generality we do not insist that O is the maximal order. The general case could appear
if there is a very large prime dividing the conductor of O.
25.5.1
The algorithm of Galbraith [216] finds a path between two vertices in the isogeny graph
XE,k,S using a breadth-first search (see Section 22.2 of [145]). This algorithm can be used
in both the ordinary and supersingular cases. More precisely, one starts with sets X0 =
e (we are assuming the vertices of the isogeny graph are labelled
{j(E)} and Y0 = {j(E)}
by j-invariants) and, at step i, computes Xi = Xi1 v (Xi1 ) and Yi = Yi1 v (Yi1 )
where v (X) is the set of vertices in the graph that are connected to a vertex in X by
an edge. Computing v (X) involves finding the roots in k of (j, y) for every j X
and every S. In the supersingular case the set S of possible isogeny degrees usually
consists of a single prime . In the ordinary case S could have as many as log(q) elements,
and one might not compute the whole of v (X) but just the boundary in a subgraph
corresponding to a (random) subset of S. In either case, the cost of computing v (X) is
clearly proportional to #X.15 The algorithm stops when Xi Yi 6= , in which case it is
e
easy to compute the isogeny from E to E.
Exercise 25.5.1. Write pseudocode for the above algorithm.
Under the (probably unreasonable) assumption that new values in v (Xi ) behave like
uniformly chosen elements in the isogeny graph, one expects frompthe birthday paradox
that the two sets have non-empty intersection when #Xi + #Yi #XE,k,S . Since the
graph is an expander, we know that #Xi = #Xi1 + #v (Xi1 ) (1 + c)#Xi1 when
Xi1 is small, and so #Xi (1 + c)i .
In the supersingular case we have #XE,k,S = O(q) and in the ordinary case we have
#XE,k,S = h(O) = O(q 1/2 log(q)). In both cases, one expects the algorithm to terminate
after O(log(q)) steps. Step i involves, for every vertex j Xi (or j v (Xi1 )) and every
S, computing roots of (j, y) in Fq . One can check that if all are polynomially
p
bounded in log(q) then the expected number of bit operations is bounded by #XE,k,S
times a polynomial in log(q).
1/2 ) bit operations.
In the supersingular case the algorithm performs an expected O(q
In the ordinary case, by Bachs result (and therefore, assuming various generalisations
of the Riemann hypothesis) we can restrict to isogenies of degree at most 6 log(q)2 and
so each step is polynomial-time (the dominant cost of each step is finding roots of the
15 When all S are used at every step, to compute (X ) it suffices to consider only vertices
v
i
j v (Xi1 ).
569
modular polynomial; see Exercise 25.2.2). The total complexity is therefore an expected
1/4 ) bit operations. The storage required is expected to be O(q 1/4 log(q)2 ) bits.
O(q
Exercise 25.5.2. Let m N. Suppose all S are such that = O(log(q)m ). Let
e be the isogeny output by the Galbraith algorithm. Show, under the same
: E E
heuristic assumptions as above, that one can evaluate (P ) for P E(Fq ) polynomialtime.
Exercise 25.5.3. Isogenies of small degree are faster to compute than isogenies of large
degree. Hence, the average cost to compute an -isogeny can be used as a weight for
the edges in the isogeny graph corresponding to -isogenies. It follows that there is a
well-defined notion of shortest path in the isogeny graph between two vertices. Show how
Dijkstras algorithm (see Section 24.3 of [145]) can be used to find a chain of isogenies
between two elliptic curves that can be computed in minimal time. What is the complexity
of this algorithm?
25.5.2
We now restrict to the ordinary isogeny graph and sketch the algorithm of Galbraith,
Hess and Smart [220]. The basic idea is to replace the breadth-first search by a random
walk, similar to that used in the kangaroo algorithm.
Let H be a hash function from Fq to a set S of prime ideals of small norm. One
e and stores ideals a0 = (1), b0 = (1).
starts random walks at x0 = j(E) and y0 = j(E)
One can think of (x0 , a0 ) as a tame walk and (y0 , b0 ) as a wild walk. Each step of
the algorithm computes new values xi and yi from xi1 and yi1 : To compute xi set
l = H(xi1 ) and = N (l); find the roots of (xi1 , z); choose the root corresponding to
the ideal l (using the trick mentioned in Remark 25.3.9) and call it xi . The same process
is used (with the same function H) for the sequence yi . The ideals are also updated as
ai = ai1 l (reduced in the ideal class group to some short canonical representation of
ideals). If xi = yj then the walks follow the same path. We designate certain elements of
Fq as being distinguished points, and if xi or yi is distinguished then it is stored together
with the corresponding ideal a or b. After a walk hits a distinguished point there are two
choices: it could be allowed to continue or it could be restarted at a j-invariant obtained
by taking a short random isogeny chain (perhaps corresponding to primes not in S) from
e
E or E.
Once a collision is detected one has an isogeny corresponding to ideal a from j(E) to
e to j. Hence, the ideal ab1
some j, and an isogeny corresponding to ideal b from j(E)
e
gives the isogeny from j(E) to j(E).
Stolbunov has noted that, since the ideal class group is Abelian, it is not advisable to
choose S such that l, l1 S (since such a choice means that walks remain close to the
original j-invariant, and cycles in the random walk might arise). It is also faster to use
isogenies of small degree more often than those with large degree. We refer to Galbraith
and Stolbunov [229] for further details.
The remaining problem is that the ideal ab1 is typically of large norm. By construction, it is a product of exponentially many small primes. Since the ideal class group is
commutative, such a product has a short representation (storing the exponents for each
prime), but this leads to an isogeny that requires exponential computation. The proposal
from [220] is to represent ideals using the standard representation for ideals in quadratic
fields, and to smooth the ideal using standard techniques from index calculus algorithms in ideal class groups. It is beyond the scope of this book to discuss these ideas
in detail. However, we note that the isogeny then has subexponential length and uses
570
primes of subexponential degree. Hence, the second stage of the isogeny computation is
subexponential-time; this is not as fast as it would be with the basic Galbraith algorithm.
The idea of smoothing an isogeny has also been used by Br
oker, Charles and Lauter [108]
and Jao and Soukharev [310].
Since the ordinary isogeny graph is conjecturally an expander graph, we know that
a random walk on it behaves close to the uniform distribution after sufficiently many
steps. We make the heuristic assumption that the pseudorandom walk proposed above
has this property when the number of different primes used is sufficiently large and the
hash
p function H is good. Then, by the birthday paradox, one expects a collision after
h(O) vertices have been visited. As a result, the heuristic expected running time of
the algorithm is O(q 1/4 ) isogeny chain steps, and the storage can be made low by making
distinguished elements rare. The algorithm can be distributed: using L processors of
1/4 /L) bit operations.
equal power one solves the isogeny problem in O(q
25.6
The main application of the algorithms in Section 25.5 is to relate the discrete logarithm
e be
problem on curves with the same number of points. More precisely, let E and E
e q ). Let r be a large prime dividing #E(Fq ).
elliptic curves over Fq with #E(Fq ) = #E(F
A natural question is whether the discrete logarithm problem (in the subgroup of order r)
e q ). To study this question one wants
has the same difficulty in both groups E(Fq ) and E(F
e q ). If we have an isogeny
to reduce the discrete logarithm problem from E(Fq ) to E(F
e
: E E of degree not divisible by r, and if can be efficiently computed, then we
have such a reduction.
As we have seen, if there is a very large prime dividing the conductor of End(E) but
e (or vice versa) then it is not efficient to compute an isogeny
not the conductor of End(E)
e In this case one cannot make any inference about the relative difficulty
from E to E.
e of this
of the DLP in the two groups. No example is known of elliptic curves E and E
form (i.e., with a large conductor gap) but for which the DLP on one is known to be
significantly easier than the DLP on another. The nearest we have to an example of this
phenomenon is with elliptic curves E with #Aut(E) > 2 (and so one can accelerate the
Pollard rho method using equivalence classes as in Section 14.4) but with an isogeny from
e with #Aut(E)
e = 2.
E to E
e have the same very
On the other hand, if the conductors of End(E) and End(E)
large prime factors (or no large prime factors) then we can (conditional on a generalised
1/4 ) bit operations. This
Riemann hypothesis) compute an isogeny between them in O(q
is not a polynomial-time reduction. But, since the current best algorithms for the DLP
1/2 ) bit operations, it shows that from a practical point of
on elliptic curves run in O(q
view the two DLPs are equivalent.
Jao, Miller and Venkatesan [308] have a different, and perhaps more useful, interpretation of the isogeny algorithms in terms of random self-reducibility of the DLP in an
isogeny class of elliptic curves. The idea is that if E is an elliptic curve over Fq then by
taking a relatively short random walk in the isogeny graph one arrives at a random
e over Fq
(again ignoring the issue of large primes dividing the conductor) elliptic curve E
e q ) = #E(Fq ). Hence, one easily turns a specific instance of the DLP (i.e.,
such that #E(F
for a specific elliptic curve) into a random instance. It follows that if there were a large
set of weak instances of the DLP in the isogeny class of E then, after enough trials,
571
one should be able to reduce the DLP from E to one of the elliptic curves in the weak
class. One concludes that either the DLP is easy for most curves in an isogeny class,
or is hard for most curves in an isogeny class.
Appendix B
606
2.4.6: Choose random 1 < a < p and compute ( ap ) until the Legendre symbol is 1.
Since the probability of success is at least 0.5 for each trial, the expected number of trials
is 2. This is a Las Vegas algorithm (it may never terminate, but the answer is always
correct). See the proof of Lemma 2.9.5 for further details.
2.4.9: This goes back to A. Cobham. See Shoup [553] or Bach and Sorensen [23] for
the analysis.
2.4.10: Computing Legendre symbols using quadratic reciprocity requires O(log(p)2 )
bit operations while computing a(p1)/2 (mod p) needs O(log(p)M (log(p))) bit operations
(see Corollary 2.8.4). So using quadratic reciprocity is (theoretically) always better.
Q
2.5.5: First compute and store b2 , . . . , bm where bi = ij=1 aj , then b1
m , then, for
1
1 1
1
i = m downto 2 compute ai = bi1 bi , bi1 = ai bi . See Algorithm 11.15 of [16],
Section 2.4.1 of [100] or Section 2.2.5 of [273].
2.6.4:
2.8.6: For pseudocode of the algorithm see Algorithm IV.4 of [64] or Algorithm 9.10
of [16].
2.8.7: Consider a window of length w representing an odd integer. If the bits of m are
uniformly distributed then the probability the next most significant bit after the window
is 0 is 0.5. Hence
expected number of zeroes before the first 1 is 0.5 0 + 0.25 1 +
Pthe
607
and so on. There are d/2 steps, each taking O(log(q)d2 ) field operations, so the overall
complexity is O(d3 log(q)). See Algorithm IPT of Section 21.1 of [552] or Section 3 of
[367].
b
2.12.11: One first computes gcd(xq x, F (x)) in time O(b log(q)) operations on polynomials of degree at most d. The rest of the algorithm is the same as the root finding
algorithm of Exercise 2.12.5 where we raise polynomials to at most the power q b .
m1
2.14.1: Construct {, q , . . . , q
} in O(m3 ) operations in Fq (using the fact that
q-th powering is linear) and then O(m3 ) operations in Fq for the Gaussian elimination.
Hence the complexity is an expected O(m3 logq (m)) bit operations.
2.14.4: Suppose 2 = d. Given g = g0 + g1 we must solve for x, y Fq such that
(x + y)2 = g0 + g1 . Expanding gives x2 + dy 2 = g0 and 2xy = g1 and one can eliminate
y to get 4x4 4g0 x2 + dg12 = 0 which can be solved using two square roots in Fq .
2.14.5: See Fong-Hankerson-L
opez-Menezes [206].
2.14.10: Computing Sm requires O(m3 ) operations in Fq . Computing a linear dependence among m vectors can be done using Gaussian elimination in O(m3 ) field operations.
2.14.11: Since 1 1/q 1/2 the number of trials is at most 2. The complexity is
therefore an expected O(m3 ) operations in Fq .
2.14.12: Expected O(m3 ) operations in Fqm .
2.14.13: The cost of finding a root of the polynomial F1 (x) of degree m is an expected
O(log(m) log(q)m2 ) operations in Fq and this dominates the cost of the isomorphism
algorithm in the forwards direction. The cost of the linear algebra to compute the inverse
to this isomorphism is O(m3 ). (If one repeats the algorithm with the roles of F1 and F2
swapped then one gets a field isomorphism from Fq [y]/(F2 (y)) to Fq [x]/(F1 (x)) but it is
not necessarily the inverse of the function computed in the first step.
Qm
2.15.1: Writing q 1 = i=1 liei we have m = O(log(q)) and then computing g (q1)/li
requires O(log(q)3 ) bit operations.
2.15.8: The naive solution requires 2/3 log2 (N ) group operations in the precomputation, followed by 3 21 log2 (N 2/3 ) = log2 (N ) group operations. The same trick can
be used in the first stage of Algorithm 4, giving 23 log2 (N ) group operations. But the
three exponentiations in the second stage are all to different bases and so the total work is
32 log2 (N ) group operations. The naive solution is therefore better. The naive method
becomes even faster compared with Algorithm 4 if one uses window methods.
Chapter 5: Varieties
5.1.3: V (y x2 ), V (y 2 x), V ((x 1)3 y 2 ).
5.1.4: If f (x, y) has no monomials featuring y then it is a polynomial in x only; take
any root a k and then {(a, b) : b k} V (f ). For the remaining case, for each a k
then f (a, y) is a polynomial in y and so has at least one root b k.
5.1.7: Let z = x + iy Fp2 . Then z p = x iy. Hence z p+1 = 1 if and only if
1 = (x + iy)(x iy) = x2 + y 2 . If follows that the map x + iy 7 (x, y) is a bijection from
G to X(Fp ). Showing that this is a group isomorphism is straightforward.
5.1.10: In X = A1 (Fp ) we have X = V (xp x).
5.2.8: V (x2 + y 2 z 2 )(R) is the same as the circle V (x2 + y 2 1)(R) A2 (R). Over
C it would also contain the points (1 : i : 0).
V (yz x2 ) is the parabola y = x2 in A2 together with the single point (0 : 1 : 0)
at infinity. Note that this algebraic set is exactly the same as the one in Example 5.2.7
under a change of coordinates.
5.2.21: (1 : 0 : 0), (0 : 1 : 0), (0 : 0 : 1).
608
Y
d|n
609
and use (dp) = (d). If p | n then
Y
Y
(xnp/d 1)(d) .
np (x) = (xnp/d 1)(d)
d|n
d|np,dn
and note that if d | np but d n then p2 | d and so (d) = 0. The final statement follows
since, when n is odd, z n = 1 if and only if (z)2n = 1.
6.3.2: To compute (u + v)(u + v ) compute uu , vv and (u + v)(u + v ). To square
(u + v) compute u2 , v 2 and (u + v)2 . One inverts (u + v) by computing (u + v)/(u2
Auv + Bv 2 ).
6.3.3: The first two parts are similar to Exercise 6.3.2. For inversion, note that
(u + v)1 = (u v)/(u2 + v 2 ). For square roots, note that (u + v)2 = (u2 v 2 ) + 2uv
so to compute the square root of a + b requires solving these equations. We assume here
2
2
) = 1. One finds that u4 au2 b2 /4 = 0 and so
that a + b is a square, and so ( a +b
q
u2 = (a a2 + b2 )21 . Only one of the two choices is a square, so one must compute
2
2
the Legendre symbol ( a+ aq +b ) to determine the answer (taking into account that ( 2q )
610
centered at (t2n , t2n ) or (t2n+1 , t2n1 ). See Gong and Harn [258] and also Stam [575]
for further details.
6.4.15: See Shirase et al [547].
6.6.2: Modulo each prime pi there are usually two values h such that TrFp2 /Fpi (h) =
i
TrFp2 /Fpi (g). Hence, by the Chinese remainder theorem, there are 2k values for h in
i
general.
The second claim follows immediately, or one can prove it using the same argument
as the first part: modulo each prime pi , 2 + (pi 1)/2 pi /2 values arise as the trace of
an element in Gpi ,2 .
Chapter 7: Curves and Divisor Class Groups
7.1.7: The first claim is immediate, since f OP implies f (P ) k is defined, and
(f f (P )) mP . For the second claim, note that mP /m2P is an OP /mP -module, i.e.,
a k-vector space. Without loss of generality, P = (0, . . . , 0). and mP is the OP -ideal
(x1 , . . . , xn ). Then every monomial xi xj for 1 i, j n lies in m2P . Therefore every
element of mP is of the form a1 x1 + an xn + g where ai k for 1 i n and g m2P .
Hence there is a map d : mP kn given by
d(a1 x1 + + an xn + g) = (a1 , . . . , an ).
7.1.9: x = y 2 yx x4 m2P so {y} is a basis. For the second example there is no
linear relation between x and y and so {x, y} is a basis.
7.1.15: Corollary 7.1.14: By Exercise 5.6.5 the dimension of X is d = n 1, so
n d = 1. Now, the Jacobian matrix is a single row vector and so the rank is 1 unless
the vector is the zero vector. Corollary 7.1.14 is just a restatement of this.
7.2.4: Putting z = 0 gives the equation x3 = 0 and so x = 0. Hence, (0 : 1 : 0) is the
only point at infinity. Taking the affine part by setting y = 1 gives the equation E(x, z) =
z +a1 xz +a3 z 2 (x3 +a2 x2 z +a4 xz 2 +a6 z 3 ) and one can check that (E/z)(0, 0) = 1 6= 0.
7.3.7: Write (P ) = (0 (P ) : : n (P )) and let n = max{vP (i ) : 0 i n}. Let
n
tP be a uniformizer at P . Then (tn
P 0 : : tP n ) is regular at P . See Proposition
II.2.1 of Silverman [560] for further details.
7.4.3: The first 3 statements are straightforward. The definitions also directly imply
that mv is an Rv -ideal. The proof that mv is maximal and that Rv is a local ring is
essentially the same as the proof of Lemma 7.1.2. Statement 5 follows from Statement 2.
7.4.6: The result follows since mP,k (C)m = k(C) mP,k (C)m for any m Z0 .
m+1
m
m
is equivalent to f = tm
7.4.8: mm
P u where u
P = (tP ) so f mP and f 6 mP
OP mP and hence u OP .
7.4.9: Write f = taP u and h = tbP v where u, v OP and note that f h = ta+b
P uv.
v (f )
7.4.12: Let f = f1 /f2 with f1 , f2 OP,k (C). By Lemma 7.4.7 we have f1 = tPP 1 u1
v (f2 )
and f2 = tPP
f = tPP
611
7.7.9: Write f as a ratio of homogeneous polynomials of the same degree and then
factor them as in equation (7.8).
7.7.15: f /h k(C) has div(f /h) = 0 and so f /h k .
7.9.6: See Cassels [122], Reid [494] or Silverman and Tate [563].
Chapter 8: Rational Maps on Curves and Divisors
8.1.2: Any non-constant morphism has a representative of the form (1 : 2 ) where
2 is not identically zero, and so define f = 1 /2 . That is a morphism follows from
Lemma 7.3.6.
8.1.10: k(x, y/x) = k(x, y).
8.1.11: k(x, y) = k(x2 , y)(x) = (k(C2 ))(x), which is a quadratic extension.
8.2.3: See Proposition III.1.4 of Stichtenoth [585].
8.2.10: By Lemma 7.4.7, t is a uniformizer at Q if and only if vQ (t) = 1. The problem
is therefore equivalent to Lemma 8.2.9.
8.2.11: Since (k(C2 )) = k(C1 ) it follows directly from the definition in terms of OP
and mP that if (P ) = Q and f k(C2 ) then vQ (f ) = vP (f ).
8.2.16: Follows from Lemma 8.2.9.
8.3.12: The function can also be written ((x : z)) = (1 : z 2 /x2 ). One has (D) =
(1 : 1) + (1 : 0) (0 : 1), (D) = (i : 1) + (i : 1) + 2(1 : 0) 2(0 : 1), (D) = 2(1 :
1) + 2(1 : 0) 2(0 : 1) = 2D and (D) = (1 : 1) + (1 : 1) + 2(1 : 0) 2(0 : 1).
8.3.15: Follows from Exercise 8.2.17.
8.4.3: See Section 8.2 of Fulton [215] or I.4.5 to I.4.9 of Stichtenoth [585].
8.4.6: By Corollary 7.7.13 all non-constant functions have a pole. Hence Lk (0) = k.
Since div(f ) 0 the second result follows from part 6 of Lemma 8.4.2.
8.4.10: The dimensions are 1, 1, 2, 3, 4, 5, 6.
8.5.15: (xp ) = pxp1 (x) = 0.
8.5.27: If 1 = f1 dtP and 2 1 then 2 = f2 dtP with f1 f2 in k(C), so there
isnt anything to show here.
For the second point suppose t and u are two uniformizers at P . Write = f1 dt =
f2 du, so that f1 /f2 = u/t. We must show that vP (f1 ) = vP (f2 ). Since u mP,k (C) =
(t) we have u = ht for some h k(C) such that h(P ) 6= 0. Then u/t = (ht)/t =
h + t(h/t). It follows that vP (u/t) = 0 and so vP (f1 ) = vP (f2 ).
8.5.29: If = f then div( ) = div() + div(f ).
8.6.3: An elliptic curve with only one point, e.g., y 2 + y = x3 + x + 1 over F2 .
Chapter 9: Elliptic Curves
9.1.1: It is clear that the formula gives the doubling formula when x1 = x2 , y1 = y2 . For
the other case it is sufficient to compute
y1 + y2 + (a1 x2 + a3 )
y1 y2
x1 x2
y1 + y2 + (a1 x2 + a3 )
and simplify to the above formula.
9.1.4: (These ideas originate in the work of Seroussi and Knudsen.) Dividing the
equation by x2P gives (yP /xP )2 + (yP /xP ) = xP + a2 + a6 /x2P and the result follows by
Exercise 2.14.7.
The formula for Q is immediate from equation (9.2) and the formulae for xP and
yP also follow immediately. If P = [2]Q then xP = 2Q + Q + a2 for some Q F2m
and so TrF2m /F2 (xP + a2 ) = 0. Since TrF2m /F2 (xP + a2 + a6 /x2P ) = 0 it follows that
TrF2m /F2 (a6 /x2P ) = 0.
612
Conversely, if TrF2m /F2 (a6 /x2P ) = 0 then one can solve for xQ F2m the equation
= 0. Further, TrF2m /F2 (xQ +a2 +a6 /x2Q ) = TrF2m /F2 (xQ +x2Q +a2 +xP ) =
0 so there is an element yQ F2m such that Q = (xQ , yQ ) E(F2m ). It then follows that
P = [2]Q.
Finally, the formulae for point halving follow from substituting yQ = xQ (Q + xQ )
into the formula yP = (xP + xQ )Q + xQ + yQ .
9.1.5: We have = (y1 /z1 y2 /z2 )/(x1 /z1 x2 /z2 ) = (y1 z2 y2 z1 )/(x1 z2 x2 z1 ).
Similarly, putting x1 /z1 , x2 /z2 and y1 /z1 in the addition formulae in place of x1 , x2 and
y1 gives the above formulae.
9.3.2: There is an inverse morphism which is a group homomorphism, also defined
over k by definition, such that the composition is the identity.
9.3.8: From E1 to E2 is just (x, y) = (x + 1, y + x). From E1 to E3 is (x, y) =
(x+s2 , y +sx+t) where s F24 satisfies s4 +s+1 = 0 and t F28 satisfies t2 +t = s6 +s2 .
9.4.5: For t F24 show that TrF24 /F2 (s6 + s2 ) = 1 + u(s4 + s + 1)2 = 0. Theorem 9.3.4
implies every element of Aut(E) is of this form. Since there are 24 choices for (u, s, t),
each giving a different function on E(F2 ), one has #Aut(E) = 24. The fact that Aut(E) is
non-Abelian follows can be shown as follows. Let i (x, y) = (u2i x+ s2i , u3i y + u2i si x+ ti ) for
i = 1, 2. One can check that the x-coordinates of 1 2 and 2 1 are u21 u22 x + u21 s22 + s21
and u21 u22 x + u22 s21 + s22 respectively. These are not equal when, for example, u1 = 1,
s1 = 3 , u2 = 3 and s2 is arbitrary.
9.5.5: The first part of the exercise follows from Theorem 9.3.4. Points P = (xP , yP )
either satisfy a1 xP + a3 = 0 and so P = P or a1 xP + a3 6= 0 and so yP is a solution to
x2Q +xP +a6 /x2Q
y 2 + y = F (x)/(a1 x + a3 )2 .
By Exercise 2.14.7 this is soluble if and only if TrF2n /F2 (F (xP )/(a1 xP + a3 )2 ) = 0. Hence,
e have a
every xP F2n is such that either a1 xP + a3 = 0 (in which case both E and E
single point each with x-coordinate xP ) or precisely one of y 2 + y = F (xP )/(a1 xP + a3 )2
and y 2 + y = F (xP )/(a1 xP + a3 )2 + has a solution. The result follows.
9.5.6: If (1 ) = 1 = 1 for all then (1 ) = 1 for all and 1 is
defined over k.
9.5.8: Follow the method of Lemma 9.5.7; also see Proposition 1 of Hess, Smart and
Vercauteren [283].
e 2 with j(E)
e = 0 in short
9.5.9: We have j(E) = 0 and the only elliptic curves E/F
2
3
Weierstrass form are y + y = x + a4 x + a6 with (a4 , a6 ) {(0, 0), (0, 1), (1, 0), (1, 1)}.
One can verify by direct calculation that for any pair E1 , E2 of distinct curves in this
form, the isomorphism between them is not defined over F2 .
9.6.2: Let E2 : y 2 + H(x)y = F (x). If (x, y) = (1 (x, y), 2 (x, y)) is a morphism then
so is (x, y) = (1 (x, y), 2 (x, y) H(1 (x, y)).
9.6.7: We know that [0] and [1] are isogenies and that the inverse of an isogeny is an
isogeny, so we may assume that n 2. Lemma 9.6.6 shows that the sum of two isogenies
is an isogeny. It follows by induction that [n] = [n 1] + [1] is an isogeny.
9.6.10: Use the fact that [2]P = OE if and only if P = P = (P ). Everything is
direct calculation except for the claim that the quartic has distinct roots, which follows
by observing that if x1 is a repeated root then the corresponding point (x1 , y1 ) would be
singular.
9.6.16: There are many proofs: using differentials; showing that ker(q ) = {OE };
determining k(E)/q (k(E)). The statement of the degree follows by Lemma 9.6.13.
9.6.20: This is an immediate consequence of Theorem 9.6.18 (the fact that is an isomorphism comes from considering degrees). The details are also given in Proposition 12.12
of Washington [622].
613
b = [0] and so the result follows by the same argument using
9.6.22: We have ( )
degrees as in the proof of Lemma 9.6.11. One can also use Theorem 9.6.18.
9.6.25: This follows since is essentially 1 (x, y) = (31 x, y) = (32 x, y).
9.6.29: A point of order 3 means [2]P = P and the subgroup being rational means
either P is defined over k or P is defined over k /k and (P ) = P for all non-trivial
Gal(k /k). It follows that k /k is quadratic. Translating x to 0 gives a point (0, v).
Since char(k) 6= 2 we assume E : y 2 = x3 + a2 x2 + a4 x + a6 and clearly v 2 = a6 . The
condition [2](0, v) = (0, v) implies a2 = (a4 /2v)2 = a24 /(4a6 ) and re-arranging gives the
first equation. For the twist write w = a4 /2a6 , X = wx, Y = w3/2 y and A = a6 w3 . Then
X 3 +A(X+1)2 = w3 x3 +(A/a26 )(a6 wx+a6 )2 = w3 (x3 +(A/(w3 a26 ))((a4 /2)x+a6 )2 = w3 y 2 = Y 2 .
It is easy to check that this final equation is singular if and only if A = 0 or 27/4.
9.6.30: See Doche, Icart and Kohel [182].
e over a field k
9.7.6: Let (x, y) = (1 (x), cy1 (x) + 3 (x)) be an isogeny from E to E
of characteristic 2. By definition, X = 1 (x) and Y = cy1 (x) + 3 (x) satisfy the curve
equation Y 2 + (e
a1 X + e
a3 )Y = X 3 + e
a2 X 2 + e
a4 X + e
a6 . Now
Y 2 + (e
a1 X + e
a3 )Y = (cy1 (x) + 3 (x))2 + (e
a1 1 (x) + e
a3 )(cy1 (x) + 3 (x))
and e
a1 1 (x) + e
a3 = c(a1 x + a3 )1 (x) . Hence
Y 2 + (e
a1 X + e
a3 )Y = (c1 (x) )2 (y 2 + (a1 x + a3 )y) + 3 (x)2 + (e
a1 1 (x) + e
a3 )3 (x).
It follows that 3 (x) must satisfy the following quadratic equation over k(x)
3 (x)2 +c(e
a1 1 (x)+e
a3 )1 (x) 3 (x)+(c1 (x) )2 (x3 +a2 x2 +a4 x+a6 )+1 (x)3 +e
a2 1 (x)2 +e
a4 1 (x)+e
a6 = 0.
Since k(x) is a field there are at most two possible values for 3 (x).
9.8.3: [3]P = OE if and only if [2]P = P . Hence, if P = (x, y) is such that [3]P = OE
then 2 = 3x = 0 where = (3x2 + 2a2 x + a4 )/(2y) = (2a2 x + a4 )/(2y). The statements
all follow from this.
b = [t]
9.9.5: Use the fact that the dual isogeny of 2 [t] + [d] is b2 [t]b + [d] since [t]
etc.
9.10.4: For each y Fp there is a unique solution to x3 = y 2 a6 and so there are p
solutions to the affine equation. Counting the point at infinity gives the result.
9.10.5: Write the curve as y 2 = x(x2 + a4 ). Since 1 is not a square, for any x Fp
we have either x(x2 + a4 ) = 0 or exactly one of x(x2 + a4 ) and x(x2 + a4 ) is a square.
Hence there are p solutions to the affine equation.
9.10.9: Maintain a pair of the form (ti , ti+1 ). So start with (t1 = t, t2 = t2 2q).
Given (ti , ti+1 ) we can compute (t2i , t2i+1 ) by first computing t2i and then t2i+1 using the
above formulae. To go from (ti , ti+1 ) to (t2i+1 , t2i+2 ) one computes t2i+1 = ti ti+1 q i t
and t2i+2 = t2i+1 2q i+1 . One can then perform a ladder algorithm (working through the
binary expansion of n) as in Lemma 6.3.19.
9.10.10: The first statement is easy to verify by counting points. The second statement
follows since if m | n then Ea (F2m ) is a subgroup of Ea (F2n ) and so #Ea (F2m ) | #Ea (F2n ).
Also, if m 3 then #Ea (F2m ) > 4. Hence, it suffices to restrict attention to prime values
for n in the third part. The values for E1 are 5, 7, 11, 17, 19, 23, 101, 107, 109, 113, 163
and the values for E0 are 5, 7, 9, 13, 19, 23, 41, 83, 97, 103, 107, 131.
9.10.21: The possibilities are p + 1 46 and p + 1 60. One can test which arises by
choosing a random point P E(Fp ) and computing [p + 1 t]P for each choice of t. One
finds that the orders are p + 1 + 46, p + 1 60, p + 1 46, p + 1 46 respectively.
614
9.11.10: We have (ignoring for the moment the easy case P (T ) = (T q)2 ) 1q P (T q) |
m (T 2 ) | T 2m 1 = (T m 1)(T m + 1) in R[x] for some m {1, 2, 3, 4, 6}. It follows (this
requires some work) that P (T ) | (T m q m/2 )(T m + q m/2 ) in Z[x]. Hence, if is a root
of P (T ) then m = q m/2 Z. Similarly, #E(Fq ) = P (1) | (q m 1).
9.11.13: Take any supersingular elliptic curve E1 over Fp with #E(Fp ) = p + 1 and
let E2 be a non-trivial quadratic twist.
9.12.6: The statements about (X1 , Z1 ) and (X2 , Z2 ) are immediate. For the others,
substitute Xn /Zn , Xm /Zm and Xmn /Zmn for x1 , x2 and x4 in Lemma 9.12.5.
9.12.7: The idea is to store (Xn , Zn , Xn+1 , Zn+1 ) (taking m = n+1 we have (Xmn , Zmn ) =
(xP : 1) and so the above formulae may be used). The exponentiation is done using
a ladder algorithm (analogous to Lemma 6.3.19) which computes either (X2n , Z2n ,
X2n+1 , Z2n+1 ) or (X2n+1 , Z2n+1 , X2n+2 , Z2n+2 ). Every step is therefore a doubling and
an addition. See Algorithm 13.35 of [16]. For the improved formulae see [433] or Section
13.2.3.a of [16].
9.12.8: One has
[2](x, y) = (B2 2x a2 , )
where = (3x2 + 2a2 x + a4 )/(2By) and where the formula for the y-coordinate is not
relevant here. Solving [2](x, y) = (0, 0) is solving B2 = 2x + a2 and so 0 = (3x2 + 2a2 x +
a4p
)2 4(x3 + a2 x2 + a4 x)(2x + a2 ) = (x2 a4 )2 . Hence, x = a4 and the y-values are
x(x2 + a2 x + a4 )/B as stated.
9.12.10: The point (1, 1) on (A + 2)y 2 = x(x2 + Ax + 1) has order 4.
9.12.17: Homogenising gives the equation (ax2 + y 2 )z 2 = z 4 + dx2 y 2 and setting z = 0
leads to the equation dx2 y 2 = 0 and hence the two points (1 : 0 : 0) and (0 : 1 : 0). To
show that (1 : 0 : 0) is singular it suffices to set x = 1 and show that the point (0, 0) on
(a + y 2 )z 2 = z 4 + dy 2 is singular, which is easy. The other case is similar.
9.12.23: Writing the curve equation as x2 = (1 y 2 /(a dy 2 ) the quadratic twist is
2
ux = (1 y 2 )/(a dy 2 ) which gives the result.
9.12.24: Follows directly from Theorem 9.12.12.
9.12.27: Irreducibility follows since y 2 g(x) must factor as (y + g1 (x))(y + g2 (x)) and
it follows that g2 (x) = g1 (x) (which is prohibited by a2 6= d). The point (0 : 1 : 0) on
y 2 z 2 = x4 + 2ax2 z 2 + z 4 is singular. An affine singular point on C must satisfy y = 0,
4x(dx2 + a) = 0 and dx4 + 2ax2 + 1 = 0, which again is impossible if a2 6= d.
The birational map is easy to verify: 2Y 2 = X(2X/x2 ) and X(X 2 2aX + (a2 d)) =
X(a2 + 2a(y + 1)/x2 + (y + 1)2 /x4 2a2 2a(y + 1)/x2 + a2 d) and the result follows
by substituting for y 2 . Finally, the discriminant of X 2
2aX + (a2 d) is 4d so when d
is a square in k then there are three points (0, 0), (a d, 0) of order 2.
9.12.29: Let P be a point of order 2 and move P to (0, 0) so that the curve is Y 2 =
X(X 2 + Ax + B). Write a = A/2 and d = a2 B so that a twist of the curve is in the
form of equation (9.16). The result follows from Exercise 9.12.27.
615
Chapter 10: Hyperelliptic Curves
10.1.3: When we substitute z = 0 we get the equation HD1 xD1 y = FD xD where HD1
(respectively, FD ) is the coefficient of the monomial xD1 (resp., xD ) in H(x) (resp.,
F (x)). The possible solutions are x = 0, giving the point (x : y : z) = (0 : 1 : 0), or
HD1 y = FD x giving the point (HD1 : FD : 0).
For the second part, we have HD = 0 and so the only point at infinity is (0 : 1 : 0).
Making the curve affine by setting y = 1 yields z D2 + z D1 H(x/z) = z D F (x/z) and
when D > 3 every monomial has degree at least 2. Hence the two partial derivatives
vanish.
10.1.5: Make the change of variable y = Y + x3 + 1. Then
y 2 + (1 2x3 )y = Y 2 + 3Y x6 + x3 + 2
and so the affine curve is isomorphic to Y 2 + 3Y = x 1. This curve is birational to
P1 (taking the map (x, Y ) 7 Y ) and hence, by Theorem 8.6.1 and Definition 8.6.2, the
curve has genus 0.
10.1.13: If H0 = F1 = F0 = 0 then (0, 0) C(k) is singular. (This also follows since
C is birational to C and the genus is a birational invariant.)
10.1.20: 1. One way is (x, y) 7 (y : xd : xd1 : : x : 1). The other way is
(Y : Xd : Xd1 : : X1 : X0 ) 7 (Y, Xd /Xd1 ).
2. The statement about is clear.
3. Identifying Xi = xi z di it follows that points at infinity (i.e., with z = 0) have
Xi = 0 when 0 i < d.
4. Set Xd = 1 to give the point (Y, Xd1 , . . . , X0 ) = (, 0, . . . , 0) on the affine curve
Y 2 + H(Xd1 , . . . , X0 )y = F (Xd1 , . . . , X0 )
together with various other equations, including Xi = X(d+i)/2 X(d+i)/2 for 0 i d
2. One must determine the Jacobian matrix (see Theorem 7.1.12 and Corollary 7.1.13) at
the point (, 0, . . . , 0). The number of variables is d+ 1 and the dimension is 1, so we need
to show that the rank is d. Note that each of the d1 equations Xi = X(d+i)/2 X(d+i)/2
yields a row (0, 0, . . . , 0, 1, 0, . . . , 0) in the Jacobian matrix (with the 1 corresponding to
variable Xi ). Hence, the rank is at least d 1. The curve equation yields the row
(2 + Hd , Hd1 F2d1 , 0, . . . , 0)
in the Jacobian matrix, so to complete the proof we must show that either 2 + Hd 6= 0
or Hd1 + F2d1 6= 0. Note that at least one of {F2d , F2d1 , Hd } is non-zero, otherwise
we would replace d by d 1.
If char(k) 6= 2 and both terms are zero then = Hd /2 which implies that F2d =
(Hd /2)2 and F2d1 = Hd Hd1 /2 which violates the condition of Lemma 10.1.8. If
char(k) = 2 then we must consider the three cases in Lemma 10.1.6. The first case is
= 0 and F2d1 6= 0 and so Hd1 + F2d1 6= 0. The second and third cases have Hd 6= 0
and so 2 + Hd = Hd 6= 0.
10.1.23: The statement v (x) = 2 follows from the fact that v() (Z) = 2 as in
Lemma 10.1.21. The second claim follows since = (1 : 0 : 0) and 1/x = c(y/xd )2 for
some constant c, in which case c(y/xd ) = xd1 /y. The third claim is immediate from
Lemma 10.1.22.
10.1.25: Write d = g + 1. Note that
v (y) = v ((y/xd )xd ) = v (y/xd ) + dv (x).
616
617
follows that v+ (y v (x)) = (du + du d). (There is also a direct proof of this fact.)
10.4.15: This is very similar to the proofs of Lemma 10.4.6 and Lemma 10.4.11.
Let du = deg(u(x)). The leading d (du 1) coefficients of v (x) agree with G+ (x)
and hence deg(v (x)2 + H(x)v (x) F (x)) 2d (d (du 1)) = d + du 1. It follows
that deg(u (x)) d 1. Since v (y v (x)) = d and since div(y v (x)) has degree
0 one has v+ (y v (x)) = (deg(u(x) + deg(u (x)) d). The result follows from the
analogue of equation (10.17).
10.4.17: Clearly,
(D + n(+ ) + (g du n)( ) D )
= (D) + n( ) + (g du n)(+ ) (D ))
When g is even then (D ) = D and the result is immediate by Exercise 10.4.2. When
g is odd note that (D ) = D + ( ) (+ ) and so
(D + n(+ ) + (g deg(u(x)) n)( ) D )
= (D) + (n 1)( ) + (g deg(u(x)) n + 1)(+ ) D .
If n = 0 then this divisor is not effective and so it is necessary to perform composition
and reduction at infinity.
10.4.20: This is the special case D1 = D2 = 0, n1 = 0, n2 = g/2 + 1 of Theorem 10.4.19, since D = (1, 0, 0) is principal.
10.4.21: This follows from Theorem 10.4.19.
10.4.22: Let P (x, y) = (1/(x xP ), y/(x xP )g+1 ) map P to + . If div(f (x, y)) =
1
n((P ) ((P )) then div(f P
) = n((+ ) ( )).
10.5.2: By Lemma 8.1.3 and Theorem 8.2.7 we know is surjective. Hence, for any
P E(k) choose any point P C(k) such that (P ) = P and let P C(k) be such
that (P ) = OE . Then ((P )(P )) = (P )(OE ) as required. The second statement
follows since the kernel of is contained in the kernel of = [deg()].
10.6.3: See Mumford [442] pages 3.31-3.32.
10.7.7: An elegant proof using power series is given in Exercises 17.19 and 17.20 of
Shoup [552].
10.7.8: One has t1 = 0, t2 = 42, t3 = 0. Hence a1 = 0, a2 = 21 and a3 = 0. It follows
that #Pic0F7 (C) = 512.
10.7.14: First, one needs to count various types of points in C(Fq ) and C(Fq2 ). Note
that there is a single point at infinity on C(Fqn ) for all n N. Write m for the number
of roots of F (x) in Fq . Let u = 21 (N1 1 m) = 21 (q t1 m). Then there are u values
for xP Fq such that there are points P = (xP , yP ) C(Fq ) with P 6= (P ). Hence
#C(Fq ) = 1 + m + 2u. There are q possible values for xP Fq . We have shown that
m + u of them arise as x-coordinates of points in C(Fq ) (i.e., F (xP ) is a quadratic residue
for those values). For the remaining q m u values xP it follows that F (xP ) is not a
square in Fq . But F (xP ) is a square in Fq2 so xP must arise as the x-coordinate of a point
in C(Fq2 ). Hence there are q m u = 12 (q + t1 m) such values for xP . Hence, there
are 21 (q m + t1 ) points P = (xP , yp ) C(Fq2 ) with xP Fq , yP 6 Fq and P 6= (P ).
Exercise 10.4.4 classifies divisor classes so it is sufficient to count the number of representatives for each case. Case 1 gives N1 divisor classes.
For case 2, let m be the number of roots of F (x) in Fq . Therefore, there are m + 1
points P C(Fq ) such that P = (P ). Hence, case 2 gives N1 1 m divisor classes.
For case 3, we first count the set of pairs (P, Q) such that P, Q 6= , and Q 6 {P, (P )}.
There are N1 1 choices for P , and for each there are either N1 2 or N1 3 choices
for Q (depending on whether P = (P ) or not). Finally, since in this case we always have
618
P 6= Q, the number of choices for {P, Q} is half the number of pairs (P, Q). Hence the
total number of divisor classes in case 3 is
1
2 ((N1
We finally consider case 4 of Exercise 10.4.4. Note that K = Fq2 is the unique quadratic
extension of Fq . We have P = (P ) if and only if P C(Fq ). Hence, the number of
choices for {P, (P )} is 21 (N2 N1 ). From these choices we must subtract the number
of pairs {P, (P )} such that (P ) = (P ). This happens when xP Fq but yP 6 Fq and
by the above argument there are 21 (q m + t1 ) such points. Hence, the total number of
divisor classes in case 4 is
1 2
2 (q
+ 1 t2 (q + 1 t1 ) (q m + t1 )) = 21 (q 2 2q t2 + m).
619
an earlier value using one additional multiplication. The second claim of the exercise is
clear.
Qn
11.2.3: For windows of length w one must precompute all ub1 ,...,bn = i=1 gibi for
0 bi < 2w under the constraint that at least one of the bi is odd. The number of such
values is 2nw 2n(w1) of which n come for free. Hence the precomputation requires
2nw 2n(w1) n multiplications (actually, when w > 1 the computation can be arranged
so that some of these are squarings). See Yen, Laih and Lenstra [634] for details.
11.2.4: See Algorithm 3.51 of [273].
11.3.1: Given P = (xP , yP ) compute a solution to 2Q + Q = xP + a2 , determine
whether the corresponding value of x2Q satisfies the trace condition (and if not, replace
Q by Q + 1), compute a square root (which is easy in finite fields of characteristic two),
and solve for yQ . See Knudsen [339] for details.
11.3.4: The cost depends on the number of non-zero values for ai , which is the weight.
The number of elliptic curve additions is one less than the weight. We ignore the cost of
computing the Frobenius maps.
11.3.10: Let n0 = 107 and n1 = 126 so a0 = 1 and the new values for (n0 , n1 ) are
(180, 54). The Frobenius expansion is
1 5 7 9 + 11 + 13 + 16 .
11.3.13: P E(Fpm ) if and only if ( m 1)P = OE .
11.3.16: It is clear that P is in the kernel of the endomorphism corresponding to this
polynomial. Now, note that (x19 1)/(x 1) 457 + 314x (mod x2 x + 2) and that
457 + 314 has norm r. Rounding n(457 + 314
)/r gives 67 148 from which we
get
n (67 148)(457 + 314) = 107 126
620
decompress map to obtain an element of G6,q . This process misses several points in G6,q ,
such as the element 1 and the element 2 of order 3 (which corresponds to the point
(0, 0, 0) not appearing in the image of the map g in Example 6.4.4).
11.4.10: The expected number of trials to get a soluble quadratic equation is roughly
2 (precisely, there are at least (q 2 q)/2 values for x0 such that y 2 + H(x0 )y F (x0 ) is a
where the second product is over all primes l. Hence one expects O(log(log(q))) iterations
overall.
For generalisations and more detail see Section 6.1 of Miller [426].
11.6.2: Use point halving to attempt r 1 halvings and then check whether the point
can be halved further over Fq (this is only possible if the original point has odd order).
11.7.2: First, we know that Tr(xP ) = Tr(a2 ) = 0. We may assume that xP 6= 0 and
so
(yP /xP )2 + (yP /xP ) = xP + a2 + a6 /x2P .
Hence, Tr(a6 /x2P ) = 0 and so Tr( a6 /xP ) = 0. It is immediate that (0, a6 ) has order
2, and since Tr(0) = 0 it follows that it can be halved in E(F2n ). The statement about
(xP , yP ) + (0, a6 ) follows from the formulae for the group law. Since (xP , yP ) and
(0, a6 ) can both be halved, their sum can also be halved; this also follows directly since
Tr( a6 /xP ) = 0.
The receiver gets an n 1 bit string representing either xP or a6 /xP . The receiver
then computes the corresponding value z {xP , a6 /xP }. One can verify that, in both
621
cases, Tr(z + a2 + a6 /z 2 ) = 0. Hence, one can solve u2 + u = z + a2 + a6 /z 2 for u F2n
a6
yP
+
+ xP + 1,
xP
xP
which does satisfy Tr(u) = 0. In both cases, one determines the corresponding value y as
y = uz.
It remains to determine the two cases. Suppose 2i k#E(F2n ). Then P can be halved
at least i times while (0, a6 ) can only be halved i 1 times. It follows that P + (0, a6 )
can only be halved i 1 times. Hence, if one can repeatedly halve (z, y) at least i times
622
12.3.7: List the primes B < Q1 < Q2 < < Qk B and use the fact that
bQi+1 bQi bQi+1 Qi (mod N ). One precomputes all the increments bQi+1 Qi (mod N ).
12.3.8: Apply Theorem 2.16.1.
12.3.9: See Williams [630] (note that he credits Guy and Conway for the suggestion).
12.5.1: Exercise 2.2.9 showed that one can determine if N is a prime power (and factor
it) in polynomial time. So suppose N has at least two distinct prime factors. If p2 | N
then either p < N 1/3 or N/p2 < N 1/3 . Use the Pollard-Strassen method with B = N 1/6
1/6 ) bit operations. Then
to find all prime factors of N that are less than N 1/3 in O(N
test whether the remaining unfactored part of N is a perfect power.
Chapter 13: Basic Discrete Logarithm Algorithms
13.0.4: Split N using Miller-Rabin style idea once know order of random g modulo N .
13.2.7: First show, using calculus, that f (x) = x/ log2 (x) is monotonically increasing
for x 2.7183. Hence deduce that if B 4 and 2 x B then x B log2 (x)/ log2 (B).
Finally, prove the result
p using induction on n.
running time 2 32 r group operations and which requires storing the same number of
group elements. See Section 3 of Pollard [485].
b
a
13.3.6:
p It is convenient to replace h by hg so that h = g with 0 a < w. Then set
m = w/2 and run Algorithm 14 with trivial modifications (the while loop now runs
to 2m).
13.3.10: One can just run BSGS twice, but a better solution is to set m = 2w,
h1 = hg b1 and h2 = hg b2 . One computes m baby steps as usual and then computes
m/2 giant steps starting from h1 and another m/2 giant steps starting from h2 .
13.3.11: Solve the DLP instance 1 = g a .
13.3.13: Blackburn and Teske [60].
13.3.14: By Exercise 11.1.15 the number of NAFS is 2n+2 /3. If a is a NAF then
a = a0 + 2n/2 a1 where a0 and a1 are NAFs of length l = n/2. There is an efficient
procedure to list all NAFs a0 of length l (see Exercise 11.1.16) so one can compute and
store all g a0 in a sorted list. This gives a BSGS algorithm requiring O(2n/2+2 /3) time
and space.
13.4.6: See Maurer [401].
13.5.4: Let m = l/2. Use the equation
a
m+1
am
g1a1 gm
= hgm+1
glal .
Let N1 = #S1 #Sm and N2 = #Sm+1 #Sl . Compute and store the left hand side
of the equation in time and space O(N1 ) then compute the right hand side and check for
623
a match. The running time is O(N1 + N2 ) group operations and the storage is O(N1 )
group elements.
13.5.5: Define S1 = {a1 al/2 : aj Sj for 1 j l/2} and S2 = {al/2+1 al :
aj Sj }. We assume #S1 #S2 . Compute and store (g a1 , a1 ), sorted according to
1
the first component, for all a1 S1 then compute each ha2 for a2 S2 and check for a
match with the earlier list. The running time is O(#S1 + #S2 ) group operations and the
storage is O(#S1 ) group elements.
n
13.6.10: Find the largest R such that 1/2 and M w
. Compute M
nn
baby steps and then ww giant steps.
13.8.2: A solution is to sort L1 and then, for each x2 L2 to check whether x2 L1 .
For the hash-join solution see Wagner [621].
13.8.5: See Wagner [621].
13.8.6: D = {x {0, 1}n : LSBm (x) = 0}.
13.8.8: See Wagner [621].
Chapter 14: Factoring and Discrete Logarithms using Pseudorandom Walks
2
l /2N
14.1.3: By
< p) for l, for probability 0.99 the number
of trials needed is
solving e
3.035 N and for probability 0.999 the number of trials is 3.717 N .
14.2.4: lt = 20, lh = 10, so x20 x40 (mod p). The walk in this example is mostly
squarings,
p which is not very random and which justifies why lt + lh is so much larger
than r/2.
14.2.5: lt = 7, lh = 6, so first collision is x12 = x24 .
14.3.3: Solving e = 1/r (the probability to simply guess the correct solution to the
DLP) gives = log(r).
14.2.18: On can store some bits of H(xn ), where H is a hash function. A further
variant is for the clients not to compute or store the values ai and bi . Instead, the
algorithm is structured so that the values (a0 , b0 ) of the initial point x0 = g a0 hb0 are a
function of a short random seed; the client then sends the end-point of the walk and the
seed to the server. The server then re-calculates the walk, this time including the values
ai and bi , once a collision has been detected. We refer to Bailey et al. [25] for details.
14.2.21: Each step of the algorithm requires computing g a0 hb0 for random 0 < a0 , b0 <
r, which (naively) requires 3 log2 (r) group operations. The cost can be reduced to around
log2 (r) (and even further) with modest precomputation
and storage. In any case the
r
log
(r)
group operations on average,
total cost of the algorithm
would
be
around
1.25
2
compared with 1.41 r for baby-step-giant-step. The storage requirements are similar.
14.4.9: xi+2 = (xi g)1 g = x1
xi+1 ) = S(
xi ) which
i . To have the cycle we need S(
occurs with probability 1/nS and we need x
i+1 = x1
i+1 which occurs with probability
1/NC .
14.5.6: The worst case is when h = g a withp
a = 0 or a = w 1. The expected cost is
w/2. The worst-case complexity is then
then
w/(2m)
+
m.
To
minimise
this
take
m
=
624
14.5.10: If w r/8 0.4r then one should use the rho algorithm. If w is significantly
smaller than 0.4r then one would use the kangaroo method. Determining the exact
crossover would require careful experimentation.
14.5.12: See Galbraith, Pollard and Ruprai [225].
p
14.6.3: The heuristic cost is w/(2m) + 4m/NP2 , which is minimised by m = NP w/8.
14.7.6: We always
= r where r = #G. The heuristic complexity is
have #(T W )
therefore (1 + o(1)) r (1.77 + o(1)) r group operations, which is slower than Pollard
rho.
14.7.7: See Galbraith and Ruprai [226].
14.8.1 See van Oorchot and Wiener [470].
14.9.5: x2 + 1 is fast to compute and does not have any trivial fixed points or short
cycles. The other suggestions have fixed points independent of p | N .
14.9.6: The idea is that f (x) is now m-to-1, so the expected length of the rho is
shorter (this is the same as the reason why taking nS > 3 is a good idea; see the end of
Section 14.2.5). For more discussion of this specific case see Brent and Pollard [99].
14.9.7: One cannot see a match modulo p without computing a gcd.
Chapter 15: Factoring and Discrete Logarithms in Subexponential Time
15.1.4: We have the error term like exp(u(log(u) + c log(log(u)))) for a suitable function
c 1. This splits as uu log(u)cu . If the second term is uf (u) for some function then
taking logs gives f (u) log(u) = cu log(log(u)) and so f (u) = cu log(log(u))/ log(u) = o(u).
15.1.7: The first 4 properties are straightforward to verify. Property 5 follows since
log(log(N )m ) = m log(log(N )) < c log(N )a log(log(N ))1a = log(LN (a, c)) for sufficiently large N . Property 6 follows from property 5 and property 2. To prove property 7
let f (N ) = m(log(log(N ))/ log(N ))a and note that Ln (a, f (N )) = exp(m log(log(N ))) =
log(N )m . It is easy to check that limN f (N ) = 0 and so f (N ) = o(1). Properties 8
and 9 are easily checked.
15.1.9: Let B = LN (1/2, c) and
p
p
u = log(N )/ log(B) = log(N )/(c log(N ) log(log(N ))) = 1c log(N )/ log(log(N )).
22 33 5 7 (mod N )
23 32 5 7 (mod N )
27 33 (mod N )
25 3 52 (mod N )
2
3
7
5
3
2
3
1
1
1
0
2
1
1
.
0
0
625
Adding the first three rows modulo 2 gives the all zero vector (again, we are lucky that
5 relations are not required) and let X = 1717 2365 1757 (mod N ) and Y = 26 34 5
7 (mod N ). Then gcd(X Y, N ) = 73. One has N = 53 73.
15.2.5: See Exercise 16.5 of Shoup [552].
15.2.8: First note that (#B)2 = LN (1/2, 1 + o(1)) as required. For the second part,
write u = log(N 1/2+o(1) )/ log(B) and note that one expects TB uu . Now,
u = ( 12 + o(1)) log(u)/( 12
Hence,
p
p
log(N ) log(log(N ))) = (1 + o(1)) ln(N )/ ln(ln(N )).
p
log(uu ) = u log(u) = ( 21 + o(1)) log(N ) log log(N ).
Hence #BTB = O(LN (1/2, 1 + o(1))) = LN (1/2, 1 + o(1)). See Section 6.1 of [161] or
Section 16.4.2 of [552]
for further details.
trials before getting a B-smooth value for w1 is (u )u where u = log( p)/ log(B) =
1
2 log(p)/ log(B). However, since we need both w1 and w2 to be smooth the number of
626
For the approximation just note that pb /b + pb1 /(b 1) + pb /b(1 + 1/p + 1/p2 +
) = pb /b(1 1/p).
p
15.5.16: Following
the discussion earlier in this section, let b = c n log(n)/ log(p) and
p
u = n/b = 1c n log(p)/ log(n). Then #B < pb and so #B = O(Lpn (1/2, c)). Now
p
log(uu ) = u log(u) = 1c n log(p)/ log(n) log(1/c) + 12 (log(n) + log(log(p)) log(log(n))
p
1
n log(n) log(p).
=
2c + o(1)
Hence uu = O(Lpn (1/2, 1/(2c) + o(1))) as usual. The total running time is therefore
O(Lpn (1/2, c + 1/(2c) + o(1))
+ Lpn (1/2, 2c + o(1))) bit operations. This is minimised
when 2c2 = 1, i.e., c = 1/ 2, which gives the stated complexity. We refer to Section
4 of Odlyzko
[466] for more details, but note that Odlyzko writes the complexity as
p
O(exp(c n log(n)) = O(Lpn/ log(n) (1/2, c)).
15.5.18: Note that F2d = F2 [x]/(A(x)) where d = deg(A(x)). Let F2d be a root
of A(x). If the equation x2 + x = has no solutions then A(x2 + x) is irreducible. If the
equation x2 + x = has two solutions (namely, and + 1) in F2d then A(x2 + x) is the
product of their minimal polynomials.
15.5.22: For example F1 (t) = t4 + t2 + t + 1 and F2 (t) = t4 + t2 .
15.5.23: Note that 1 (1 (x)) = t, 1 (1 (y)) = 1 (F1 (x)) = F1 (t), 2 (2 (x)) =
2 (F1 (y)) = F2 (F1 (t)) t (mod F (t)) and 2 (2 (y)) = 2 (y) = F1 (t).
15.6.2: Set b = logq (Lqg (1/2, c)). Construct the factor base B as above, generate
random group elements using the algorithm of Section 15.5.1 and use Theorem 15.6.1 to
determine the expected number of trials to get a smooth relation. The group operations
and polynomial factorisation are all polynomial-time and can be ignored. The algorithm therefore has the usual complexity #BLqg (1/2, 1/(2c) + o(1)) + (#B)2+o(1) which
is Lqg (1/2, c + 1/(2c) + o(1))
+ Lqg (1/2, 2c + o(1)) when g is sufficiently large. This is
optimised by taking c = 1/ 2, giving the stated running time. For the technical details
see Enge and Gaudry [194] or Section VII.6.1 of [65].
15.6.3: See page 36 of Adleman, DeMarrais and Huang [4] for the case of one point at
infinity.
15.6.4: Clearly degree 1 prime divisors are points and #C(Fq ) = q + 1 + t where |t|
627
Pn1 + Pn + (R)) = OE . The symmetry is obvious, and the statement about degrees
follows by induction and using the fact that the resultant of a quadratic and a degree
2n2 polynomial is a degree 2n1 polynomial.
Chapter 16: Lattices
16.1.4: A hint for last part is that the
lattice contains the sublattice with basis M ei
16.1.13: b1 = (1, 1) has volume 2.
16.2.5: (You need to have studied the proof of Theorem 16.2.3.) A radius r = 1 +
hypercube around the origin has volume (2r)n . Taking > 0 arbitrarily small gives the
result. For the second claim, consider the convex region {v Rn : kvk1 r}, which is a
hyper-tetrahedron of volume 2n rn /n!. The approximation is using Stirlings formula.
16.2.6: Consider the lattice basis {(1, a), (0, b)} of rank n = 2 and determinant b.
Every element of the lattice
p is of the form (s, as + bt) for some s, t Z. By Minkowski,
the disk of radius u = b 2 hasvolume u2 > 4b and so contains a non-zero lattice
point. Hence, 0 < s2 + r2 < u2 = 2b.
16.3.1: Problem 1 is achieved by solving xB = v over Q (or with sufficiently accurate
floating point arithmetic, rounding to the nearest integer solution and then checking).
To solve problem 2, compute the HNF of the matrix B whose rows are b1 , . . . , bn and
discard the zero rows. For problem 3, write A = U A be the HNF of A. If the first r rows
of A are zero then the first r rows of U are a basis for ker(A) (since if x is any vector
with last m r entries zero then 0 = xA = (xU )A).
To solve problem 4, concatenate the n rows of the n n matrix M In to the matrix
A to apply an extended matrix A . Then use the method used to solve problem 3 on the
matrix A . To see this is correct note that (x1 , . . . , xm )A 0 (mod M ) if and only if
there are n integers xn+1 , . . . , xn+m such that then (x1 , . . . , xn+m )A = 0.
Chapter 17: Lattice Basis Reduction
17.1.6: Clearly each change of basis is invertible and so the output is a basis of the lattice.
Since B1 N is strictly decreasing the algorithm must terminate after a finite number of
steps.
17.1.8: {(1, 0), (0, 2)}.
17.2.5: Yes, no, yes, no, yes.
17.2.7: An example is b1 = (1, 0) and b2 = (0.49, 0.8).
17.2.10: The proof closely follows the proof of Lemma 17.2.8. For part 2 one should
i
Pi1 k
find kbi k2 (1 + 14 k=1 2 )Bi (0.146 + 2 /1.46)Bi and one gets the result. Since
1/6 ( 2 1)2(i1)/2 for i 1 one has kbj k2 2j/2 Bj and part 3 follows.
17.2.15: An example of the situation kv
Q1 k 6= kb1 k was seen in Exercise 17.2.5. For the
second part, we have 2n(n1)/4 det(L) nj=1 kv j k kv i kn+1i .
17.4.4: See [370] or Section 2.6.1 of [135].
17.5.2: Let L be any lattice of dimension n. We have proved than an LLL-reduced
basis for L exists. Hence, by part 5 of Theorem 17.2.12 1 2(n1)/4 det(L)1/n . Also see
Section 12.2 of Cassels [121].
Chapter 18: Algorithms for the Closest and Shortest Vector Problem:
18.1.9: The complexity of computing the Gram-Schmidt basis is given by Theorem 17.3.4.
Using the same techniques, one can show that wi can be represented using exact Qarithmetic with denominators bounded by X n1 and numerator bounded by X n .
628
18.1.10: The inductive process uses the fact that the orthogonal complement of bn is
the span of {b1 , . . . , bn1 }. If one starts at b1 instead of bn then one needs an analogue
of Lemma 18.1.1. P
P
P
18.2.5: If w = ni=1 li bi and u = ni=1 li bi then kw uk2 = ni=1 (li li )2 kbi k2 . The
result follows.
18.2.7: w 24.09b1 +12.06b2 +26.89b3 so v = (99, 204, 306) and kvwk = k(1, 1, 1)k =
3.
18.3.4: (100, 77, 104).
18.4.5: See Figure 1 of Hanrot and Stehle [274] or Algorithm 10.6 of Joux [314].
18.4.7: See Section 3 (under Combinatorial methods) of Micciancio and Regev [420].
N
0
0
0
0
0 NX
0
0
0
2
0
0
NX
0
0
3
p 3
p2 X 3
pX 2
X3
0
0 p3 X 3
p2 X 2 3
pX 3 X 4
629
c(log(n) + log log(pn )); whereas if e > log(n) then the logarithm of the running time of
the algorithm is log((n/e)e ) > log(n)2 log(n) log log(n).
19.4.16: pn n log(n) so P > n! > (n/e)n (warning: the
p e here is 2.71828... from
log(X)/ log(P ) log(pn )
Stirlings
formula)
and
so
log(P
)
>
n(log(n)
1).
Then
p
p
log(X)/(n log(n)) log(n) = log(X) log(n)/n. The claimed value for e follows. This is
obviously > log(n) for large n and sufficiently small X.
19.4.17: Let m = log2 (U ). Then 2m U < 2m+1 and 2m+1 2U V .
19.4.19: For the first case we get x = 347641 and for the second x = 512.
19.5.2: Given q let pi = qi so that |qi pi | 1/2.
19.5.4: The first row output by LLL on the matrix is approximately (0.0000225, 0.0002499, 0.0002499, 0.0007500
Now compute q = 0.0000225Q/ = 2250. One finds p1 = 3499, p2 = 1735, p3 = 750.
19.6.1: Note that 21 b1/3 < qb < b1/3 and so |y| < 21 qb as required. Also, note that
the continued fraction method finds qa /qb as long as |(
a/b) (qa /qb )| < 1/(2qb2 ). First
note that b1/3 < 1/qb < 2b1/3 so 12 b2/3 < 1/(2qb2 ). Now, the difference (
a/b) (qa /qb )
1 1/3 1 1/3
is given by equation (19.6), whose numerator is bounded by 2 2 b 4 b . Since 1/qb <
2b1/3 we have the difference bounded by 21 b1/3 /b. It follows that the difference is less
than 1/(2qb2 ) as required and so Euclid solves the problem.
19.6.2: If < then the target gcd is smaller than the errors one therefore expects
very many solutions.
The first condition comes from the Diophantine approximation step: Namely that the
right hand side of equation (19.6) (which is 1/b1 ) is at most 1/(2qb2 ) 1/b2(1) . The
second condition comes from the requirement that |y| < 21 qb .
19.6.3: The process is the same as
the generalisation of Diophantine approximation
in Section 19.5. The lattice is b0 ab which has determinant roughly b1+ and a short
vector (qab , qb x qa y) of length roughly b1+ . The result follows.
19.7.4: See [493] for the first reduction. The second is almost immediate, but needs
some care.
Chapter 19a: Cryptosystems Based on Lattices
19.9.1: Given a ciphertext c one can check if it is an encryption of m by testing whether
c mG (respectively, c mB ) is a valid error vector for the system.
19.9.2: Check if c is an encryption of m by testing if cm lies in the code (respectively,
lattice) corresponding to the public key.
19.9.3: Given c add m G (respectively, m B ) to get a ciphertext c 6= c which is an
encryption of m + m . Alternatively, add an extremely small error vector e to get c and
call the decryption oracle; hopefully this will return the message m.
19.9.8: (2, 3).
19.9.9: Set w = v and, for i = n downto 1 do w = w hw, bi i/hbi , bi i.
19.10.3: m = (1, 2, 0), e = (1, 1, 1);
P m = (1, 0, 1), e = (1, 1, 1).
19.13.2: There is a solution s = ni=1 xi bi if and only if s = s xn bn is a subset sum
of {b1 , . . . , bn1 }.
P
19.13.3: Assume n even. Compute all 2n/2 integers n/2
and store
i=1 xi bi for xi {0,
P1}
n
in a sorted list, binary tree or hash table. Then compute all 2n/2 values s i=n/2+1 xi bi
and determine whether or not it appears in the list. For details see Algorithm 3.94 of
[415].
19.13.4: Just divide everything by the gcd.
19.13.7: Given s = 112 we just subtract the largest possible element, in this case 80,
which leaves 112 80 = 32. We then subtract the largest element less than 32 to get
630
3220 = 12. We then take 127 = 5. Hence, we have computed that 112 = 5+7+20+80,
which is the solution vector (0, 1, 1, 1, 0, 1, 0).
19.13.10: n/(n 1) = 1 + 1/(n 1).
19.13.11: 8/ log2 (430) 0.9145.
19.13.12: By Exercise 19.13.8 bn 2n1 and so the density is at most n/(n 1) =
1 + 1/(n 1).
19.13.15: 0110010.
19.13.16: 0.782
19.13.17: (154, 184, 43, 69, 125, 62), c = 384.
19.13.18: Encryption is deterministic.
19.13.19: Given a ciphertext c one can call c + a1 and c a1 to the decryption oracle.
One of them is a valid ciphertext and corresponds to the original message with the first
bit flipped.
19.13.21: Since 0 < ai < M we expect an average ciphertext to be a sum of around
n/2 integers of size M/2. Hence c nM/4 on average (and, in all cases, 0 c < nM ).
Since M 2n we expect c to require at least log2 (nM/4) > log2 (n) + n 2 bits.
19.13.22: Take bi = 2i1 for 1 i n, M = 2n + 1 and any W such that W bi 6
n
2 (mod M ) for all i. Then the density is > 1. Of course, it is easy to compute the
private key corresponding to such a public key.
Mi we expect
Pn19.13.24: Since the integers ai,j are like random integers modulo
t
t n
j=1 ai,j nMi /2 and so Mi+1 > nMi /2. Hence, Mt > (n/2) M1 (n/2) 2 . The
ciphertext is expected to be a sum of around n/2 integers of size Mt /2 and so around
nMt /4. Therefore, on average,
log2 (c) = log2 (n(n/2)t M1 /4) > log2 (n) + t(log2 (n) 1) + n 2.
Since the at,i are somewhat like randomly chosen integers modulo Mt we expect to have
max{at,i } Mt . We conservately assume in what follows that, on average, max{at,i } >
Mt /2. Hence the density is at most n/ log2 (Mt /2) < n/(t(log2 (n) 1) + n 1). For
example, n = 200 and t = 5 gives expected density less than 0.87; in any case, the density
of an iterated Merkle-Hellman knapsack is always significantly less than 1.
19.13.25: From Exercise 19.13.8 we have bn > 2n2 b1 and so M > (2n2 + + 2 +
1 + 1)b1 = 2n1 b1 . So b1 b2 > M > 2n1 b1 implies b2 > 2n1 . Exercise 19.13.8 also shows
M > 2n3 b2 and so M > 22n4 . Then log2 (nM/4) > log2 (n) + (2n 4) 2.
19.13.26: See if W 1 ai (mod M ) are small and allow efficient solution to the subset
sum problem using a greedy algorithm.
19.13.28: a1 b2 a2 b1 = 7 233 37589 143197. The only factor of size a3 is
M = 233 37589 = 8758237. This gives W = a1 b1
1 (mod M ) = 5236910. One verifies
that W 1 ai (mod M ) is a superincreasing sequence.
19.13.30: Suppose W is known and assume no permutation is used. Note that there
are integers ki for 1 i n such that
ai = bi W + ki M
and ki < bi . So a1 k1 M (mod W ) and a2 k2 M (mod W ). Hence, writing c =
a2 a1
(mod W ), we have k1 c k2 (mod W ). If k1 k2 < M (which is plausible since
1
k1 k2 < b1 b2 < M and W is usually about the same size as M ) one can apply the same
methods as used in Example 19.13.29 to find (k1 , k2 ) and hence M .
19.13.32: 11100001.
19.13.33: 10111100.
631
Chapter 20: The Diffie-Hellman Problem and Cryptographic Applications
20.2.7: Suppose l | n is prime and write g1 = g n/l . Then g c = g ab implies g cn/l = g abn/l
and so (g1 , g1a , g1b , g1c ) = (g n/l , (g a )n/l , (g b )n/l , (g c )n/l ) is a valid Diffie-Hellman tuple. If
l = O(log(n)) then one can solve the DLP in hg1 i (this is just Pohlig-Hellman) and hence
test DDH in hg1 i. If (g, g a , g b , g c ) is a random tuple in G4 then with probability 1/l the
resulting tuple in hg1 i is not a valid Diffie-Hellman tuple. The algorithm therefore has a
noticeable advantage in Definition 20.2.4.
B
A1
20.4.1: In the first case, c2 = mhk and c2 = mhAk+B . Hence, cA
and
2 h /c2 = m
so one can compute m assuming that (A 1) is coprime to the order of the group G. In
the second case, query the decryption oracle on (c1 , c2 ) to get m and hence hk . One can
then compute (hk )A hB and decrypt c2 .
20.4.3: As above we can self-correct the CDH oracle to make it reliable. Once one
knows g ax then one can decrypt to get M . For the second part: The problem is that
given the message M corresponding to the public key (g, h) and ciphertext (c1 , c2 ) one
can compute M c2 = H(g ax ) but if H is hard to invert then one cannot compute g ax .
20.4.6: Given (c1 , c2 ) the user returns m = c2 c1a so set c1 = h1 and c2 arbitrary and
get ha = mc1
2 . If the group contains an element h with relatively small order l then one
can easily solve the DLP to get a (mod l). If this can be repeated for sufficiently many
coprime values l then a can be determined using the Chinese remainder theorem.
20.4.9: See Boneh, Joux and Nguyen [82] for the full details of this attack. The basic
idea is as follows: For suitable constant c one has m = m1 m2 where 1 mi c2m/2+
with a certain noticeable probability (for c = 1 [82] states the probability is at least
log(1 + 2)).
20.5.2: Eve, pretending to be Bob, sends g y (where y is known to Eve). She makes
a corrupt query to Alice and, knowing g xy , can compute g ab . Eve can now compute a
shared key with Alice, whereas Alice believes she is sharing with Bob.
Chapter 21: The Diffie-Hellman Problem
21.1.10: For both problems let the space of instances be (G {1})2 . Given an Inverse-DH
1
1
oracle A and instance (g, g a ) one has g a = A(g x , (g a )xy )yx for 1 x, y < r. Given a
2
2 1
Square-DH oracle A and instance (g, g a ) one has g a = (A(g x , (g a )xy )(xy ) .
For the self-correction: Run the non-perfect Square-DH oracle repeatedly to produce
2
a list L which is expected to contain g a . Then choose 0 u < r and repeat the process
2
on (g, g a g u ). This givies a list L which is expected to contain g (a+u) . Finally, determine
2
whether there is a unique pair (Z, Z ) in L L such that Z(g a )2u g u = Z , and if so
return Z. The precise details are similar to Theorem 21.3.8.
21.3.9: Suppose A is correct with noticeable probability . Since the reduction makes
at least log2 (r) oracle queries, the probability that the result is correct is at most log2 (r)
which is negligible. Instead, one should self-correct the oracle A to obtain an oracle A
with success probability greater than 1 1/(4 log2 (r)). By Theorem 21.3.8 this requires
O(log(log(r))/) oracle queries. One can then perform the reduction of Lemma 21.1.13
using A , with success probability 1/2.
21.1.19: The problem Inverse-DH is only defined when a is coprime to the group order.
If a = r1 , as would be required in several places, then we cannot make a meaningful query
to an Inverse-DH oracle. Also, in the proof of Lemma 21.1.5 then g 6 hg1 i so a CDH
oracle cant work. Indeed, Shoup has shown (Theorem 5 of [549]) that a generic algorithm
for Fixed-CDH with respect to g r1 , even when given a perfect Fixed-CDH oracle with
632
21.1.20: For the proof, historical references and a major generalisation of such problems see [102].
21.4.12: For example, with a perfect Fixed-CDH oracle, Pohlig-Hellman only and
using projective coordinates the number of oracle queries is O(log(r) log log(r)) oracle
queries and O((l12 + l2 ) log(r)2 / log(max{l1 , l2 })) group operations.
21.4.13: We give the results only for the case of Pohlig-Hellman combined with exhaustive search. Note that not every a Fr corresponds to an element a + b of G2,r ,
whereas every a A1 (Fr ) corresponds to an element of T2 (Fr ). Hence, embedding the
DLP instance into the algebraic group G2,r requires an expected O(log(r)) oracle queries
(computing Legendre symbols and taking square roots).
The group operation using the first representation requires no inversions but the group
operation for the second representation requires inversions. The number of Fixed-CDH
oracle queries respectively is O(log(r) log log(r)) and O(log(r)2 log log(r)). Hence, the
first representation is better. If one has a CDH oracle then inversions are a single oracle
query and so the number of oracle queries is O(log(r) log log(r)) in both cases.
21.5.5: See Cheon [130].
21.5.9: See Brown and Gallant [111].
21.6.4: See Kaliski [324] or Fischlin and Schnorr [202].
21.6.9: Let A be a perfect oracle which, given h = g x for x {0, 1, . . . , r 1} outputs
b(x). It suffices to show how to use A to determine the least significant bit of x. We may
assume that h 6= g 1 , since if h = g 1 then we know x. If A(h) = 1 then (x1 , x0 ) = (0, 1)
or (1, 0), so A(gh) determines the LSB. Similarly, if A(h) = 0 then (x1 , x0 ) = (0, 0) or
(1, 1) and so A(gh) determines the LSB (since h 6= g 1 there is no carry when computing
gh).
21.6.11: The integer r in binary begins as 11000 so the highest order 2 bits of a
random integer modulo r are essentially 00, 01 or 10 with probability close to 1/3 each.
Hence, both predicates are 0 with probability close to 2/3.
21.6.13: Let A be an oracle such that A(g x ) = b(x). Let h = g a . Set j = 1 and if
A(h) = 0 then set l = 1, and if A(h) = 1 then set l = 2. Given, at step j, (l 1)r/2j
j
j
a < lr/2j one calls A(h2 ). If A(h2 ) = 0 then (2l 2)r/2j+1 a < (2l 1)r/2j+1 ,
j+1
otherwise, (2l 1)r/2
a < 2lr/2j+1 .
We remark that Blum and Micali [73](generalised by Long and Wigderson [390]) use
Legendre symbols and square roots modulo p to show this predicate is hardcore in the
group Fp when g is a primitive element (their method does not work in a prime order
cyclic subgroup).
21.6.15: See Section 7 of Li, N
aslund and Shparlinski [384].
21.7.4: This is the same argument as Exercise 21.6.13. One bounds as (l 1)p/2j
< lp/2j and refines (l, j) by computing A1 (2j (mod p)) = MSB1 (2j (mod p)).
21.7.6: Use the same argument as Exercise 21.6.4. See Fischlin and Schnorr [202] for
details.
21.7.13: Given a DDH instance (g, g a , g b , g c ) one can compute MSB1+ (g c ) and compare with the result of the oracle. If g c = g ab then the results will agree. Repeating
for random self-reduced versions of the original DDH instance gives the result. We refer
to Blake and Garefelakis [62]and Blake, Garefelakis and Shparlinski [63]for details and
generalisation to small subgroups of Fp and to elliptic curve groups.
i
21.7.14: Just call A(g, (g a )2 , g b ) to get the i-th bit of the representation of g ab .
21.7.15: Call A(g, g a g z , g b ) about m times for uniformly random z. Each output
yields a linear equation over F2 in the unknown bits of g ab . Solving the system of linear
equations over F2 gives g ab .
633
Chapter 22: Digital Signatures Based on Discrete Logarithms
22.1.2: The second and fourth are valid transcripts (i.e., the equation (22.1) is satisfied).
In the third case one even has s0 6 hgi.
22.1.3: As2 + B s2 a(As1 s1 ) (mod r) which can be solved for a.
22.1.6: If s1 can be guessed then choose any 0 s2 < r and set s0 = g s2 hs1 . Send s0
in the first stage of the protocol and respond to the challenge s1 with s2 .
22.1.8: We need a multi-exponentiation (in equation (22.1)) and this is not well-defined
in algebraic group quotients.
22.1.11: If m is a message and (s1 , s2 ) is a signature which is valid for two distinct
public keys hA and hB then we have
s1
1
s1 = H(mkg s2 hA
) = H(mkg s2 hs
B ).
s2 s1
1
If s1 = 0 then the signature is valid for all public keys. If s1 6= 0 then hs
and
A 6= g hB
so we have a hash collision of a very special form: namely H(mkR1 ) = H(mkR2 ) where
R1 and R2 are distinct elements of hgi. If the bit-length of the hash output is l such that
2l < r then, by the pigeonhole principle, there must be distinct R1 , R2 hgi such that
H(mkR1 ) = H(mkR2 ). Indeed, one expects many such pairs if 2l is significantly smaller
than r.
Even when 2l is significantly larger than r then, by the birthday paradox, one expects
there to be a collision of the form H(mkR1 ) = H(mkR2 ). However, if 2l is larger than r2
then the probability of such a collision is rather low.
As for security, the existence of two keys for which the same signature is valid on
the same message is not considered to lead to any practical attack in any real-world
application. In any case, it appears to be impossible to construct the actual signatures,
given the hash collision H(mkR1 ) = H(mkR2 ), without solving at least two instances of
the discrete logarithm problem.
22.1.12: Change s2 to k as1 (mod r). All the security arguments are unchanged by
this.
22.2.2: Write the verification equation as
634
Given the public key h and a message with hash H(m) the adversary chooses a random
1
1 s1 , s2 < r and computes s1 = (g H(m) hs1 )s2 (mod p). Now, compute an integer s1
using the Chinese remainder theorem so that
s1 s1 (mod p)
Note that, under our assumptions, F (s1 ) = s1 and that sr1 1 (mod p). Then hF (s1 ) ss12 =
22.2.12: First, precompute and store g1 = g r . Then, for each signature (s0 , s2 )
to be verified run Euclids algorithmon inputs u2 = F (s0 )s1
(mod r) and r until the
2
current remainder is approximately r. Write v for the resulting coefficient of u2 (in
the notation of Section 2.3 this is v = si which clashes horribly with thenotation of this
chapter). By part
6 of Lemma 2.3.3 it follows that
v, vu2 (mod r) r. Also, write
u1 v w0 + w1 r (mod r) with 0 w0 , w1 < r. Equation 22.6 can therefore be
written as
g w0 g1w1 hu2 v sv
0 =1
635
2
22.2.18: For example, to check that g a is correct one can test whether
g2a ) =
e(g1 , 1
(m+a)
e((g2a ), g2a ). The other elements are tested similarly. For the second part, e g1
, g2a , g2m
should equal z.
Chapter 23: Public Key Encryption Based on Discrete Logarithms
23.1.5: The proof proceeds almost identically to the previous case, except that when a
decryption query is made then only one call to the oracle is required. This variant is
studied in Section 10.4 of Cramer and Shoup [160].
23.1.7: Given a Hash-DH instance (g, g a , g b ), if one can compute g ab then one can
compute kdf(g ab ). Given a DDH instance (g, g a , g b , g c ) one can compute K = kdf(g c )
and, using an oracle for Hash-DH, distinguish it from kdf(g ab ).
23.2.2: For the first statement note that for each 0 z1 < r there is a unique choice
for z2 . The second statement is straightforward. For the final statement write g2 = g1w ,
h = g1v and u2 = g k with 0 k < r and k 6= k. The fact that (z1 , z2 ) Xg1 ,g2 ,h imposes
the linear equation z1 + wz2 v (mod r). To prove the result we need to show that one
can simultaneously solve kz1 + k wz2 x (mod
r) for any 0 x < r. The result follows
since the determinant of the matrix k1 kw
is
not
zero modulo r.
w
23.2.7: First has v 6 G, second does not satisfy equation (23.1), third has message
m = 1.
23.2.11: The adversary returns eu1z1 u2z2 just as the Decrypt algorithm does.
23.2.12: Given a challenge ciphertext (u1 , u2 , e, v) compute u1 = u1 g1 , u2 = u2 g2 and
e = eh (these are g1k+1 , g2k+1 and mhk+1 respectively). Then compute = H(u1 , u2 , e )
and set v = (u1 )x1 +y1 (u2 )x2 +y2 . Calling the decryption oracle on (u1 , u2 , e , v ) gives
m.
23.2.13: Let u1 be a random element of Fp of order l. Set u2 = ua1 and v = ub1 for
random 1 a, b < l and choose any e Fp . Call the decryption oracle on (u1 , u2 , e, v).
With probability 1/l the decryption oracle does not return , and indeed returns some
message m. One therefore has
rz1 +a(rz2 )
u1
= me1 .
Since it is easy to compute the discrete logarithm of me1 to the base u1 when l is small
one obtains a linear equation in z1 and z2 modulo l. Repeating the attack and solving
gives z1 (mod l) and z2 (mod l).
Q
If p 1 has distinct small prime factors l1 , . . . , lt so that ti=1 li > r then, by repeating
the above attack, one can determine the private key uniquely using the Chinese remainder
theorem.
23.3.7: Just set c2 = c2 s for some non-zero string s {0, 1}l and query the
decryption oracle on (c1 , c2 ) to get m s.
23.3.8: Given Qid = H1 (id) set R = (g)Qid . Invert the hash function to find an
identity id such that H1 (id ) = R. Then request the private key for identity id to receive
R = Rs = ((g)Qid )s . One can obtain Qid = Qsid as R (g )1 .
23.3.10: If one can solve CDH in GT then compute z = e(Q, g), z1 = e(Q, g a ) = z a
and z2 = e(Q, g b ) = z b . Then the solution to the CDH instance (z, z1 , z2 ) is the required
value. The case of CDH in G2 is similar.
Chapter 24: The RSA and Rabin Cryptosystems
24.1.1: Repeatedly choose random primes p such that 2/21 < p < 2/2 and let q =
u2l /p + (where Z0 is small, possibly just always zero) until q is prime and the
636
637
24.2.7: The advantage is that one speeds up the square roots modulo p, q and r since
the primes are smaller (see Example 24.1.4). The main disadvantage is that there are
now 8 square roots, in general, to choose from.
24.2.8: There are still only four square roots (two square roots modulo p, which
correspond via Hensel lifting to two square roots modulo pr , and similarly for q). Hence,
one can use exactly the same type of redundancy schemes as standard Rabin. One speeds
up computing square roots by using the Chinese remainder theorem and Hensel lifting as
in Example 24.1.6. Hence, there is a significant advantage of using Rabin in this way.
x
), 0) to get x
24.2.17: Choose random 1 < x < N , call oracle on (x2 (mod N ), ( N
x 6 x (mod q) then one can split N . This situation occurs with probability 2/9. The
idea of cubing is mentioned in Rabin [491].
x
) = 1 and set y = A(x4 , x2 , x2 ). Then
24.2.23: Choose random x such that ( N
gcd(x y, N ) splits N with probability 1/2. This argument is due to Shmuely [548]. For
precise details see Biham, Boneh and Reingold [57].
24.2.24: Compute (N ) = 2(N + 2) M and then solve a quadratic equation to get
p and q.
24.2.26: Use the same method as Exercise 24.2.23.
24.2.27: Since it is hard to choose random points in E(Z/N Z) one cannot apply the
method of Shmuely with the first oracle. For the second oracle one can choose P and
then fit the curve through it. Shmuelys method then applies.
24.3.1: Obviously, (m1 m2 )2 m21 m22 (mod N ). But we need to ensure that decryption
of the product of the ciphertexts really does return the product of the messages. Since
there is no guarantee that m1 m2 (mod N ) has the correct bit pattern in the l least
significant bits, redundancy in the message is not suitable for homomorphic encryption.
The extra bits redundancy does not work either, since the least significant bit of
m1 m2 (mod N ) may not be the product (or any other easily computable function) of the
least significant bits of m1 and m2 .
Finally, the Williams redundancy scheme is also not compatible with homomorphic
encryption, since P (m1 )P (m2 ) (mod N ) is almost always not equal to P (m1 m2 ) (mod N ).
24.3.5: See Paillier [471].
24.3.7: One computes cp1 (mod p2 ) to get 1 + pq(p 1)m 1 + p(qm) (mod p2 )
and hence m (mod p). Similarly one computes m (mod q) and hence m (mod N ). As with
standard RSA decryption using the CRT, we replace one modular exponentiation with
two modular exponentiations where the exponents and moduli are half the size. Hence,
the new method is about 3 times faster than the old one.
24.3.8: The security depends on the decisional problem: Given y, is y hx (mod N 2 )
for some integer 0 x < 2k . This problem is a variant of composite residuosity, and
also similar to the discrete logarithm problem in an interval; see Exercise 13.3.6 and
Section 14.5 for algorithms to solve this problem in O(2k/2 ) multiplications.
2
Suppose a users public key consists of elements hi = uN
i (mod N ) for 1 i l
(i1)/l
638
1 < u < N ). To encrypt to the user one could compute c = (1 + N m)ha1 1 hal l (mod N 2 )
using a sliding window multi-exponentiation method, where 0 a1 , . . . , al < 2k for some
value k (one possibility would be 2k N 1/l ).
24.3.11: One encrypts 0 m < N k as
k
24.4.11: Let m be the message to forge and try to find x, y N such that
(P + x) (P + m)(P + y) (mod N ).
Then x and y satisfy x y(P + m) P 2 P + P m (mod N ). In other words, we seek a
small solution to x Ay B (mod N ) for fixed A, B and N . We have seen how to solve
such a problem in Section 11.3.2 under the name Gallant-Lambert-Vanstone method.
639
24.4.12: Take P = A1 B (mod N ).
24.5.1: One checks a guess for d by testing whether y d x (mod N ) for some precomputed pair 1 < x < N and y = xe (mod N ). If one precomputes y 2 (mod N ) then
one can compute the next value y d+2 (mod N ) using a single modular multiplication as
y d y 2 (mod N ). The total complexity is therefore O(dM (log(N ))) bit operations.
24.5.5: Write (N ) = (N )/r where r = gcd(p 1, q 1) so that the equation
ed = 1 + k(N ) corresponds to the equation edr + krN kru (with u N ). The
Wiener method
computes dr, as long as the condition drkru < N holds or, in other words,
if d < N 1/4 /( 3 r). One can determine r as gcd(dr, N 1) and hence determine d.
24.5.6: One finds gcd(p 1, q 1) = 10 and d = 97.
24.5.7: If (e, k) satisfy ed = 1 + k(N ) then so do (e , k ) = (e + l(N ), k + ld). Taking
l large enough one can ensure that duk > N and so the attack does not apply.
24.5.8: Choose random 1 < x < N , set y = xe (mod N ) and, for each odd integer
1 < dp in turn, compute gcd(x y dp (mod N ), N ).
640
For the attack, suppose s1 and s2 are valid signatures for identity id on messages m1 and
m2 such that gcd(H2 (m1 ), H2 (m2 )) = 1. Let a, b Z be such that aH2 (m1 )+bH2 (m2 ) = 1.
Then
e
sa1 sb2 H1 (id) (mod N )
25.1.1: Since (P ) = OEe if and only if P = OEe it follows that ker( ) = ker(). On
the other hand, ker( ) = 1 (ker()). So if does not fix ker() then the isogenies
are not equivalent.
25.1.3: Just add composition with a power of Frobenius.
25.1.4: The existence of 1 and 2 follows from applying Theorem 9.6.18 to 2 1 :
E E2 .
25.1.9: The kernel of is a group of order G. Apply Velus formula and write the
function X as a rational function in x. Once the result for 1 (x) is proved the result for
2 (x, y) follows from Theorem 9.7.5.
25.1.11: The calculation is essentially the same as a calculation in the proof of Theorem 9.7.5.
25.1.14: One performs d steps, each step being some arithmetic operations in Fqn .
The x-coordinates of points in G are all roots of a polynomial of degree (d 1)/2 when
d is odd or roots of yF (x) where deg(F (x)) < d/2. The y-coordinates are defined over
quadratic extensions.
It is not quite true that n < d in general. Indeed, Fqn is a compositum of fields of
degrees corresponding to the degrees of irreducible factors. Hence n can be bigger than d
(e.g., d = 5 where the polynomial splits as quadratic times cubic). When d is prime then
there are no such problems as the kernel subgroup is generated by a single x-coordinate
and a quadratic extension for y.
25.1.17: Note that t(Q) = 2Fx (Q)a1 Fy (Q) = 2(3x2Q +2a2 xQ +a4 a1 yQ )a1 (2yQ
a1 xQ a3 ), and re-arranging gives the result. Similarly, u(Q) = Fy (Q)2 = (2yQ + a1 xQ +
2
2
a3 )2 = 4(yQ
+ yQ (a1 xQ + a3 )) + (a1 xQ + a3 )2 , and one can replace yQ
+ yQ (a1 xQ + a3 )
3
2
by xQ + a2 xQ + a4 xQ + a6 .
25.1.18: The first statement is using xQ = x (x xQ ) the rest are equally easy. For
example, the final statement follows from x3Q = x3 3x2 xQ + 3xx2Q (x xQ )3 together
with some of the earlier cases.
25.1.20: Combine Theorem 25.1.6 with Exercise 25.1.17. Then simplify the formulae
using Exercises 25.1.18 and 25.1.19. For details see pages 80-81 of [380].
25.2.2: One computes the powers of j(E) in Fq in O(M (log(q))) bit operations. In
the polynomial (x, y) there are O(2 ) terms to consider and the coefficients are of size
O( log()) and so reducing each coefficient to an element of Fq requires (the hardest
case being when q is prime and large) O( log() log(q)) or O(M ( log())) bit operations.
We also need to multiply each coefficient by a suitable power of j(E). The total cost
641
is therefore O(2 ( log() log(q) + M (log(q))) bit operations. The resulting polynomial
requires O( log(q)) bits. The claim about root finding is immediate from Exercise 2.12.5.
25.2.3: The first statement follows since j(E) = j(E ) and the Fq -rational -isogenies
are given by the roots of (j(E), y). For the second statement, let : E E be the
isomorphism and consider the map EndFq (E) EndFq (E ) given by 7 1 .
One can check that this is a ring homomorphism and, since is an isomorphism, it is
surjective.
more than q operations to find the chain. The advantage is that the isogeny itself can
be computed faster, but probably only by a constant factor.
Chapter 26: Pairings on Elliptic Curves
26.1.1: A function f such that div(f ) = D1 is defined up to multiplication by an element
Q
642
26.3.5: The fact that the divisors of the functions are correct is immediate. To show
the functions are normalised at infinity it is necessary to show that the functions y and
3
3
3
3
2
x are normalised at infinity. To see this note that t3
= (y/x) = y /x = y(x + a2 x +
3
a4 x + a6 a1 xy a3 y)/x = y(1 + u) where u is zero at OE . Hence, y is normalised at
2
3
2
2
infinity. Similarly, t2
= (y/x) = (x + a2 x + a4 x + a6 a1 xy a3 y)/x = x + u where
u(OE ) = a2 and so x is normalised at infinity too. It follows that l(x, y) = y x + c and
v(x, y) = x c are normalised at infinity.
m
26.3.8: This follows since if div(fn,P ) = n(P ) n(OE ) then div(fn,P
) = mn(P )
mn(OE ). So take m = N/n.
26.3.10: Let Q1 , Q2 E[r] be such that Q1 6= Q2 . Suppose Q1 and Q2 were in the
same class in E(Fqk )/rE(Fqk ). Then Q1 Q2 = [r]R for some R E(Fqk ). It would
follow that R has order r2 , but the conditions imply that no such group element exists.
26.3.12: If v(x) = (xa) is a vertical line function over Fq then v(Q) = xQ a Fqk/2 .
k
It follows that v(Q)(q 1)/r = 1.
26.3.17: The first statement follows directly from the definition. To show part 2, first
1
note that div(fs,xs,Q ) = ([s]Q) s(Q) + (s 1)(OE ) = div(fs,Q
). Now, s q m
T m (mod r) and so, by Exercise 26.3.14, we have a power of the ate pairing. Part 3
follows from
div(fs,h(x)x,Q ) =
d
X
i=0
d
X
i=0
= as,h(x) (Q, P )s
(this is essentially the same argument as in equation (26.5)). The additive property follows
from the fact that the divisor of a product is the sum of the divisors. The multiplicative
property follows from the additive property and from part 3.
26.5.3: Given P, [a]P, [b]P choose a point Q such that e(P, Q) 6= 1 and one can invert
pairings with respect to that point Q. Compute z = e(P, Q)ab as in Lemma 26.5.2, then
call the pairing inversion oracle on (Q, z) to get [ab]P .
Bibliography
[1] M. Abdalla, M. Bellare, and P. Rogaway, DHIES: An encryption scheme based on
the Diffie-Hellman problem, Preprint, 2001.
[2] L. M. Adleman and J. DeMarrais, A subexponential algorithm for discrete logarithms
over all finite fields, Math. Comp. 61 (1993), no. 203, 115.
[3] L. M. Adleman, K. L. Manders, and G. L. Miller, On taking roots in finite fields,
Foundations of Computer Science (FOCS), IEEE, 1977, pp. 175178.
[4] L.M. Adleman, J. DeMarrais, and M.-D. Huang, A subexponential algorithm for
discrete logarithms over the rational subgroup of the Jacobians of large genus hyperelliptic curves over finite fields, ANTS I (L. M. Adleman and M.-D. Huang, eds.),
LNCS, vol. 877, Springer, 1994, pp. 2840.
[5] G. B. Agnew, R. C. Mullin, I. M. Onyszchuk, and S. A. Vanstone, An implementation for a fast public-key cryptosystem, J. Crypt. 3 (1991), no. 2, 6379.
[6] M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Ann. of Math 160 (2004),
no. 2, 781793.
[7] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, Closest point search in lattices,
IEEE Trans. Inf. Theory 48 (2002), no. 8, 22012214.
[8] A. Akavia, Solving hidden number problem with one bit oracle and advice, CRYPTO
2009 (S. Halevi, ed.), LNCS, vol. 5677, Springer, 2009, pp. 337354.
[9] W. Alexi, B. Chor, O. Goldreich, and C.-P. Schnorr, RSA and Rabin functions:
Certain parts are as hard as the whole, SIAM J. Comput. 17 (1988), no. 2, 194
209.
[10] W. R. Alford, A. Granville, and C. Pomerance, There are infinitely many
Carmichael numbers, Ann. of Math. 139 (1994), no. 3, 703722.
[11] A. Antipa, D. R. L. Brown, R. P. Gallant, R. J. Lambert, R. Struik, and S. A.
Vanstone, Accelerated verification of ECDSA signatures, SAC 2005 (B. Preneel and
S. E. Tavares, eds.), LNCS, vol. 3897, Springer, 2006, pp. 307318.
[12] C. Ar`ene, T. Lange, M. Naehrig, and C. Ritzenthaler, Faster computation of the
Tate pairing, J. Number Theory 131 (2011), no. 5, 842857.
[13] J. Arney and E. D. Bender, Random mappings with constraints on coalescence and
number of origins, Pacic J. Math. 103 (1982), 269294.
[14] E. Artin, Galois theory, 2nd ed., Notre Dame, 1959.
643
644
BIBLIOGRAPHY
[22] E. Bach and J. Shallit, Algorithmic number theory, MIT press, 1996.
[23] E. Bach and J. Sorenson, Sieve algorithms for perfect power testing, Algorithmica
9 (1993), 313328.
[24] S. Bai and R. P. Brent, On the efficiency of Pollards rho method for discrete
logarithms, CATS 2008 (J. Harland and P. Manyem, eds.), Australian Computer
Society, 2008, pp. 125131.
[25] D. V. Bailey, L. Batina, D. J. Bernstein, P. Birkner, J. W. Bos, H.-C. Chen, C.M. Cheng, G. van Damme, G. de Meulenaer, L. Julian Dominguez Perez, J. Fan,
T. G
uneysu, F. Gurkaynak, T. Kleinjung, T. Lange, N. Mentens, R. Niederhagen,
C. Paar, F. Regazzoni, P. Schwabe, L. Uhsadel, A. Van Herrewege, and B.-Y. Yang,
Breaking ECC2K-130, Cryptology ePrint Archive, Report 2009/541, 2009.
[26] R. Balasubramanian and N. Koblitz, The improbability that an elliptic curve has
sub-exponential discrete log problem under the Menezes-Okamoto-Vanstone algorithm, J. Crypt. 11 (1998), no. 2, 141145.
[27] W. D. Banks and I. E. Shparlinski, Sato-Tate, cyclicity, and divisibility statistics
on average for elliptic curves of small height, Israel J. Math. 173 (2009), 253277.
hEigeartaigh,
BIBLIOGRAPHY
645
[31] A. Bauer, Vers une generalisation rigoureuse des methodes de Coppersmith pour
la recherche de petites racines de polyn
omes, Ph.D. thesis, Universite de Versailles
Saint-Quentin-en-Yvelines, 2008.
[32] M. Bellare, R. Canetti, and H. Krawczyk, A modular approach to the design and
analysis of authentication and key exchange protocols, Symposium on the Theory
of Computing (STOC), ACM, 1998, pp. 419428.
[33] M. Bellare, J. A. Garay, and T. Rabin, Fast batch verification for modular exponentiation and digital signatures, EUROCRYPT 1998 (K. Nyberg, ed.), LNCS, vol.
1403, Springer, 1998, pp. 236250.
[34] M. Bellare, S. Goldwasser, and D. Micciancio, Pseudo-Random number generation
within cryptographic algorithms: The DSS case, CRYPTO 1997 (B. S. Kaliski Jr.,
ed.), LNCS, vol. 1294, Springer, 1997, pp. 277291.
[35] M. Bellare, C. Namprempre, and G. Neven, Security proofs for identity-based identification and signature schemes, J. Crypt. 22 (2009), no. 1, 161.
[36] M. Bellare and G. Neven, Multi-signatures in the plain public-key model and a
general forking lemma, CCS 2006 (A. Juels, R. N. Wright, and S. De Capitani
di Vimercati, eds.), ACM, 2006, pp. 390399.
[37] M. Bellare, D. Pointcheval, and P. Rogaway, Authenticated key exchange secure
against dictionary attacks, EUROCRYPT 2000 (B. Preneel, ed.), LNCS, vol. 1807,
Springer, 2000, pp. 139155.
[38] M. Bellare and P. Rogaway, Random oracles are practical: A paradigm for designing
efficient protocols, CCS 1993, ACM, 1993, pp. 6273.
[39]
[40]
[41]
, The exact security of digital signatures how to sign with RSA and Rabin,
EUROCRYPT 1996 (U. M. Maurer, ed.), LNCS, vol. 1070, Springer, 1996, pp. 399
416.
[42] K. Bentahar, The equivalence between the DHP and DLP for elliptic curves used in
practical applications, revisited, IMA Cryptography and Coding (N. P. Smart, ed.),
LNCS, vol. 3796, Springer, 2005, pp. 376391.
[43]
[44] D. J. Bernstein, Faster square roots in annoying finite fields, Preprint, 2001.
[45]
[46]
646
[47]
BIBLIOGRAPHY
, Proving tight security for Rabin-Williams signatures, EUROCRYPT 2008
(N. P. Smart, ed.), LNCS, vol. 4965, Springer, 2008, pp. 7087.
[53]
[54]
, Type-II optimal polynomial bases, WAIFI 2010 (M. A. Hasan and T. Helleseth, eds.), LNCS, vol. 6087, Springer, 2010, pp. 4161.
[55] D. J. Bernstein, T. Lange, and R. R. Farashahi, Binary Edwards curves, CHES 2008,
(E. Oswald and P. Rohatgi, eds.), LNCS, vol. 5154, Springer, 2008, pp. 244265.
[56] D. J. Bernstein, T. Lange, and P. Schwabe, On the correct use of the negation map
in the Pollard rho method, PKC 2011 (D. Catalano, N. Fazio, R. Gennaro, and
A. Nicolosi, eds.), LNCS, vol. 6571, Springer, 2011, pp. 128146.
[57] E. Biham, D. Boneh, and O. Reingold, Breaking generalized Diffie-Hellman modulo
a composite is no easier than factoring, Inf. Process. Lett. 70 (1999), no. 2, 8387.
[58] G. Bisson and A. V. Sutherland, Computing the endomorphism ring of an ordinary
elliptic curve over a finite field, J. Number Theory 131 (2011), no. 5, 815831.
[59] S. R. Blackburn and S. Murphy, The number of partitions in Pollard rho, unpublished manuscript, 1998.
[60] S. R. Blackburn and E. Teske, Baby-step giant-step algorithms for non-uniform
distributions, ANTS IV (W. Bosma, ed.), LNCS, vol. 1838, Springer, 2000, pp. 153
168.
[61] I. F. Blake, R. Fuji-Hara, R. C. Mullin, and S. A. Vanstone, Computing logarithms
in finite fields of characteristic two, SIAM J. Algebraic and Discrete Methods 5
(1984), no. 2, 272285.
[62] I. F. Blake and T. Garefalakis, On the complexity of the discrete logarithm and
Diffie-Hellman problems, J. Complexity 20 (2004), no. 2-3, 148170.
[63] I. F. Blake, T. Garefalakis, and I. E. Shparlinski, On the bit security of the DiffieHellman key, Appl. Algebra Eng. Commun. Comput. 16 (2006), no. 6, 397404.
[64] I. F. Blake, G. Seroussi, and N. P. Smart, Elliptic curves in cryptography, Cambridge, 1999.
BIBLIOGRAPHY
[65]
647
[66] D. Bleichenbacher, Generating ElGamal signatures without knowing the secret key,
EUROCRYPT 1996 (U. M. Maurer, ed.), LNCS, vol. 1070, Springer, 1996, pp. 10
18.
[67]
[68]
[69] D. Bleichenbacher and A. May, New attacks on RSA with small secret CRTexponents, PKC 2006 (M. Yung, Y. Dodis, A. Kiayias, and T. Malkin, eds.), LNCS,
vol. 3958, Springer, 2006, pp. 113.
[70] D. Bleichenbacher and P. Q. Nguyen, Noisy polynomial interpolation and noisy
Chinese remaindering, EUROCRYPT 2000 (B. Preneel, ed.), LNCS, vol. 1807,
Springer, 2000, pp. 5369.
[71] J. Bl
omer and A. May, Low secret exponent RSA revisited, Cryptography and Lattices (CaLC) (J. H. Silverman, ed.), LNCS, vol. 2146, Springer, 2001, pp. 419.
[72] J. Bl
omer and A. May, A tool kit for finding small roots of bivariate polynomials
over the integers, EUROCRYPT 2005 (R. Cramer, ed.), LNCS, vol. 3494, Springer,
2005, pp. 251267.
[73] M. Blum and S. Micali, How to generate cryptographically strong sequences of
pseudo-random bits, SIAM J. Comput. 13 (1984), no. 4, 850864.
[74] D. Boneh, Simplified OAEP for the RSA and Rabin functions, CRYPTO 2001
(J. Kilian, ed.), LNCS, vol. 2139, Springer, 2001, pp. 275291.
[75]
[76] D. Boneh and X. Boyen, Short signatures without random oracles, EUROCRYPT
2004 (C. Cachin and J. Camenisch, eds.), LNCS, vol. 3027, Springer, 2004, pp. 56
73.
[77]
, Short signatures without random oracles and the SDH assumption in bilinear groups, J. Crypt. 21 (2008), no. 2, 149177.
[78] D. Boneh and G. Durfee, Cryptanalysis of RSA with private key d less than N 0.292 ,
IEEE Trans. Inf. Theory 46 (2000), no. 4, 13391349.
[79] D. Boneh, G. Durfee, and N. Howgrave-Graham, Factoring N = pr q for large r,
CRYPTO 1999 (M. J. Wiener, ed.), LNCS, vol. 1666, Springer, 1999, pp. 326337.
[80] D. Boneh and M. K. Franklin, Identity based encryption from the Weil pairing,
CRYPTO 2001 (J. Kilian, ed.), LNCS, vol. 2139, Springer, 2001, pp. 213229.
[81]
648
BIBLIOGRAPHY
[82] D. Boneh, A. Joux, and P. Nguyen, Why textbook ElGamal and RSA encryption are
insecure, ASIACRYPT 2000 (T. Okamoto, ed.), LNCS, vol. 1976, Springer, 2000,
pp. 3043.
[83] D. Boneh and R. J. Lipton, Algorithms for black-box fields and their application to
cryptography, CRYPTO 1996 (N. Koblitz, ed.), LNCS, vol. 1109, Springer, 1996,
pp. 283297.
[84] D. Boneh and I. E. Shparlinski, On the unpredictability of bits of the elliptic curve
Diffie-Hellman scheme, CRYPTO 2001 (J. Kilian, ed.), LNCS, vol. 2139, Springer,
2001, pp. 201212.
[85] D. Boneh and R. Venkatesan, Hardness of computing the most significant bits of
secret keys in Diffie-Hellman and related schemes, CRYPTO 1996 (N. Koblitz, ed.),
LNCS, vol. 1109, Springer, 1996, pp. 129142.
[86]
, Rounding in lattices and its cryptographic applications, Symposium on Discrete Algorithms (SODA), ACM/SIAM, 1997, pp. 675681.
[87]
[88] A. Borodin and I. Munro, The computational complexity of algebraic and numeric
problems, Elsevier, 1975.
[89] J. W. Bos, M. E. Kaihara, and T. Kleinjung, Pollard rho on elliptic curves, Preprint,
2009.
[90] J. W. Bos, M. E. Kaihara, and P. L. Montgomery, Pollard rho on the playstation
3, Handouts of SHARCS 2009, 2009, pp. 3550.
[91] J. W. Bos, T. Kleinjung, and A. K. Lenstra, On the use of the negation map in
the Pollard Rho method, ANTS IX (G. Hanrot, F. Morain, and E. Thome, eds.),
LNCS, vol. 6197, Springer, 2010, pp. 6682.
[92] W. Bosma and H. W. Lenstra Jr., Complete systems of two addition laws for elliptic
curves, J. Number Theory 53 (1995), 229240.
[93] A. Bostan, F. Morain, B. Salvy, and E. Schost, Fast algorithms for computing
isogenies between elliptic curves, Math. Comp. 77 (2008), no. 263, 17551778.
[94] C. Boyd and A. Mathuria, Protocols for authentication and key establishment, Information Security and Cryptography, Springer, 2003.
[95] X. Boyen, The uber-assumption family, Pairing 2008 (S. D. Galbraith and K. G.
Paterson, eds.), LNCS, vol. 5209, Springer, 2008, pp. 3956.
[96] V. Boyko, M. Peinado, and R. Venkatesan, Speeding up discrete log and factoring
based schemes via precomputations, EUROCRYPT 1998 (K. Nyberg, ed.), LNCS,
vol. 1403, Springer, 1998, pp. 221235.
[97] S. Brands, An efficient off-line electronic cash system based on the representation
problem, Tech. report, CWI Amsterdam, 1993, CS-R9323.
[98] R. P. Brent, An improved Monte Carlo factorization algorithm, BIT (1980), 176
184.
BIBLIOGRAPHY
649
[99] R. P. Brent and J. M. Pollard, Factorization of the eighth Fermat number, Math.
Comp. 36 (1981), no. 154, 627630.
[100] R. P. Brent and P. Zimmermann, Modern computer arithmetic, Cambridge, 2010.
[101]
, An O(M (n)logn) algorithm for the Jacobi symbol, ANTS IX (G. Hanrot,
F. Morain, and E. Thome, eds.), LNCS, vol. 6197, Springer, 2010, pp. 8395.
650
BIBLIOGRAPHY
[116] R. Canetti and H. Krawczyk, Analysis of key-exchange protocols and their use for
building secure channels, EUROCRYPT 2001 (B. Ptzmann, ed.), LNCS, vol. 2045,
Springer, 2001, pp. 453474.
[117] E. R. Caneld, P. Erd
os, and C. Pomerance, On a problem of Oppenheim concerning
factorisatio numerorum, J. Number Theory 17 (1983), no. 1, 128.
[118] D. G. Cantor, Computing in the Jacobian of an hyperelliptic curve, Math. Comp.
48 (1987), no. 177, 95101.
[119]
[120] D. Cash, E. Kiltz, and V. Shoup, The twin Diffie-Hellman problem and applications,
EUROCRYPT 2008 (N. P. Smart, ed.), LNCS, vol. 4965, Springer, 2008, pp. 127
145.
[121] J. W. S. Cassels, An introduction to the geometry of numbers, Springer, 1959.
[122]
[132] J. H. Cheon, J. Hong, and M. Kim, Speeding up the Pollard rho method on prime
fields, ASIACRYPT 2008 (J. Pieprzyk, ed.), LNCS, vol. 5350, Springer, 2008,
pp. 471488.
[133] J. H. Cheon and H.-T. Kim, Analysis of low Hamming weight products, Discrete
Applied Mathematics 156 (2008), no. 12, 22642269.
[134] M. A. Cherepnev, On the connection between the discrete logarithms and the DiffieHellman problem, Discr. Math. Appl. 6 (1996), no. 4, 341349.
BIBLIOGRAPHY
651
[135] H. Cohen, A course in computational algebraic number theory, GTM 138, Springer,
1993.
[136]
[137] P. Cohen, On the coefficients of the transformation polynomials for the elliptic modular function, Math. Proc. Cambridge Philos. Soc. 95 (1984), no. 3, 389402.
[138] S. A. Cook, An overview of computational complexity, Commun. ACM 26 (1983),
no. 6, 400408.
[139] D. Coppersmith, Fast evaluation of logarithms in fields of characteristic 2, IEEE
Trans. Inf. Theory 30 (1984), no. 4, 587594.
[140]
, Small solutions to polynomial equations, and low exponent RSA vulnerabilities, J. Crypt. 10 (1997), no. 4, 233260.
[141]
, Optimal security proofs for PSS and other signature schemes, EUROCRYPT 2002 (L. R. Knudsen, ed.), LNCS, vol. 2332, Springer, 2002, pp. 272287.
[149]
, Finding small roots of bivariate integer polynomial equations: A direct approach, CRYPTO 2007 (A. Menezes, ed.), LNCS, vol. 4622, Springer, 2007, pp. 379
394.
652
BIBLIOGRAPHY
, Universal hash proofs and a paradigm for adaptive chosen ciphertext secure
public-key encryption, EUROCRYPT 2002 (L. R. Knudsen, ed.), LNCS, vol. 2332,
Springer, 2002, pp. 4564.
[160]
BIBLIOGRAPHY
653
[173]
[174]
, An index calculus algorithm for plane curves of small degree, ANTS VII
(F. Hess, S. Pauli, and M. E. Pohst, eds.), LNCS, vol. 4076, Springer, 2006, pp. 543
557.
[175]
[176]
[177] C. Diem and E. Thome, Index calculus in class groups of non-hyperelliptic curves
of genus three, J. Crypt. 21 (2008), no. 4, 593611.
[178] W. Die and M. E. Hellman, New directions in cryptography, IEEE Trans. Inf.
Theory 22 (1976), 644654.
[179] V. S. Dimitrov, K. U. Jarvinen, M. J. Jacobson, W. F. Chan, and Z. Huang, Provably
sublinear point multiplication on Koblitz curves and its hardware implementation,
IEEE Trans. Computers 57 (2008), no. 11, 14691481.
[180] V. S. Dimitrov, G. A. Jullien, and W. C. Miller, Theory and applications of the
double-base number system, IEEE Trans. Computers 48 (1999), no. 10, 10981106.
[181] S. A. DiPippo and E. W. Howe, Real polynomials with all roots on the unit circle
and abelian varieties over finite fields, J. Number Theory 73 (1998), no. 2, 426450.
[182] C. Doche, T. Icart, and D. R. Kohel, Efficient scalar multiplication by isogeny
decompositions, PKC 2006 (M. Yung, Y. Dodis, A. Kiayias, and T. Malkin, eds.),
LNCS, vol. 3958, Springer, 2006, pp. 191206.
[183] A. Dujella, A variant of Wieners attack on RSA, Computing 85 (2009), no. 1-2,
7783.
[184] I. M. Duursma, Class numbers for some hyperelliptic curves, Arithmetic, Geometry and Coding Theory (R. Pellikaan, M. Perret, and S.G. Vladut, eds.), Walter
de Gruyter, 1996, pp. 4552.
[185] I. M. Duursma, P. Gaudry, and F. Morain, Speeding up the discrete log computation
on curves with automorphisms, ASIACRYPT 1999 (K.Y. Lam, E. Okamoto, and
C. Xing, eds.), LNCS, vol. 1716, Springer, 1999, pp. 103121.
[186] I. M. Duursma and H.-S. Lee, Tate pairing implementation for hyperelliptic curves
y 2 = xp x + d, ASIACRYPT 2003 (C.-S. Laih, ed.), LNCS, vol. 2894, Springer,
2003, pp. 111123.
654
BIBLIOGRAPHY
[187] P. N. J. Eagle, S. D. Galbraith, and J. Ong, Point compression for Koblitz elliptic
curves, Advances in Mathematics of Communication 5 (2011), no. 1, 110.
[188] S. Edixhoven, Le couplage Weil: de la geometrie a
` larithmetique, Notes from a
seminar in Rennes, 2002.
[189] H. M. Edwards, A normal form for elliptic curves, Bulletin of the AMS 44 (2007),
393422.
[190] D. Eisenbud, Commutative algebra with a view toward algebraic geometry, GTM,
vol. 150, Springer, 1999.
[191] T. ElGamal, A public key cryptosystem and a signature scheme based on discrete
logarithms, CRYPTO 1984 (G. R. Blakley and D. Chaum, eds.), LNCS, vol. 196,
Springer, 1985, pp. 1018.
[192] N. D. Elkies, Elliptic and modular curves over finite fields and related computational
issues, Computational Perspectives on Number Theory (D. A. Buell and J. T. Teitelbaum, eds.), Studies in Advanced Mathematics, AMS, 1998, pp. 2176.
[193] A. Enge, Computing modular polynomials in quasi-linear time, Math. Comp. 78
(2009), no. 267, 18091824.
[194] A. Enge and P. Gaudry, A general framework for subexponential discrete logarithm
algorithms, Acta Arith. 102 (2002), 83103.
[195]
, An L(1/3 + ) algorithm for the discrete logarithm problem for low degree curves, EUROCRYPT 2007 (M. Naor, ed.), LNCS, vol. 4515, Springer, 2007,
pp. 379393.
[196] A. Enge, P. Gaudry, and E. Thome, An L(1/3) discrete logarithm algorithm for low
degree curves, J. Crypt. 24 (2011), no. 1, 2441.
[197] A. Enge and A. Stein, Smooth ideals in hyperelliptic function fields, Math. Comp.
71 (2002), no. 239, 12191230.
[198] S. Erickson, M. J. Jacobson Jr., N. Shang, S. Shen, and A. Stein, Explicit formulas for real hyperelliptic curves of genus 2 in affine representation, WAIFI 2007
(C. Carlet and B. Sunar, eds.), LNCS, vol. 4547, Springer, 2007, pp. 202218.
[199] H. M. Farkas and I. Kra, Riemann surfaces, GTM, vol. 71, Springer, 1980.
[200] U. Feige, A. Fiat, and A. Shamir, Zero-knowledge proofs of identity, J. Crypt. 1
(1988), no. 2, 7794.
[201] L. De Feo, Fast algorithms for towers of finite fields and isogenies, Ph.D. thesis,
LEcole
Polytechnique, 2010.
[202] R. Fischlin and C.-P. Schnorr, Stronger security proofs for RSA and Rabin bits, J.
Crypt. 13 (2000), no. 2, 221244.
[203] P. Flajolet and A. M. Odlyzko, Random mapping statistics, EUROCRYPT 1989
(J.-J.Quisquater and J. Vandewalle, eds.), LNCS, vol. 434, Springer, 1990, pp. 329
354.
[204] P. Flajolet and R. Sedgewick, Analytic combinatorics, Cambridge, 2009.
BIBLIOGRAPHY
655
656
BIBLIOGRAPHY
[223] S. D. Galbraith, X. Lin, and M. Scott, Endomorphisms for faster elliptic curve
cryptography on a large class of curves, EUROCRYPT 2009 (A. Joux, ed.), LNCS,
vol. 5479, Springer, 2009, pp. 518535.
[224] S. D. Galbraith and J. F. McKee, The probability that the number of points on an
elliptic curve over a finite field is prime, Journal of the Lond. Math. Soc. 62 (2000),
no. 3, 671684.
[225] S. D. Galbraith, J. M. Pollard, and R. S. Ruprai, Computing discrete logarithms in
an interval, Cryptology ePrint Archive, Report 2010/617, 2010.
[226] S. D. Galbraith and R. S. Ruprai, An improvement to the Gaudry-Schost algorithm
for multidimensional discrete logarithm problems, IMA Cryptography and Coding
(M. G. Parker, ed.), LNCS, vol. 5921, Springer, 2009, pp. 368382.
[227]
, Using equivalence classes to accelerate solving the discrete logarithm problem in a short interval, PKC 2010 (P. Q. Nguyen and D. Pointcheval, eds.), LNCS,
vol. 6056, Springer, 2010, pp. 368383.
BIBLIOGRAPHY
657
[242]
[243]
[244]
, Index calculus for abelian varieties of small dimension and the elliptic curve
discrete logarithm problem, Journal of Symbolic Computation 44 (2009), no. 12,
16901702.
[245] P. Gaudry, F. Hess, and N. P. Smart, Constructive and destructive facets of Weil
descent on elliptic curves, J. Crypt. 15 (2002), no. 1, 1946.
[246] P. Gaudry and D. Lubicz, The arithmetic of characteristic 2 Kummer surfaces and
of elliptic Kummer lines, Finite Fields Appl. 15 (2009), no. 2, 246260.
Schost, Construction of secure random curves of genus 2 over
[247] P. Gaudry and E.
prime fields, EUROCRYPT 2004 (C. Cachin and J. Camenisch, eds.), LNCS, vol.
3027, Springer, 2004, pp. 239256.
[248]
[249] P. Gaudry, E. Thome, N. Theriault, and C. Diem, A double large prime variation for
small genus hyperelliptic index calculus, Math. Comp. 76 (2007), no. 257, 475492.
[250] C. Gentry, Key recovery and message attacks on NTRU-composite, EUROCRYPT
2001 (B. Ptzmann, ed.), LNCS, vol. 2045, Springer, 2001, pp. 182194.
[251]
[252] C. Gentry, C. Peikert, and V. Vaikuntanathan, Trapdoors for hard lattices and
new cryptographic constructions, Symposium on the Theory of Computing (STOC)
(R. E. Ladner and C. Dwork, eds.), ACM, 2008, pp. 197206.
[253] M. Girault, An identity-based identification scheme based on discrete logarithms
modulo a composite number, EUROCRYPT 1990 (I. Damg
ard, ed.), LNCS, vol.
473, Springer, 1991, pp. 481486.
[254] M. Girault, G. Poupard, and J. Stern, On the fly authentication and signature
schemes based on groups of unknown order, J. Crypt. 19 (2006), no. 4, 463487.
[255] O. Goldreich, S. Goldwasser, and S. Halevi, Public-key cryptosystems from lattice reduction problems, CRYPTO 1997 (B. S. Kaliski Jr., ed.), LNCS, vol. 1294, Springer,
1997, pp. 112131.
[256] O. Goldreich, D. Ron, and M. Sudan, Chinese remaindering with errors, IEEE
Trans. Inf. Theory 46 (2000), no. 4, 13301338.
658
BIBLIOGRAPHY
[257] S. Goldwasser, S. Micali, and R. L. Rivest, A digital signature scheme secure against
adaptive chosen-message attacks, SIAM J. Comput. 17 (1988), no. 2, 281308.
[258] G. Gong and L. Harn, Public-key cryptosystems based on cubic finite field extensions,
IEEE Trans. Inf. Theory 45 (1999), no. 7, 26012605.
[259] M. I. Gonz
alez Vasco, M. N
aslund, and I. E. Shparlinski, New results on the hardness
of Diffie-Hellman bits, PKC 2004 (F. Bao, R. H. Deng, and J. Zhou, eds.), LNCS,
vol. 2947, Springer, 2004, pp. 159172.
[260] M. I. Gonz
alez Vasco and I. E. Shparlinski, On the security of Diffie-Hellman bits,
Cryptography and Computational Number Theory (H. Wang K. Y. Lam, I. E. Shparlinski and C. Xing, eds.), Progress in Computer Science and Applied Logic,
Birkh
auser, 2001, pp. 257268.
[261] D. M. Gordon, On the number of elliptic pseudoprimes, Math. Comp. 52 (1989),
no. 185, 231245.
[262] D. M. Gordon and K. S. McCurley, Massively parallel computation of discrete logarithms, CRYPTO 1992 (E. F. Brickell, ed.), LNCS, vol. 740, Springer, 1993, pp. 312
323.
[263] J. Gordon, Strong primes are easy to find, EUROCRYPT 1984 (T. Beth, N. Cot,
and I. Ingemarsson, eds.), LNCS, vol. 209, Springer, 1985, pp. 216223.
[264] R. Granger, F. Hess, R. Oyono, N. Theriault, and F. Vercauteren, Ate pairing on
hyperelliptic curves, EUROCRYPT 2007 (M. Naor, ed.), LNCS, vol. 4515, Springer,
2007, pp. 430447.
[265] R. Granger and F. Vercauteren, On the discrete logarithm problem on algebraic tori,
CRYPTO 2005 (V. Shoup, ed.), LNCS, vol. 3621, Springer, 2005, pp. 6685.
[266] A. Granville, Smooth numbers: Computational number theory and beyond, Algorithmic number theory (J. P. Buhler and P. Stevenhagen, eds.), MSRI Proceedings,
vol. 44, Cambridge, 2008, pp. 267323.
[267] B. H. Gross, Heights and the special values of L-series, Number theory, CMS Conf.
Proc., vol. 7, AMS, 1987, pp. 115187.
[268] M. Grotschel, L. Lovasz, and A. Schrijver, Geometric algorithms and combinatorial
optimization, Springer, 1993.
[269] J. Guarjardo and C. Paar, Itoh-Tsujii inversion in standard basis and its application
in cryptography and codes, Des. Codes Crypt. 25 (2002), no. 2, 207216.
[270] L. C. Guillou and J.-J. Quisquater, A practical zero-knowledge protocol fitted to
security microprocessor minimizing both transmission and memory, EUROCRYPT
1988 (C. G. G
unther, ed.), LNCS, vol. 330, Springer, 1988, pp. 123128.
[271] R. K. Guy, Unsolved problems in number theory, 2nd ed., Springer, 1994.
[272] J. L. Hafner and K. S. McCurley, Asymptotically fast triangularization of matrices
over rings, SIAM J. Comput. 20 (1991), no. 6, 10681083.
[273] D. Hankerson, A. Menezes, and S. Vanstone, Guide to elliptic curve cryptography,
Springer, 2004.
BIBLIOGRAPHY
659
[274] G. Hanrot and D. Stehle, Improved analysis of Kannans shortest lattice vector
algorithm, CRYPTO 2007 (A. Menezes, ed.), LNCS, vol. 4622, Springer, 2007,
pp. 170186.
[275] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, 5th ed.,
Oxford, 1980.
[276] R. Harley, Fast arithmetic on genus two curves, Preprint, 2000.
[277] R. Hartshorne, Algebraic geometry, GTM, vol. 52, Springer, 1977.
[278] J. H
astad and M. N
aslund, The security of all RSA and discrete log bits, J. ACM
51 (2004), no. 2, 187230.
[279] G. Havas, B. S. Majewski, and K. R. Matthews, Extended GCD and Hermite normal
form algorithms via lattice basis reduction, Experimental Math. 7 (1998), no. 2,
125136.
[280] B. Helfrich, Algorithms to construct Minkowski reduced and Hermite reduced lattice
bases, Theor. Comput. Sci. 41 (1985), 125139.
[281] F. Hess, A note on the Tate pairing of curves over finite fields, Arch. Math. 82
(2004), 2832.
[282]
[283] F. Hess, N. Smart, and F. Vercauteren, The eta pairing revisited, IEEE Trans. Inf.
Theory 52 (2006), no. 10, 45954602.
[284] N. J. Higham, Accuracy and stability of numerical algorithms, 2nd ed., SIAM, 2002.
[285] H. Hisil, K. K.-H. Wong, G. Carter, and E. Dawson, Jacobi quartic curves revisited,
ACISP 2009 (C. Boyd and J. M. Gonzalez Nieto, eds.), LNCS, vol. 5594, Springer,
2009, pp. 452468.
[286] Y. Hitchcock, P. Montague, G. Carter, and E. Dawson, The efficiency of solving
multiple discrete logarithm problems and the implications for the security of fixed
elliptic curves, Int. J. Inf. Secur. 3 (2004), 8698.
[287] J. Hostein, J. Pipher, and J. H. Silverman, NTRU: A ring-based public key cryptosystem, ANTS III (J. Buhler, ed.), LNCS, vol. 1423, Springer, 1998, pp. 267288.
[288]
[289] J. Hostein and J. H. Silverman, Random small Hamming weight products with
applications to cryptography, Discrete Applied Mathematics 130 (2003), no. 1, 37
49.
[290] D. Hofheinz and E. Kiltz, The group of signed quadratic residues and applications,
CRYPTO 2009 (S. Halevi, ed.), LNCS, vol. 5677, Springer, 2009, pp. 637653.
[291] S. Hohenberger and B. Waters, Short and stateless signatures from the RSA assumption, CRYPTO 2009 (S. Halevi, ed.), LNCS, vol. 5677, Springer, 2009, pp. 654670.
[292] J. E. Hopcroft and J. D. Ullman, Introduction to automata theory, languages and
computation, Addison-Wesley, 1979.
660
BIBLIOGRAPHY
[293] J. Horwitz and R. Venkatesan, Random Cayley digraphs and the discrete logarithm,
ANTS V (C. Fieker and D. R. Kohel, eds.), LNCS, vol. 2369, Springer, 2002,
pp. 416430.
[294] E. W. Howe, On the group orders of elliptic curves over finite fields, Compositio
Mathematica 85 (1993), 229247.
[295] N. Howgrave-Graham, Finding small roots of univariate modular equations revisited,
IMA Cryptography and Coding (M. Darnell, ed.), LNCS, vol. 1355, Springer, 1997,
pp. 131142.
[296]
BIBLIOGRAPHY
[309]
661
, Expander graphs based on GRH with an application to elliptic curve cryptography, J. Number Theory 129 (2009), no. 6, 14911504.
[310] D. Jao and V. Soukharev, A subexponential algorithm for evaluating large degree
isogenies, ANTS IX (G. Hanrot, F. Morain, and E. Thome, eds.), LNCS, vol. 6197,
Springer, 2010, pp. 219233.
[311] D. Jao and K. Yoshida, Boneh-Boyen signatures and the strong Diffie-Hellman
problem, Pairing 2009 (H. Shacham and B. Waters, eds.), LNCS, vol. 5671, Springer,
2009, pp. 116.
[312] D. Jetchev and R. Venkatesan, Bits security of the elliptic curve Diffie-Hellman
secret keys, CRYPTO 2008 (D. Wagner, ed.), LNCS, vol. 5157, Springer, 2008,
pp. 7592.
[313] Z.-T. Jiang, W.-L. Xu, and Y.-M. Wang, Polynomial analysis of DH secrete key
and bit security, Wuhan University Journal of Natural Sciences 10 (2005), no. 1,
239242.
[314] A. Joux, Algorithmic cryptanalysis, Chapman & Hall/CRC, 2009.
[315] A. Joux and R. Lercier, The function field sieve in the medium prime case, EUROCRYPT 2006 (S. Vaudenay, ed.), LNCS, vol. 4004, Springer, 2006, pp. 254270.
[316] A. Joux, R. Lercier, N. P. Smart, and F. Vercauteren, The number field sieve in the
medium prime case, CRYPTO 2006 (C. Dwork, ed.), LNCS, vol. 4117, Springer,
2006, pp. 326344.
[317] M. Joye and G. Neven, Identity-based cryptography, Cryptology and Information
Security, vol. 2, IOS Press, 2008.
[318] M. Joye and S.-M. Yen, Optimal left-to-right binary signed-digit recoding, IEEE
Trans. Computers 49 (2000), no. 7, 740748.
[319] M. J. Jacobson Jr., N. Koblitz, J. H. Silverman, A. Stein, and E. Teske, Analysis
of the Xedni calculus attack, Des. Codes Crypt. 20 (2000), no. 1, 4164.
[320] M. J. Jacobson Jr. and A. J. van der Poorten, Computational aspects of NUCOMP,
ANTS V (C. Fieker and D. R. Kohel, eds.), LNCS, vol. 2369, Springer, 2002,
pp. 120133.
[321] C. S. Jutla, On finding small solutions of modular multivariate polynomial equations,
EUROCRYPT 1998 (K. Nyberg, ed.), LNCS, vol. 1403, Springer, 1998, pp. 158170.
[322] M. Kaib and H. Ritter, Block reduction for arbitrary norms, Technical Report,
Universitat Frankfurt am Main, 1994.
[323] M. Kaib and C.-P. Schnorr, The generalized Gauss reduction algorithm, Journal of
Algorithms 21 (1996), no. 3, 565578.
[324] B. S. Kaliski Jr., Elliptic curves and cryptography: A pseudorandom bit generator
and other tools, Ph.D. thesis, MIT, 1988.
[325] W. van der Kallen, Complexity of the Havas, Majewski, Matthews LLL Hermite
normal form algorithm, Journal of Symbolic Computation 30 (2000), no. 3, 329
337.
662
BIBLIOGRAPHY
[326] R. Kannan, Improved algorithms for integer programming and related lattice problems, Symposium on the Theory of Computing (STOC), ACM, 1983, pp. 193206.
[327]
[328] R. Kannan and A. Bachem, Polynomial algorithms for computing the Smith and
Hermite normal forms of an integer matrix, SIAM J. Comput. 8 (1979), 499507.
[329] M. Katagi, T. Akishita, I. Kitamura, and T. Takagi, Some improved algorithms for
hyperelliptic curve cryptosystems using degenerate divisors, ICISC 2004 (C. Park
and S. Chee, eds.), LNCS, vol. 3506, Springer, 2004, pp. 296312.
[330] M. Katagi, I. Kitamura, T. Akishita, and T. Takagi, Novel efficient implementations
of hyperelliptic curve cryptosystems using degenerate divisors, WISA 2004 (C.-H.
Lim and M. Yung, eds.), LNCS, vol. 3325, Springer, 2004, pp. 345359.
[331] J. Katz and Y. Lindell, Introduction to modern cryptography, Chapman &
Hall/CRC, 2008.
[332] E. Kiltz and G. Neven, Identity-based signatures, Identity-Based Cryptography
(M. Joye and G. Neven, eds.), Cryptology and Information Security Series, vol. 2,
IOS Press, 2008, pp. 3144.
[333] J. H. Kim, R. Montenegro, Y. Peres, and P. Tetali, A birthday paradox for Markov
chains, with an optimal bound for collision in the Pollard rho algorithm for discrete
logarithm, ANTS VIII (A. J. van der Poorten and A. Stein, eds.), LNCS, vol. 5011,
Springer, 2008, pp. 402415.
[334] J. H. Kim, R. Montenegro, and P. Tetali, Near optimal bounds for collision in
Pollard rho for discrete log, Foundations of Computer Science (FOCS), IEEE, 2007,
pp. 215223.
[335] S. Kim and J.-H. Cheon, A parameterized splitting system and its application to the
discrete logarithm problem with low Hamming weight product exponents, PKC 2008
(R. Cramer, ed.), LNCS, vol. 4939, Springer, 2008, pp. 328343.
[336] B. King, A point compression method for elliptic curves defined over GF(2n ), PKC
2004 (F. Bao, R. H. Deng, and J. Zhou, eds.), LNCS, vol. 2947, Springer, 2004,
pp. 333345.
[337] J. F. C. Kingman and S. J. Taylor, Introduction to measure theory and probability,
Cambridge, 1966.
[338] P. N. Klein, Finding the closest lattice vector when its unusually close, Symposium
on Discrete Algorithms (SODA), ACM/SIAM, 2000, pp. 937941.
[339] E. W. Knudsen, Elliptic scalar multiplication using point halving, ASIACRYPT
1999 (K.-Y. Lam, E. Okamoto, and C. Xing, eds.), LNCS, vol. 1716, Springer,
1999, pp. 135149.
[340] D. E. Knuth, Art of computer programming, Volume 2: semi-numerical algorithms,
3rd ed., Addison-Wesley, 1997.
[341] N. Koblitz, Elliptic curve cryptosystems, Math. Comp. 48 (1987), no. 177, 203209.
BIBLIOGRAPHY
[342]
[343]
[344]
663
, CM curves with good cryptographic properties, CRYPTO 1991 (J. Feigenbaum, ed.), LNCS, vol. 576, Springer, 1992, pp. 279287.
, A course in number theory and cryptography, 2nd ed., GTM 114, Springer,
[345]
1994.
[346] C. K. Koc and T. Acar, Montgomery multplication in GF(2k ), Des. Codes Crypt.
14 (1998), no. 1, 5769.
[347] D. R. Kohel, Endomorphism rings of elliptic curves over finite fields, Ph.D. thesis,
University of California, Berkeley, 1996.
[348]
[349] D. R. Kohel and I. E. Shparlinski, On exponential sums and group generators for elliptic curves over finite fields, ANTS IV (W. Bosma, ed.), LNCS, vol. 1838, Springer,
2000, pp. 395404.
[350] S. Kozaki, T. Kutsuma, and K. Matsuo, Remarks on Cheons algorithms for
pairing-related problems, Pairing 2007 (T. Takagi, T. Okamoto, E. Okamoto, and
T. Okamoto, eds.), LNCS, vol. 4575, Springer, 2007, pp. 302316.
[351] M. Kraitchik, Theorie des nombres, Vol. 1, Gauthier-Villars, Paris, 1922.
[352] F. Kuhn and R. Struik, Random walks revisited: Extensions of Pollards rho algorithm for computing multiple discrete logarithms, SAC 2001 (S. Vaudenay and
A. M. Youssef, eds.), LNCS, vol. 2259, Springer, 2001, pp. 212229.
[353] R. M. Kuhn, Curves of genus 2 with split Jacobian, Trans. Amer. Math. Soc. 307
(1988), no. 1, 4149.
[354] R. Kumar and D. Sivakumar, Complexity of SVP a readers digest, SIGACT News
Complexity Theory Column 32 (2001), 13.
[355] N. Kunihiro and K. Koyama, Equivalence of counting the number of points on
elliptic curve over the ring Zn and factoring n, EUROCRYPT 1998 (K. Nyberg,
ed.), LNCS, vol. 1403, Springer, 1998, pp. 4758.
[356] K. Kurosawa and Y. Desmedt, A new paradigm of hybrid encryption scheme,
CRYPTO 2004 (M. K. Franklin, ed.), LNCS, vol. 3152, Springer, 2004, pp. 426442.
[357] J. C. Lagarias, Knapsack public key cryptosystems and diophantine approximation,
CRYPTO 1983 (D. Chaum, ed.), Plenum Press, 1984, pp. 323.
[358] J. C. Lagarias, H. W. Lenstra Jr., and C.-P. Schnorr, Korkin-Zolotarev bases and
successive minima of a lattice and its reciprocal lattice, Combinatorica 10 (1990),
no. 4, 333348.
[359] J. C. Lagarias and A. M. Odlyzko, Solving low-density subset sum problems, J. ACM
32 (1985), no. 1, 229246.
664
BIBLIOGRAPHY
[363]
[364]
[365] T. Lange, Koblitz curve cryptosystems, Finite Fields Appl. 11 (2005), no. 2, 200
229.
[366] E. Lee, H.-S. Lee, and C.-M. Park, Efficient and generalized pairing computation
on abelian varieties, IEEE Trans. Information Theory 55 (2009), no. 4, 17931803.
[367] A. K. Lenstra, Factorization of polynomials, Computational methods in number
theory (H. W. Lenstra Jr. and R. Tijdeman, eds.), Mathematical Center Tracts
154, Mathematisch Centrum Amsterdam, 1984, pp. 169198.
[368]
[369] A. K. Lenstra and H. W. Lenstra Jr., The development of the number field sieve,
LNM, vol. 1554, Springer, 1993.
[370] A. K. Lenstra, H. W. Lenstra Jr., and L. Lovasz, Factoring polynomials with rational
coefficients, Math. Ann. 261 (1982), 515534.
[371] A. K. Lenstra and I. E. Shparlinski, Selective forgery of RSA signatures with fixedpattern padding, PKC 2002 (D. Naccache and P. Paillier, eds.), LNCS, vol. 2274,
Springer, 2002, pp. 228236.
[372] A. K. Lenstra and E. R. Verheul, The XTR public key system, CRYPTO 2000
(M. Bellare, ed.), LNCS, vol. 1880, Springer, 2000, pp. 119.
[373]
[374] H. W. Lenstra Jr., Factoring integers with elliptic curves, Annals of Mathematics
126 (1987), no. 3, 649673.
[375]
[376]
BIBLIOGRAPHY
665
[380] R. Lercier and F. Morain, Algorithms for computing isogenies between elliptic
curves, Computational Perspectives on Number Theory (D. A. Buell and J. T.
Teitelbaum, eds.), Studies in Advanced Mathematics, vol. 7, AMS, 1998, pp. 7796.
[381] R. Lercier and T. Sirvent, On Elkies subgroups of -torsion points in elliptic curves
defined over a finite field, J. Theor. Nombres Bordeaux 20 (2008), no. 3, 783797.
[382] G. Leurent and P. Q. Nguyen, How risky is the random oracle model?, CRYPTO
2009 (S. Halevi, ed.), LNCS, vol. 5677, Springer, 2009, pp. 445464.
[383] K.-Z. Li and F. Oort, Moduli of supersingular abelian varieties, LNM, vol. 1680,
Springer, 1998.
[384] W.-C. Li, M. N
aslund, and I. E. Shparlinski, Hidden number problem with the trace
and bit security of XTR and LUC, CRYPTO 2002 (M. Yung, ed.), LNCS, vol. 2442,
Springer, 2002, pp. 433448.
[385] R. Lidl and H. Niederreiter, Introduction to finite fields and their applications, Cambridge, 1994.
[386]
[387] R. Lindner and C. Peikert, Better key sizes (and attacks) for LWE-based encryption,
CT-RSA 2011 (A. Kiayias, ed.), LNCS, vol. 6558, Springer, 2011, pp. 123.
[388] J. H. van Lint, Introduction to coding theory, 3rd ed., GTM, vol. 86, Springer, 1999.
[389] P. Lockhart, On the discriminant of a hyperelliptic curve, Trans. Amer. Math. Soc.
342 (1994), no. 2, 729752.
[390] D. L. Long and A. Wigderson, The discrete logarithm hides O(log n) bits, SIAM J.
Comput. 17 (1988), no. 2, 363372.
[391] D. Lorenzini, An invitation to arithmetic geometry, Graduate Studies in Mathematics, vol. 106, AMS, 1993.
[392] L. Lovasz, An algorithmic theory of numbers, graphs and convexity, SIAM, 1986.
[393] L. Lovasz and H. E. Scarf, The generalized basis reduction algorithm, Mathematics
of Operations Research 17 (1992), no. 3, 751764.
[394] R. Lovorn Bender and C. Pomerance, Rigorous discrete logarithm computations in
finite fields via smooth polynomials, Computational Perspectives on Number Theory
(D. A. Buell and J. T. Teitelbaum, eds.), Studies in Advanced Mathematics, vol. 7,
AMS, 1998, pp. 221232.
[395] M. Luby, Pseudorandomness and cryptographic applications, Princeton, 1996.
[396] H. L
uneburg, On a little but useful algorithm, AAECC-3, 1985 (J. Calmet, ed.),
LNCS, vol. 229, Springer, 1986, pp. 296301.
[397] S. Martn Mollev, P. Morillo, and J. L. Villar, Computing the order of points on an
elliptic curve modulo N is as difficult as factoring N, Appl. Math. Lett. 14 (2001),
no. 3, 341346.
[398] C. Mauduit and A. S
arkozy, On finite pseudorandom binary sequences I: Measure
of pseudorandomness, the Legendre symbol, Acta Arith. 82 (1997), 365377.
666
BIBLIOGRAPHY
[399] U. M. Maurer, Towards the equivalence of breaking the Diffie-Hellman protocol and
computing discrete logarithms, CRYPTO 1994 (Y. Desmedt, ed.), LNCS, vol. 839,
Springer, 1994, pp. 271281.
[400]
, Fast generation of prime numbers and secure public-key cryptographic parameters, J. Crypt. 8 (1995), no. 3, 123155.
[401]
[402] U. M. Maurer and S. Wolf, Diffie-Hellman oracles, CRYPTO 1996 (N. Koblitz, ed.),
LNCS, vol. 1109, Springer, 1996, pp. 268282.
[403]
[404]
, Lower bounds on generic algorithms in groups, EUROCRYPT 1998 (K. Nyberg, ed.), LNCS, vol. 1403, Springer, 1998, pp. 7284.
[405]
, The relationship between breaking the Diffie-Hellman protocol and computing discrete logarithms, SIAM J. Comput. 28 (1999), no. 5, 16891721.
[406]
, The Diffie-Hellman protocol, Des. Codes Crypt. 19 (2000), no. 2/3, 147
171.
[407] A. May, New RSA vulnerabilities using lattice reduction methods, Ph.D. thesis,
Paderborn, 2003.
[408]
[409] A. May and J. H. Silverman, Dimension reduction methods for convolution modular
lattices, Cryptography and Lattices (CaLC) (J. H. Silverman, ed.), LNCS, Springer,
2001, pp. 110125.
[410] J. F. McKee, Subtleties in the distribution of the numbers of points on elliptic curves
over a finite prime field, J. London Math. Soc. 59 (1999), no. 2, 448460.
[411] J. F. McKee and R. G. E. Pinch, Further attacks on server-aided RSA cryptosystems, unpublished manuscript, 1998.
[412] W. Meier and O. Staelbach, Efficient multiplication on certain non-supersingular
elliptic curves, CRYPTO 1992 (E. F. Brickell, ed.), LNCS, vol. 740, Springer, 1993,
pp. 333344.
[413] A. Menezes and S. A. Vanstone, The implementation of elliptic curve cryptosystems,
AUSCRYPT 1990 (J. Seberry and J. Pieprzyk, eds.), LNCS, vol. 453, Springer,
1990, pp. 213.
[414] A. J. Menezes, T. Okamoto, and S. A. Vanstone, Reducing elliptic curve logarithms
to a finite field, IEEE Trans. Inf. Theory 39 (1993), no. 5, 16391646.
[415] A.J. Menezes, P.C. van Oorschot, and S.A. Vanstone, Handbook of applied cryptography, CRC Press, 1996.
BIBLIOGRAPHY
667
[416] J.-F. Mestre, La methode des graphes. exemples et applications, Proceedings of the
international conference on class numbers and fundamental units of algebraic number elds (Katata, 1986), Nagoya Univ., 1986, pp. 217242.
[417] J.-F. Mestre, Construction de courbes de genre 2 a
` partir de leurs modules, Eective methods in algebraic geometry (T. Mora and C. Traverso, eds.), Progress in
Mathematics, Birkh
auser, 1991, pp. 313334.
[418] D. Micciancio, Improving lattice based cryptosystems using the Hermite normal
form, Cryptography and Lattices (CaLC) (J. H. Silverman, ed.), LNCS, vol. 2146,
Springer, 2001, pp. 126145.
[419] D. Micciancio and S. Goldwasser, Complexity of lattice problems: A cryptographic
perspective, Kluwer, 2002.
[420] D. Micciancio and O. Regev, Lattice-based cryptography, Post Quantum Cryptography (D. J. Bernstein, J. Buchmann, and E. Dahmen, eds.), Springer, 2009, pp. 147
191.
[421] D. Micciancio and P. Voulgaris, Faster exponential time algorithms for the shortest
vector problem, SODA (M. Charikar, ed.), SIAM, 2010, pp. 14681480.
[422] D. Micciancio and B. Warinschi, A linear space algorithm for computing the Hermite
normal form, ISSAC, 2001, pp. 231236.
[423] S. D. Miller and R. Venkatesan, Spectral analysis of Pollard rho collisions, ANTS
VII (F. Hess, S. Pauli, and M. E. Pohst, eds.), LNCS, vol. 4076, Springer, 2006,
pp. 573581.
[424] V. S. Miller, Short programs for functions on curves, Unpublished manuscript, 1986.
[425]
[426]
, The Weil pairing, and its efficient calculation, J. Crypt. 17 (2004), no. 4,
235261.
[427] A. Miyaji, T. Ono, and H. Cohen, Efficient elliptic curve exponentiation, ICICS
1997 (Y. Han, T. Okamoto, and S. Qing, eds.), LNCS, vol. 1334, Springer, 1997,
pp. 282291.
[428] B. M
oller, Algorithms for multi-exponentiation, SAC 2001 (S. Vaudenay and A. M.
Youssef, eds.), LNCS, vol. 2259, Springer, 2001, pp. 165180.
[429]
, Improved techniques for fast exponentiation, ICISC 2002 (P.-J. Lee and
C.-H. Lim, eds.), LNCS, vol. 2587, Springer, 2003, pp. 298312.
[430]
, Fractional windows revisited: Improved signed-digit representations for efficient exponentiation, ICISC 2004 (C. Park and S. Chee, eds.), LNCS, vol. 3506,
Springer, 2005, pp. 137153.
[431] R. Montenegro and P. Tetali, How long does it take to catch a wild kangaroo?,
Symposium on Theory of Computing (STOC), 2009, pp. 553559.
[432] P. L. Montgomery, Modular multiplication without trial division, Math. Comp. 44
(1985), no. 170, 519521.
668
[433]
BIBLIOGRAPHY
, Speeding the Pollard and elliptic curve methods of factorization, Math.
Comp. 48 (1987), no. 177, 243264.
[434] F. Morain and J.-L. Nicolas, On Cornacchias algorithm for solving the Diophantine
equation u2 + dv 2 = m, Preprint, 1990.
[435] F. Morain and J. Olivos, Speeding up the computations on an elliptic curve using
addition-subtraction chains, Theoretical Informatics and Applications, vol. 24, 1990,
pp. 531543.
[436] C. J. Moreno, Algebraic curves over finite fields, Cambridge, 1991.
[437] W. H. Mow, Universal lattice decoding: principle and recent advances, Wireless
Communications and Mobile Computing 3 (2003), no. 5, 553569.
[438] J. A. Muir and D. R. Stinson, New minimal weight representations for left-to-right
window methods, CT-RSA 2005 (A. Menezes, ed.), LNCS, vol. 3376, Springer, 2005,
pp. 366383.
[439]
[440] V. M
uller, Fast multiplication on elliptic curves over small fields of characteristic
two, J. Crypt. 11 (1998), no. 4, 219234.
[441] D. Mumford, Abelian varieties, Oxford, 1970.
[442]
[443] M. R. Murty, Ramanujan graphs, J. Ramanujan Math. Soc. 18 (2003), no. 1, 120.
[444] R. Murty and I. E. Shparlinski, Group structure of elliptic curves over finite fields
and applications, Topics in Geometry, Coding Theory and Cryptography (A. Garcia
and H. Stichtenoth, eds.), Springer-Verlag, 2006, pp. 167194.
[445] A. Muzereau, N. P. Smart, and F. Vercauteren, The equivalence between the DHP
and DLP for elliptic curves used in practical applications, LMS J. Comput. Math.
7 (2004), 5072.
[446] D. Naccache, D. MRahi, S. Vaudenay, and D. Raphaeli, Can D.S.A. be improved? Complexity trade-offs with the digital signature standard, EUROCRYPT
1994 (A. De Santis, ed.), LNCS, vol. 950, Springer, 1995, pp. 7785.
[447] D. Naccache and I. E. Shparlinski, Divisibility, smoothness and cryptographic applications, Algebraic Aspects of Digital Communications (T. Shaska and E. Hasimaj, eds.), NATO Science for Peace and Security Series, vol. 24, IOS Press, 2009,
pp. 115173.
[448] N.Courtois, M. Finiasz, and N. Sendrier, How to achieve a McEliece-based digital
signature scheme, ASIACRYPT 2001 (C. Boyd, ed.), LNCS, vol. 2248, Springer,
2001, pp. 157174.
[449] V. I. Nechaev, Complexity of a determinate algorithm for the discrete logarithm,
Mathematical Notes 55 (1994), no. 2, 165172.
BIBLIOGRAPHY
669
[450] G. Neven, N. P. Smart, and B. Warinschi, Hash function requirements for Schnorr
signatures, J. Math. Crypt. 3 (2009), no. 1, 6987.
[451] P. Nguyen and D. Stehle, Floating-point LLL revisited, EUROCRYPT 2005
(R. Cramer, ed.), LNCS, vol. 3494, Springer, 2005, pp. 215233.
[452]
[453] P. Q. Nguyen, Public key cryptanalysis, Recent Trends in Cryptography (I. Luengo,
ed.), AMS, 2009, pp. 67119.
[454] P. Q. Nguyen and O. Regev, Learning a parallelepiped: Cryptanalysis of GGH
and NTRU signatures, EUROCRYPT 2006 (S. Vaudenay, ed.), LNCS, vol. 4004,
Springer, 2006, pp. 271288.
[455]
[456] P. Q. Nguyen and I. E. Shparlinski, The insecurity of the digital signature algorithm
with partially known nonces, J. Crypt. 15 (2002), no. 3, 151176.
[457]
, The insecurity of the elliptic curve digital signature algorithm with partially
known nonces, Des. Codes Crypt. 30 (2003), no. 2, 201217.
[458] P. Q. Nguyen and D. Stehle, Low-dimensional lattice basis reduction revisited, ANTS
VI (D. A. Buell, ed.), LNCS, vol. 3076, Springer, 2004, pp. 338357.
[459] P. Q. Nguyen and J. Stern, Lattice reduction in cryptology: An update, ANTS IV
(W. Bosma, ed.), LNCS, vol. 1838, Springer, 2000, pp. 85112.
[460]
[461] P. Q. Nguyen and B. Vallee, The LLL algorithm: Survey and applications, Information Security and Cryptography, Springer, 2010.
[462] P. Q. Nguyen and T. Vidick, Sieve algorithms for the shortest vector problem are
practical, J. Math. Crypt. 2 (2008), no. 2, 181207.
[463] H. Niederreiter, A new efficient factorization algorithm for polynomials over small
finite fields, Applicable Algebra in Engineering, Communication and Computing 4
(1993), no. 2, 8187.
[464] G. Nivasch, Cycle detection using a stack, Inf. Process. Lett. 90 (2004), no. 3,
135140.
[465] I. Niven, H. S. Zuckerman, and H. L. Montgomery, An introduction to the theory of
numbers, 5th ed., Wiley, 1991.
[466] A. M. Odlyzko, Discrete logarithms in finite fields and their cryptographic significance, EUROCRYPT 1984 (T. Beth, N. Cot, and I. Ingemarsson, eds.), LNCS, vol.
209, Springer, 1985, pp. 224314.
[467]
, The rise and fall of knapsack cryptosystems, Cryptology and Computational Number Theory (C. Pomerance, ed.), Proc. Symp. Appl. Math., vol. 42, Am.
Math. Soc., 1990, pp. 7588.
670
BIBLIOGRAPHY
[484]
, Monte Carlo methods for index computation (mod p), Math. Comp. 32
(1978), no. 143, 918924.
[485]
BIBLIOGRAPHY
671
[486] C. Pomerance, A tale of two sieves, Notices of the Amer. Math. Soc. 43 (1996),
14731485.
[487] V. R. Pratt, Every prime has a succinct certificate, SIAM J. Comput. 4 (1974),
no. 3, 214220.
[488] X. Pujol and D. Stehle, Rigorous and efficient short lattice vectors enumeration,
ASIACRYPT 2008 (J. Pieprzyk, ed.), LNCS, vol. 5350, Springer, 2008, pp. 390
405.
[489] G. Qiao and K.-Y. Lam, RSA signature algorithm for microcontroller implementation, CARDIS 1998 (J.-J. Quisquater and B. Schneier, eds.), LNCS, vol. 1820,
Springer, 2000, pp. 353356.
[490] J. J. Quisquater and C. Couvreur, Fast decipherment algorithm for RSA public-key
cryptosystem, Electronics Letters (1982), no. 21, 905907.
[491] M.O. Rabin, Digitalized signatures and public-key functions as intractable as factorization, Tech. Report MIT/LCS/TR-212, MIT Laboratory for Computer Science,
1979.
[492] J.-F. Raymond and A. Stiglic, Security issues in the Diffie-Hellman key agreement
protocol, Preprint, 2000.
[493] O. Regev, The learning with errors problem (Invited survey), 25th Annual IEEE
Conference on Computational Complexity, IEEE, 2010, pp. 191204.
[494] M. Reid, Undergraduate algebraic geometry, Cambridge, 1988.
[495]
, Graded rings and varieties in weighted projective space, Chapter of unnished book, 2002.
, Compression in finite fields and torus-based cryptography, SIAM J. Comput. 37 (2008), no. 5, 14011428.
[502] H.-G. R
uck, A note on elliptic curves over finite fields, Math. Comp. 49 (1987),
no. 179, 301304.
[503]
, On the discrete logarithm in the divisor class group of curves, Math. Comp.
68 (1999), no. 226, 805806.
672
BIBLIOGRAPHY
[505] A.-R. Sadeghi and M. Steiner, Assumptions related to discrete logarithms: Why
subtleties make a real difference, EUROCRYPT 2001 (B. Ptzmann, ed.), LNCS,
vol. 2045, Springer, 2001, pp. 244261.
[506] A. S
arkozy and C. L. Stewart, On pseudorandomness in families of sequences derived from the Legendre symbol, Periodica Math. Hung. 54 (2007), no. 2, 163173.
[507] T. Satoh, On generalization of Cheons algorithm, Cryptology ePrint Archive, Report 2009/058, 2009.
[508] T. Satoh and K. Araki, Fermat quotients and the polynomial time discrete log algorithm for anomalous elliptic curves, Comment. Math. Univ. St. Paul. 47 (1998),
no. 1, 8192.
[509] J. Sattler and C.-P. Schnorr, Generating random walks in groups, Ann. Univ. Sci.
Budapest. Sect. Comput. 6 (1985), 6579.
[510] A. Schinzel and M. Skalba, On equations y 2 = xn + k in a finite field, Bull. Polish
Acad. Sci. Math. 52 (2004), no. 3, 223226.
[511] O. Schirokauer, Using number fields to compute logarithms in finite fields, Math.
Comp. 69 (2000), no. 231, 12671283.
[512]
, The special function field sieve, SIAM J. Discrete Math 16 (2002), no. 1,
8198.
[513]
, The impact of the number field sieve on the discrete logarithm problem
in finite fields, Algorithmic Number Theory (J. Buhler and P. Stevenhagen, eds.),
MSRI publications, vol. 44, Cambridge, 2008, pp. 397420.
[514]
, The number field sieve for integers of low weight, Math. Comp. 79 (2010),
no. 269, 583602.
[519]
[520]
, Security of almost all discrete log bits, Electronic Colloquium on Computational Complexity (ECCC) 5 (1998), no. 33, 113.
[521]
, Progress on LLL and lattice reduction, The LLL Algorithm (P. Q. Nguyen
and B. Vallee, eds.), Springer, 2010, pp. 145178.
BIBLIOGRAPHY
673
[522] C.-P. Schnorr and M. Euchner, Lattice basis reduction: Improved practical algorithms and solving subset sum problems, Math. Program. 66 (1994), 181199.
[523] C.-P. Schnorr and H. W. Lenstra Jr., A Monte Carlo factoring algorithm with linear
storage, Math. Comp. 43 (1984), no. 167, 289311.
[524] R. Schoof, Elliptic curves over finite fields and the computation of square roots
(mod ) p, Math. Comp. 44 (1985), no. 170, 483494.
[525]
, Nonsingular plane cubic curves over finite fields, J. Combin. Theory Ser.
A 46 (1987), 183211.
[526]
[535]
[536] G. Seroussi, Compact representation of elliptic curve points over F2n , HewlettPackard Labs technical report HPL-98-94, 1998.
[537] J.-P. Serre, Sur la topologie des varietes algebriques en characteristique p, Symp.
Int. Top. Alg., Mexico, 1958, pp. 2453.
[538]
674
BIBLIOGRAPHY
[542] A. Shamir, A polynomial-time algorithm for breaking the basic Merkle-Hellman cryptosystem, IEEE Trans. Inf. Theory 30 (1984), no. 5, 699704.
[543]
[544]
, On formal models for secure key exchange (version 4), November 15, 1999,
Tech. report, IBM, 1999, Revision of Report RZ 3120.
[551]
, OAEP reconsidered, CRYPTO 2001 (J. Kilian, ed.), LNCS, vol. 2139,
Springer, 2001, pp. 239259.
, A computational introduction to number theory and algebra, Cambridge,
[552]
2005.
[553] I. E. Shparlinski, Computing Jacobi symbols modulo sparse integers and polynomials
and some applications, J. Algorithms 36 (2000), 241252.
[554]
[555]
[556] I. E. Shparlinski and A. Winterhof, A nonuniform algorithm for the hidden number
problem in subgroups, PKC 2004 (F. Bao, R. H. Deng, and J. Zhou, eds.), LNCS,
vol. 2947, Springer, 2004, pp. 416424.
[557]
BIBLIOGRAPHY
[561]
675
[562] J. H. Silverman and J. Suzuki, Elliptic curve discrete logarithms and the index
calculus, ASIACRYPT 1998 (K. Ohta and D. Pei, eds.), LNCS, vol. 1514, Springer,
1998, pp. 110125.
[563] J. H. Silverman and J. Tate, Rational points on elliptic curves, Springer, 1994.
[564] M. Sipser, Introduction to the theory of computation, Course Technology, 2005.
[565] M. Skalba, Points on elliptic curves over finite fields, Acta Arith. 117 (2005), no. 3,
293301.
[566] N. P. Smart, The discrete logarithm problem on elliptic curves of trace one, J.
Cryptology 12 (1999), no. 3, 193196.
[567]
[568]
[569] B. A. Smith, Isogenies and the discrete logarithm problem in Jacobians of genus 3
hyperelliptic curves, J. Crypt. 22 (2009), no. 4, 505529.
[570] P. J. Smith and M. J. J. Lennon, LUC: A new public key system, International
Conference on Information Security (E. Graham Dougall, ed.), IFIP Transactions,
vol. A-37, North-Holland, 1993, pp. 103117.
[571] P. J. Smith and C. Skinner, A public-key cryptosystem and a digital signature system
based on the Lucas function analogue to discrete logarithms, ASIACRYPT 1994
(J. Pieprzyk and R. Safavi-Naini, eds.), LNCS, vol. 917, Springer, 1994, pp. 357
364.
[572] J. A. Solinas, Efficient arithmetic on Koblitz curves, Des. Codes Crypt. 19 (2000),
195249.
[573]
[576] M. Stam and A. K. Lenstra, Speeding up XTR, ASIACRYPT 2001 (C. Boyd, ed.),
LNCS, vol. 2248, Springer, 2001, pp. 125143.
[577] H. M. Stark, Class-numbers of complex quadratic fields, Modular Functions of One
Variable I (W. Kuyk, ed.), LNM, vol. 320, Springer, 1972, pp. 153174.
[578] D. Stehle, Floating point LLL: Theoretical and practical aspects, The LLL Algorithm
(P. Q. Nguyen and B. Vallee, eds.), Springer, 2010, pp. 179213.
[579] D. Stehle and P. Zimmermann, A binary recursive GCD algorithm, ANTS VI (D. A.
Buell, ed.), LNCS, vol. 3076, Springer, 2004, pp. 411425.
676
BIBLIOGRAPHY
[580] P. Stevenhagen, The number field sieve, Algorithmic number theory (J. Buhler and
P. Stevenhagen, eds.), MSRI publications, Cambridge, 2008, pp. 8399.
[581] I. Stewart, Galois theory, 3rd ed., Chapman & Hall, 2003.
[582] I. Stewart and D. Tall, Algebraic number theory and Fermats last theorem, 3rd ed.,
AK Peters, 2002.
[586] H. Stichtenoth and C. Xing, On the structure of the divisor class group of a class
of curves over finite fields, Arch. Math, 65 (1995), 141150.
[587] D. R. Stinson, Some baby-step giant-step algorithms for the low Hamming weight
discrete logarithm problem, Math. Comp. 71 (2001), no. 237, 379391.
[588]
, Cryptography: Theory and practice, 3rd ed., Chapman & Hall/CRC, 2005.
[594]
[595] T. Takagi, Fast RSA-type cryptosystem modulo pk q, CRYPTO 1998 (H. Krawczyk,
ed.), LNCS, vol. 1462, Springer, 1998, pp. 318326.
[596] J. Talbot and D. Welsh, Complexity and cryptography: An introduction, Cambridge,
2006.
[597] J. Tate, Endomorphisms of abelian varieties over finite fields, Invent. Math. 2
(1966), 134144.
[598]
[599] E. Teske, A space efficient algorithm for group structure computation, Math. Comp.
67 (1998), no. 224, 16371663.
BIBLIOGRAPHY
677
[600]
[601]
[602]
, Computing discrete logarithms with the parallelized kangaroo method, Discrete Applied Mathematics 130 (2003), 6182.
[603] N. Theriault, Index calculus attack for hyperelliptic curves of small genus, ASIACRYPT 2003 (C.-S. Laih, ed.), LNCS, vol. 2894, Springer, 2003, pp. 7592.
[604] E. Thome, Algorithmes de calcul de logarithmes discrets dans les corps finis, Ph.D.
thesis, LEcole
Polytechnique, 2003.
[605] W. Trappe and L. C. Washington, Introduction to cryptography with coding theory,
2nd ed., Pearson, 2005.
[606] M. A. Tsfasman, Group of points of an elliptic curve over a finite field, Theory of
numbers and its applications,. Tbilisi, 1985, pp. 286287.
[607] J. W. M. Turk, Fast arithmetic operations on numbers and polynomials, Computational methods in number theory, Part 1 (H. W. Lenstra Jr. and R. Tijdeman,
eds.), Mathematical Centre Tracts 154, Amsterdam, 1984.
[608] M. Ulas, Rational points on certain hyperelliptic curves over finite fields, Bull. Pol.
Acad. Sci. Math. 55 (2007), no. 2, 97104.
[609] B. Vallee, Une approche geometrique de la reduction de reseaux en petite dimension,
Ph.D. thesis, Universite de Caen, 1986.
[610]
[611] S. Vaudenay, Hidden collisions on DSS, CRYPTO 1996 (N. Koblitz, ed.), LNCS,
vol. 1109, Springer, 1996, pp. 8388.
[612]
[613] J. Velu, Isogenies entre courbes elliptiques, C.R. Acad. Sc. Paris 273 (1971), 238
241.
[614] F. Vercauteren, Optimal pairings, IEEE Trans. Inf. Theory 56 (2010), no. 1, 455
461.
[615] E. R. Verheul, Certificates of recoverability with scale recovery agent security, PKC
2000 (H. Imai and Y. Zheng, eds.), LNCS, vol. 1751, Springer, 2000, pp. 258275.
[616]
, Evidence that XTR is more secure than supersingular elliptic curve cryptosystems, J. Crypt. 17 (2004), no. 4, 277296.
[617] E. R. Verheul and H. C. A. van Tilborg, Cryptanalysis of less short RSA secret
exponents, Applicable Algebra in Engineering, Communication and Computing 8
(1997), no. 5, 425435.
[618] M.-F. Vigneras, Arithmetique des alg`ebres de quaternions, LNM, vol. 800, Springer,
1980.
678
BIBLIOGRAPHY
[619] J. F. Voloch, A note on elliptic curves over finite fields, Bulletin de la Societe
Mathematique de France 116 (1988), no. 4, 455458.
[620]
[621] D. Wagner, A generalized birthday problem, CRYPTO 2002 (M. Yung, ed.), LNCS,
vol. 2442, Springer, 2002, pp. 288303.
[622] L. C. Washington, Elliptic curves: Number theory and cryptography, 2nd ed., CRC
Press, 2008.
[623] E. Waterhouse, Abelian varieties over finite fields, Ann. Sci. Ecole
Norm. Sup. 2
(1969), 521560.
[624] A. Weng, Constructing hyperelliptic curves of genus 2 suitable for cryptography,
Math. Comp. 72 (2003), no. 241, 435458.
[625] D. H. Wiedemann, Solving sparse linear equations over finite fields, IEEE Trans.
Inf. Theory 32 (1986), 5462.
[626] M. J. Wiener, Cryptanalysis of short RSA secret exponents, IEEE Trans. Inf. Theory
36 (1990), no. 3, 553558.
[627]
Author Index
Abdalla, M., 494, 497
Adleman, L. M., 28, 341, 343, 345
Agnew, G. B., 445
Agrawal, M., 263
Agrell, E., 394
Ajtai, M., 393
Akavia, A., 473
Akishita, T., 220
Alexi, W., 475, 513
Alon, N., 561
Ankeny, N. C., 51
Antipa, A., 486
Araki, K., 584
Arney, J., 297
Ar`ene, C., 195
Atkin, A. O. L., 59, 188
Avanzi, R. M., 241
Babai, L., 275, 383, 388
Bach, E., 51, 52, 321, 560
Bachem, A., 55
Balasubramanian, R., 584
Banks, W. D., 170
Barreto, P. S. L. M., 579, 580, 587
Bauer, A., 529
Bellare, M., 78, 443, 479, 480, 485, 494, 497,
531, 532, 534, 537
Bellman, R., 243
Bender, E. A., 297
Bentahar, K., 46, 454
Berlekamp, R., 63
Bernstein, D. J., 59, 62, 195, 237, 263, 302,
517, 532, 534
Birkner, P., 195
Bisson, G., 567
Blackburn, S. R., 297
Blake, I. F., 241, 336, 338
Bleichenbacher, D., 410, 485, 490, 516, 526,
530
Blichfeldt, H. F., 361
Block, H., 270
Blum, M., 466, 514
Bl
omer, J., 406, 529
Boneh, D., 254, 409, 442, 454, 461, 469, 470,
473, 474, 487, 502, 504, 511, 523,
529, 538, 541, 640
Boppana, R. B., 561
Bos, J. W., 295, 302304
Bosma, W., 167
Bostan, A., 556
Bourgain, J., 473
Boyen, X., 487
Boyko, V., 445
Brands, S., 278
Brauer, A., 237
Brent, R. P., 52, 292, 297, 320, 322
Brickell, E. F., 238, 427, 430
Brier, E., 525, 526
Brown, D. R. L., 452, 462, 464, 486, 487
Brumley, B. B., 250
Br
oker, R., 189, 553, 570
Burgess, D. A., 51
Burmester, M., 437
Camion, P., 282
Canetti, R., 79, 443, 475
Caneld, E. R., 324, 331
Cantor, D. G., 64, 214, 217, 220, 229
Carter, G., 198, 295
Cash, D., 453, 497
Cassels, J. W. S., 202, 226, 228
Catalano, D., 522
Chan, W. F, 251
Chao, J., 252
Charlap, L. S., 137, 574
Charles, D. X., 563, 570
Chaum, D., 78, 525
Cheon, J.-H., 295, 462, 464
Cherepnev, M. A., 456
Chor, B., 475, 513
Clavier, C., 525, 526
Cobham, A., 606
Cocks, C., 28
Cohen, H., 187, 241
679
680
Cohen, P., 553
Coley, R., 574
Collins, T., 509
Conway, J. H., 622
Cook, S., 45
Coppersmith, D., 280, 337, 339, 397, 398,
401, 404, 475, 511, 524, 526
Cornelissen, G., 229
Coron, J.-S., 404, 445, 512, 517, 524526,
531, 532
Coster, M. J., 431
Courtois, N., 422
Couveignes, J.-M., 557, 560
Couvreur, C., 509
Cox, D. A., 187, 191, 553, 559
Cramer, R., 438, 495, 498, 501, 541
Crandall, R. E., 320
Damg
ard, I. B., 51, 77, 522
Davenport, H., 234
Davido, G., 561
Davies, D., 78
Dawson, E., 198, 295
De Feo, L., 557
De Jonge, W., 525
de Rooij, P., 483
Deligne, P., 184
DeMarrais, J., 343, 345
den Boer, B., 454, 455
Denny, T. F., 337
Desmedt, Y., 437, 501, 523
Deuring, M., 186
Dewaghe, L., 551, 560
Diem, C., 346, 349, 350
Die, W., 28, 436
Dimitrov, V. S., 244, 251
Dipippo, S. A., 232
Dixon, J., 325
Doche, C., 245
Dujella, A., 50, 529
Durfee, G., 409, 529
Duursma, I. M., 234, 302, 580
Eagle, P. N. J., 260
Edwards, H. M., 195
Elgamal, T., 438, 484
Elkies, N. D., 188, 554
Ellis, J., 28
Enge, A., 344, 350, 553
Erd
os, P., 324, 331
Erickson, S., 225
AUTHOR INDEX
Eriksson, T., 394
Euchner, M., 381, 391
Farashahi, R. R., 195
Feige, U., 535
Fiat, A., 535
Finiasz, M., 422
Finke, U., 391
Fischlin, R., 475, 513
Flajolet, P., 287, 289
Flassenberg, R., 344
Floyd, 289
Flynn, E. V., 202, 226, 228
Fong, K., 62
Fontaine, C., 501
Fouquet, M., 565
Franklin, M. K., 254, 502, 504, 524
Freeman, D., 516, 587
Frey, G., 346, 573, 576, 577, 584
Friedlander, J., 475
Fuji-Hara, R., 336, 338
Fujisaki, E., 537
F
urer, M., 45
Galand, F., 501
Galbraith, S. D., 199, 221, 222, 252, 260,
308, 318, 530, 560, 567569, 580,
586
Gallant, R. P., 243, 248, 251, 300, 302, 452,
462, 464, 486
Gama, N., 393, 423
Gao, S., 66
Garay, J. A., 485
Garefalakis, T., 576
von zur Gathen, J., 66, 254
Gaudry, P., 193, 202, 220, 229, 302, 315,
345, 346, 348, 350
Gauss, C. F., 365, 366
Gelfond, A. O., 273
Gennaro, R., 522
Gentry, C., 422, 423, 516, 538
Giesbrecht, M., 66
Girault, M., 491, 525
Goldreich, O., 79, 409, 417, 418, 475, 513,
516
Goldwasser, S., 33, 417, 418, 479, 488
Gong, G., 120
Gonzalez Vasco, M. I., 473
Gordon, D. M., 341, 621
Gordon, J., 636
Goren, E. Z., 563
681
AUTHOR INDEX
Granger, R., 120, 579
Granville, A., 411
Grieu, F., 526
Gross, B. H., 191, 563
Guillou, L. C., 535
Guy, R. K., 622
Hafner, J. L., 55, 343
Halevi, S., 79, 417, 418, 526
Hankerson, D., 62
Hanrot, G., 393
Harley, R., 219, 303, 346
Harn, L., 120
Harrison, M., 221, 222
Hasse, H., 234
H
astad, J., 398, 399, 468, 513, 522
Havas, G., 55
van Heijst, E., 78
Helfrich, B., 391
Hellman, M. E., 28, 270, 337, 423, 436
Heneghan, C., 530
Hess, F., 346, 560, 567, 569, 576, 577, 579
583, 586
Hilbert, D., 90
Hildebrand, A., 331
Hildebrand, M. V., 296
Hisil, H., 198
Hitchcock, Y., 295
Hostein, J., 423, 445
Hofheinz, D., 541
Hohenberger, S., 531
Holmes, M., 318
Honda, T., 232
Hong, J., 295
Hopkins, D., 509
Horwitz, J., 296
Howe, E. W., 199, 232
Howgrave-Graham, N. A., 398, 409, 413,
414, 423, 424, 472, 489, 522
Huang, M.-D., 343, 345
Huang, Z., 251
Hurwitz, A., 211
Icart, T., 245, 256
Igusa, J.-I., 210
Iijima, T., 252
Itoh, T., 62
Jacobson Jr., M. J., 220, 225
Jacobson, M. J., 251
Jager, T., 277
682
Lang, S., 186, 553, 559
Lange, T., 62, 195, 237, 302
Langford, S., 509
Lauter, K. E., 553, 563, 570
Lee, E., 581
Lee, H.-S., 580, 581
Lehmer, D. H., 265
Lehmer, D. N, 265
Lennon, M. J. J., 116
Lenstra Jr., H. W., 66, 167, 187, 199, 200,
250, 266, 293, 330, 332, 365, 375,
395, 445, 461
Lenstra, A. K., 119, 243, 251, 259, 302, 365,
375, 526
Lercier, R., 341, 343, 556, 557
Leurent, G., 78
Li, W.-C., 469, 474
Lichtenbaum, S., 576
Lin, X., 252
Lindell, Y., 31, 253
Lindner, R., 388, 416
Lipton, R. J., 454, 461
Lockhart, P., 210
Lovorn Bender, R., 337, 338
Lovasz, L., 365, 375, 381, 412
Lubicz, D., 193, 202
Lucas, E., 114
Lynn, B., 579
Lyubashevsky, V., 424
L
uneburg, H., 66
L
opez, J., 62
MRahi, D., 445, 485
Majewski, B. S., 55
Martn Mollev, S., 520
Matsuo, K., 252, 463
Matthews, K. R., 55
Mauduit, C., 51
Maurer, U. M., 332, 454, 457
May, A., 406, 423, 512, 529, 530
McCurley, K. S., 55, 341, 343
McEliece, R., 417
McKee, J. F., 187, 199, 530
Meier, W., 247, 248
Menezes, A. J., 31, 62, 245, 573, 584
Merkle, R., 28, 77, 423
Mestre, J.-F., 188, 210, 562, 563
Micali, S., 33, 466, 488
Micciancio, D., 55, 393, 414, 419, 479
Miller, G. L., 512
AUTHOR INDEX
Miller, S. D., 296, 562, 570
Miller, V. S., 258, 351, 575, 579, 620
Miller, W. C., 244
Minkowski, 362
Mireles, D. J., 221, 222
Misarsky, J.-F., 525
Miyaji, A., 241
Monico, C., 303
Montague, P., 295
Montenegro, R., 296, 307, 309
Montgomery, P. L., 5254, 191, 266, 293,
303, 304
Morain, F., 60, 238, 302, 556, 557, 560, 565
Morillo, P., 520
Muir, J., 241
Mullin, R. C., 336, 338, 445
Mumford, D., 213, 214
Murphy, S., 297
Murty, M. R., 561
Murty, R., 199
Muzereau, A., 454, 461
M
oller, B., 242, 243
M
uller, V., 250
Naccache, D., 411, 485, 517, 524526
Naehrig, M., 195, 587
Namprempre, C., 534
Nechaev, V. I., 270, 273, 275
Neven, G., 480, 482, 534, 573
Nguyen, P. Q., 78, 355, 365, 368, 381, 393,
410, 421423, 431, 442, 472, 473,
489, 522, 523
Nicolas, J.-L., 60
Niederreiter, H., 63
Nivasch, G., 293
N
aslund, M., 468, 469, 473, 474, 513
Odlyzko, A. M., 289, 337, 430, 431, 523
Oesterle, J., 563
hEigeartaigh,
O
C., 580
Okamoto, T., 522, 537, 573, 584, 638
Olivos, J., 238
OMalley, S. W., 445
Ono, T., 241
Onyszchuk, I. M., 445
van Oorschot, P. C., 31, 293, 304, 308, 311,
318, 319
Orman, H. K., 445
Oyono, R., 579
Paillier, P., 482, 487, 519, 521, 532
AUTHOR INDEX
Park, C.-M., 581
Patarin, J., 282, 524
Patel, S., 468
Paulus, S., 221, 222, 225, 344
Peikert, C., 388, 416, 422
Peinado, M., 445
Peres, Y., 296
Peters, C., 195
Ptzmann, B., 78
Pila, J., 229, 332, 461
Pinch, R. G. E., 529, 530
Pipher, J., 423
Pizer, A. K., 561, 563
Pohst, M., 391
Pointcheval, D., 443, 479, 480, 482, 484,
487, 537
Pollard, J. M., 265, 267, 285, 297, 304, 305,
308, 309, 311, 314, 320, 332
Pomerance, C., 320, 324, 329332, 337, 338,
461
van der Poorten, A. J., 220
Poupard, G., 491
Price, W. L., 78
Pujol, X., 393
683
684
Skinner, C., 116
Smart, N. P., 28, 241, 343, 346, 454, 461,
472, 482, 489, 560, 567, 569, 580,
584
Smith, B. A., 350
Smith, P. J., 116
Solinas, J. A., 241, 244, 246, 248
Soukharev, V., 570
Soundararajan, K., 337
Spatscheck, O., 445
Staelbach, O., 247, 248
Stam, M., 119, 192, 244, 251
Stapleton, J., 295
Stark, H. M., 556
Stehle, D., 368, 380, 381, 393
Stein, A., 225, 344
Stein, J., 48
Stern, J., 355, 431, 479, 480, 482, 484, 491,
537
Stern, J. P., 526
Stewart, C., 51
Stichtenoth, H., 211, 230, 231, 234
Stinson, D. R., 28, 241, 280, 320
Stolarsky, K. B., 57
Stolbunov, A., 569
Storjohann, A., 55
Strassen, V., 45, 267
Straus, E. G., 243
Struik, R., 287, 295, 486
Sudan, M., 409
Suk, A. H., 529
Sundaram, G. S., 468
Sutherland, A. V., 54, 71, 272, 553, 567
Suyama, H., 193
Szemeredi, E., 275
Szymanski, T. G., 293
S
arkozy, A., 51
Takagi, T., 220, 242, 510, 522
Tate, J., 232, 576
Tenenbaum, G., 331
Teske, E., 288, 293, 296, 311, 313, 587
Tetali, P., 296, 307, 309
Thome, E., 339, 341, 346, 350
Thurber, E. G., 237
Theriault, N., 346, 579
Tibouchi, M., 524
van Tilborg, H. C. A., 529
Toom, A., 45
Tsujii, S., 62, 252
AUTHOR INDEX
Tymen, C., 445
Uchiyama, S., 522, 638
Ulas, M., 256
Vaikuntanathan, V., 422
Valette, A., 561
Vallee, B., 365, 368
Vanstone, S. A., 31, 243, 245, 248, 251, 300,
302, 336, 338, 445, 486, 573, 584
Vardy, A., 394
Vaudenay, S., 28, 485, 487
Venkatesan, R., 296, 445, 469, 470, 473475,
511, 562, 570
Vercauteren, F., 120, 343, 454, 461, 579
582, 586
Vergnaud, D., 482, 487
Verheul, E. R., 119, 259, 529, 588
Vidick, T., 393
Villar, J. L., 519, 520
Voloch, J. F., 186, 343
Voulgaris, P., 393
V`elu, J., 547
Wagner, D., 282, 393, 424
Wang, Y.-M., 469
Warinschi, B., 55, 482
Washington, L. C., 187
Waters, B., 531
Weber, D., 337
Weinmann, R.-P., 524
Weng, A., 210
Wiedemann, D. H., 55, 329, 336
Wiener, M. J., 293, 300, 304, 308, 311, 318,
319, 527, 528
Williams, H. C., 515, 519
Williamson, M. J., 436
Winterhof, A., 473
van de Woestijne, C. E., 255
Wolf, S., 332, 454
Wong, K. K.-H., 198
Xing, C., 234
Xu, W.-L., 469
Yao, A. C.-C., 293
Yen, S.-M., 241, 243, 485
Yoshida, K., 489
Zassenhaus, H., 64
Zeger, K., 394
Zierler, N., 68
AUTHOR INDEX
Zimmermann, P., 46, 52
Zuccherato, R. J., 300
685
Subject Index
Abel-Jacobi map, 142, 227
Abelian variety, 226
absolutely simple, 226
adaptive chosen message attack, 34
adaptive chosen-ciphertext attack, 32
addition chain, 57
additive group, 85
additive rho walk, 288
adjacency matrix, 561
advantage, 42, 437, 467, 496
adversary against an identication protocol,
479
ane n-space over k, 87
ane algebraic set, 88
ane coordinate ring, 89
ane line, 87
ane plane, 87
ane variety, 96
ane Weierstrass equation, 127
AKS primality test, 263
algebraic, 593
algebraic closure, 593
algebraic group, 83
algebraic group quotient, 85
algebraic torus, 111
algebraically independent, 403
algorithm, 38
amplifying, 44
amplitude, 410
anomalous binary curves, 184
anomalous elliptic curves, 584
approximate CVP, 363
approximate SVP, 363
ascending chain, 597
ascending isogeny, 563
asymmetric cryptography, 28
ate pairing, 580
attack goals for public key encryption, 31
attack goals for signatures, 33
attack model, 31
automorphism, 170
686
687
SUBJECT INDEX
Cayley graph, 561
CCA, CCA1, CCA2, 32
CDH, 436
c-expander, 561
chains of isogenies, 546
characteristic, 590
characteristic polynomial, 182, 595
characteristic polynomial of Frobenius, 183,
231
Cheons variant of the DLP, 462
Chinese remainder theorem, 596
Chinese remaindering with errors problem,
409
chord-and-tangent rule, 140
chosen plaintext attack, 32
ciphertext, 29
circle group, 88
circulant matrix, 423
classic textbook Elgamal encryption, 438
clients, 297
closest vector problem (CVP), 363
CM method, 187
co-DDH problem, 586
coecient explosion, 373
cofactor, 259
collapsing the cycle, 302
collision, 286, 320
Collision-resistance, 75
complement, 228
complete group law, 195, 196
complete system of addition laws, 167
Complex multiplication, 184, 186, 198, 558
complex multiplication method, 187
Complexity, 38
composite residuosity problem, 521
compositeness witness, 262
composition and reduction at innity, 222
composition of functions, 589
compositum, 593
compression function, 77
compression map, 113, 118
computational problem, 38
COMPUTE-LAMBDA, 512
COMPUTE-PHI, 512
conditional probability, 601
conductor, 563, 600
conjugate, 112
connected graph, 558
conorm, 151, 153
constant function, 98
continuation, 266
continued fraction expansion, 48
convergents, 48
Coppersmiths theorem, 401
Cornacchia algorithm, 60
coupon collector, 602
covering attack, 346
covering group, 85, 116, 120
CPA, 32
Cramer-Shoup encryption scheme, 498
crater, 566
CRT list decoding problem, 409
CRT private exponents, 509
cryptographic hash family, 75
curve, 127
cycle, 289
cyclotomic polynomial, 109
cyclotomic subgroup, 111
data encapsulation mechanism, 495
DDH, 437
de-homogenisation, 94
decision closest vector problem (DCVP), 363
decision learning with errors, 415
decision problem, 38
decision shortest vector problem, 363
decision static Die-Hellman problem, 452
Decisional Die-Hellman problem (DDH),
436
decompression map, 113, 118
decrypt, 29
decryption algorithm, 31
decryption oracle, 465
Dedekind domain, 148
dened, 98
dened over k, 89, 91, 92, 135
degree, 134, 146, 172, 343, 593
DEM, 495
den Boer reduction, 455
dense, 97, 102
density, 240
derivation, 155
derivative, 591
descending isogeny, 563
Desmedt-Odlyzko attack, 523
determinant, 358, 599
deterministic algorithm, 38
deterministic pseudorandom walk, 287
DHIES, 494
diameter, 558
688
Dickman-de Bruijn function, 324
Diems algorithm, 350
dierentials, 158
Die-Hellman tuples, 437
digit set, 245
dimension, 105
Diophantine approximation, 48, 412
discrete, 357
discrete logarithm assumption, 435
discrete logarithm in an interval, 274
discrete logarithm problem, 38, 269
discrete valuation, 131
discriminant, 128, 600
d-isogeny, 172
distinguished point, 293
Distinguished points, 293
distortion map, 586, 588
distributed computing, 297
Distributed rho algorithm, 297
distribution, 601
division polynomials, 181
divisor, 134
divisor class, 138
divisor class group, 138
divisor of a dierential, 159
divisor of a function, 136
divisor-norm map, 151
Dixons random squares, 325
DL-LSB, 466
DLP, 38, 269
DLP in an interval, 274
DLWE, 415
dominant, 102
DSA, 486
DStatic-DH, 452
DStatic-DH oracle, 496
dual isogeny, 176, 177
dual lattice, 360
eavesdropper, 436
ECDSA, 486
ECIES, 494
ECM, 267
edge boundary, 561
eective, 134
eective ane divisor, 212
eigenvalues of a nite graph, 561
Eisensteins criteria, 591
Elgamal encryption, 465
Elgamal public key signatures, 484
SUBJECT INDEX
elliptic curve, 128
elliptic curve method, 267
embedding degree, 578
embedding technique, 390
encapsulates, 495
encoding, 276
encrypt, 28
encryption algorithm, 31
encryption scheme, 30
endomorphism ring, 172
entropy smoothing, 76
epact, 290
ephemeral keys, 436
equation for a curve, 127
equivalence class in AGQ, 85
equivalence classes in Pollard rho, 300
equivalence of functions, 98
equivalent, 171, 212, 248
equivalent computational problems, 43
equivalent isogenies, 546
e-th roots problem, 511
Euclidean norm, 598
Euler phi function, 590
Eulers criterion, 50
Euler-Mascheroni constant, 590
event, 601
existential forgery, 33
expander graph, 561
expectation, 602
expected exponential-time, 40
expected polynomial-time, 40
expected subexponential-time, 40
expected value, 287
explicit Chinese remainder theorem, 54
explicit representation, 454
exponent, 590
exponent representation, 83
exponential-time, 39
exponential-time reduction, 43
extended Euclidean algorithm, 47
extension, 148, 593
Extra bits for Rabin, 515
FACTOR, 512
factor base, 325, 326
family of groups, 467
FDH-RSA, 531
Feige-Fiat-Shamir protocol, 535
Fermat test, 262
Fiat-Shamir transform, 481
689
SUBJECT INDEX
eld of fractions, 597
nal exponentiation, 579
nitely generated, 590, 593, 596
xed base, 237, 243
xed pattern padding, 406, 524
Fixed-CDH, 448
Fixed-Inverse-DH, 449
Fixed-Square-DH, 449
oating-point LLL, 380
oor, 564
Floyds cycle nding, 289
forking lemma, 480
four-kangaroo algorithm, 309
free module, 591
Frobenius expansion, 245
Frobenius map, 146, 175, 231
full domain hash, 531
full rank lattice, 357
fully homomorphic, 501
function, 589
function eld, 98
function eld sieve, 341
fundamental domain, 422
fundamental parallelepiped, 599
690
identity-based cryptography, 491, 502
Igusa invariants, 210
imaginary hyperelliptic curve, 206
imaginary quadratic eld, 600
implicit representation, 454
IND, 32
IND-CCA security, 33
independent events, 601
independent random variables, 602
independent torstion points, 258
index calculus, 334
indistinguishability, 32
indistinguishability adversary, 32
inert model of a hyperelliptic curve, 206
inert place, 206
inner product, 598
input size, 38
inseparable, 146
inseparable degree, 146
instance, 38
instance generator, 41
interleaving, 243
invalid parameter attacks, 441
invariant dierential, 178
inverse limit, 182
Inverse-DH, 448
irreducible, 96, 590, 591
isogenous, 172
isogeny, 172, 226, 231
isogeny class, 557
isogeny problem for elliptic curves, 567
isomorphic, 101
isomorphism of elliptic curves, 168
isomorphism of pointed curves, 168
iterated Merkle-Hellman, 426
i-th bit, 601
Itoh-Tsujii inversion algorithm, 62
Jacobi quartic model, 198
Jacobi symbol, 50
Jacobian matrix, 126
Jacobian variety, 140, 226
j-invariant, 169
joint sparse form, 244
k-algebraic set, 88
kangaroo method, 305
kangaroo, tame, 305
kangaroo, wild, 305
Karatsuba multiplication, 45, 509
KEM, 495
SUBJECT INDEX
kernel, 172
kernel lattice, 362
kernel lattice modulo M , 362
key derivation function, 438
key encapsulation, 28
key encapsulation mechanism, 495
key only attack, 33
key transport, 28, 495
keyed hash function, 75
KeyGen, 31
known message attack, 34
Koblitz curves, 184
Korkine-Zolotarev reduced, 394
k-regular, 558
Kronecker substitution, 61
Kronecker symbol, 50, 559
Krull dimension, 105
Kruskals principle, 308
Kummer surface, 202
L-polynomial, 230
l-sum problem, 282
2 -norm, 598
a -norm, 598
ladder methods, 116
Lagrange-Gauss reduced, 366
large prime variation, 330
Las Vegas algorithm, 40
lattice, 357
lattice basis, 357, 362
lattice dimension, 357
lattice membership, 362
lattice rank, 357
l-bit string, 601
learning with errors, 415
least signicant bit, 601
Legendre symbol, 50
length, 44, 238
Length of a Frobenius expansion, 245
linear change of variables, 93
linear congruential generator, 439, 479
linear map, 597
linearly equivalent, 138
Little O notation, 39
LLL algorithm, 375
LLL reduced, 370
local, 597
local properties of varieties, 123
local ring, 123
localisation, 123, 597
691
SUBJECT INDEX
loop shortening, 580
Lovasz condition, 370
low Hamming weight DLP, 279
low-exponent RSA, 509
LSB, 601
LUC, 114, 116, 474
lunchtime attack, 32, 523
LWE, 415
LWE distribution, 414
MAC, 77
map, 589
match, 286
Maurers algorithm, 621
maximal ideal, 131, 596
McEliece cryptosystem, 417
McEliece encryption, 417
mean step size, 305
meet-in-the-middle attack, 319
Merkle-Damg
ard construction, 77
Mersenne prime, 468
message authentication code, 77
message digest, 29
messages, 436
Mestres algorithm, 210
Miller function, 577
Miller-Rabin test, 262
Minkowski convex body theorem, 361
Minkowski theorem, 362
mixing time, 296
M (n), 46
mod, 589
model, 104
model for a curve, 127
modular curve, 553
modular exponentiation, 55
modular polynomial, 553
modular subset sum problem, 424
module, 590
monic, 591
Monte Carlo algorithm, 40
Montgomery model, 192
Montgomery multiplication, 52, 55
Montgomery reduction, 52
Montgomery representation, 52
morphism, 101
most signicant bit, 467, 469
MOV/FR attack, 584
MSB, 469
MTI/A0 protocol, 444
NAF, 238
naive Schnorr signatures, 481
nearly Ramanujan graph, 562
negligible, 41
Newton identities, 231
Newton iteration, 46
Newton root nding, 46
NFS, 332, 337
NIST primes, 53
Noetherian, 597
non-adjacent form, 238, 246
non-singular, 125, 126
non-uniform complexity, 39
norm, 111, 112, 153, 593, 595
norm map, 246
normal basis, 595
normalised isogeny, 550
noticeable, 41
n-torsion subgroup, 166
NTRU cryptosystem, 423
NTRU decryption failures, 423
NUCOMP, 220
Nullstellensatz, 133
number eld sieve, 332, 337
OAEP, 537
Okamoto-Uchiyama scheme, 522
O(n), 38
O(n),
39
o(n), 39
one way encryption, 32
one-way function, 29, 424
one-way permutation, 29
optimal extension elds, 62
optimal normal basis, 62
optimal pairing, 581
oracle, 31, 42
oracle replay attack, 479
orbit, 85
order, 131, 159, 590, 600
ordinary, 189, 233
692
original rho walk, 288
orthogonal, 598
orthogonal complement, 599
orthogonal matrix, 598
orthogonal projection, 384, 599
orthogonality defect, 360
orthogonalized parallelepiped, 420
orthonormal, 598
output distribution, 41, 438
output size, 38
overwhelming, 42
OWE, 32
SUBJECT INDEX
polynomial-time equivalent, 43
polynomial-time reduction, 43
p-rank, 233
preimage-resistance, 75, 438
primality certicate, 264
primality test, 261
prime, 590
prime divisor, 214, 343
prime number theorem, 263
primitive, 109
primitive element theorem, 594
principal divisor, 136
principal ideal, 596
private key, 28
probable prime, 263
processors, 297
product discrete logarithm problem, 278
product tree, 69
projective algebraic set, 92
projective closure, 95
projective hyperelliptic equation, 207
projective line, 91
projective plane, 91
projective space, 91
projective variety, 96
pseudoprime, 261
pseudorandom, 288
PSS, 532
public key cryptography, 28
public key identication scheme, 477
pullback, 102, 103, 150
purely inseparable, 593
pushforward, 151
SUBJECT INDEX
random self-reducible, 43
random variable, 602
randomised, 39
randomised algorithm, 39
randomised encryption, 30
randomised padding scheme, 507
randomness extraction, 257
rank, 591, 597
rational, 112
rational functions, 98
rational map, 100
Rational parameterisation, 117
rational points, 88
real hyperelliptic curve, 206
real or random security, 444
reduced, 218, 222
reduced divisor, 220
reduced Tate-Lichtenbaum pairing, 578
reducible, 96
reduction, 43
redundancy in message for Rabin, 514
regular, 98, 100
regulator, 225
relation, 325
reliable, 42
reliable oracle, 44
repeat, 286
representation problem, 278
residue degree, 148
Residue number arithmetic, 46
restriction, 148
resultant, 592
rewinding attack, 479
rho algorithm, discrete logarithms, 287
rho graph, 296
rho walks, 288
Riemann hypothesis for elliptic curves, 184
Riemann zeta function, 291
Riemann-Roch space, 154
ring class eld, 186
ring of integers, 600
Rivest, R. L., 488
Robin Hood, 319
root of unity, 109
RSA, 507
RSA for paranoids, 510
RSA problem, 511
RSA-PRIVATE-KEY, 512
SAEP, 538
693
safe prime, 259, 264, 508
Sakurai, K., 522
Sato-Tate distribution, 199
Schnorr identication scheme, 478
Schnorr signature scheme, 481
Sch
onhage-Strassen multiplication, 45
second stage, 266
second-preimage-resistance, 75
security parameter, 30, 41
security properties, 31
selective forgery, 33
self-corrector, 44
Selfridge-Miller-Rabin test, 262
semantic security, 32
semi-reduced divisor, 212
semi-textbook Elgamal encryption, 438
separable, 146, 172, 593
separable degree, 146
separating element, 156
separating variable, 156
Serial computing, 297
server, 297
session key, 436
set of RSA moduli, 511
SETI, 297
short Weierstrass form, 128
shortest vector problem (SVP), 362
sieving, 329
signature forgery, 34
signature scheme, 33
simple, 226
simple zero, 131
simultaneous Diophantine approximation, 428
simultaneous Diophantine approximation problem, 412
simultaneous modular inversion, 53
simultaneous multiple exponentiation, 242
simultaneously hard bits, 468
singular, 125
singular point, 126
sliding window methods, 56
small private exponent RSA, 527
small public RSA exponent, 509
small subgroup attacks, 441
smooth, 125, 330, 344
smooth divisor, 343
Smooth integers, 265
smooth polynomial, 337
SNFS, 342
snowball algorithm, 71
694
Soft O notation, 39
Solinas, J. A., 247
Sophie-Germain prime, 259, 264, 508
sparse matrix, 55, 329
special q-descent, 339
special function eld sieve, 342
special number eld sieve, 332, 342
split an integer, 63
split Jacobian, 227
split model of a hyperelliptic curve, 206
split place, 206
splits, 261
splitting system, 280
SQRT-MOD-N, 517
square, 595
square-and-multiply, 55
Square-DH, 448
square-free, 63
SSL, 28
Standard continuation, 266
standard model, 78
Starks algorithm, 556
static Die-Hellman key exchange, 438
static Die-Hellman problem, 452
Static-DH oracle, 465
statistical distance, 603
statistically close, 603
Stirlings approximation to the factorial, 601
Stolarsky conjecture, 57
strong Die-Hellman (Strong-DH), 496
strong forgery, 34
strong prime, 264, 508
strong prime test, 262
STRONG-RSA, 513
strongly B-smooth, 265
subexponential, 324
subexponential function, 324
subexponential-time, 39
subexponential-time reduction, 43
subgroup generated by g, 590
sublattice, 357
subvariety, 96
succeeds, 42
success probability, 42
successful adversary, 32, 437
successive minima, 360
summation polynomials, 347
superpolynomial-time, 39
supersingular, 185, 189, 233
support, 134
SUBJECT INDEX
surface, 564
system parameters, 439, 477
tail, 289
Takagi-RSA, 510
target message forgery, 33
target-collision-resistant, 76
Tate isogeny theorem, 179, 557, 560
Tate module, 182
Tates isogeny theorem, 232
Tate-Lichtenbaum pairing, 576
tau-adic expansions, 245
tensor product, 591
textbook Elgamal public key encryption, 438
textbook RSA, 28
three-kangaroo algorithm, 309
tight security reduction, 532
TLS, 28
Tonelli-Shanks algorithm, 58
Toom-Cook multiplication, 45
tori, 474
torsion-free module, 174
torus based cryptography, 109, 112
total break, 31, 33
total degree, 591
total variation, 603
trace, 85, 114, 183, 593, 595
trace based cryptography, 109
trace of Frobenius, 183
trace polynomial, 64
transcendence basis, 593
transcendence degree, 593
transcendental, 593
translation, 124
transpose, 597
trapdoor, 29
trapdoor one-way permutation, 29
trial division, 261
trivial twist, 171
tunable balancing of RSA, 510
twist, 171
twisted Edwards model, 195
UF, 33
UF-CMA, 34
Unied elliptic curve addition, 166
uniform complexity, 39
uniform distribution, 601
uniformizer, 129
uniformizing parameter, 129
unimodular matrix, 358, 600
SUBJECT INDEX
unique factorisation domain, 590
unramied, 149, 173
unreliable, 42
unreliable oracle, 44
useless cycles, 302
valuation ring, 131
value, 99
value of a function, 99
variable base, 237, 243
Verschiebung, 177
vertex boundary, 561
volcano, 565
volume, 358
V`elus formulae, 547
Wagner algorithm, 393
weak chosen-message attack, 488
Weierstrass equation, 127
weight, 240
Weight of a Frobenius expansion, 245
weighted projective hyperelliptic equation,
204
weighted projective space, 95, 204
weights, 424
Weil bounds, 231
Weil descent, 346
Weil pairing, 258, 574
Weil reciprocity, 573
Weil restriction of scalars, 106, 111
width-w non-adjacent form, 241
Wiener attack, 527
Williams, 515
Williams integer, 515, 532
window length, 56
window methods, 56
worst-case complexity, 39, 40
Xedni calculus, 351
XOR, 601
XTR, 119, 474
XTR cryptosystem, 120
XTR representation, 119
Zariski topology, 93
zero, 92, 99
zero isogeny, 172
zeta function, 230
695