0% found this document useful (0 votes)
27 views20 pages

Modular Polynomial Multiplication Using RSA/ECC Coprocessor: Keywords

This document discusses optimizing modular polynomial multiplication for lattice-based cryptography using RSA/ECC coprocessors, which are typically designed for integer operations. The authors build on previous work by enhancing the evaluation, radix conversion, and modular reductions, enabling efficient polynomial multiplication in constrained embedded devices. The study assesses the performance of their algorithms on various lattice-based schemes that are finalists in the NIST standardization process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views20 pages

Modular Polynomial Multiplication Using RSA/ECC Coprocessor: Keywords

This document discusses optimizing modular polynomial multiplication for lattice-based cryptography using RSA/ECC coprocessors, which are typically designed for integer operations. The authors build on previous work by enhancing the evaluation, radix conversion, and modular reductions, enabling efficient polynomial multiplication in constrained embedded devices. The study assesses the performance of their algorithms on various lattice-based schemes that are finalists in the NIST standardization process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Modular Polynomial Multiplication Using

RSA/ECC coprocessor

Aurélien Greuet1 , Simon Montoya1,2 , and Clémence Vermeersch1


1
IDEMIA, Cryptography & Security Labs, Courbevoie, France.
firstname.lastname@idemia.com
2
LIX, INRIA, CNRS, École Polytechnique, Institut Polytechnique de Paris, France.
firstname.lastname@lix.polytechnique.fr

Keywords: Post-Quantum Lattice-based Cryptography · Modular Polynomial


Multiplication · Embedded devices

Abstract. Modular polynomial multiplication is a core and costly oper-


ation of ideal lattice-based schemes. In the context of embedded devices,
previous works transform the polynomial multiplication to an integer
one using Kronecker substitution. Then thanks to this transformation,
existing coprocessors which handle large-integer operations can be re-
purposed to speed-up lattice-based cryptography. In a nutshell, the Kro-
necker substitution transforms by evaluation the polynomials to integers,
multiplies it with an integer multiplication and gets back to a polynomial
result using a radix conversion. The previous work focused on optimiza-
tion of the integer multiplication using coprocessor instructions. In this
work, we pursue the seminal research by optimizing the evaluation, radix
conversion and the modular reductions modulo q with today’s RSA/ECC
coprocessor. In particular we show that with a RSA/ECC coprocessor
that can compute addition/subtraction, (modular) multiplication, shift
and logical AND on integers, we can compute the whole modular poly-
nomial multiplication using coprocessor instructions. The efficiency of
our modular polynomial multiplication depends on the component spec-
ification and on the cryptosystem parameters set. Hence, we assess our
algorithm on a chip for several lattice-based schemes, which are finalists
of the NIST standardization. Moreover, we compare our modular poly-
nomial multiplication with other polynomial multiplication techniques.

1 Introduction
In the next few years, a quantum computer powerful enough to run Shor’s al-
gorithm [17] could emerge. Such a computer can break the entire cryptography
based on the hardness of integer factorization and discrete logarithm like RSA
or Elliptic Curve Cryptography (ECC). Due to this potential threat, national
agencies started to study new proposals (e.g. [6]) and initiated standardization of
quantum safe algorithms [14,8]. The most followed standardization by the com-
munity is the one of the National Institute of Standards and Technology (NIST),
which was launched in 2016 [14]. This standardization aims to bring together an
2 A. Greuet, S. Montoya, C. Vermeersch

important part of the community to determine future Key Encapsulation Mech-


anisms (KEMs) and signatures standards. In July 2020, the third round of this
standardization started with seven finalists remaining, including four KEMs and
three signatures. Among these seven finalists, five are based on lattice or assim-
ilated problems [15]. Hence, the international community around post-quantum
cryptography is very likely to include lattice-based standards. Therefore, opti-
mizing and ensuring practical security of these cryptosystems is an important
area of research.
Post-quantum cryptography will be deployed on embedded devices. On such
devices, the amount of RAM or the CPU frequency are very limited: less than 60
kB of RAM and less than 100 MHz. Therefore, implementing efficient cryptosys-
tems in these constrained environments is a real challenge. In order to speed-up
the cryptographic algorithms, these devices may embed additional hardware
coprocessors for symmetric and asymmetric cryptographic computations. More-
over, these coprocessors can provide additional security features as hardware and
software security against faults and side-channel leakage. Most of the asymmetric
coprocessors currently deployed are designed for the ECC or RSA schemes and
not for lattice-based cryptosystems. However, the underlying arithmetic of these
cryptosystems can be tweaked with the purpose of using an arithmetic close to
the one used on RSA/ECC schemes. Therefore, re-purposing such asymmetric
coprocessors is interesting to optimize lattice-based schemes and to facilitate the
transition in the post-quantum world. Indeed, the easier the transition, the more
it will be used and deployed.

Motivations & previous works. Lattice-based cryptography is believed to be a


promising direction to provide efficient and secure post-quantum algorithms.
One of the main operation in these schemes is modular polynomial multiplica-
tion. Researches have been conducted in the way of optimizing the polynomial
multiplication operation using specific software instructions or by designing a
specific hardware. However, most of the polynomial multiplication optimization
are intended to ARM-Cortex M4, or less frequently to ARM-Cortex M3 CPU
architecture. The ARM CPU is powerful and has a larger panel of interesting
assembly instruction. However, all the embedded systems do not have such a
powerful CPU and base their cryptographic efficiency on the additional copro-
cessor.
Moreover, the transition period should rely on hybrid cryptography which
is the combination of a post-quantum algorithm and a classical one. Hence,
such cryptography is both secure against quantum attacks, thanks to the post-
quantum part, and secure against classical attacks, with at least the same se-
curity level as a pure classical crypto algorithm. Several governmental agen-
cies (NIST, ANSSI, BSI) recommend and will impose in a few years the use
of hybrid cryptography for long term security certification [2,6]. In this context,
re-purposing the current asymmetric coprocessors to optimize the modular poly-
nomial multiplication is of interest in terms of costs, ease of deployment and to
propose optimization for a wide range of components.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 3

The seminal work of Albrecht et al. in [1] re-purposes a RSA/ECC coproces-


sor to optimize polynomial multiplication on Kyber algorithm. To do so, they
use techniques introduced in [11] which transform polynomial multiplication to
an integer one using the Kronecker substitution [12]. Afterwards, another work
in [18] adapts the previous technique on Saber algorithm.
The work of Bos et al. in [5] introduced Kronecker+, a generalization of
the Kronecker substitution used by Albrecht et al. in [1]. This generalization
allows trade-off between number of integer multiplications, size of the integers
and the number of polynomial evaluations. Depending on the component and
coprocessor specifications, Kronecker+ allows a faster polynomial multiplication
than Kronecker substitution.
In [10], the authors provide a variant of the Kronecker substitution and an
adaptation of the schoolbook multiplication to perform hardware polynomial
multiplication. Depending on the RSA/ECC coprocessor specifications, one of
these algorithms can outperform the classical Kronecker substitution.

Our contribution. This work aims to perform modular polynomial multiplication


in Rq,δ = Zq [X]/(X N + δ) using a RSA/ECC coprocessor, where δ ∈ {−1, 1}.
These rings are the most used by the lattice-based finalists of the NIST stan-
dardization.
The contemporary asymmetric coprocessor can perform integer operations
and not polynomial ones. As we have seen previously, most techniques to repur-
pose current coprocessor to optimize polynomial multiplication on embedded
devices are based on the Kronecker substitution. In Rq,δ this substitution can
be summarized in four steps:
1. Convert polynomials in Rq,δ to integers in N of bit size bitsize. When
polynomials have coefficients with a negative representation, this conversion
requires additional operations.
2. Modular integer multiplication mod 2bitsize + δ of the obtained integers.
3. Convert back integer multiplication result to a polynomial in Z[X]/(X N +
δ). Like Step 1, if the initial polynomials have coefficients with a negative
representation this conversion requires additional operations.
4. Reduce the coefficients modulo q to have result over Rq,δ .
All the previous works re-purpose the coprocessor only to optimize Step 2.
All the other steps are implemented in software without the use of coprocessor.
In this work for most of the previous steps, we describe algorithms which
allow to re-purpose existing coprocessor. Our work focuses on two mains contri-
butions:
– Handle negative evaluation and radix conversion using RSA/ECC coproces-
sor (Steps 1 and 3).
– Perform modular reduction of the coefficients modulo q with a RSA/ECC
coprocessor (Step 4).
These improvements are possible only if the coprocessor can handle the following
integer operations: addition/subtraction, bitwise AND, logical shift, multiplication
4 A. Greuet, S. Montoya, C. Vermeersch

and modular multiplication. Except the logical AND operation, most of current
asymmetric coprocessors handle these operations. The logical AND is less common
on the current RSA/ECC coprocessor. However adding this operation to an ex-
isting architecture is easier and cheaper than designing a new one for polynomial
multiplication.

Organization. In Section 2 we introduce notations which we use in the rest of the


paper. In Section 3 we present how to perform a polynomial multiplication in
N[X] using the Kronecker substitution. Afterwards, in Section 4 we explain how
to use the coprocessor instructions to perform the Kronecker substitution evalua-
tion and radix conversation, since the polynomials are in Rq,δ = Zq [X]/(X N +δ),
where δ ∈ {−1, 1}. In Section 5 we describe modular reductions modulo q us-
ing coprocessor instructions. Finally, in Section 6 we present the results of our
practical implementations of our algorithms on several lattice-based finalists.

2 Background
RSA/ECC coprocessor. The RSA/ECC coprocessor are designed to speed-up
RSA or elliptic curves cryptosystems. To do so, these components provide a
range of integer operations. In this work, we assume that we have access to
a component which can perform, at least, addition/subtraction, bitwise AND,
logical shift, multiplication and modular multiplication operations.

2.1 Element representation


Integers representation. Let a ∈ N such that 0 ≤ a < 2ℓ . In the following, we
say that a is represented over ℓ bits to mean that a is stored in a machine buffer
of ℓ bits.
′ ′
Let b ∈ Z such that −2ℓ −1 < b < 2ℓ −1 . Let b̃ be the two’s complement
representation of b over ℓ′ bits, defined by:
′ ′
b̃ = 2ℓ + b mod 2ℓ ∈ N

In the following, we say that b is represented over ℓ′ bits to mean that the two’s
complement representation of b is stored in a machine buffer of ℓ′ bits.
Let r be a N ℓ-bit natural ∑number. We denote by ri the i-th digit of r in
N −1
base 2ℓ . In other words, r = i=0 ri 2iℓ with 0 ≤ ri < 2ℓ . We use the following
notation r = (r0 , r1 , . . . , rN −1 )ℓ .

Polynomial representation. Let F (X) = f0 + f1 X + . . . + fN −1 X N −1 ∈ Z[X] of


degree at most N −1. Let f˜i be a two’s complement representation of a coefficient
fi .
Array representation. The usual machine representation of F (X) is an array
where the i-th item is f˜i . To ease the reading, we denote in the following, fi
or f [i] the coefficient associated to the i-th item. Moreover, unless otherwise
specified, a polynomial is represented as an array.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 5

Packed integer representation. A packed integer representation of F (X) is


the concatenation of all the f˜i into a buffer.

f = f˜N −1 | . . . |f˜1 |f˜0 ∈ N

In this work, this representation is used to represent polynomials into a natural


number. Afterwards, the polynomial arithmetic is carried out with operations
on this natural number.

2.2 Notations
Z [X]
Rings. Let q be an integer. Denote by Rq,δ the polynomial ring (XqN +δ) , where
δ ∈ {−1, 1}. We represent an element F (X) ∈ Rq,δ as a polynomial of degree

at most N − 1 with coefficients in {0, . . . , q − 1}. Rq,δ denotes the elements of
Rq,δ represented by a polynomial of degree at most N − 1 with coefficients in
{− 2q − 1, . . . , 2q }.

Integer operations. In the sequel, the algorithms are described using the following
notations. Their purpose is to clarify the size of the manipulated operands.

– Let add(a,b,bitlen) (resp. sub(a,b,bitlen)) be the addition (resp. subtrac-


tion) between a and b. The values a and b are represented over bitlen bits.
– Let lshift(a,k,bitlen) (resp. rshift(a,k,bitlen)) be the left (resp. right)
shift a << k (resp. a >> k) over bitlen bits.
– Let and(a,b,bitlen) be the AND operation a&b over bitlen bits.
– Let mult(a,b,bitlena ,bitlenb ) be the integer multiplication a × b where a
(resp. b) is represented on bitlena (resp. bitlenb ) bits.
– Let modMult(a,b,bitlena ,bitlenb ,p) be the integer modular multiplica-
tion a×b mod p where a (resp. b) is represented on bitlena (resp. bitlenb )
bits.

Concatenation. Let (ℓ, k, N ) ∈ N3 with ℓ ≤ k and m ∈ N represented over ℓ bits.


In the following we denote by concat(m, k, N ) the function that represents m
on k bits and concatenates this new representation N times. Formally:
N∑−1
concat(m, k, N ) = m2jk ∈ N
j=0

Example 1. Let m = 1 then concat(m, 8, 3) = 0x10101.

Integer to polynomial. Let (ℓ, k, N ) ∈ N3 , ℓ > k and F (X) = f0 + . . . +


fN −1 X N −1 ∈ Z[X]. For all i, let f˜i be the two’s complement representation
of fi over k bits. We denote by:

N −1
f = polyToN(F (X), k, ℓ) = f˜i 2iℓ , f ∈ N
i=0
Let g = (g0 , g1 , . . . , gN −1 )ℓ ∈ N a N ℓ-bit number.
6 A. Greuet, S. Montoya, C. Vermeersch


N −1
G(X) = NtoPoly(g, ℓ) = gi X i
i=0
The obtained polynomial G(X) belongs to N[X] and its degree is at most N − 1.

Example 2. Let F (X) = f2 X 2 + f1 X + f0 = 2X 2 + 4X − 2. Let f˜0 = 0xE, f˜1 =


0x4, f˜2 = 0x2, be representations of all fi over 4 bits. Then,
f = polyToN(F (X), 4, 8) = 0x02040E and NtoPoly(f, 8) = 2X 2 + 4X + 14

3 Multiplication in N[X] using Kronecker substitution

The Kronecker substitution was first introduced in [12]. We give here the main
steps of this substitution. The idea of this substitution is to transform a poly-
nomial multiplication to an integer one by evaluating the polynomials and get
back to the result using a radix conversion. In the context of embedded devices,
this transformation is of interest to perform polynomial multiplication by using
the RSA/ECC coprocessor. Indeed, such coprocessor handles multiplication on
integers. In this section we assume that our polynomials are defined over N[X].

3.1 Kronecker substitution

The Kronecker substitution multiplies two polynomials F (X) and G(X) using
an integer multiplication. This substitution can be summarized in three steps:

1. Evaluation of F (X) and G(X) at 2ℓ . The value ℓ is chosen such that all the
coefficients after the polynomial
( ℓ )multiplication
( ℓ) are lower than 2ℓ .
2. Integer multiplication r = F 2 G 2 , r ∈ N.
3. Get back to polynomial R(X) ∈ N[X] using radix conversion on r.

Evaluation. The first step of the Kronecker substitution is the polynomial eval-
uation at 2ℓ . Since F (X) has coefficients in N represented over k bits:
( )
Evaluation≥0 (F (X), k, ℓ) := F 2ℓ = polyToN(F (X), k, ℓ) (1)

Example 3. Let F (X) = 2X 2 + X + 3 then, F (28 ) = 0x020103


= Evaluation≥0 (F (X), 2, 8)

Evaluation point. Let R(X) = F (X)G(X) where F (X), G(X) ∈ N[X] of degree
at most N − 1. The evaluation point 2ℓ is chosen such that for all i ≤ 2(N − 1):

ri ≤ max (fj ) max (gj )N < 2ℓ


j∈{0,...,N −1} j∈{0,...,N −1}

By the fact that all the coefficients are non-negative, this evaluation is only a
representation of all the fi over ℓ bits. Then in an implementation, the evaluation
does not require arithmetic operations.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 7

Radix Conversion. Radix conversion aims to transform an integer into a poly-


nomial. Let f = (f0 , . . . , fN −1 )ℓ ∈ N, then:

F (X) = f0 + . . . + fN −1 X N −1 := Radix Conversion≥0 (f ) = NtoPoly(f, ℓ)


(2)

Example 4. Let f = 0x020103 then F (X) = 2X 2 + X + 3


= Radix Conversion≥0 (f )

The radix conversion converts a packed integer representation to an array


one. Like the evaluation algorithm, in an implementation, the radix conversion
does not require arithmetic operation.

Example of Kronecker substitution.

Example 5. Let F (X) = 2X 2 + X + 3 and G(X) = X 2 + 1. Then,

F (28 ) = 0x020103 = Evaluation≥0 (F (X), 2, 8)


G(28 ) = 0x010001 = Evaluation≥0 (G(X), 2, 8)

Afterwards we multiply the evaluated polynomials r = F (28 )G(28 )


= 0x201050103. Finally we obtain R(X) = Radix Conversion≥0 (r) = 2X 4 +
X 3 + 5X 2 + X + 3.

4 Multiplication in Rq,δ using Kronecker substitution

In the previous section we perform polynomial multiplication as an integer one


with polynomials in N[X]. However, in the lattice-based schemes some poly-
nomials, mainly the secret ones, have coefficients with a negative representa-
tion close to 0. Moreover, the reduction modulo X N + 1 can also bring nega-
tive coefficients. Then in this section we focus on polynomial multiplication in
Rq,δ = Zq [X]/(X N + δ). In Rq,δ , the polynomial multiplication using Kronecker
substitution is achieved as follows:

– Evaluation of polynomials considering negative coefficients.


– Integer multiplication modulo 2N ℓ + δ. The modular reduction ensures that
after radix conversion the polynomial result is reduced modulo X N + δ.
– Radix conversion to obtain a polynomial in Z[X]/(X N + δ).
– Reduction modulo q of the polynomial coefficients.

Previous works [1,10,5] already achieve the evaluation and the radix conver-
sion with negative coefficients. However, these algorithms are done using array
representations. In this section we describe a way to realize these algorithms
when the coefficients are on a packed integer representation. The main advan-
tage of this representation is that it allows the use of existing coprocessor.
8 A. Greuet, S. Montoya, C. Vermeersch

Negative representation. Our goal is to perform polynomial multiplication over


Rq,δ . Then, a way to avoid the negative coefficients is to represent them with
a non-negative representation over Rq,δ . However, the negative coefficients are
close to 0, then the closest non-negative representation is nearby q. This involves
that the evaluation point must be higher and then the integer operation are done
on much larger integer; see [10] for more details on the impact on the evaluation
point. Thus, for the sake of efficiency we use, when possible, our algorithms with
the negative representation.

4.1 Evaluation with negative coefficients.



Let F (X) = f0 + f1 X + . . . + fN −1 X N −1 ∈ Rq,δ and f˜i be the two’s complement
representation over k bits of fi . Our goal is to evaluate F (X) at 2ℓ where ℓ > k,
then for i = 0 to N − 1:
– If fi ≥ 0, then we only have to represent it on ℓ bits (as in Section 2).
– If fi < 0, then we have to represent it with a two’s complement over ℓ bits
and propagate a borrow to the next coefficient. To obtain a two’s complement
representation from k bits to ℓ bits, we compute:

f˜i + (2ℓ − 2k ) = 2k + fi + (2ℓ − 2k ) = 2ℓ + fi

The Algorithm 1 computes the two’s complement representation of the polyno-


mial evaluation when the coefficients are in Z. More precisely, this evaluation
is done using arithmetic operations on a packed integers representation. To do
so, we first represent the polynomial coefficients into a packed integers form,
as defined in Equation 1. Afterwards, we use arithmetic operations in order to
convert the two complement’s representation from k to ℓ bits and to propagate
the required borrows.

Algorithm 1 Evaluation

Input: F (X) ∈ Rq,δ , k, ℓ ∈ N where ℓ > k.
( )
Output: f˜ ∈ N the two’s complement representation of F 2ℓ mod 2N ℓ
1: mask ← concat(1, ℓ, N ) //Precomputed
2: f˜ ← polyToN(F (X), k, ℓ)
3: neg ← rshift(f˜, k − 1, N ℓ)
4: neg ← and(neg, mask, N ℓ) // Detect negative coefficients
5: tmp ← mult(neg, 2ℓ − 2k , N ℓ, 32)
6: f˜ ← add(f˜, tmp, N ℓ) // Two’s complement representation of each coeff over ℓ
bits
7: neg ← lshift(neg, ℓ, N ℓ)
8: f˜ ← sub(f˜, neg, N ℓ) // Borrow propagation
9: return f˜

Remark 1. The value mask is always the same for a fixed scheme. Then, this
integer can be precomputed and stored in Non-Volatile Memory (NVM).
Modular Polynomial Multiplication Using RSA/ECC coprocessor 9

Remark 2. The Evaluation


( ) (Algorithm 1) returns the two’s complement rep-
resentation of F 2ℓ mod 2N ℓ . This implies:
( ) ( )
– If F 2ℓ ≥ 0, then the returned value is equal to F 2ℓ .
( )
– Otherwise, the returned value is not equal to F 2ℓ . This case occurs when
the latest non-zero coefficient of F (X) is negative.

To obtain the expected result after the Kronecker Substitution, the last case
requires additional operations before the radix conversion. These additional op-
erations are described in Section 4.2 paragraph Two’s complement represen-
tation of the evaluated polynomial.

Example 6. Let F (X) = 3X 2 − 2X + 2, where all the coefficients are encoded


with a two’s complement representation over k = 4 bits. Let N = 3 and ℓ = 8.
The expected result is F (28 ) = 0x02FE02. This is obtained with Evalua-
tion(F (X), k, ℓ):

1. mask ← concat(1, 8, 3) = 0x010101


2. f˜ ← polyToN(F (X), 4, 8) = 0x030E02
3. neg ← rshift(f˜, 4 − 1, 3 × 8) = 0x0061C0
4. neg ← and(neg, mask, 3 × 8) = 0x000100
5. tmp ← mult(neg, 28 − 24 , 3 × 8, 32) = 0x00F000
6. f˜ ← add(f˜, tmp, 3 × 8) = 0x03FE02
7. neg ← lshift(neg, 8, 3 × 8) = 0x010000
8. F (28 ) ← sub(f˜, neg, 3 × 8) = 0x02FE02

Evaluation point. Let R(X) = F (X)G(X) where F (X), G(X) ∈ Rq,δ . The eval-
uation point 2ℓ is chosen such that for all i ≤ 2(N − 1):

ri ≤ max (|fj |) max (|gj |)N < 2ℓ−1


j∈{0,...,N −1} j∈{0,...,N −1}

4.2 Radix Conversion with negative coefficient representation.

As mentioned in [1,10], the radix conversion has to be adapted since some coef-
ficients have negative representations. Two issues arise with the negative coeffi-
cients:

1. The evaluation and the integer multiplication propagate borrow between the
polynomial coefficients.
2. The negative evaluation algorithm returns two’s complement representation
over N ℓ bits.

Borrow between the coefficients. The evaluation converts a polynomial to a


packed integers representation. In the following of the Kronecker substitution,
the obtained natural numbers are manipulated regardless the original polyno-
mial structure. Therefore, borrows can be propagated between the coefficients.
10 A. Greuet, S. Montoya, C. Vermeersch

However in order to retrieve the expected polynomial result, the radix conversion
must compensate the propagated borrows by propagating back carries.
Let r̃ = (r̃0 , r̃1 , . . . , r̃N −1 )ℓ ∈ N be the integer that we want to convert to a
polynomial, where for all i, r̃i is a two’s complement representation over ℓ bits
of an integer −2ℓ−1 < ri < 2ℓ−1 . In order to propagate back the carries, we
transform the negative coefficients to non-negative ones by adding a multiple of
our modulus q: maxValue. More precisely, maxValue is the smallest multiple of q
such that for all i, −maxValue ≤ ri < maxValue. Moreover with the parameters
that we use in Section 6, we have maxValue < 2ℓ−1 . Then, by adding maxValue
we got:
– If ri < 0, then 2ℓ ≤ r̃i + maxValue = 2ℓ + ri + maxValue. Therefore a carry
is propagated to r̃i+1 .
– If ri ≥ 0, then r̃i + maxValue = ri + maxValue < 2ℓ .
After adding maxValue, the values ri are considered as natural numbers repre-
sented over ℓ bits. Then, the expected polynomial is obtained by using the radix
conversion algorithm defined in Equation 2 on r̃.
This negative to non-negative conversion is possible because the polynomial
multiplication is done over Rq,δ . Indeed after reduction modulo q, the added
value maxValue is equal to 0.

Two’s complement representation of the evaluated polynomial. The second issue


is due to the two’s complement representation of the evaluated polynomial.

Let F (X) = f0 + . . . + fN −1 X N −1 ∈ Rq,δ of degree N − 1 and ℓ ∈ N.
Then Algorithm 1 returns the integer f ← Evaluation(F
( ) (X), k, ℓ), that is
the two’s complement representation of F 2ℓ mod 2N ℓ . Two cases are to be
distinguished:
( )
– fN −1 > 0, then f = F 2ℓ ∈(N. ) ( )
– fN −1 < 0, then f = 2N ℓ + F 2ℓ is the two’s complement of F 2ℓ modulo
2N ℓ .
Only the second case will lead to a wrong
( ) result after the modular multiplication.
Indeed, let g ∈ N and f = 2N ℓ + F 2ℓ we got:
( ) ( ) ( ) ( )
r mod 2N ℓ + δ = f g mod 2N ℓ + δ = 2N ℓ g + F 2ℓ g mod 2N ℓ + δ
( ) ( )
̸= F 2ℓ g mod 2N ℓ + δ

Then in this case, before the radix conversion we must add or subtract g to
r, depending on δ:
( ) ( )
– δ = 1 : 2N ℓ g mod 2N ℓ + 1 = −g mod 2N ℓ + 1 , then
( ) ( ) ( )
r + g mod 2N ℓ + 1 = F 2ℓ g mod 2N ℓ + 1
( ) ( )
– δ = −1 : 2N ℓ g mod 2N ℓ − 1 = g mod 2N ℓ − 1 , then
( ) ( ) ( )
r − g mod 2N ℓ − 1 = F 2ℓ g mod 2N ℓ − 1
Modular Polynomial Multiplication Using RSA/ECC coprocessor 11

Previously, we supposed that at most one polynomial can have negative coeffi-
cients. In case of lattice-based schemes, this is always the case.

Algorithm 2 Radix Conversion


Input: r, g, maxValue ∈ N, and sign ∈ {0, 1}
Output: R(X) ∈ N[X]/(X N + δ)
1: max ← concat(maxValue, ℓ, N ) //Can be precomputed
2: if sign eq 1 then
3: if δ eq 1 then r ← add(r, g, N ℓ) // To handle negative last coeff
4: else r ← sub(r, g, N ℓ)
5: else
6: if δ eq 1 then dummy ← add(r, g, N ℓ) // For isochrony
7: else dummy ← sub(r, g, N ℓ)
8: end if
9: r ← add(r, max, N ℓ) // Add maxValue to each coefficient
10: R(X) ← Radix Conversion≥0 (r)

4.3 Multiplication in Rq,δ using coprocessor


The Sections 4.1 and 4.2 are used to obtain a polynomial multiplication algo-
rithm in Rq,δ using, mainly, a packed integer representation. More precisely,
except for the modular reductions modulo q, the operations are done using this
representation. All operations performed on the packed integers representation
can be achieved with coprocessor as defined in Section 2.
The Polynomial Multiplication in Rq,δ algorithm is described in Algorithm
3.

Algorithm 3 Polynomial Multiplication in Rq,δ



Input: (F (X), G(X)) ∈ (Rq,δ , Rq,δ ) of degree N − 1. Let k, ℓ, q ∈ N where ℓ > k, and
maxValue defined as above.
Output: R(X) = F (X)G(X) ∈ Rq,δ
1: f ←
( Evaluation(F
) (X), k, ℓ)
2: G 2ℓ ← Evaluation≥0 (G(X), k, ℓ)
( )
3: r ← modMult(f, G 2ℓ , N ℓ, N ℓ, 2N ℓ + δ)
4: b ← sign(F [N − 1]) ( )// if FN −1 < 0 then b = 1, otherwise b = 0.
5: R(X) ← Radix Conversion(r, G 2ℓ , maxValue, b)
6: R(X) ← R(X) mod q // Any modular reduction
7: return R(X)

In the following section we determine how to perform modular reductions


modulo q using packed integers representation.

5 Reducing coefficients modulo q


In Section 4.3, we perform polynomial multiplication in Rq,δ . However, the re-
duction modulo q is done after the radix conversion on a polynomial representa-
12 A. Greuet, S. Montoya, C. Vermeersch

tion. In this section we show how to perform reduction modulo q using packed
integers representation. As mentioned previously, such representation allows to
repurpose existing RSA/ECC coprocessor.
Let r = (r0 , . . . , rN −1 )ℓ ∈ N. In our context, r is obtained after the two first
steps of the Kronecker substitution: polynomial evaluation and modular integer
multiplication. Moreover, we have added maxValue like in Section 4.2. Then,
each ri is such that for all i: 0 ≤ ri < 2maxValue.
In the following we denote by simultaneous reduction, the fact of reducing all
the ri mod q by performing operations on r.

5.1 Power-of-two modulus

Some of lattice-based schemes, like Saber [9] and NTRU [7], use a power-of-two
modulus. In this context, the simultaneous reduction is easy and fast. Indeed,
the simultaneous reduction is achieved by the computation:

r&concat(q − 1, ℓ, N )

5.2 Prime modulus

Kyber [4] is a lattice-based KEM which perform polynomial multiplication over


Rq,δ , where q is a prime number. In this section we adapt Barrett [3] reduction
to perform simultaneous reduction.

Barrett The Barrett reduction is introduced in [3]. The main idea is to pre-
compute an approximation of a division and use it to perform modular re-
duction. Let α, β ∈ Z and a ∈ N be an integer to reduce
⌊ k+αmodulo
⌋ q ∈ N of
bit-length k ∈ N. Barrett reduction precomputes m = 2 q and computes:
a′ = a − [((a >> (k + β)) · m) >> (α − β)] q
A special case is when α = β, therefore the computation is
a′ = a − [a >> (k + β)] · m · q
In this case, only one shift and one multiplication is performed (m · q is
precomputed).
⌊ ⌋
Depending on the parameters (α, β), a′ = a mod q + tq where 0 ≤ t < aq .
Further details on the Barrett algorithms are given in [13].

Simultaneous modular reduction. To adapt this reduction to simultaneous re-


duction we need to perform logical AND after the shift operations. Indeed, be-
cause of the shift operations, noise coming from coefficient i + 1 can overflow on
Modular Polynomial Multiplication Using RSA/ECC coprocessor 13

the coefficient i. The Algorithm 4 describes the simultaneous Barrett reduction.

Algorithm 4 Simult. Barrettα,β


⌊ ⌋
2k+α
Input: r = (r0 , . . . , rN −1 )ℓ ∈ N. Let q ∈ N of bit-length k and m = q
.
Output: r′ = (r0′ , . . . , rN

−1 )ℓ ∈ N, all ri are reduced with Barrett reduction
1: mask ← concat(2 ℓ−α+β
− 1, ℓ, N ) // Can be precomputed
2: mask′ ← concat(2ℓ−k−β − 1, ℓ, N ) // Can be precomputed
3: tmp ← rshift(r, k + β, N ℓ)
4: tmp ← and(tmp, mask′ , N ℓ)
5: tmp ← mult(tmp, m, N ℓ, 32) // Mult between a word and a large integer
6: tmp ← rshift(tmp, α − β, N ℓ)
7: tmp ← and(tmp, mask, N ℓ)
8: tmp ← mult(tmp, q, N ℓ, 32) // Mult between a word and a large integer
9: r′ ← sub(r, tmp, N ℓ)
10: return r′

Final reduction Using the simultaneous Barrett Algorithm 4, the returned


result r′ = (r0′ , . . . , rN
′ ′ ′
−1 )ℓ ∈ N is such that, for all i, ri = ri mod q + ti q. With
the parameters sets that we use in Section 6, for all i, ti ∈ {0, 1, 2}.
Let k and c such that q = 2k − c. Then ri′ ≥ 2q if and only if ri′ + 2c has
its (k + 1)-th bit equal to one. This fact is used in Algorithm 5 to detect and
subtract q to coefficients ≥ 2q in a packed integers representation.
After using the Algorithm 5, the ri′′ are bounded by 2q. In that case, this
algorithm can be adapted replacing 2c by c (line 1) and k + 1 by k (line 3). It
follows that q is subtracted from each ri′′ ≥ q. Afterwards, each ri′′ is necessary
lower than q.

Algorithm 5 Simult. Conditional Subtraction


Input: r′ = (r0′ , . . . , rN
′ ′
−1 )ℓ with all 0 ≤ ri < 3q, where q = 2 − c, ℓ, N ∈ N.
k
′′ ′′ ′′ ′′
Output: r = (r0 , . . . , rN −1 )ℓ with all 0 ≤ ri < 2q
1: (C, mask) ← (concat(2c, ℓ, N ), concat(1, ℓ, N )) //Can be precomputed
2: tmp ← add(r′ , C, N ℓ) //Raised the k + 1-th bit in each coeff
3: tmp ← rshift(tmp, k + 1, N ℓ) // Move the k + 1-th bit to position 0 in each coeff
4: tmp ← and(tmp, mask, N ℓ) // Detect the coeff ≥ 2q
5: tmp ← mult(tmp, q, N ℓ, 32) // Mult between a word and a large integer
6: r′′ ← sub(r′ , tmp, N ℓ) // Subtract q to each coeff ≥ 2q
7: return r′′

5.3 Modular polynomial multiplication using coprocessor

The Algorithm 6 performs polynomial multiplication in Rq,δ using operations on


packed integers representation. All operations performed on this representation
can be achieved with coprocessor as defined in Section 2.
14 A. Greuet, S. Montoya, C. Vermeersch

Algorithm 6 Modular Polynomial Multiplication



Input: (F (X), G(X)) ∈ (Rq,δ , Rq,δ ) of degree N − 1. Let k, ℓ, q ∈ N where ℓ > k, and
maxValue defined as above.
Output: R(X) = F (X)G(X) ∈ Rq,δ
( ←( concat(maxValue,
1: max )) ℓ, N ) // Precomputed
2: f, G 2ℓ ← (Evaluation(F (X), ℓ), Evaluation≥0 (G(X), ℓ))
( )
3: r ← modMult(f, G 2ℓ , N ℓ, N ℓ, 2N ℓ ± δ)
4: b ← sign(f [N − 1])
5: if b eq 1 then ( )
6: if δ eq 1 then r ← sub(r, G 2ℓ , N ℓ) // To handle negative last coeff
( ℓ)
7: else r ← add(r, G 2 , N ℓ)
8: else ( )
9: if δ eq 1 then dummy ← sub(r, G 2ℓ , N ℓ) // For isochrony
( ℓ)
10: else dummy ← add(r, G 2 , N ℓ)
11: end if
12: r ← add(r, max, N ℓ) //Negative to non negative representation for all ri′
k
13: if q eq 2 then
14: mask′ ← concat(2k − 1, ℓ, N )
15: r ← and(r, mask′ , N ℓ)
16: else
17: r ←Simult. Barrett(r)
18: r ← Simult. Cond. Sub.(r, ℓ, N ) // Can be applied twice if some ri ≥ 2q
19: end if
20: R(X) ← Radix Conversion(r)
21: return R(X)

The Modular Polynomial Multiplication Algorithm 6 works as follows:

1. Line 2: Polynomial evaluations defined in Equation 1 and Algorithm 1.


2. Line 3: Modular integer multiplication modulo 2N ℓ + δ of the evaluated
polynomials.
3. Line 4 to 11: Handle the two’s complement representation of the evaluated
polynomial; see Section 4.2.
4. Line 12: Convert the negative representation to non negative one; see Section
4.2. This operation allows to perform simultaneous reduction mod q and
radix conversion.
5. Line 13 to 19: Perform simultaneous reduction mod q. This ensures that the
polynomial result has coefficients reduced mod q.
6. Line 20: Radix conversion defined in Equation 2 to obtain a polynomial
result.

6 Applications and Results

In this section, after some preliminaries, the component on which we performed


our experiments and the results obtained by implementing the Modular Poly-
Modular Polynomial Multiplication Using RSA/ECC coprocessor 15

nomial Multiplication (MPM), cf. Algorithm 6, and another polynomial multi-


plication depending of the evaluated scheme. The evaluated lattice-based algo-
rithms are: Kyber, Dilithium, NTRU, and Saber.

6.1 Background

NTT NTT is an algorithm allowing to perform fast polynomial multiplication in


Rq,1 [16]. Given a and b ∈ Rq,1 , a×b is computed as NTT−1 (NTT (a) ◦ NTT (b)),
where ◦ is the coefficient-wise multiplication.
Theoretically, NTT has the best asymptotic complexity for multiplication
in Rq,1 . However, in constrained environments (e.g. smart cards), devices may
have dedicated hardware to perform fast large-integer arithmetic. In this context,
NTT can be outperformed by an algorithm relying on integer arithmetic, even
if its theoretical complexity is worse than NTT.

Subdivision RSA/ECC coprocessors perform integer arithmetic with data in


buffer size has a fixed limit. In our context after polynomial evaluation, the re-
sulting integer is generally too large to fit in these buffers. In that case we use
multiple-precision arithmetic. This arithmetic consists of dividing the manipu-
lated integers into several smaller ones and then perform operations on these
smaller integers.
In the case of integer multiplication we use two techniques to divide the
integer multiplication into smaller ones: Karatsuba and Schoolbook. Let f =
fI + fS 2N ℓ/2 and g = gI + gS 2N ℓ/2 , where fI , fS , gI , gS are lower than 2N ℓ/2 .

Schoolbook: f g = fI gI + (fI gS + fS gI )2N ℓ/2 + 2N ℓ fS gS

Karatsuba: f g = fI gI + ((fI + fS )(gI + gS ) − fI gI − fS gS )2N ℓ/2 + 2N ℓ fS gS


These techniques can be applied recursively in order to obtain a targeted
integer size. Later on when presenting the results, we specify in a column named
subdivision the multi-precision method that we use for the integer multiplication.

Evaluation point. In our context the Karatsuba subdivision requires to increase


the size of the evaluation point by 1 bit at each subdivision. It is due to the
computation (fI +fS )(gI +gS ). Indeed, this computation is performed on integers
of length twice as small but with values twice as large.
In the following results, the evaluation point is chosen to take into account
the negative coefficients and the Karatsuba subdivisions.

Polynomial distribution The following polynomial multiplications are per-



formed between a polynomial G(X) ∈ Rq,δ and F (X) ∈ Rq,δ . More precisely,
the coefficients of G(X) are sampled uniformly in {0, . . . , q − 1} and the coeffi-
cients of F (X) are sampled in a distribution Dσ . Using a distribution Dσ , the
coefficients are represented in {−σ, . . . , 0, . . . , σ}.
16 A. Greuet, S. Montoya, C. Vermeersch

Masked secret polynomial. Most of the time the polynomial using the distribution
Dσ is the secret polynomial. In some use cases, an embedded implementation
must be strongly secured against side-channel attacks. One way to do this is to
mask the secret data. To do so, we split the sensitive data into shares x = x1 +x2
mod q, where x1 , x2 belongs to {0, . . . , q −1}, and then we process the operations
on each share separately. In our context the value q is much larger than the secret
distribution. Therefore, that implies we will manipulate larger secret data and
then it increases the evaluation point. For some assessments, in order to consider
this security requirement, we suppose that the polynomial F (X) is defined over
Rq,δ and its coefficients are sampled uniformly in {0, . . . , q − 1}. In the following
results, we denote this case by Uq distribution.
In the following results, we only specify the distribution of F (X).

Target Assessments are done on a smart card component using a 32-bit ar-
chitecture. In the following we refer to this device as Component A. Due to
intellectual properties reasons, the component name or a detailed description
cannot be given. Then, we only give the main characteristics of the component
A:

– Standards 32-bit instructions (add, sub, shifts, bitwise and, xor, or, etc.).
– No CPU multiplication and division.
– A coprocessor which handles: logical AND, addition, subtraction, shifts, mod-
ular integer multiplication and the non-modular one.

The following results take into account a complete modular reduction. More-
over like the previous works [1,18,5,10], we assume that the inputs are already
in the appropriate machine representation. This implies that the inputs are in:

– Polynomial representation for NTT, Karatsuba and schoolbook polynomial


multiplication.
– Packed integers representation for the MPM algorithm.

6.2 Results

Kyber Kyber [4] is a lattice-based KEM finalist of the NIST standardization.


The polynomial ring defined in Kyber is Rq,1 = Zq [X]/(X N +1), where q = 3329
and N = 256. The polynomial multiplication used in the specification is the NTT
algorithm. In this context, we have implemented two polynomial multiplications:

– A NTT multiplication. It is adapted from the reference implementation,


in order to use the hardware Montgomery multiplication. Tables of roots
of unity have been recomputed to handle the Montgomery arithmetic with
R = 232 , the smallest handled by the coprocessor, instead of R = 216 . In
addition, the multiplication followed by a Montgomery reduction is replaced
by a call to the coprocessor Montgomery multiplication. In Table 1 we present
timings from the NTT’s implementation.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 17

NTT Pointwise NTT−1


Cycles 98k 40k 106k
Table 1. Kyber NTT cycles on Component A
– The modular polynomial multiplication (MPM) described in Algorithm 6. For
this algorithm we consider two distributions for the polynomial F (X):
• D3 . In this case the modular reduction modulo q is done using Simult.Ba-
rrett11,0 . In order to completely reduce the coefficients we perform 2
final subtractions using the technique described in Section 5.2.
• Uq . In this case the modular reduction modulo q is done using Simult.
Barrett10,10 and then an application of Simult. Barrett13,−2 . After-
wards, a final subtraction is performed using the technique described in
Section 5.2.
In Table 2, we describe the parameters used for MPM algorithms. More precisely,
we describe ℓ such that the evaluation point is 2ℓ , the maximum value to convert
negative coefficients to non-negative ones, the subdivision used and the obtained
cycles.
Distribution ℓ maxValue Subdivision Cycles MPM
D3 23 3qn None 50k
Uq 34 q2 n 2 calls to Karatsuba 67k
Table 2. Parameters and cost of one multiplication in Rq,1 for Kyber parameters
Comparison. The previous results take into account one execution of MPM algo-
rithm and each NTT routine. In order to compare NTT and MPM algorithms,
we must not only compare pointwise routine with MPM algorithm. Indeed, we
must also take into account calls to the NTT and NTT−1 routines. Then, in order
to compare the two polynomial multiplication methods we must determine how
many times each algorithm is called.
The Table 3 describes the number of calls to NTT, pointwise multiplication
and NTT−1 during the Key Generation, Encrypt and Decrypt routines. The
number of calls depends on the Kyber’s security parameters which are k = 2/3/4.
Note that the number of pointwise matches the number of MPM calls.

NTT Pointwise/MPM NTT−1


Key Gen. 2k k2 0
2
Encrypt k k +k k+1
Decrypt k k 1
Table 3. Number of call to NTT routines in Kyber
In order to fairly compare NTT and MPM algorithms we use:
– The official specification of Kyber for the NTT algorithm. The private and
public keys are stored in the NTT domain.
– A tweaked version of Kyber for the MPM algorithm. The private and public
keys are not stored in the NTT domain. Therefore, we do not need to apply
NTT−1 to perform MPM algorithm.
18 A. Greuet, S. Montoya, C. Vermeersch

The MPM algorithm is called with the Uq distribution parameters.


Total cycles NTT Total cycles MPM Ratio (NTT/MPM)
k=2
Key Gen. 552k 268k 2
Encrypt 754k 402k 1.9
Decrypt 382k 134k 2.9
k=3
Key Gen. 948k 603k 1.6
Encrypt 1198k 804k 1.5
Decrypt 520k 201k 2.6
k=4
Key Gen. 1424k 1072k 1.3
Encaps 1722k 1340k 1.3
Decrypt 658k 268k 2.5
Table 4. Cycle count for all multiplications in Kyber for the Uq distribution param-
eters

Saber & NTRU

Saber. Saber [9] is a lattice-based KEM finalist of the NIST standardization. The
polynomial ring used in Saber is Rq,1 = Zq [X]/(X N + 1), where N = 256 and
q = 8192 = 213 . In this work we consider two distributions for the polynomial
F (X):

– D5 . Other distributions are used in Saber. However we only describe the


worst one for the MPM algorithm.
– Uq .

Since the modulus is a power of two, the reductions are achieved using a logical
AND with the appropriate mask.

NTRU. NTRU [7] is also a KEM finalist of the NIST competition. The polyno-
Z [X]
mial ring used in NTRU is Rq,−1 = (XqN −1) . The modulus q and the value N
depends on the security parameters. In this work we only consider NTRU HPS
1 parameters, where N = 509 and q = 2048 = 211 .
The value of N does not allow to easily make subdivisions. To overcome
this issue, we work on polynomials with Ñ = 512 coefficients where the latest
coefficients are equal to 0.
In this work, we consider only a Uq distribution. Since q is a power of two,
the modular reductions are performed with a logical AND.

Comparison. The Saber and NTRU MPM algorithms are compared with the poly-
nomial multiplication used in their reference implementations.

– Saber: A combination of a 4-way Toom-Cook and Karatsuba algorithms.


– NTRU: A schoolbook multiplication.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 19

The polynomial multiplication of the reference implementations are achieved


with the 32 bits coprocessor multiplication. The Table 5 describes the obtained
results on Component A.

Distribution ℓ maxValue Subdivision Cycl.MPM Cycl. ref.


Saber
D5 25 5qn None 47k 1405k
Uq 36 q2 n 2 calls to Karatsuba 61k 1405k
NTRU
Uq 34 q2 n 3 calls to Karatsuba 173k 17256k
Table 5. Parameters and cost of one multiplication in Rq,δ for Saber and NTRU
parameters

7 Conclusion

In this paper we pursue the previous works that optimize lattice-based schemes,
by re-purposing today’s RSA/ECC coprocessor. Indeed, we propose an algo-
rithm, called MPM, which performs modular polynomial multiplication using co-
processor instructions. More precisely, our work allow to reprupose existing co-
processor to handle modular reductions and the negative coefficients during the
polynomial multiplication.
Afterwards, we assess in practice the MPM algorithm for almost all NIST
lattice-based finalists. This assessment is done on a component which has few
CPU instruction and that bases the asymmetric cryptographic efficiency on its
RSA/ECC coprocessor. The MPM algorithm is compared to software polynomial
multiplications, as NTT or Karatsuba. The few CPU instruction minimizes the
possible assembly optimization for the software algorithms. Therefore in this
component, our algorithm multiplication brings a significant speed-up.
This attest that re-purposing standard asymmetric coprocessor to speed-
up lattice-based cryptography is of interest especially in a context of hybrid
cryptography deployment.

References

1. Albrecht, M.R., Hanser, C., Hoeller, A., Pöppelmann, T., Virdia, F., Wallner, A.:
Implementing RLWE-based Schemes Using an RSA Co-Processor. IACR Transac-
tions on Cryptographic Hardware and Embedded Systems pp. 169–208 (2019)
2. ANSSI: Technical position paper - ANSSI views on the Post-Quantum
Cryptography transition, available at https://www.ssi.gouv.fr/publication/
anssi-views-on-the-post-quantum\-cryptography-transition/
3. Barrett, P.: Implementing The Rivest Shamir And Adleman Public Key Encryption
On A Standard Digital Signal Processor. CRYPTO’ 86. CRYPTO 1986. Lecture
Notes in Computer Science, vol 263. Springer, Berlin, Heidelberg. pp. 1156–1158
(1986)
20 A. Greuet, S. Montoya, C. Vermeersch

4. Bos, J., Ducas, L., Kiltz, E., Lepoint, T., Lyubashevsky, V., Schanck, J.M.,
Schwabe, P., Seiler, G., Stehlé, D.: Crystals – kyber: a cca-secure module-lattice-
based kem. Cryptology ePrint Archive, Report 2017/634 (2017)
5. Bos, J.W., Renes, J., van Vredendaal, C.: Post-quantum cryptography with con-
temporary co-processors. USENIX (2021)
6. BSI: Migration zu Post-Quanten-Kryptografie - Handlungsempfehlungen des BSI
7. Chen, C., Danba, O., Hoffstein, J., Hülsing, A., Rijneveld, J., M.Schank, J.,
Schwabe, P., Whyte, W., Zhang, Z.: NTRU (2020)
8. for Cryptography Research, C.A.: National cryptographic algorithm design com-
petition (2018)
9. D’Anvers, J.P., Karmakar, A., Roy, S.S., Vercauteren, F.: Saber: Module-lwr
based key exchange, cpa-secure encryption and cca-secure kem. Cryptology ePrint
Archive, Report 2018/230 (2018)
10. Greuet, A., Montoya, S., Renault, G.: Speeding-up ideal lattice-based key exchange
using a RSA/ECC coprocessor. IACR Cryptol. ePrint Arch. p. 1602 (2020)
11. Harvey, D.: Faster polynomial multiplication via multipoint kronecker substitution
(2007)
12. Kronecker, L.: Grundzüge einer arithmetischen theorie der algebraischen grössen.
(abdruck einer festschrift zu herrn e. e. kummers doctor-jubiläum, 10. september
1881.). Journal für die reine und angewandte Mathematik 92, 1–122 (1882)
13. Menezes, A.J., Van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryp-
tography. CRC press (2018)
14. Moody, D.: Post-Quantum Cryptography NIST’s Plan for the Future (2016)
15. Moody, D., Alagic, G., Apon, D.C., Cooper, D.A., Dang, Q.H., Kelsey, J.M., Liu,
Y.K., Miller, C.A., Peralta, R.C., Perlner, R.A., Robinson, A.Y., Smith Tone, D.C.,
Alperin Sheriff, J.: Status Report on the Second Round of the NIST Post-Quantum
Cryptography Standardization Process. Tech. rep., National Institute of Standards
and Technology (Jul 2020)
16. Nussbaumer, H.J.: Number Theoretic Transforms, pp. 211–240. Springer Berlin
Heidelberg, Berlin, Heidelberg (1982)
17. Shor, P.W.: Polynomial-Time Algorithms for Prime Factorization and Discrete
Logarithms on a Quantum Computer. SIAM J. Comput. 26(5), 1484–1509 (Oct
1997)
18. Wang, B., Gu, X., Yang, Y.: Saber on ESP32. In: Conti, M., Zhou, J., Casalicchio,
E., Spognardi, A. (eds.) Applied Cryptography and Network Security. pp. 421–440.
Springer International Publishing, Cham (2020)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy