0% found this document useful (0 votes)

27 views20 pages

Modular Polynomial Multiplication Using RSA/ECC Coprocessor: Keywords

This document discusses optimizing modular polynomial multiplication for lattice-based cryptography using RSA/ECC coprocessors, which are typically designed for integer operations. The authors build on previous work by enhancing the evaluation, radix conversion, and modular reductions, enabling efficient polynomial multiplication in constrained embedded devices. The study assesses the performance of their algorithms on various lattice-based schemes that are finalists in the NIST standardization process.

Uploaded by

Charles Caskey Siliwonde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views20 pages

Modular Polynomial Multiplication Using RSA/ECC Coprocessor: Keywords

Uploaded by

Charles Caskey Siliwonde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Modular Polynomial Multiplication Using

RSA/ECC coprocessor

Aurélien Greuet1 , Simon Montoya1,2 , and Clémence Vermeersch1

1
IDEMIA, Cryptography & Security Labs, Courbevoie, France.
firstname.lastname@idemia.com
2
LIX, INRIA, CNRS, École Polytechnique, Institut Polytechnique de Paris, France.
firstname.lastname@lix.polytechnique.fr

Keywords: Post-Quantum Lattice-based Cryptography · Modular Polynomial

Multiplication · Embedded devices

Abstract. Modular polynomial multiplication is a core and costly oper-

ation of ideal lattice-based schemes. In the context of embedded devices,
previous works transform the polynomial multiplication to an integer
one using Kronecker substitution. Then thanks to this transformation,
existing coprocessors which handle large-integer operations can be re-
purposed to speed-up lattice-based cryptography. In a nutshell, the Kro-
necker substitution transforms by evaluation the polynomials to integers,
multiplies it with an integer multiplication and gets back to a polynomial
result using a radix conversion. The previous work focused on optimiza-
tion of the integer multiplication using coprocessor instructions. In this
work, we pursue the seminal research by optimizing the evaluation, radix
conversion and the modular reductions modulo q with today’s RSA/ECC
coprocessor. In particular we show that with a RSA/ECC coprocessor
that can compute addition/subtraction, (modular) multiplication, shift
and logical AND on integers, we can compute the whole modular poly-
nomial multiplication using coprocessor instructions. The eﬀiciency of
our modular polynomial multiplication depends on the component spec-
ification and on the cryptosystem parameters set. Hence, we assess our
algorithm on a chip for several lattice-based schemes, which are finalists
of the NIST standardization. Moreover, we compare our modular poly-
nomial multiplication with other polynomial multiplication techniques.

1 Introduction
In the next few years, a quantum computer powerful enough to run Shor’s al-
gorithm [17] could emerge. Such a computer can break the entire cryptography
based on the hardness of integer factorization and discrete logarithm like RSA
or Elliptic Curve Cryptography (ECC). Due to this potential threat, national
agencies started to study new proposals (e.g. [6]) and initiated standardization of
quantum safe algorithms [14,8]. The most followed standardization by the com-
munity is the one of the National Institute of Standards and Technology (NIST),
which was launched in 2016 [14]. This standardization aims to bring together an
2 A. Greuet, S. Montoya, C. Vermeersch

important part of the community to determine future Key Encapsulation Mech-

anisms (KEMs) and signatures standards. In July 2020, the third round of this
standardization started with seven finalists remaining, including four KEMs and
three signatures. Among these seven finalists, five are based on lattice or assim-
ilated problems [15]. Hence, the international community around post-quantum
cryptography is very likely to include lattice-based standards. Therefore, opti-
mizing and ensuring practical security of these cryptosystems is an important
area of research.
Post-quantum cryptography will be deployed on embedded devices. On such
devices, the amount of RAM or the CPU frequency are very limited: less than 60
kB of RAM and less than 100 MHz. Therefore, implementing eﬀicient cryptosys-
tems in these constrained environments is a real challenge. In order to speed-up
the cryptographic algorithms, these devices may embed additional hardware
coprocessors for symmetric and asymmetric cryptographic computations. More-
over, these coprocessors can provide additional security features as hardware and
software security against faults and side-channel leakage. Most of the asymmetric
coprocessors currently deployed are designed for the ECC or RSA schemes and
not for lattice-based cryptosystems. However, the underlying arithmetic of these
cryptosystems can be tweaked with the purpose of using an arithmetic close to
the one used on RSA/ECC schemes. Therefore, re-purposing such asymmetric
coprocessors is interesting to optimize lattice-based schemes and to facilitate the
transition in the post-quantum world. Indeed, the easier the transition, the more
it will be used and deployed.

Motivations & previous works. Lattice-based cryptography is believed to be a

promising direction to provide eﬀicient and secure post-quantum algorithms.
One of the main operation in these schemes is modular polynomial multiplica-
tion. Researches have been conducted in the way of optimizing the polynomial
multiplication operation using specific software instructions or by designing a
specific hardware. However, most of the polynomial multiplication optimization
are intended to ARM-Cortex M4, or less frequently to ARM-Cortex M3 CPU
architecture. The ARM CPU is powerful and has a larger panel of interesting
assembly instruction. However, all the embedded systems do not have such a
powerful CPU and base their cryptographic eﬀiciency on the additional copro-
cessor.
Moreover, the transition period should rely on hybrid cryptography which
is the combination of a post-quantum algorithm and a classical one. Hence,
such cryptography is both secure against quantum attacks, thanks to the post-
quantum part, and secure against classical attacks, with at least the same se-
curity level as a pure classical crypto algorithm. Several governmental agen-
cies (NIST, ANSSI, BSI) recommend and will impose in a few years the use
of hybrid cryptography for long term security certification [2,6]. In this context,
re-purposing the current asymmetric coprocessors to optimize the modular poly-
nomial multiplication is of interest in terms of costs, ease of deployment and to
propose optimization for a wide range of components.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 3

The seminal work of Albrecht et al. in [1] re-purposes a RSA/ECC coproces-

sor to optimize polynomial multiplication on Kyber algorithm. To do so, they
use techniques introduced in [11] which transform polynomial multiplication to
an integer one using the Kronecker substitution [12]. Afterwards, another work
in [18] adapts the previous technique on Saber algorithm.
The work of Bos et al. in [5] introduced Kronecker+, a generalization of
the Kronecker substitution used by Albrecht et al. in [1]. This generalization
allows trade-off between number of integer multiplications, size of the integers
and the number of polynomial evaluations. Depending on the component and
coprocessor specifications, Kronecker+ allows a faster polynomial multiplication
than Kronecker substitution.
In [10], the authors provide a variant of the Kronecker substitution and an
adaptation of the schoolbook multiplication to perform hardware polynomial
multiplication. Depending on the RSA/ECC coprocessor specifications, one of
these algorithms can outperform the classical Kronecker substitution.

Our contribution. This work aims to perform modular polynomial multiplication

in Rq,δ = Zq [X]/(X N + δ) using a RSA/ECC coprocessor, where δ ∈ {−1, 1}.
These rings are the most used by the lattice-based finalists of the NIST stan-
dardization.
The contemporary asymmetric coprocessor can perform integer operations
and not polynomial ones. As we have seen previously, most techniques to repur-
pose current coprocessor to optimize polynomial multiplication on embedded
devices are based on the Kronecker substitution. In Rq,δ this substitution can
be summarized in four steps:
1. Convert polynomials in Rq,δ to integers in N of bit size bitsize. When
polynomials have coefficients with a negative representation, this conversion
requires additional operations.
2. Modular integer multiplication mod 2bitsize + δ of the obtained integers.
3. Convert back integer multiplication result to a polynomial in Z[X]/(X N +
δ). Like Step 1, if the initial polynomials have coefficients with a negative
representation this conversion requires additional operations.
4. Reduce the coefficients modulo q to have result over Rq,δ .
All the previous works re-purpose the coprocessor only to optimize Step 2.
All the other steps are implemented in software without the use of coprocessor.
In this work for most of the previous steps, we describe algorithms which
allow to re-purpose existing coprocessor. Our work focuses on two mains contri-
butions:
– Handle negative evaluation and radix conversion using RSA/ECC coproces-
sor (Steps 1 and 3).
– Perform modular reduction of the coefficients modulo q with a RSA/ECC
coprocessor (Step 4).
These improvements are possible only if the coprocessor can handle the following
integer operations: addition/subtraction, bitwise AND, logical shift, multiplication
4 A. Greuet, S. Montoya, C. Vermeersch

and modular multiplication. Except the logical AND operation, most of current
asymmetric coprocessors handle these operations. The logical AND is less common
on the current RSA/ECC coprocessor. However adding this operation to an ex-
isting architecture is easier and cheaper than designing a new one for polynomial
multiplication.

Organization. In Section 2 we introduce notations which we use in the rest of the

paper. In Section 3 we present how to perform a polynomial multiplication in
N[X] using the Kronecker substitution. Afterwards, in Section 4 we explain how
to use the coprocessor instructions to perform the Kronecker substitution evalua-
tion and radix conversation, since the polynomials are in Rq,δ = Zq [X]/(X N +δ),
where δ ∈ {−1, 1}. In Section 5 we describe modular reductions modulo q us-
ing coprocessor instructions. Finally, in Section 6 we present the results of our
practical implementations of our algorithms on several lattice-based finalists.

2 Background
RSA/ECC coprocessor. The RSA/ECC coprocessor are designed to speed-up
RSA or elliptic curves cryptosystems. To do so, these components provide a
range of integer operations. In this work, we assume that we have access to
a component which can perform, at least, addition/subtraction, bitwise AND,
logical shift, multiplication and modular multiplication operations.

2.1 Element representation

Integers representation. Let a ∈ N such that 0 ≤ a < 2ℓ . In the following, we
say that a is represented over ℓ bits to mean that a is stored in a machine buffer
of ℓ bits.
′ ′
Let b ∈ Z such that −2ℓ −1 < b < 2ℓ −1 . Let b̃ be the two’s complement
representation of b over ℓ′ bits, defined by:
′ ′
b̃ = 2ℓ + b mod 2ℓ ∈ N

In the following, we say that b is represented over ℓ′ bits to mean that the two’s
complement representation of b is stored in a machine buffer of ℓ′ bits.
Let r be a N ℓ-bit natural ∑number. We denote by ri the i-th digit of r in
N −1
base 2ℓ . In other words, r = i=0 ri 2iℓ with 0 ≤ ri < 2ℓ . We use the following
notation r = (r0 , r1 , . . . , rN −1 )ℓ .

Polynomial representation. Let F (X) = f0 + f1 X + . . . + fN −1 X N −1 ∈ Z[X] of

degree at most N −1. Let f˜i be a two’s complement representation of a coeﬀicient
fi .
Array representation. The usual machine representation of F (X) is an array
where the i-th item is f˜i . To ease the reading, we denote in the following, fi
or f [i] the coeﬀicient associated to the i-th item. Moreover, unless otherwise
specified, a polynomial is represented as an array.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 5

Packed integer representation. A packed integer representation of F (X) is

the concatenation of all the f˜i into a buffer.

f = f˜N −1 | . . . |f˜1 |f˜0 ∈ N

In this work, this representation is used to represent polynomials into a natural

number. Afterwards, the polynomial arithmetic is carried out with operations
on this natural number.

2.2 Notations
Z [X]
Rings. Let q be an integer. Denote by Rq,δ the polynomial ring (XqN +δ) , where
δ ∈ {−1, 1}. We represent an element F (X) ∈ Rq,δ as a polynomial of degree
−
at most N − 1 with coeﬀicients in {0, . . . , q − 1}. Rq,δ denotes the elements of
Rq,δ represented by a polynomial of degree at most N − 1 with coeﬀicients in
{− 2q − 1, . . . , 2q }.

Integer operations. In the sequel, the algorithms are described using the following
notations. Their purpose is to clarify the size of the manipulated operands.

– Let add(a,b,bitlen) (resp. sub(a,b,bitlen)) be the addition (resp. subtrac-

tion) between a and b. The values a and b are represented over bitlen bits.
– Let lshift(a,k,bitlen) (resp. rshift(a,k,bitlen)) be the left (resp. right)
shift a << k (resp. a >> k) over bitlen bits.
– Let and(a,b,bitlen) be the AND operation a&b over bitlen bits.
– Let mult(a,b,bitlena ,bitlenb ) be the integer multiplication a × b where a
(resp. b) is represented on bitlena (resp. bitlenb ) bits.
– Let modMult(a,b,bitlena ,bitlenb ,p) be the integer modular multiplica-
tion a×b mod p where a (resp. b) is represented on bitlena (resp. bitlenb )
bits.

Concatenation. Let (ℓ, k, N ) ∈ N3 with ℓ ≤ k and m ∈ N represented over ℓ bits.

In the following we denote by concat(m, k, N ) the function that represents m
on k bits and concatenates this new representation N times. Formally:
N∑−1
concat(m, k, N ) = m2jk ∈ N
j=0

Example 1. Let m = 1 then concat(m, 8, 3) = 0x10101.

Integer to polynomial. Let (ℓ, k, N ) ∈ N3 , ℓ > k and F (X) = f0 + . . . +

fN −1 X N −1 ∈ Z[X]. For all i, let f˜i be the two’s complement representation
of fi over k bits. We denote by:
∑
N −1
f = polyToN(F (X), k, ℓ) = f˜i 2iℓ , f ∈ N
i=0
Let g = (g0 , g1 , . . . , gN −1 )ℓ ∈ N a N ℓ-bit number.
6 A. Greuet, S. Montoya, C. Vermeersch

∑
N −1
G(X) = NtoPoly(g, ℓ) = gi X i
i=0
The obtained polynomial G(X) belongs to N[X] and its degree is at most N − 1.

Example 2. Let F (X) = f2 X 2 + f1 X + f0 = 2X 2 + 4X − 2. Let f˜0 = 0xE, f˜1 =

0x4, f˜2 = 0x2, be representations of all fi over 4 bits. Then,
f = polyToN(F (X), 4, 8) = 0x02040E and NtoPoly(f, 8) = 2X 2 + 4X + 14

3 Multiplication in N[X] using Kronecker substitution

The Kronecker substitution was first introduced in [12]. We give here the main
steps of this substitution. The idea of this substitution is to transform a poly-
nomial multiplication to an integer one by evaluating the polynomials and get
back to the result using a radix conversion. In the context of embedded devices,
this transformation is of interest to perform polynomial multiplication by using
the RSA/ECC coprocessor. Indeed, such coprocessor handles multiplication on
integers. In this section we assume that our polynomials are defined over N[X].

3.1 Kronecker substitution

The Kronecker substitution multiplies two polynomials F (X) and G(X) using
an integer multiplication. This substitution can be summarized in three steps:

1. Evaluation of F (X) and G(X) at 2ℓ . The value ℓ is chosen such that all the
coeﬀicients after the polynomial
( ℓ )multiplication
( ℓ) are lower than 2ℓ .
2. Integer multiplication r = F 2 G 2 , r ∈ N.
3. Get back to polynomial R(X) ∈ N[X] using radix conversion on r.

Evaluation. The first step of the Kronecker substitution is the polynomial eval-
uation at 2ℓ . Since F (X) has coeﬀicients in N represented over k bits:
( )
Evaluation≥0 (F (X), k, ℓ) := F 2ℓ = polyToN(F (X), k, ℓ) (1)

Example 3. Let F (X) = 2X 2 + X + 3 then, F (28 ) = 0x020103

= Evaluation≥0 (F (X), 2, 8)

Evaluation point. Let R(X) = F (X)G(X) where F (X), G(X) ∈ N[X] of degree
at most N − 1. The evaluation point 2ℓ is chosen such that for all i ≤ 2(N − 1):

ri ≤ max (fj ) max (gj )N < 2ℓ

j∈{0,...,N −1} j∈{0,...,N −1}

By the fact that all the coeﬀicients are non-negative, this evaluation is only a
representation of all the fi over ℓ bits. Then in an implementation, the evaluation
does not require arithmetic operations.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 7

Radix Conversion. Radix conversion aims to transform an integer into a poly-

nomial. Let f = (f0 , . . . , fN −1 )ℓ ∈ N, then:

F (X) = f0 + . . . + fN −1 X N −1 := Radix Conversion≥0 (f ) = NtoPoly(f, ℓ)

(2)

Example 4. Let f = 0x020103 then F (X) = 2X 2 + X + 3

= Radix Conversion≥0 (f )

The radix conversion converts a packed integer representation to an array

one. Like the evaluation algorithm, in an implementation, the radix conversion
does not require arithmetic operation.

Example of Kronecker substitution.

Example 5. Let F (X) = 2X 2 + X + 3 and G(X) = X 2 + 1. Then,

F (28 ) = 0x020103 = Evaluation≥0 (F (X), 2, 8)

G(28 ) = 0x010001 = Evaluation≥0 (G(X), 2, 8)

Afterwards we multiply the evaluated polynomials r = F (28 )G(28 )

= 0x201050103. Finally we obtain R(X) = Radix Conversion≥0 (r) = 2X 4 +
X 3 + 5X 2 + X + 3.

4 Multiplication in Rq,δ using Kronecker substitution

In the previous section we perform polynomial multiplication as an integer one

with polynomials in N[X]. However, in the lattice-based schemes some poly-
nomials, mainly the secret ones, have coeﬀicients with a negative representa-
tion close to 0. Moreover, the reduction modulo X N + 1 can also bring nega-
tive coeﬀicients. Then in this section we focus on polynomial multiplication in
Rq,δ = Zq [X]/(X N + δ). In Rq,δ , the polynomial multiplication using Kronecker
substitution is achieved as follows:

– Evaluation of polynomials considering negative coeﬀicients.

– Integer multiplication modulo 2N ℓ + δ. The modular reduction ensures that
after radix conversion the polynomial result is reduced modulo X N + δ.
– Radix conversion to obtain a polynomial in Z[X]/(X N + δ).
– Reduction modulo q of the polynomial coeﬀicients.

Previous works [1,10,5] already achieve the evaluation and the radix conver-
sion with negative coeﬀicients. However, these algorithms are done using array
representations. In this section we describe a way to realize these algorithms
when the coeﬀicients are on a packed integer representation. The main advan-
tage of this representation is that it allows the use of existing coprocessor.
8 A. Greuet, S. Montoya, C. Vermeersch

Negative representation. Our goal is to perform polynomial multiplication over

Rq,δ . Then, a way to avoid the negative coefficients is to represent them with
a non-negative representation over Rq,δ . However, the negative coefficients are
close to 0, then the closest non-negative representation is nearby q. This involves
that the evaluation point must be higher and then the integer operation are done
on much larger integer; see [10] for more details on the impact on the evaluation
point. Thus, for the sake of efficiency we use, when possible, our algorithms with
the negative representation.

4.1 Evaluation with negative coeﬀicients.

−
Let F (X) = f0 + f1 X + . . . + fN −1 X N −1 ∈ Rq,δ and f˜i be the two’s complement
representation over k bits of fi . Our goal is to evaluate F (X) at 2ℓ where ℓ > k,
then for i = 0 to N − 1:
– If fi ≥ 0, then we only have to represent it on ℓ bits (as in Section 2).
– If fi < 0, then we have to represent it with a two’s complement over ℓ bits
and propagate a borrow to the next coeﬀicient. To obtain a two’s complement
representation from k bits to ℓ bits, we compute:

f˜i + (2ℓ − 2k ) = 2k + fi + (2ℓ − 2k ) = 2ℓ + fi

The Algorithm 1 computes the two’s complement representation of the polyno-

mial evaluation when the coeﬀicients are in Z. More precisely, this evaluation
is done using arithmetic operations on a packed integers representation. To do
so, we first represent the polynomial coeﬀicients into a packed integers form,
as defined in Equation 1. Afterwards, we use arithmetic operations in order to
convert the two complement’s representation from k to ℓ bits and to propagate
the required borrows.

Algorithm 1 Evaluation
−
Input: F (X) ∈ Rq,δ , k, ℓ ∈ N where ℓ > k.
( )
Output: f˜ ∈ N the two’s complement representation of F 2ℓ mod 2N ℓ
1: mask ← concat(1, ℓ, N ) //Precomputed
2: f˜ ← polyToN(F (X), k, ℓ)
3: neg ← rshift(f˜, k − 1, N ℓ)
4: neg ← and(neg, mask, N ℓ) // Detect negative coeﬀicients
5: tmp ← mult(neg, 2ℓ − 2k , N ℓ, 32)
6: f˜ ← add(f˜, tmp, N ℓ) // Two’s complement representation of each coeff over ℓ
bits
7: neg ← lshift(neg, ℓ, N ℓ)
8: f˜ ← sub(f˜, neg, N ℓ) // Borrow propagation
9: return f˜

Remark 1. The value mask is always the same for a fixed scheme. Then, this
integer can be precomputed and stored in Non-Volatile Memory (NVM).
Modular Polynomial Multiplication Using RSA/ECC coprocessor 9

Remark 2. The Evaluation

( ) (Algorithm 1) returns the two’s complement rep-
resentation of F 2ℓ mod 2N ℓ . This implies:
( ) ( )
– If F 2ℓ ≥ 0, then the returned value is equal to F 2ℓ .
( )
– Otherwise, the returned value is not equal to F 2ℓ . This case occurs when
the latest non-zero coeﬀicient of F (X) is negative.

To obtain the expected result after the Kronecker Substitution, the last case
requires additional operations before the radix conversion. These additional op-
erations are described in Section 4.2 paragraph Two’s complement represen-
tation of the evaluated polynomial.

Example 6. Let F (X) = 3X 2 − 2X + 2, where all the coeﬀicients are encoded

with a two’s complement representation over k = 4 bits. Let N = 3 and ℓ = 8.
The expected result is F (28 ) = 0x02FE02. This is obtained with Evalua-
tion(F (X), k, ℓ):

1. mask ← concat(1, 8, 3) = 0x010101

2. f˜ ← polyToN(F (X), 4, 8) = 0x030E02
3. neg ← rshift(f˜, 4 − 1, 3 × 8) = 0x0061C0
4. neg ← and(neg, mask, 3 × 8) = 0x000100
5. tmp ← mult(neg, 28 − 24 , 3 × 8, 32) = 0x00F000
6. f˜ ← add(f˜, tmp, 3 × 8) = 0x03FE02
7. neg ← lshift(neg, 8, 3 × 8) = 0x010000
8. F (28 ) ← sub(f˜, neg, 3 × 8) = 0x02FE02

Evaluation point. Let R(X) = F (X)G(X) where F (X), G(X) ∈ Rq,δ . The eval-
uation point 2ℓ is chosen such that for all i ≤ 2(N − 1):

ri ≤ max (|fj |) max (|gj |)N < 2ℓ−1

j∈{0,...,N −1} j∈{0,...,N −1}

4.2 Radix Conversion with negative coeﬀicient representation.

As mentioned in [1,10], the radix conversion has to be adapted since some coef-
ficients have negative representations. Two issues arise with the negative coeﬀi-
cients:

1. The evaluation and the integer multiplication propagate borrow between the
polynomial coeﬀicients.
2. The negative evaluation algorithm returns two’s complement representation
over N ℓ bits.

Borrow between the coeﬀicients. The evaluation converts a polynomial to a

packed integers representation. In the following of the Kronecker substitution,
the obtained natural numbers are manipulated regardless the original polyno-
mial structure. Therefore, borrows can be propagated between the coeﬀicients.
10 A. Greuet, S. Montoya, C. Vermeersch

However in order to retrieve the expected polynomial result, the radix conversion
must compensate the propagated borrows by propagating back carries.
Let r̃ = (r̃0 , r̃1 , . . . , r̃N −1 )ℓ ∈ N be the integer that we want to convert to a
polynomial, where for all i, r̃i is a two’s complement representation over ℓ bits
of an integer −2ℓ−1 < ri < 2ℓ−1 . In order to propagate back the carries, we
transform the negative coeﬀicients to non-negative ones by adding a multiple of
our modulus q: maxValue. More precisely, maxValue is the smallest multiple of q
such that for all i, −maxValue ≤ ri < maxValue. Moreover with the parameters
that we use in Section 6, we have maxValue < 2ℓ−1 . Then, by adding maxValue
we got:
– If ri < 0, then 2ℓ ≤ r̃i + maxValue = 2ℓ + ri + maxValue. Therefore a carry
is propagated to r̃i+1 .
– If ri ≥ 0, then r̃i + maxValue = ri + maxValue < 2ℓ .
After adding maxValue, the values ri are considered as natural numbers repre-
sented over ℓ bits. Then, the expected polynomial is obtained by using the radix
conversion algorithm defined in Equation 2 on r̃.
This negative to non-negative conversion is possible because the polynomial
multiplication is done over Rq,δ . Indeed after reduction modulo q, the added
value maxValue is equal to 0.

Two’s complement representation of the evaluated polynomial. The second issue

is due to the two’s complement representation of the evaluated polynomial.
−
Let F (X) = f0 + . . . + fN −1 X N −1 ∈ Rq,δ of degree N − 1 and ℓ ∈ N.
Then Algorithm 1 returns the integer f ← Evaluation(F
( ) (X), k, ℓ), that is
the two’s complement representation of F 2ℓ mod 2N ℓ . Two cases are to be
distinguished:
( )
– fN −1 > 0, then f = F 2ℓ ∈(N. ) ( )
– fN −1 < 0, then f = 2N ℓ + F 2ℓ is the two’s complement of F 2ℓ modulo
2N ℓ .
Only the second case will lead to a wrong
( ) result after the modular multiplication.
Indeed, let g ∈ N and f = 2N ℓ + F 2ℓ we got:
( ) ( ) ( ) ( )
r mod 2N ℓ + δ = f g mod 2N ℓ + δ = 2N ℓ g + F 2ℓ g mod 2N ℓ + δ
( ) ( )
̸= F 2ℓ g mod 2N ℓ + δ

Then in this case, before the radix conversion we must add or subtract g to
r, depending on δ:
( ) ( )
– δ = 1 : 2N ℓ g mod 2N ℓ + 1 = −g mod 2N ℓ + 1 , then
( ) ( ) ( )
r + g mod 2N ℓ + 1 = F 2ℓ g mod 2N ℓ + 1
( ) ( )
– δ = −1 : 2N ℓ g mod 2N ℓ − 1 = g mod 2N ℓ − 1 , then
( ) ( ) ( )
r − g mod 2N ℓ − 1 = F 2ℓ g mod 2N ℓ − 1
Modular Polynomial Multiplication Using RSA/ECC coprocessor 11

Previously, we supposed that at most one polynomial can have negative coeﬀi-
cients. In case of lattice-based schemes, this is always the case.

Algorithm 2 Radix Conversion

Input: r, g, maxValue ∈ N, and sign ∈ {0, 1}
Output: R(X) ∈ N[X]/(X N + δ)
1: max ← concat(maxValue, ℓ, N ) //Can be precomputed
2: if sign eq 1 then
3: if δ eq 1 then r ← add(r, g, N ℓ) // To handle negative last coeff
4: else r ← sub(r, g, N ℓ)
5: else
6: if δ eq 1 then dummy ← add(r, g, N ℓ) // For isochrony
7: else dummy ← sub(r, g, N ℓ)
8: end if
9: r ← add(r, max, N ℓ) // Add maxValue to each coeﬀicient
10: R(X) ← Radix Conversion≥0 (r)

4.3 Multiplication in Rq,δ using coprocessor

The Sections 4.1 and 4.2 are used to obtain a polynomial multiplication algo-
rithm in Rq,δ using, mainly, a packed integer representation. More precisely,
except for the modular reductions modulo q, the operations are done using this
representation. All operations performed on the packed integers representation
can be achieved with coprocessor as defined in Section 2.
The Polynomial Multiplication in Rq,δ algorithm is described in Algorithm
3.

Algorithm 3 Polynomial Multiplication in Rq,δ

−
Input: (F (X), G(X)) ∈ (Rq,δ , Rq,δ ) of degree N − 1. Let k, ℓ, q ∈ N where ℓ > k, and
maxValue defined as above.
Output: R(X) = F (X)G(X) ∈ Rq,δ
1: f ←
( Evaluation(F
) (X), k, ℓ)
2: G 2ℓ ← Evaluation≥0 (G(X), k, ℓ)
( )
3: r ← modMult(f, G 2ℓ , N ℓ, N ℓ, 2N ℓ + δ)
4: b ← sign(F [N − 1]) ( )// if FN −1 < 0 then b = 1, otherwise b = 0.
5: R(X) ← Radix Conversion(r, G 2ℓ , maxValue, b)
6: R(X) ← R(X) mod q // Any modular reduction
7: return R(X)

In the following section we determine how to perform modular reductions

modulo q using packed integers representation.

5 Reducing coeﬀicients modulo q

In Section 4.3, we perform polynomial multiplication in Rq,δ . However, the re-
duction modulo q is done after the radix conversion on a polynomial representa-
12 A. Greuet, S. Montoya, C. Vermeersch

tion. In this section we show how to perform reduction modulo q using packed
integers representation. As mentioned previously, such representation allows to
repurpose existing RSA/ECC coprocessor.
Let r = (r0 , . . . , rN −1 )ℓ ∈ N. In our context, r is obtained after the two first
steps of the Kronecker substitution: polynomial evaluation and modular integer
multiplication. Moreover, we have added maxValue like in Section 4.2. Then,
each ri is such that for all i: 0 ≤ ri < 2maxValue.
In the following we denote by simultaneous reduction, the fact of reducing all
the ri mod q by performing operations on r.

5.1 Power-of-two modulus

Some of lattice-based schemes, like Saber [9] and NTRU [7], use a power-of-two
modulus. In this context, the simultaneous reduction is easy and fast. Indeed,
the simultaneous reduction is achieved by the computation:

r&concat(q − 1, ℓ, N )

5.2 Prime modulus

Kyber [4] is a lattice-based KEM which perform polynomial multiplication over

Rq,δ , where q is a prime number. In this section we adapt Barrett [3] reduction
to perform simultaneous reduction.

Barrett The Barrett reduction is introduced in [3]. The main idea is to pre-
compute an approximation of a division and use it to perform modular re-
duction. Let α, β ∈ Z and a ∈ N be an integer to reduce
⌊ k+αmodulo
⌋ q ∈ N of
bit-length k ∈ N. Barrett reduction precomputes m = 2 q and computes:
a′ = a − [((a >> (k + β)) · m) >> (α − β)] q
A special case is when α = β, therefore the computation is
a′ = a − [a >> (k + β)] · m · q
In this case, only one shift and one multiplication is performed (m · q is
precomputed).
⌊ ⌋
Depending on the parameters (α, β), a′ = a mod q + tq where 0 ≤ t < aq .
Further details on the Barrett algorithms are given in [13].

Simultaneous modular reduction. To adapt this reduction to simultaneous re-

duction we need to perform logical AND after the shift operations. Indeed, be-
cause of the shift operations, noise coming from coeﬀicient i + 1 can overflow on
Modular Polynomial Multiplication Using RSA/ECC coprocessor 13

the coeﬀicient i. The Algorithm 4 describes the simultaneous Barrett reduction.

Algorithm 4 Simult. Barrettα,β

⌊ ⌋
2k+α
Input: r = (r0 , . . . , rN −1 )ℓ ∈ N. Let q ∈ N of bit-length k and m = q
.
Output: r′ = (r0′ , . . . , rN
′
−1 )ℓ ∈ N, all ri are reduced with Barrett reduction
1: mask ← concat(2 ℓ−α+β
− 1, ℓ, N ) // Can be precomputed
2: mask′ ← concat(2ℓ−k−β − 1, ℓ, N ) // Can be precomputed
3: tmp ← rshift(r, k + β, N ℓ)
4: tmp ← and(tmp, mask′ , N ℓ)
5: tmp ← mult(tmp, m, N ℓ, 32) // Mult between a word and a large integer
6: tmp ← rshift(tmp, α − β, N ℓ)
7: tmp ← and(tmp, mask, N ℓ)
8: tmp ← mult(tmp, q, N ℓ, 32) // Mult between a word and a large integer
9: r′ ← sub(r, tmp, N ℓ)
10: return r′

Final reduction Using the simultaneous Barrett Algorithm 4, the returned

result r′ = (r0′ , . . . , rN
′ ′ ′
−1 )ℓ ∈ N is such that, for all i, ri = ri mod q + ti q. With
the parameters sets that we use in Section 6, for all i, ti ∈ {0, 1, 2}.
Let k and c such that q = 2k − c. Then ri′ ≥ 2q if and only if ri′ + 2c has
its (k + 1)-th bit equal to one. This fact is used in Algorithm 5 to detect and
subtract q to coeﬀicients ≥ 2q in a packed integers representation.
After using the Algorithm 5, the ri′′ are bounded by 2q. In that case, this
algorithm can be adapted replacing 2c by c (line 1) and k + 1 by k (line 3). It
follows that q is subtracted from each ri′′ ≥ q. Afterwards, each ri′′ is necessary
lower than q.

Algorithm 5 Simult. Conditional Subtraction

Input: r′ = (r0′ , . . . , rN
′ ′
−1 )ℓ with all 0 ≤ ri < 3q, where q = 2 − c, ℓ, N ∈ N.
k
′′ ′′ ′′ ′′
Output: r = (r0 , . . . , rN −1 )ℓ with all 0 ≤ ri < 2q
1: (C, mask) ← (concat(2c, ℓ, N ), concat(1, ℓ, N )) //Can be precomputed
2: tmp ← add(r′ , C, N ℓ) //Raised the k + 1-th bit in each coeff
3: tmp ← rshift(tmp, k + 1, N ℓ) // Move the k + 1-th bit to position 0 in each coeff
4: tmp ← and(tmp, mask, N ℓ) // Detect the coeff ≥ 2q
5: tmp ← mult(tmp, q, N ℓ, 32) // Mult between a word and a large integer
6: r′′ ← sub(r′ , tmp, N ℓ) // Subtract q to each coeff ≥ 2q
7: return r′′

5.3 Modular polynomial multiplication using coprocessor

The Algorithm 6 performs polynomial multiplication in Rq,δ using operations on

packed integers representation. All operations performed on this representation
can be achieved with coprocessor as defined in Section 2.
14 A. Greuet, S. Montoya, C. Vermeersch

Algorithm 6 Modular Polynomial Multiplication

−
Input: (F (X), G(X)) ∈ (Rq,δ , Rq,δ ) of degree N − 1. Let k, ℓ, q ∈ N where ℓ > k, and
maxValue defined as above.
Output: R(X) = F (X)G(X) ∈ Rq,δ
( ←( concat(maxValue,
1: max )) ℓ, N ) // Precomputed
2: f, G 2ℓ ← (Evaluation(F (X), ℓ), Evaluation≥0 (G(X), ℓ))
( )
3: r ← modMult(f, G 2ℓ , N ℓ, N ℓ, 2N ℓ ± δ)
4: b ← sign(f [N − 1])
5: if b eq 1 then ( )
6: if δ eq 1 then r ← sub(r, G 2ℓ , N ℓ) // To handle negative last coeff
( ℓ)
7: else r ← add(r, G 2 , N ℓ)
8: else ( )
9: if δ eq 1 then dummy ← sub(r, G 2ℓ , N ℓ) // For isochrony
( ℓ)
10: else dummy ← add(r, G 2 , N ℓ)
11: end if
12: r ← add(r, max, N ℓ) //Negative to non negative representation for all ri′
k
13: if q eq 2 then
14: mask′ ← concat(2k − 1, ℓ, N )
15: r ← and(r, mask′ , N ℓ)
16: else
17: r ←Simult. Barrett(r)
18: r ← Simult. Cond. Sub.(r, ℓ, N ) // Can be applied twice if some ri ≥ 2q
19: end if
20: R(X) ← Radix Conversion(r)
21: return R(X)

The Modular Polynomial Multiplication Algorithm 6 works as follows:

1. Line 2: Polynomial evaluations defined in Equation 1 and Algorithm 1.

2. Line 3: Modular integer multiplication modulo 2N ℓ + δ of the evaluated
polynomials.
3. Line 4 to 11: Handle the two’s complement representation of the evaluated
polynomial; see Section 4.2.
4. Line 12: Convert the negative representation to non negative one; see Section
4.2. This operation allows to perform simultaneous reduction mod q and
radix conversion.
5. Line 13 to 19: Perform simultaneous reduction mod q. This ensures that the
polynomial result has coeﬀicients reduced mod q.
6. Line 20: Radix conversion defined in Equation 2 to obtain a polynomial
result.

6 Applications and Results

In this section, after some preliminaries, the component on which we performed

our experiments and the results obtained by implementing the Modular Poly-
Modular Polynomial Multiplication Using RSA/ECC coprocessor 15

nomial Multiplication (MPM), cf. Algorithm 6, and another polynomial multi-

plication depending of the evaluated scheme. The evaluated lattice-based algo-
rithms are: Kyber, Dilithium, NTRU, and Saber.

6.1 Background

NTT NTT is an algorithm allowing to perform fast polynomial multiplication in

Rq,1 [16]. Given a and b ∈ Rq,1 , a×b is computed as NTT−1 (NTT (a) ◦ NTT (b)),
where ◦ is the coeﬀicient-wise multiplication.
Theoretically, NTT has the best asymptotic complexity for multiplication
in Rq,1 . However, in constrained environments (e.g. smart cards), devices may
have dedicated hardware to perform fast large-integer arithmetic. In this context,
NTT can be outperformed by an algorithm relying on integer arithmetic, even
if its theoretical complexity is worse than NTT.

Subdivision RSA/ECC coprocessors perform integer arithmetic with data in

buffer size has a fixed limit. In our context after polynomial evaluation, the re-
sulting integer is generally too large to fit in these buffers. In that case we use
multiple-precision arithmetic. This arithmetic consists of dividing the manipu-
lated integers into several smaller ones and then perform operations on these
smaller integers.
In the case of integer multiplication we use two techniques to divide the
integer multiplication into smaller ones: Karatsuba and Schoolbook. Let f =
fI + fS 2N ℓ/2 and g = gI + gS 2N ℓ/2 , where fI , fS , gI , gS are lower than 2N ℓ/2 .

Schoolbook: f g = fI gI + (fI gS + fS gI )2N ℓ/2 + 2N ℓ fS gS

Karatsuba: f g = fI gI + ((fI + fS )(gI + gS ) − fI gI − fS gS )2N ℓ/2 + 2N ℓ fS gS

These techniques can be applied recursively in order to obtain a targeted
integer size. Later on when presenting the results, we specify in a column named
subdivision the multi-precision method that we use for the integer multiplication.

Evaluation point. In our context the Karatsuba subdivision requires to increase

the size of the evaluation point by 1 bit at each subdivision. It is due to the
computation (fI +fS )(gI +gS ). Indeed, this computation is performed on integers
of length twice as small but with values twice as large.
In the following results, the evaluation point is chosen to take into account
the negative coeﬀicients and the Karatsuba subdivisions.

Polynomial distribution The following polynomial multiplications are per-

−
formed between a polynomial G(X) ∈ Rq,δ and F (X) ∈ Rq,δ . More precisely,
the coefficients of G(X) are sampled uniformly in {0, . . . , q − 1} and the coeffi-
cients of F (X) are sampled in a distribution Dσ . Using a distribution Dσ , the
coefficients are represented in {−σ, . . . , 0, . . . , σ}.
16 A. Greuet, S. Montoya, C. Vermeersch

Masked secret polynomial. Most of the time the polynomial using the distribution
Dσ is the secret polynomial. In some use cases, an embedded implementation
must be strongly secured against side-channel attacks. One way to do this is to
mask the secret data. To do so, we split the sensitive data into shares x = x1 +x2
mod q, where x1 , x2 belongs to {0, . . . , q −1}, and then we process the operations
on each share separately. In our context the value q is much larger than the secret
distribution. Therefore, that implies we will manipulate larger secret data and
then it increases the evaluation point. For some assessments, in order to consider
this security requirement, we suppose that the polynomial F (X) is defined over
Rq,δ and its coeﬀicients are sampled uniformly in {0, . . . , q − 1}. In the following
results, we denote this case by Uq distribution.
In the following results, we only specify the distribution of F (X).

Target Assessments are done on a smart card component using a 32-bit ar-
chitecture. In the following we refer to this device as Component A. Due to
intellectual properties reasons, the component name or a detailed description
cannot be given. Then, we only give the main characteristics of the component
A:

– Standards 32-bit instructions (add, sub, shifts, bitwise and, xor, or, etc.).
– No CPU multiplication and division.
– A coprocessor which handles: logical AND, addition, subtraction, shifts, mod-
ular integer multiplication and the non-modular one.

The following results take into account a complete modular reduction. More-
over like the previous works [1,18,5,10], we assume that the inputs are already
in the appropriate machine representation. This implies that the inputs are in:

– Polynomial representation for NTT, Karatsuba and schoolbook polynomial

multiplication.
– Packed integers representation for the MPM algorithm.

6.2 Results

Kyber Kyber [4] is a lattice-based KEM finalist of the NIST standardization.

The polynomial ring defined in Kyber is Rq,1 = Zq [X]/(X N +1), where q = 3329
and N = 256. The polynomial multiplication used in the specification is the NTT
algorithm. In this context, we have implemented two polynomial multiplications:

– A NTT multiplication. It is adapted from the reference implementation,

in order to use the hardware Montgomery multiplication. Tables of roots
of unity have been recomputed to handle the Montgomery arithmetic with
R = 232 , the smallest handled by the coprocessor, instead of R = 216 . In
addition, the multiplication followed by a Montgomery reduction is replaced
by a call to the coprocessor Montgomery multiplication. In Table 1 we present
timings from the NTT’s implementation.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 17

NTT Pointwise NTT−1

Cycles 98k 40k 106k
Table 1. Kyber NTT cycles on Component A
– The modular polynomial multiplication (MPM) described in Algorithm 6. For
this algorithm we consider two distributions for the polynomial F (X):
• D3 . In this case the modular reduction modulo q is done using Simult.Ba-
rrett11,0 . In order to completely reduce the coeﬀicients we perform 2
final subtractions using the technique described in Section 5.2.
• Uq . In this case the modular reduction modulo q is done using Simult.
Barrett10,10 and then an application of Simult. Barrett13,−2 . After-
wards, a final subtraction is performed using the technique described in
Section 5.2.
In Table 2, we describe the parameters used for MPM algorithms. More precisely,
we describe ℓ such that the evaluation point is 2ℓ , the maximum value to convert
negative coeﬀicients to non-negative ones, the subdivision used and the obtained
cycles.
Distribution ℓ maxValue Subdivision Cycles MPM
D3 23 3qn None 50k
Uq 34 q2 n 2 calls to Karatsuba 67k
Table 2. Parameters and cost of one multiplication in Rq,1 for Kyber parameters
Comparison. The previous results take into account one execution of MPM algo-
rithm and each NTT routine. In order to compare NTT and MPM algorithms,
we must not only compare pointwise routine with MPM algorithm. Indeed, we
must also take into account calls to the NTT and NTT−1 routines. Then, in order
to compare the two polynomial multiplication methods we must determine how
many times each algorithm is called.
The Table 3 describes the number of calls to NTT, pointwise multiplication
and NTT−1 during the Key Generation, Encrypt and Decrypt routines. The
number of calls depends on the Kyber’s security parameters which are k = 2/3/4.
Note that the number of pointwise matches the number of MPM calls.

NTT Pointwise/MPM NTT−1

Key Gen. 2k k2 0
2
Encrypt k k +k k+1
Decrypt k k 1
Table 3. Number of call to NTT routines in Kyber
In order to fairly compare NTT and MPM algorithms we use:
– The oﬀicial specification of Kyber for the NTT algorithm. The private and
public keys are stored in the NTT domain.
– A tweaked version of Kyber for the MPM algorithm. The private and public
keys are not stored in the NTT domain. Therefore, we do not need to apply
NTT−1 to perform MPM algorithm.
18 A. Greuet, S. Montoya, C. Vermeersch

The MPM algorithm is called with the Uq distribution parameters.

Total cycles NTT Total cycles MPM Ratio (NTT/MPM)
k=2
Key Gen. 552k 268k 2
Encrypt 754k 402k 1.9
Decrypt 382k 134k 2.9
k=3
Key Gen. 948k 603k 1.6
Encrypt 1198k 804k 1.5
Decrypt 520k 201k 2.6
k=4
Key Gen. 1424k 1072k 1.3
Encaps 1722k 1340k 1.3
Decrypt 658k 268k 2.5
Table 4. Cycle count for all multiplications in Kyber for the Uq distribution param-
eters

Saber & NTRU

Saber. Saber [9] is a lattice-based KEM finalist of the NIST standardization. The
polynomial ring used in Saber is Rq,1 = Zq [X]/(X N + 1), where N = 256 and
q = 8192 = 213 . In this work we consider two distributions for the polynomial
F (X):

– D5 . Other distributions are used in Saber. However we only describe the

worst one for the MPM algorithm.
– Uq .

Since the modulus is a power of two, the reductions are achieved using a logical
AND with the appropriate mask.

NTRU. NTRU [7] is also a KEM finalist of the NIST competition. The polyno-
Z [X]
mial ring used in NTRU is Rq,−1 = (XqN −1) . The modulus q and the value N
depends on the security parameters. In this work we only consider NTRU HPS
1 parameters, where N = 509 and q = 2048 = 211 .
The value of N does not allow to easily make subdivisions. To overcome
this issue, we work on polynomials with Ñ = 512 coeﬀicients where the latest
coeﬀicients are equal to 0.
In this work, we consider only a Uq distribution. Since q is a power of two,
the modular reductions are performed with a logical AND.

Comparison. The Saber and NTRU MPM algorithms are compared with the poly-
nomial multiplication used in their reference implementations.

– Saber: A combination of a 4-way Toom-Cook and Karatsuba algorithms.

– NTRU: A schoolbook multiplication.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 19

The polynomial multiplication of the reference implementations are achieved

with the 32 bits coprocessor multiplication. The Table 5 describes the obtained
results on Component A.

Distribution ℓ maxValue Subdivision Cycl.MPM Cycl. ref.

Saber
D5 25 5qn None 47k 1405k
Uq 36 q2 n 2 calls to Karatsuba 61k 1405k
NTRU
Uq 34 q2 n 3 calls to Karatsuba 173k 17256k
Table 5. Parameters and cost of one multiplication in Rq,δ for Saber and NTRU
parameters

7 Conclusion

In this paper we pursue the previous works that optimize lattice-based schemes,
by re-purposing today’s RSA/ECC coprocessor. Indeed, we propose an algo-
rithm, called MPM, which performs modular polynomial multiplication using co-
processor instructions. More precisely, our work allow to reprupose existing co-
processor to handle modular reductions and the negative coeﬀicients during the
polynomial multiplication.
Afterwards, we assess in practice the MPM algorithm for almost all NIST
lattice-based finalists. This assessment is done on a component which has few
CPU instruction and that bases the asymmetric cryptographic eﬀiciency on its
RSA/ECC coprocessor. The MPM algorithm is compared to software polynomial
multiplications, as NTT or Karatsuba. The few CPU instruction minimizes the
possible assembly optimization for the software algorithms. Therefore in this
component, our algorithm multiplication brings a significant speed-up.
This attest that re-purposing standard asymmetric coprocessor to speed-
up lattice-based cryptography is of interest especially in a context of hybrid
cryptography deployment.

References

1. Albrecht, M.R., Hanser, C., Hoeller, A., Pöppelmann, T., Virdia, F., Wallner, A.:
Implementing RLWE-based Schemes Using an RSA Co-Processor. IACR Transac-
tions on Cryptographic Hardware and Embedded Systems pp. 169–208 (2019)
2. ANSSI: Technical position paper - ANSSI views on the Post-Quantum
Cryptography transition, available at https://www.ssi.gouv.fr/publication/
anssi-views-on-the-post-quantum\-cryptography-transition/
3. Barrett, P.: Implementing The Rivest Shamir And Adleman Public Key Encryption
On A Standard Digital Signal Processor. CRYPTO’ 86. CRYPTO 1986. Lecture
Notes in Computer Science, vol 263. Springer, Berlin, Heidelberg. pp. 1156–1158
(1986)
20 A. Greuet, S. Montoya, C. Vermeersch

4. Bos, J., Ducas, L., Kiltz, E., Lepoint, T., Lyubashevsky, V., Schanck, J.M.,
Schwabe, P., Seiler, G., Stehlé, D.: Crystals – kyber: a cca-secure module-lattice-
based kem. Cryptology ePrint Archive, Report 2017/634 (2017)
5. Bos, J.W., Renes, J., van Vredendaal, C.: Post-quantum cryptography with con-
temporary co-processors. USENIX (2021)
6. BSI: Migration zu Post-Quanten-Kryptografie - Handlungsempfehlungen des BSI
7. Chen, C., Danba, O., Hoffstein, J., Hülsing, A., Rijneveld, J., M.Schank, J.,
Schwabe, P., Whyte, W., Zhang, Z.: NTRU (2020)
8. for Cryptography Research, C.A.: National cryptographic algorithm design com-
petition (2018)
9. D’Anvers, J.P., Karmakar, A., Roy, S.S., Vercauteren, F.: Saber: Module-lwr
based key exchange, cpa-secure encryption and cca-secure kem. Cryptology ePrint
Archive, Report 2018/230 (2018)
10. Greuet, A., Montoya, S., Renault, G.: Speeding-up ideal lattice-based key exchange
using a RSA/ECC coprocessor. IACR Cryptol. ePrint Arch. p. 1602 (2020)
11. Harvey, D.: Faster polynomial multiplication via multipoint kronecker substitution
(2007)
12. Kronecker, L.: Grundzüge einer arithmetischen theorie der algebraischen grössen.
(abdruck einer festschrift zu herrn e. e. kummers doctor-jubiläum, 10. september
1881.). Journal für die reine und angewandte Mathematik 92, 1–122 (1882)
13. Menezes, A.J., Van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryp-
tography. CRC press (2018)
14. Moody, D.: Post-Quantum Cryptography NIST’s Plan for the Future (2016)
15. Moody, D., Alagic, G., Apon, D.C., Cooper, D.A., Dang, Q.H., Kelsey, J.M., Liu,
Y.K., Miller, C.A., Peralta, R.C., Perlner, R.A., Robinson, A.Y., Smith Tone, D.C.,
Alperin Sheriff, J.: Status Report on the Second Round of the NIST Post-Quantum
Cryptography Standardization Process. Tech. rep., National Institute of Standards
and Technology (Jul 2020)
16. Nussbaumer, H.J.: Number Theoretic Transforms, pp. 211–240. Springer Berlin
Heidelberg, Berlin, Heidelberg (1982)
17. Shor, P.W.: Polynomial-Time Algorithms for Prime Factorization and Discrete
Logarithms on a Quantum Computer. SIAM J. Comput. 26(5), 1484–1509 (Oct
1997)
18. Wang, B., Gu, X., Yang, Y.: Saber on ESP32. In: Conti, M., Zhou, J., Casalicchio,
E., Spognardi, A. (eds.) Applied Cryptography and Network Security. pp. 421–440.
Springer International Publishing, Cham (2020)

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Practice Exam For Design of Experiments DOE
100% (1)
Practice Exam For Design of Experiments DOE
30 pages
Bana Al Subaiei, Muneerah Al Nuwairan - A Gentle Introduction To Group Theory (2023)
No ratings yet
Bana Al Subaiei, Muneerah Al Nuwairan - A Gentle Introduction To Group Theory (2023)
429 pages
Maths Excel Book4
86% (7)
Maths Excel Book4
190 pages
Towards A Decentralized Refugee Identification and Management Using Smart Contracts
No ratings yet
Towards A Decentralized Refugee Identification and Management Using Smart Contracts
38 pages
Implementing RLWE-based Schemes Using An RSA Co-Processor
No ratings yet
Implementing RLWE-based Schemes Using An RSA Co-Processor
40 pages
Hardware Acceleration of ECC
No ratings yet
Hardware Acceleration of ECC
102 pages
Kazuo Sakiyama, Elke de Mulder, Bart Preneel and Ingrid Verbauwhede
No ratings yet
Kazuo Sakiyama, Elke de Mulder, Bart Preneel and Ingrid Verbauwhede
4 pages
Knuth-Morris-Pratt Algorithm Explained: Definitive Reference for Developers and Engineers
From Everand
Knuth-Morris-Pratt Algorithm Explained: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Hardware RSA Accelerator: Group 3: Ariel Anders, Timur Balbekov, Neil Forrester May 15, 2013
No ratings yet
Hardware RSA Accelerator: Group 3: Ariel Anders, Timur Balbekov, Neil Forrester May 15, 2013
15 pages
Exploring The Design Space For FPGA Base
No ratings yet
Exploring The Design Space For FPGA Base
9 pages
Applsci 14 03323 v2
No ratings yet
Applsci 14 03323 v2
15 pages
Reconfigurable and High-Efficiency Polynomial Multiplication Accelerator For CRYSTALS-Kyber
No ratings yet
Reconfigurable and High-Efficiency Polynomial Multiplication Accelerator For CRYSTALS-Kyber
12 pages
Design A Scalable RSA and ECC Crypto-Processor
No ratings yet
Design A Scalable RSA and ECC Crypto-Processor
4 pages
Power Analysis of Ntruencrypt On Arm Cortex-M4: 2. Ntruencrpyt
No ratings yet
Power Analysis of Ntruencrypt On Arm Cortex-M4: 2. Ntruencrpyt
4 pages
Zhou 2021
No ratings yet
Zhou 2021
21 pages
FPGA Implementation of RSA Encryption System: Sushanta Kumar Sahu Manoranjan Pradhan
No ratings yet
FPGA Implementation of RSA Encryption System: Sushanta Kumar Sahu Manoranjan Pradhan
3 pages
High-Speed Modular Multiplier For Lattice-Based Cryptosystems
No ratings yet
High-Speed Modular Multiplier For Lattice-Based Cryptosystems
5 pages
Fast Implementation of ECC p256
No ratings yet
Fast Implementation of ECC p256
16 pages
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Efficient Memory Optimization for IoT Intrusion Detection
From Everand
Efficient Memory Optimization for IoT Intrusion Detection
Ethan Evelyn
No ratings yet
High-Speed NTT-based Polynomial Multiplication Accelerator For Post-Quantum Cryptography
No ratings yet
High-Speed NTT-based Polynomial Multiplication Accelerator For Post-Quantum Cryptography
8 pages
Q#: Programming Quantum Algorithms and Circuits: Definitive Reference for Developers and Engineers
From Everand
Q#: Programming Quantum Algorithms and Circuits: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Fast Architectures For FPGA-Based Implementation Encryption Algorithm
No ratings yet
Fast Architectures For FPGA-Based Implementation Encryption Algorithm
8 pages
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Scalable and Parallel Optimization of The Number Theoretic Transform Based On FPGA
No ratings yet
Scalable and Parallel Optimization of The Number Theoretic Transform Based On FPGA
14 pages
Cheung 等 - 2009 - On the Design and Optimization of a Quantum Polyno
No ratings yet
Cheung 等 - 2009 - On the Design and Optimization of a Quantum Polyno
12 pages
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Efficient Numerical Computing with Intel MKL: Definitive Reference for Developers and Engineers
From Everand
Efficient Numerical Computing with Intel MKL: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
The Chinese Remainder Theorem and Its Application in A High-Speed RSA Crypto Chip
No ratings yet
The Chinese Remainder Theorem and Its Application in A High-Speed RSA Crypto Chip
10 pages
Implementation of Rsa Key Generation Based On Rns Using Verilog
No ratings yet
Implementation of Rsa Key Generation Based On Rns Using Verilog
5 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Charm++ Programming and Applications: Definitive Reference for Developers and Engineers
From Everand
Charm++ Programming and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
EtherChannel Configuration and Optimization: Definitive Reference for Developers and Engineers
From Everand
EtherChannel Configuration and Optimization: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Tcpdump in Depth: Definitive Reference for Developers and Engineers
From Everand
Tcpdump in Depth: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Ece4750 Lab1 Imul
No ratings yet
Ece4750 Lab1 Imul
7 pages
FPGA Implementation of A Run-Time Configurable NTT-based Polynomial
No ratings yet
FPGA Implementation of A Run-Time Configurable NTT-based Polynomial
12 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Practical High Performance Computing: Definitive Reference for Developers and Engineers
From Everand
Practical High Performance Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Implementation and Evaluation of BSD Elliptic Curve Cryptography - Oriol Piñol
No ratings yet
Implementation and Evaluation of BSD Elliptic Curve Cryptography - Oriol Piñol
70 pages
Elliptic Curve Digital Signature Algorithm in Theory and Practice: Definitive Reference for Developers and Engineers
From Everand
Elliptic Curve Digital Signature Algorithm in Theory and Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
RSA Public-Key Encryption and Signature Lab
No ratings yet
RSA Public-Key Encryption and Signature Lab
8 pages
Efficient Low-Latency Multiplication Architecture For NIST Trinomials With RISC-V Integration
No ratings yet
Efficient Low-Latency Multiplication Architecture For NIST Trinomials With RISC-V Integration
5 pages
A Review On Implementation of RSA Cryptosystem Using Ancient Indian Vedic Mathematics
No ratings yet
A Review On Implementation of RSA Cryptosystem Using Ancient Indian Vedic Mathematics
3 pages
Quantum Algorithms in Action: A Practical Guide to Implementation with Qiskit
From Everand
Quantum Algorithms in Action: A Practical Guide to Implementation with Qiskit
Robert Johnson
No ratings yet
Building Container Solutions with Fargate: Definitive Reference for Developers and Engineers
From Everand
Building Container Solutions with Fargate: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
OpenTelemetry in Practice: Definitive Reference for Developers and Engineers
From Everand
OpenTelemetry in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Omni-Path Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Omni-Path Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
A SECURE DATA AGGREGATION TECHNIQUE IN WIRELESS SENSOR NETWORK
From Everand
A SECURE DATA AGGREGATION TECHNIQUE IN WIRELESS SENSOR NETWORK
Dr Chaitra HV
No ratings yet
Elliptic Curve Cryptography On Embedded Multicore Systems
No ratings yet
Elliptic Curve Cryptography On Embedded Multicore Systems
6 pages
Low-Power Design For A Digit-Serial Polynomial Basis Finite Field Multiplier Using Factoring Technique
No ratings yet
Low-Power Design For A Digit-Serial Polynomial Basis Finite Field Multiplier Using Factoring Technique
17 pages
1998 Goodman JSSC PDF
No ratings yet
1998 Goodman JSSC PDF
11 pages
1998 Goodman JSSC PDF
No ratings yet
1998 Goodman JSSC PDF
11 pages
Practical Replication Architectures and Protocols: Definitive Reference for Developers and Engineers
From Everand
Practical Replication Architectures and Protocols: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Compiler Frontiers Unveiled
From Everand
Compiler Frontiers Unveiled
Azhar ul Haque Sario
No ratings yet
Quantum Computing for Programmers and Investors: with full implementation of algorithms in C
From Everand
Quantum Computing for Programmers and Investors: with full implementation of algorithms in C
Alberto Palazzi
5/5 (1)
Network Address Translation Protocols and Design: Definitive Reference for Developers and Engineers
From Everand
Network Address Translation Protocols and Design: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rate-1 Arithmetic Garbling From Homomorphic Secret Sharing
No ratings yet
Rate-1 Arithmetic Garbling From Homomorphic Secret Sharing
28 pages
A Security Analysis of Two Classes of RSA-like Cryptosystems
No ratings yet
A Security Analysis of Two Classes of RSA-like Cryptosystems
42 pages
Another Lattice Attack Against An RSA-like Cryptosystem: Abstract
No ratings yet
Another Lattice Attack Against An RSA-like Cryptosystem: Abstract
8 pages
Small Private Key Attack Against A Family of RSA-like Cryptosystems
No ratings yet
Small Private Key Attack Against A Family of RSA-like Cryptosystems
17 pages
Two RSA-based Cryptosystems: International Study Centre, Kingston University, Kingston Hill Campus, London, KT2 7LB, UK
No ratings yet
Two RSA-based Cryptosystems: International Study Centre, Kingston University, Kingston Hill Campus, London, KT2 7LB, UK
23 pages
Trustworthy Approaches To RSA: Efficient Exploitation Strategies Based On Common Modulus
No ratings yet
Trustworthy Approaches To RSA: Efficient Exploitation Strategies Based On Common Modulus
6 pages
An Efficient Variant of The RSA Cryptosystem: Abstract
No ratings yet
An Efficient Variant of The RSA Cryptosystem: Abstract
10 pages
Using Decision Problems in Public Key Cryptography: Abstract
No ratings yet
Using Decision Problems in Public Key Cryptography: Abstract
12 pages
A New RSA Variant Based On Elliptic Curves: Abstract
No ratings yet
A New RSA Variant Based On Elliptic Curves: Abstract
19 pages
A New Public Key Cryptosystem Based On The Cubic Pell Curve: Abstract
No ratings yet
A New Public Key Cryptosystem Based On The Cubic Pell Curve: Abstract
24 pages
Application of Ecm To A Class of Rsa Keys: Abderrahmane Nitaj
No ratings yet
Application of Ecm To A Class of Rsa Keys: Abderrahmane Nitaj
16 pages
Computing Asymptotic Bounds For Small Roots in Coppersmith's Method Via Sumset Theory
No ratings yet
Computing Asymptotic Bounds For Small Roots in Coppersmith's Method Via Sumset Theory
33 pages
Using The RSA or RSA-B Accumulator in Anonymous Credential Schemes
No ratings yet
Using The RSA or RSA-B Accumulator in Anonymous Credential Schemes
19 pages
Mathematics Research For The Beginning Student, Volume 2: Eli E. Goldwyn Sandy Ganzell Aaron Wootton Editors
No ratings yet
Mathematics Research For The Beginning Student, Volume 2: Eli E. Goldwyn Sandy Ganzell Aaron Wootton Editors
314 pages
Addima PPL 2013 BT
No ratings yet
Addima PPL 2013 BT
19 pages
SMS 3400 Design Analysis of Experiments
No ratings yet
SMS 3400 Design Analysis of Experiments
3 pages
6 Induction Scan
No ratings yet
6 Induction Scan
27 pages
Applications of Macdonalds Polynomials
No ratings yet
Applications of Macdonalds Polynomials
9 pages
MSCE WORLD HISTORY NOTES - Revised Curriculum 2 (1) 061505
No ratings yet
MSCE WORLD HISTORY NOTES - Revised Curriculum 2 (1) 061505
138 pages
6 Continuity
No ratings yet
6 Continuity
23 pages
Gambini - Probability of Digits Preprint - 2012
No ratings yet
Gambini - Probability of Digits Preprint - 2012
19 pages
1 Set Theory and Mathematical Induction
No ratings yet
1 Set Theory and Mathematical Induction
26 pages
First Course in Design and Analysis of Experiments
No ratings yet
First Course in Design and Analysis of Experiments
679 pages
Stirling Numbers1
No ratings yet
Stirling Numbers1
13 pages
2023 Additional Mathematics Paper I
No ratings yet
2023 Additional Mathematics Paper I
12 pages
042 Additional Maths 2018
No ratings yet
042 Additional Maths 2018
53 pages
T Tut Proto 2021 PDF e
No ratings yet
T Tut Proto 2021 PDF e
18 pages
When NTT Meets SIS: Efficient Side-Channel Attacks On Dilithium and Kyber
No ratings yet
When NTT Meets SIS: Efficient Side-Channel Attacks On Dilithium and Kyber
24 pages
Kristin Lauter, Wei Dai, Kim Laine - Protecting Privacy Through Homomorphic Encryption-Springer (2022)
No ratings yet
Kristin Lauter, Wei Dai, Kim Laine - Protecting Privacy Through Homomorphic Encryption-Springer (2022)
183 pages
Quantum Information Science and Technology (QIST)
No ratings yet
Quantum Information Science and Technology (QIST)
36 pages
Cryptography 08 00015 v2
No ratings yet
Cryptography 08 00015 v2
13 pages
The LLL Algorithms
No ratings yet
The LLL Algorithms
503 pages
Dilithium Nist
No ratings yet
Dilithium Nist
38 pages
Nist Fips 204
No ratings yet
Nist Fips 204
65 pages
A Survey On Post-Quantum Cryptography For Constrained Devices
No ratings yet
A Survey On Post-Quantum Cryptography For Constrained Devices
8 pages
2024 USA24 CRYP-W09 01 Signatures-2 1713559074168001ypww
No ratings yet
2024 USA24 CRYP-W09 01 Signatures-2 1713559074168001ypww
58 pages
Lattice Klepto: Turning Post-Quantum Crypto Against Itself
No ratings yet
Lattice Klepto: Turning Post-Quantum Crypto Against Itself
19 pages
Modular Polynomial Multiplication Using RSA/ECC Coprocessor: Keywords
No ratings yet
Modular Polynomial Multiplication Using RSA/ECC Coprocessor: Keywords
20 pages
Lattice Based Cryptography
No ratings yet
Lattice Based Cryptography
9 pages
Quantum-Resistant Cryptography
No ratings yet
Quantum-Resistant Cryptography
5 pages
Towards Post-Quantum Blockchain A Review On Blockchain Cryptography Resistant To Quantum Computing Attacks
No ratings yet
Towards Post-Quantum Blockchain A Review On Blockchain Cryptography Resistant To Quantum Computing Attacks
26 pages
08 Chapter 4
No ratings yet
08 Chapter 4
26 pages
Modfalcon: Compact Signatures Based On Module Ntru Lattices: Chitchanok - Chuengsatiansup@Adelaide - Edu.Au
No ratings yet
Modfalcon: Compact Signatures Based On Module Ntru Lattices: Chitchanok - Chuengsatiansup@Adelaide - Edu.Au
28 pages
Lattices
No ratings yet
Lattices
37 pages
1 s2.0 S2214212621000296 Main
No ratings yet
1 s2.0 S2214212621000296 Main
11 pages
1 s2.0 S2214212622000011 Main
No ratings yet
1 s2.0 S2214212622000011 Main
12 pages
Pki MRTD Lixembourg
No ratings yet
Pki MRTD Lixembourg
18 pages
Quantum Resistant Cryptography
No ratings yet
Quantum Resistant Cryptography
13 pages
Towards Post-Quantum Authenticated Key Agreement Scheme For Mobile Devices
No ratings yet
Towards Post-Quantum Authenticated Key Agreement Scheme For Mobile Devices
9 pages
Chris Peikert
No ratings yet
Chris Peikert
12 pages
BV Final Report Print
No ratings yet
BV Final Report Print
22 pages
PQC in A Flash: A Downloadable Mind Map For Post-Quantum Cryptography
No ratings yet
PQC in A Flash: A Downloadable Mind Map For Post-Quantum Cryptography
1 page
Post-Quantum Authentication in TLS 1.3: A Performance Study
No ratings yet
Post-Quantum Authentication in TLS 1.3: A Performance Study
16 pages
Efficient Number Theoretic Transform Architecture For CRYSTALS-Kyber
No ratings yet
Efficient Number Theoretic Transform Architecture For CRYSTALS-Kyber
5 pages
Post Quantum Blockchain Security For The Internet of Things Survey and Research Directions
No ratings yet
Post Quantum Blockchain Security For The Internet of Things Survey and Research Directions
28 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.