Modular Polynomial Multiplication Using RSA/ECC Coprocessor: Keywords
Modular Polynomial Multiplication Using RSA/ECC Coprocessor: Keywords
RSA/ECC coprocessor
1 Introduction
In the next few years, a quantum computer powerful enough to run Shor’s al-
gorithm [17] could emerge. Such a computer can break the entire cryptography
based on the hardness of integer factorization and discrete logarithm like RSA
or Elliptic Curve Cryptography (ECC). Due to this potential threat, national
agencies started to study new proposals (e.g. [6]) and initiated standardization of
quantum safe algorithms [14,8]. The most followed standardization by the com-
munity is the one of the National Institute of Standards and Technology (NIST),
which was launched in 2016 [14]. This standardization aims to bring together an
2 A. Greuet, S. Montoya, C. Vermeersch
and modular multiplication. Except the logical AND operation, most of current
asymmetric coprocessors handle these operations. The logical AND is less common
on the current RSA/ECC coprocessor. However adding this operation to an ex-
isting architecture is easier and cheaper than designing a new one for polynomial
multiplication.
2 Background
RSA/ECC coprocessor. The RSA/ECC coprocessor are designed to speed-up
RSA or elliptic curves cryptosystems. To do so, these components provide a
range of integer operations. In this work, we assume that we have access to
a component which can perform, at least, addition/subtraction, bitwise AND,
logical shift, multiplication and modular multiplication operations.
In the following, we say that b is represented over ℓ′ bits to mean that the two’s
complement representation of b is stored in a machine buffer of ℓ′ bits.
Let r be a N ℓ-bit natural ∑number. We denote by ri the i-th digit of r in
N −1
base 2ℓ . In other words, r = i=0 ri 2iℓ with 0 ≤ ri < 2ℓ . We use the following
notation r = (r0 , r1 , . . . , rN −1 )ℓ .
2.2 Notations
Z [X]
Rings. Let q be an integer. Denote by Rq,δ the polynomial ring (XqN +δ) , where
δ ∈ {−1, 1}. We represent an element F (X) ∈ Rq,δ as a polynomial of degree
−
at most N − 1 with coefficients in {0, . . . , q − 1}. Rq,δ denotes the elements of
Rq,δ represented by a polynomial of degree at most N − 1 with coefficients in
{− 2q − 1, . . . , 2q }.
Integer operations. In the sequel, the algorithms are described using the following
notations. Their purpose is to clarify the size of the manipulated operands.
∑
N −1
G(X) = NtoPoly(g, ℓ) = gi X i
i=0
The obtained polynomial G(X) belongs to N[X] and its degree is at most N − 1.
The Kronecker substitution was first introduced in [12]. We give here the main
steps of this substitution. The idea of this substitution is to transform a poly-
nomial multiplication to an integer one by evaluating the polynomials and get
back to the result using a radix conversion. In the context of embedded devices,
this transformation is of interest to perform polynomial multiplication by using
the RSA/ECC coprocessor. Indeed, such coprocessor handles multiplication on
integers. In this section we assume that our polynomials are defined over N[X].
The Kronecker substitution multiplies two polynomials F (X) and G(X) using
an integer multiplication. This substitution can be summarized in three steps:
1. Evaluation of F (X) and G(X) at 2ℓ . The value ℓ is chosen such that all the
coefficients after the polynomial
( ℓ )multiplication
( ℓ) are lower than 2ℓ .
2. Integer multiplication r = F 2 G 2 , r ∈ N.
3. Get back to polynomial R(X) ∈ N[X] using radix conversion on r.
Evaluation. The first step of the Kronecker substitution is the polynomial eval-
uation at 2ℓ . Since F (X) has coefficients in N represented over k bits:
( )
Evaluation≥0 (F (X), k, ℓ) := F 2ℓ = polyToN(F (X), k, ℓ) (1)
Evaluation point. Let R(X) = F (X)G(X) where F (X), G(X) ∈ N[X] of degree
at most N − 1. The evaluation point 2ℓ is chosen such that for all i ≤ 2(N − 1):
By the fact that all the coefficients are non-negative, this evaluation is only a
representation of all the fi over ℓ bits. Then in an implementation, the evaluation
does not require arithmetic operations.
Modular Polynomial Multiplication Using RSA/ECC coprocessor 7
Previous works [1,10,5] already achieve the evaluation and the radix conver-
sion with negative coefficients. However, these algorithms are done using array
representations. In this section we describe a way to realize these algorithms
when the coefficients are on a packed integer representation. The main advan-
tage of this representation is that it allows the use of existing coprocessor.
8 A. Greuet, S. Montoya, C. Vermeersch
Algorithm 1 Evaluation
−
Input: F (X) ∈ Rq,δ , k, ℓ ∈ N where ℓ > k.
( )
Output: f˜ ∈ N the two’s complement representation of F 2ℓ mod 2N ℓ
1: mask ← concat(1, ℓ, N ) //Precomputed
2: f˜ ← polyToN(F (X), k, ℓ)
3: neg ← rshift(f˜, k − 1, N ℓ)
4: neg ← and(neg, mask, N ℓ) // Detect negative coefficients
5: tmp ← mult(neg, 2ℓ − 2k , N ℓ, 32)
6: f˜ ← add(f˜, tmp, N ℓ) // Two’s complement representation of each coeff over ℓ
bits
7: neg ← lshift(neg, ℓ, N ℓ)
8: f˜ ← sub(f˜, neg, N ℓ) // Borrow propagation
9: return f˜
Remark 1. The value mask is always the same for a fixed scheme. Then, this
integer can be precomputed and stored in Non-Volatile Memory (NVM).
Modular Polynomial Multiplication Using RSA/ECC coprocessor 9
To obtain the expected result after the Kronecker Substitution, the last case
requires additional operations before the radix conversion. These additional op-
erations are described in Section 4.2 paragraph Two’s complement represen-
tation of the evaluated polynomial.
Evaluation point. Let R(X) = F (X)G(X) where F (X), G(X) ∈ Rq,δ . The eval-
uation point 2ℓ is chosen such that for all i ≤ 2(N − 1):
As mentioned in [1,10], the radix conversion has to be adapted since some coef-
ficients have negative representations. Two issues arise with the negative coeffi-
cients:
1. The evaluation and the integer multiplication propagate borrow between the
polynomial coefficients.
2. The negative evaluation algorithm returns two’s complement representation
over N ℓ bits.
However in order to retrieve the expected polynomial result, the radix conversion
must compensate the propagated borrows by propagating back carries.
Let r̃ = (r̃0 , r̃1 , . . . , r̃N −1 )ℓ ∈ N be the integer that we want to convert to a
polynomial, where for all i, r̃i is a two’s complement representation over ℓ bits
of an integer −2ℓ−1 < ri < 2ℓ−1 . In order to propagate back the carries, we
transform the negative coefficients to non-negative ones by adding a multiple of
our modulus q: maxValue. More precisely, maxValue is the smallest multiple of q
such that for all i, −maxValue ≤ ri < maxValue. Moreover with the parameters
that we use in Section 6, we have maxValue < 2ℓ−1 . Then, by adding maxValue
we got:
– If ri < 0, then 2ℓ ≤ r̃i + maxValue = 2ℓ + ri + maxValue. Therefore a carry
is propagated to r̃i+1 .
– If ri ≥ 0, then r̃i + maxValue = ri + maxValue < 2ℓ .
After adding maxValue, the values ri are considered as natural numbers repre-
sented over ℓ bits. Then, the expected polynomial is obtained by using the radix
conversion algorithm defined in Equation 2 on r̃.
This negative to non-negative conversion is possible because the polynomial
multiplication is done over Rq,δ . Indeed after reduction modulo q, the added
value maxValue is equal to 0.
Then in this case, before the radix conversion we must add or subtract g to
r, depending on δ:
( ) ( )
– δ = 1 : 2N ℓ g mod 2N ℓ + 1 = −g mod 2N ℓ + 1 , then
( ) ( ) ( )
r + g mod 2N ℓ + 1 = F 2ℓ g mod 2N ℓ + 1
( ) ( )
– δ = −1 : 2N ℓ g mod 2N ℓ − 1 = g mod 2N ℓ − 1 , then
( ) ( ) ( )
r − g mod 2N ℓ − 1 = F 2ℓ g mod 2N ℓ − 1
Modular Polynomial Multiplication Using RSA/ECC coprocessor 11
Previously, we supposed that at most one polynomial can have negative coeffi-
cients. In case of lattice-based schemes, this is always the case.
tion. In this section we show how to perform reduction modulo q using packed
integers representation. As mentioned previously, such representation allows to
repurpose existing RSA/ECC coprocessor.
Let r = (r0 , . . . , rN −1 )ℓ ∈ N. In our context, r is obtained after the two first
steps of the Kronecker substitution: polynomial evaluation and modular integer
multiplication. Moreover, we have added maxValue like in Section 4.2. Then,
each ri is such that for all i: 0 ≤ ri < 2maxValue.
In the following we denote by simultaneous reduction, the fact of reducing all
the ri mod q by performing operations on r.
Some of lattice-based schemes, like Saber [9] and NTRU [7], use a power-of-two
modulus. In this context, the simultaneous reduction is easy and fast. Indeed,
the simultaneous reduction is achieved by the computation:
r&concat(q − 1, ℓ, N )
Barrett The Barrett reduction is introduced in [3]. The main idea is to pre-
compute an approximation of a division and use it to perform modular re-
duction. Let α, β ∈ Z and a ∈ N be an integer to reduce
⌊ k+αmodulo
⌋ q ∈ N of
bit-length k ∈ N. Barrett reduction precomputes m = 2 q and computes:
a′ = a − [((a >> (k + β)) · m) >> (α − β)] q
A special case is when α = β, therefore the computation is
a′ = a − [a >> (k + β)] · m · q
In this case, only one shift and one multiplication is performed (m · q is
precomputed).
⌊ ⌋
Depending on the parameters (α, β), a′ = a mod q + tq where 0 ≤ t < aq .
Further details on the Barrett algorithms are given in [13].
6.1 Background
Masked secret polynomial. Most of the time the polynomial using the distribution
Dσ is the secret polynomial. In some use cases, an embedded implementation
must be strongly secured against side-channel attacks. One way to do this is to
mask the secret data. To do so, we split the sensitive data into shares x = x1 +x2
mod q, where x1 , x2 belongs to {0, . . . , q −1}, and then we process the operations
on each share separately. In our context the value q is much larger than the secret
distribution. Therefore, that implies we will manipulate larger secret data and
then it increases the evaluation point. For some assessments, in order to consider
this security requirement, we suppose that the polynomial F (X) is defined over
Rq,δ and its coefficients are sampled uniformly in {0, . . . , q − 1}. In the following
results, we denote this case by Uq distribution.
In the following results, we only specify the distribution of F (X).
Target Assessments are done on a smart card component using a 32-bit ar-
chitecture. In the following we refer to this device as Component A. Due to
intellectual properties reasons, the component name or a detailed description
cannot be given. Then, we only give the main characteristics of the component
A:
– Standards 32-bit instructions (add, sub, shifts, bitwise and, xor, or, etc.).
– No CPU multiplication and division.
– A coprocessor which handles: logical AND, addition, subtraction, shifts, mod-
ular integer multiplication and the non-modular one.
The following results take into account a complete modular reduction. More-
over like the previous works [1,18,5,10], we assume that the inputs are already
in the appropriate machine representation. This implies that the inputs are in:
6.2 Results
Saber. Saber [9] is a lattice-based KEM finalist of the NIST standardization. The
polynomial ring used in Saber is Rq,1 = Zq [X]/(X N + 1), where N = 256 and
q = 8192 = 213 . In this work we consider two distributions for the polynomial
F (X):
Since the modulus is a power of two, the reductions are achieved using a logical
AND with the appropriate mask.
NTRU. NTRU [7] is also a KEM finalist of the NIST competition. The polyno-
Z [X]
mial ring used in NTRU is Rq,−1 = (XqN −1) . The modulus q and the value N
depends on the security parameters. In this work we only consider NTRU HPS
1 parameters, where N = 509 and q = 2048 = 211 .
The value of N does not allow to easily make subdivisions. To overcome
this issue, we work on polynomials with Ñ = 512 coefficients where the latest
coefficients are equal to 0.
In this work, we consider only a Uq distribution. Since q is a power of two,
the modular reductions are performed with a logical AND.
Comparison. The Saber and NTRU MPM algorithms are compared with the poly-
nomial multiplication used in their reference implementations.
7 Conclusion
In this paper we pursue the previous works that optimize lattice-based schemes,
by re-purposing today’s RSA/ECC coprocessor. Indeed, we propose an algo-
rithm, called MPM, which performs modular polynomial multiplication using co-
processor instructions. More precisely, our work allow to reprupose existing co-
processor to handle modular reductions and the negative coefficients during the
polynomial multiplication.
Afterwards, we assess in practice the MPM algorithm for almost all NIST
lattice-based finalists. This assessment is done on a component which has few
CPU instruction and that bases the asymmetric cryptographic efficiency on its
RSA/ECC coprocessor. The MPM algorithm is compared to software polynomial
multiplications, as NTT or Karatsuba. The few CPU instruction minimizes the
possible assembly optimization for the software algorithms. Therefore in this
component, our algorithm multiplication brings a significant speed-up.
This attest that re-purposing standard asymmetric coprocessor to speed-
up lattice-based cryptography is of interest especially in a context of hybrid
cryptography deployment.
References
1. Albrecht, M.R., Hanser, C., Hoeller, A., Pöppelmann, T., Virdia, F., Wallner, A.:
Implementing RLWE-based Schemes Using an RSA Co-Processor. IACR Transac-
tions on Cryptographic Hardware and Embedded Systems pp. 169–208 (2019)
2. ANSSI: Technical position paper - ANSSI views on the Post-Quantum
Cryptography transition, available at https://www.ssi.gouv.fr/publication/
anssi-views-on-the-post-quantum\-cryptography-transition/
3. Barrett, P.: Implementing The Rivest Shamir And Adleman Public Key Encryption
On A Standard Digital Signal Processor. CRYPTO’ 86. CRYPTO 1986. Lecture
Notes in Computer Science, vol 263. Springer, Berlin, Heidelberg. pp. 1156–1158
(1986)
20 A. Greuet, S. Montoya, C. Vermeersch
4. Bos, J., Ducas, L., Kiltz, E., Lepoint, T., Lyubashevsky, V., Schanck, J.M.,
Schwabe, P., Seiler, G., Stehlé, D.: Crystals – kyber: a cca-secure module-lattice-
based kem. Cryptology ePrint Archive, Report 2017/634 (2017)
5. Bos, J.W., Renes, J., van Vredendaal, C.: Post-quantum cryptography with con-
temporary co-processors. USENIX (2021)
6. BSI: Migration zu Post-Quanten-Kryptografie - Handlungsempfehlungen des BSI
7. Chen, C., Danba, O., Hoffstein, J., Hülsing, A., Rijneveld, J., M.Schank, J.,
Schwabe, P., Whyte, W., Zhang, Z.: NTRU (2020)
8. for Cryptography Research, C.A.: National cryptographic algorithm design com-
petition (2018)
9. D’Anvers, J.P., Karmakar, A., Roy, S.S., Vercauteren, F.: Saber: Module-lwr
based key exchange, cpa-secure encryption and cca-secure kem. Cryptology ePrint
Archive, Report 2018/230 (2018)
10. Greuet, A., Montoya, S., Renault, G.: Speeding-up ideal lattice-based key exchange
using a RSA/ECC coprocessor. IACR Cryptol. ePrint Arch. p. 1602 (2020)
11. Harvey, D.: Faster polynomial multiplication via multipoint kronecker substitution
(2007)
12. Kronecker, L.: Grundzüge einer arithmetischen theorie der algebraischen grössen.
(abdruck einer festschrift zu herrn e. e. kummers doctor-jubiläum, 10. september
1881.). Journal für die reine und angewandte Mathematik 92, 1–122 (1882)
13. Menezes, A.J., Van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryp-
tography. CRC press (2018)
14. Moody, D.: Post-Quantum Cryptography NIST’s Plan for the Future (2016)
15. Moody, D., Alagic, G., Apon, D.C., Cooper, D.A., Dang, Q.H., Kelsey, J.M., Liu,
Y.K., Miller, C.A., Peralta, R.C., Perlner, R.A., Robinson, A.Y., Smith Tone, D.C.,
Alperin Sheriff, J.: Status Report on the Second Round of the NIST Post-Quantum
Cryptography Standardization Process. Tech. rep., National Institute of Standards
and Technology (Jul 2020)
16. Nussbaumer, H.J.: Number Theoretic Transforms, pp. 211–240. Springer Berlin
Heidelberg, Berlin, Heidelberg (1982)
17. Shor, P.W.: Polynomial-Time Algorithms for Prime Factorization and Discrete
Logarithms on a Quantum Computer. SIAM J. Comput. 26(5), 1484–1509 (Oct
1997)
18. Wang, B., Gu, X., Yang, Y.: Saber on ESP32. In: Conti, M., Zhou, J., Casalicchio,
E., Spognardi, A. (eds.) Applied Cryptography and Network Security. pp. 421–440.
Springer International Publishing, Cham (2020)