Master Thesis Computer Science: Radboud University SAP Netherlands
Master Thesis Computer Science: Radboud University SAP Netherlands
Master Thesis Computer Science: Radboud University SAP Netherlands
Computer Science
In the shared responsibility model of the cloud, all parties should be equipped
with enough tools to protect data actively. While users might be partially re-
sponsible, they often do not have full control over their data. Data breaches
in the cloud are amongst the highest ranked threats and arguably can do
the most damage. Encryption techniques can ensure data confidentiality,
but only for data at rest and data in transit. If we want to protect the data
during processing, as a user, we need more advanced techniques. In this
thesis, we will look into how secure computation can help protect data in
the cloud and more specifically what homomorphic encryption can achieve
and how it impacts performance and security.
Contents
1 Introduction 3
2 Preliminaries 6
2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Mathematical structures . . . . . . . . . . . . . . . . . . . . . 7
2.3 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Lattice-based cryptography . . . . . . . . . . . . . . . . . . . 9
2.4.1 Hard lattice problems . . . . . . . . . . . . . . . . . . 9
2.5 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Secure computation 18
4.1 Secure multi-party computation . . . . . . . . . . . . . . . . . 19
4.2 Secure hardware . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Homomorphic encryption . . . . . . . . . . . . . . . . . . . . 22
4.4 Combination with the cloud . . . . . . . . . . . . . . . . . . . 23
5 Cloud security 24
5.1 HE in the cloud . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Homomorphic encryption 29
6.1 Partial homomorphic encryption . . . . . . . . . . . . . . . . 32
6.2 Fully homomorphic encryption . . . . . . . . . . . . . . . . . 33
6.3 Somewhat homomorphic encryption . . . . . . . . . . . . . . 35
6.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.5 BFV scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1
7 Security of HE 39
7.1 Secure architecture . . . . . . . . . . . . . . . . . . . . . . . . 39
7.1.1 CIA model . . . . . . . . . . . . . . . . . . . . . . . . 39
7.1.2 Verifiability . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2 Secure schemes . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2.1 Security reductions . . . . . . . . . . . . . . . . . . . . 42
8 Use cases 43
8.1 Recommendation systems . . . . . . . . . . . . . . . . . . . . 43
8.2 General scenario . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.3 General framework . . . . . . . . . . . . . . . . . . . . . . . . 47
8.4 Potential usecases . . . . . . . . . . . . . . . . . . . . . . . . . 47
9 Proof of concept 49
9.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
9.1.1 Minimum viable product . . . . . . . . . . . . . . . . 50
9.1.2 Open-source libraries . . . . . . . . . . . . . . . . . . . 50
9.1.3 SAP Cloud Platform . . . . . . . . . . . . . . . . . . . 52
9.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.2.1 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.2.2 Backend . . . . . . . . . . . . . . . . . . . . . . . . . . 53
10 Related work 55
11 Conclusions 57
11.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
A 61
A.1 On data banks . . . . . . . . . . . . . . . . . . . . . . . . . . 61
B 62
B.1 File structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
B.2 Proof of concept . . . . . . . . . . . . . . . . . . . . . . . . . 63
2
Chapter 1
Introduction
In the last decade the transition into the cloud has been fast forwarded
after the tech giants Amazon, Google and Microsoft emerged as cloud giants.
Millions of users, companies and individuals alike, make use of the computing
power and storage provided through the internet. Massive data centres
around the globe provide services ranging from web-based email to entire
IT infrastructures. In order to offer such a variety of services at a competitive
price, a massive scale of operation is required.
Not only the massive scale but also the optimal usage of the cloud infras-
tructure helps in reducing cost. It follows the philosophy that a large group
of users would more efficiently use the computing power than a single user
can, thus increasing throughput and reducing idle-time. This also means
that users can scale their resources effectively, and pay only for what they
use.
Companies such as Netflix can benefit from this so called unlimited scal-
ing through a modular and flexible architecture. They can upscale to several
thousands of servers on a rainy day and downscale when it is nice and sunny
outside. While Netflix might have a lot of users during the evenings, Mi-
crosoft Teams sees a surge in the amount of meeting minutes during the
day. At the end of March, an increase from 900 million minutes to almost 3
billion meeting minutes was measured. Since the processing and hosting of
a meeting happens in the cloud, they were able to scale and handle a record
of 4.1 billion meeting minutes in a single day [1].
The core business in the case of Netflix is to provide the customers with
series and movies. This requires storage of movies and video encoding before
sending it to customers. For Microsoft Teams, the core business is to create
a virtual meeting space. This requires storage and hosting of potentially
confidential chat messages or voice calls. Although these are just two ex-
amples of micro-service applications operating in the cloud, we can already
see their alternating usage, the value of scalability and more importantly a
difference in the level of trust.
3
In the near future, cloud-dependent applications and platforms will,
maybe unwillingly, become more and more important in our personal and
professional lives. The cloud giants, also called hyperscalers, will thus have
to gain more and more trust in their security infrastructure. Even though
a lot of money and expertise is invested into the security of the cloud, inci-
dents and mistakes still happen on both cloud and user side. These incidents
are not allowed to happen once highly regulated industries such as telecom,
electric power or healthcare join the cloud landscape. Data breaches will be
more severe since these industries are often associated with highly sensitive
data, e.g., financial and health records. This will also make them a more
valuable target for attackers.
Therefore, it is important to reflect on the amount of trust we put in
the cloud providers. Not only for highly regulated industries, but also in
general. We should also look into technical advancements, such as secure
computation, which might enable and support developers and users of cloud
applications to process data securely without losing the advantages of the
cloud.
In this thesis, we will look into secure computation with a focus on its
feasibility in cloud applications and infrastructures. To correctly assess the
current cloud landscape, we discuss the concept of the cloud, the benefits and
the obstacles in terms of architecture and security and the main threats to
4
the cloud. We explain the main three techniques used in secure computation
of which homomorphic encryption is highlighted for further research. We
describe the recent advancements in the field of homomorphic encryption
and their performance and security guarantees. In the last chapters we
describe a general framework and scenario, derived from several promising
usecases. To demonstrate what an implementation of such a general usecase
could look like, we created a proof of concept on the SAP Cloud Platform
using the SAP full-stack Web IDE and the Microsoft SEAL library.
5
Chapter 2
Preliminaries
This chapter will describe some prior knowledge necessary to clarify this
thesis.
2.1 Terminology
• Encode, an operation that transforms human readable data into data
that is ready for encryption.
Encoded data does not provide any confidentiality and can be decoded by
everybody, encrypted data can not be decrypted by everybody.
6
2.2 Mathematical structures
Definition 2.2.1. (Group) A group is a set with the following properties:
f (x) · f (y) = f (x ∗ y)
Where · is the operation from the first structure and ∗ of the second.
For this thesis we are focused on the ring structure of the plaintext and
preserving its structure in the ciphertext, e.g., the addition of ciphertexts
should result in the addition of plaintexts.
2.3 Encryption
When encrypting data, you need a key. The two ways of using this key is ei-
ther asymmetric or symmetric. In symmetric cryptography we use the same
key for encryption and decryption. In asymmetric cryptography we use two
different keys. A secret (private) key and a public key. The main reason for
asymmetric cryptography is to solve the issues regarding key management
and establishing shared keys. For every pair of people communicating we
need a shared symmetric key. However, in asymmetric crypto we only need
a public and secret key per user. To establish a shared symmetric key, a
hybrid approach is often used. This uses asymmetric cryptography to es-
tablish a symmetric shared key. This symmetric key can then be used to
encrypt large amounts of data efficiently. One of the most famous symmet-
ric encryption schemes is Rijndael, which became the Advanced Encryption
7
Standard. This standard is now used in almost all cloud setups to protect
data at rest.
In asymmetric cryptography the public key should be authenticated and
publicly available, this allows you to encrypt a plaintext for a specific user.
The secret key belongs only to the user and can be used to decrypt the
ciphertext. All encryption schemes are based on hard computational prob-
lems and trapdoor functions that are only possible to revert with the secret
key.
A computational hardness assumption refers to the hypothesis that there
is no known algorithm that can solve a particular problem efficiently. This
comes from the computational complexity theory but is particularly useful
in cryptography to indicate whether or not a scheme is secure.
The most famous public encryption scheme is RSA, introduced by Rivest,
Shamir and Adleman. The assumption used in RSA is based on the factoring
problem. It describes the challenge to factor a large number into its prime
components.
• KeyGen(1λ ), choose e such that gcd(e, φ(N )) = 1 and find d such that
e · d ≡ 1 mod φ(N ). Where N = p · q and φ(N ) = (p − 1) · (q − 1).
Output pk = (N, e), sk = d.
8
2.4 Lattice-based cryptography
With the introduction of Shor’s algorithm [5], the previous hard problems
are not hard anymore for quantum computers. Since then several new pub-
lic key encryption schemes were proposed that are quantum resistant. This
sparked a new area of cryptography called Post-Quantum cryptography. In
the search for a quantum secure encryption scheme, lattice-based cryptog-
raphy has been one of the contenders from the start. As the name suggests
it is based on lattices.
• Search SVP, given a lattice basis B, find v ∈ L(B) such that ||v|| =
λ1 (L(B)) (minimum distance).
For a two dimensional lattice, we can visualize this as seen in Figure 2.3.
9
Closest vector problem
The CVP problem [7] is closely related to the SVP problem. Given a lattice
and a target point. Find the lattice-point closest to the target. This differs
from the SVP problem, which has the origin as target point. For a two
dimensional lattice, we can visualize this as seen in Figure 2.4.
• Let As,X be the probability distribution with samples (a, < a, s > +e),
where a is a vector ∈ Znq , e ∈ Zq from X and fixed s ∈ Znq ;
In this thesis we will give a high level interpretation of the basic BFV
scheme, it is based on the ring-LWE hard problem. This means the previous
samples will thus be from a ring, e.g., Z[x]/(X n + 1).
2.5 Miscellaneous
Next we will describe some additional concepts [9].
Negligible functions
Since an attacker can always randomly guess correctly (or gain another small
advantage) there is a small chance that a scheme will be broken. This can
be modeled by allowing a small probability of adversarial success, also called
a negligible probability. In other words, the advantage that an adversary
has is negligible.
10
Indistinguishability
In the area of computational complexity and cryptography, two distribu-
tions are computationally indistinguishable if no efficient algorithm can tell
the difference between them except with negligible probability. When two
encryptions are indistinguishable we speak of semantic security.
Privilege escalation
A type of threat that allows an adversary that has access to a user account,
to increase his privileges. This means that the adversary can elevate the
amount control it has over certain systems or files.
11
Figure 2.3: Shortest Vector Prob- Figure 2.4: Closest Vector Prob-
lem lem
12
Chapter 3
In this chapter we will look in to the concept of the cloud and explain all
different types of models in which the technology can be used. Towards the
end we will explain one way in which SAP uses the cloud.
3.1 Origin
The cloud is, as boring as it might sound, nothing more than a blanket term.
The cloud is easy to say and remember but gives little detail about what it is
and how it can be used. In essence, the cloud describes the shift from static
infrastructures to scalable and flexible infrastructures. In order to achieve
flexibility of the infrastructure, most of it will be accessible from everywhere,
location and device independent. To increase accessibility some parts of the
infrastructure will be available and controllable over the internet, in other
words it appears to be “in the clouds”. In fact, the internet is just a means
of transporting data. The actual data is located in ordinary servers just as
in an on-premise system. The technique and type of infrastructure is thus
not described by this term. This is where the more specific terms, cloud
computing and cloud storage come into play.
Cloud storage refers to data accessible through the internet which is
stored on a server. Cloud computing describes the processing of this data.
Processing can be seen as a computation on certain numbers, changing data
or removing data entirely. These changes require processing power, which
can be provided by servers. The results can again be retrieved over the
internet. In essence the server can do everything for the user, however it
can also have partial functionality. That is why there are several terms to
describe the infrastructure specific details. These are called the delivery and
deployment models.
13
will explain the exact definition of the standardized terms during the fol-
lowing sections. First we will describe the several roles that exist in the
cloud landscape:
• The cloud provider, is the owner of the cloud computing and storage
resources;
• The service provider, is the owner of the application that uses these
resources;
• The customer, can refer to both the customer of the cloud resources
and the cloud service.
Several companies can have multiple roles, e.g., Microsoft Azure owns the
cloud resources that their own service, Microsoft Teams, operates on. This
makes them both the cloud and service provider.
Public cloud
Most people interact with the public cloud on a daily basis, sometimes with-
out even noticing it. Some websites providing online video content, such as
Youtube, Netflix or Twitch are hosted in the public cloud. In this deploy-
ment model the infrastructure is located at an external server owned by
the cloud providers, e.g., AWS or Microsoft Azure, and the resources are
provided over the internet. The model is characterized by its on-demand
nature, resource pooling and rapid elasticity which makes it approachable
and affordable.
For Netflix this would mean that they are the customer of the cloud re-
sources, provided by AWS. The customer of Netflix would thus be the user
of the cloud service.
Private cloud
The second deployment model is called the private cloud. The specific cloud
storage and computing resources are allocated to a specific service provider.
Since this is a more tailored approach to providing resources, the amount of
control for the customer is greater than in the public cloud. Both in terms of
14
infrastructure and security architecture. The resources can be managed by
the service or the cloud provider. The resources can be located on premise
or off premise.
For Netflix this could mean that, e.g., their billing processes can be located
in a private cloud.
Community cloud
The community cloud is an extension of the private cloud that allows access
by multiple organizations or subdivisions of an organization often with a
similar goal. This combines the benefit of sharing an infrastructure, without
the massive scale in which this happens on the public cloud. This can
both be on premise or off premise. This could be useful when multiple
companies have to access the same application or service and want to share
the application server, e.g., when having a time difference in the peak-usage.
Hybrid cloud
Any combination of a private, public or community cloud, sometimes also
combined with existing on-premise infrastructure. An example of an appli-
cation of a hybrid cloud is to separate business critical services from publicly
accessible services.
Software as a service
This is the most extensive service. It will include everything that is required
to run the application and also the application itself, e.g., Office 365. It is
one of the easiest to use services and abstracts all the backend information
away from the user. The user will not notice anything of what happens
behind the scenes and can access the application and data from all devices.
Platform as a service
This service provides the platform to host and manage applications. The
platform might also support several building blocks to create applications
or a web based interactive development environment. The cloud often en-
courages micro-service architectures that can be structured and scaled using
containers. These can be viewed as light-weight and often more efficient vir-
tual machines.
15
Infrastructure as a service
In the previous service the low-level details about the OS, storage configu-
ration and network components is taken care of by the cloud provider. In
this model it is possible to specify exactly how you want the infrastructure
to behave. Options can include server and storage configuration, network
components and sometimes also the physical data center location. While
the service providers are able to control these details, the cloud provider
will still provide the resources. In this deployment model the customer has
the most amount of control but also requires them to do a lot of the setup
themselves.
16
them to both a cloud or on-premise infrastructure. It supports the developer
through application development services and capabilities with features such
as big data infrastructures or machine learning.
The applications that are being created can be deployed on either the
Neo or Cloud Foundry environment. The Neo environment is fully operated
by SAP on several servers across the world. These are specifically created for
customers and run exclusively on a SAP infrastructure, it can be categorized
as a private cloud.
Cockpit
The cockpit is the launchpad that controls the PaaS, including Global Ac-
counts and high level services [13]. Each global account can have several
subaccounts, linked to a specific provider, region and environment. Within
each subaccount you can create development spaces in which you deploy your
application and launch service instances. These can be custom applications
or pre-configured SaaS. The deployment often happens in a container in-
frastructure. Each container has access to a portion of the storage, memory
and processing power. This micro-service architecture or containerized way
of working allows for easy deployment of additional services when necessary.
We will use The SAP Cloud Platform for the proof of concept described
in Section 9. It provides us with a Web IDE, used to develop the application
and the service marketplace, used to quickly add building blocks such as a
database.
17
Chapter 4
Secure computation
In this chapter we will look into the concept of secure computation and which
methods exist to achieve this. Towards the end we make a choice on which
technique we will further research.
• privacy of output, refers to the output of the function, e.g., the result
of a recommendation system;
There are several reasons why you can not share the input, output and
function, e.g., competitive advantage, regulations or privacy concerns. How-
ever, sometimes we still want to get meaningful results from an outsourced
or joint computation.
An example is the right to a secret ballot in, e.g., electronic voting [14].
In this case we want to guarantee privacy of the input since releasing who
voted for or against in a referendum directly violates the right to a secret
ballot. The function is not private and can be the summation of all the votes
or a direct output of the winner. Since everybody can know the output, i.e.,
the winner, we also do not require the output to be private. If we can come
up with a protocol or techniques with which we only reveal the winner of
the referendum and not the individual votes, we can guarantee the privacy
18
of the input. This can be seen as one example that secure computation can
solve.
In a cloud environment we want to outsource a computation and use
the strength of the clouds computing resources. However, it is not always
guaranteed that the cloud provider can keep the input, output or function
private. This is why we look into secure computation techniques and reflect
on whether they can be a useful extension to the cloud. There are several
subfields that focus on the different properties previously described and also
use different methods and techniques to achieve them:
19
In his paper called “Protocols for secure computations” [18], Andrew
Yao introduced the concept of secure computation, later known as secure
multiparty computation. His introduction proposes the problem in the form
of a conversation between two millionaires who want to find out who is the
richest. An obvious solution would be to tell how wealthy they both are and
compare it, however, they do not want to reveal exactly how wealthy they
are. This is called the Yao’s Millionaires’ Problem.
The protocol described, has to satisfy specific security and privacy con-
straints, such as input privacy, security against an eavesdropper or a dishon-
est participant. The protocol describes an algorithm that the participants
have to follow, which if followed correctly can guarantee security and privacy.
The solution he proposed in his paper involves an active protocol in which
both participants partially compute the result and combine their results
using a technique called oblivious transfer [19]. Oblivious Transfer is a
specific primitive often used in these protocols. It allows a participant to
choose between two values without the other participants knowing what
value has been chosen. Hence we choose between values while the other
participant remains oblivious to which value has been chosen. He later
generalized this problem from a secure comparison to the secure evaluation
of an arbitrary function using Garbled Circuits [20].
The millionaire’s problem has 2 participants, thus we speak of a secure
two-party computation. Later this solution has been extended by Goldreich,
Micali and Wigderson [21] to n-parties and thus becoming a secure multi-
party computation.
The most famous example of an actual application that uses multi-party
computation is a fully automated secure auction[22]. It was the first appli-
cation of MPC which was seen as efficient. It allowed the auction to happen
automatically without a single trusted party that has full responsibility.
To conclude this section we give a short recap of the general characteristics
of secure multi-party computation:
• interactive protocol;
20
4.2 Secure hardware
The second technique is focused on a lower level and allows users to define
secure enclaves in hardware. These secure regions remain confidential when
the platform is under attack by malicious software. The applications are
put into the enclaves with a special instruction set, an example is the Intel
Software Guard Extensions [23]. Since the computation inside these secure
regions still take place on plaintext, the performance of the computation
itself should be just as fast. However, loading into secure memory and
context switching still introduces performance overhead [24].
Although all hardware should be secure, not all hardware is called secure
hardware. The addition of the word secure often means that it will protect
against software level attacks. An obvious requirement for secure hardware is
also that the hardware itself should be secure. The Intel SGX has an unclear
trust model with regards to side-channel [25] and fault-injection attacks
[26], as seen in several attacks [27]. Side-channel attacks try to attack the
hardware by abusing the implementations instead of the algorithms. The
implementations can leak, e.g., timing, power consumption or other side-
channels that can be abused. Fault-injection attacks try to purposefully
induce faults to force execution or output errors that can be abused to
break the hardware.
An example of secure hardware that is both protecting against software
level attacks and has good hardware security are secure cryptoprocessor, also
called hardware security modules [28]. It often manages keys and performs
encryption and decryption functions. Although it is not able to perform
arbitrary computation, it can be useful in combination with either secure
multiparty computation or homomorphic encryption. To conclude this sec-
tion we give a short recap of the general characteristics of secure hardware:
21
4.3 Homomorphic encryption
Homomorphic encryption is a type of encryption that contains homomor-
phic properties and can be used to achieve secure computation. The term
HE is an umbrella term to describe all encryption schemes that contain a
homomorphism. This homomorphism allows the evaluation of an operation
on the ciphertext, and through the properties of the scheme, directly change
the plaintext. One can imagine that if we can calculate directly on, e.g., en-
crypted numbers, we do not have to reveal them during calculation. Since
the result is once again a valid ciphertext, we also do not reveal the output.
This means that we satisfy both the privacy of input and output property
at once while also correctly evaluating a computation. The challenging part
is that we have to build this computation from the basic operations, e.g.,
addition and multiplication. To visualize this, we can use the following
example.
Example 4.3.1. The function that we want to evaluate has three input
values and can be described as follows.
f (x, y, z) = (x + y) · z.
To ensure input privacy we encrypt (E) all values separately and use the
homomorphic property to calculate the result.
After decryption (D) we get the same result as if we used the function on
the “normal” inputs.
D(E((x + y) · z)) = (x + y) · z
22
4.4 Combination with the cloud
When we try to implement secure computation in real-life, there are many
choices that can be made to optimize performance for different architec-
tures and different application scenarios. Sometimes a combination of these
techniques should be used. The architecture we analyse in this thesis is the
cloud architecture and more specifically the addition of secure computation
techniques to the SAP Cloud Platform. Therefore, we will look at the key
characteristics of the techniques once again and see if they align with the
goals of the cloud.
In general, we can see that HE can provide the extra privacy for users in
a cloud environment in which the cloud only has to evaluate a function on
encrypted data. It also allows the cloud to use its cloud computing resources
efficiently. Therefore, the rest of the thesis will be structured around HE
as the secure computation technique that will be further researched and
discussed.
23
Chapter 5
Cloud security
In this chapter we look into the main threats to the cloud. This helps us
identify in which parts of the cloud landscape, secure computation can have
added benefit.
• Spoofing identity;
• Repudiation;
• Information disclosure;
• Denial of service;
• Elevation of privilege.
While the CSA prefers to use the STRIDE framework, we will use the terms
confidentiality, integrity and availability. We will mainly reflect on infor-
mation disclosure, which relates to confidentiality. Integrity is related to
tampering with data and availability is related to denial of service. We will
shortly describe the egregious eleven threats:
24
1. Data Breaches are the main threat to confidentiality of data in the
cloud. Data disclosure can be the consequence of many of the threats
we will mention later and is often the most valuable for an adversary.
Personal health information, financial information, personally identi-
fiable information, trade secrets and intellectual property are all very
valuable, e.g., on black markets, to governments or competitors. It
also impacts the victims brand value and that of the cloud provider.
2. Misconfiguration and Inadequate Change Control can also lead
to data disclosure. Most commonly caused by a wrong setup of a (pub-
lic) system. While the system itself may be perfectly secure, default
credentials or disabled security controls can still lead to unwanted ac-
cess to data. Either due to negligence or lack of knowledge.
3. Lack of Cloud Security Architecture and Strategy is often re-
lated to the differences in the way public clouds and on-premise infras-
tructures need to be protected. This can lead to incidents that were
improbable in an on-premise infrastructure.
4. Insufficient Identity, Credential, Access and Key Manage-
ment can result in direct access to plaintext data. In a normal setup
we often have key management located at the cloud provider. Insuffi-
cient control can thus lead to decryption at the cloud provider. In this
scenario encryption (at rest) will lose its value. An obvious solution
is to shift the key management towards on-premise HSMs or other
key management techniques. In essence, this gives control back to the
user, in a way that on-premise infrastructures can. This also means
that the users now have to protect their keys themselves, which could
be a good or bad thing. A downside to this solution is that the cloud
cannot perform any calculations on the data.
5. Account Hijacking can be the consequence of the previous threat.
It can also be caused by phishing, exploits or stolen credentials. The
account with the highest value is often the account with the highest
privileges. These privileges can lead to sensitive data or control over
the applications. While the first one affects confidentiality and/or
integrity, the last one can impact availability if the attacker wants to
disrupt the applications.
6. Insider Threat, is always a hard threat to solve since the insider
has legitimate access. When there is an insider threat at the cloud
provider side, depending on the amount of privilege, this attacker can
access confidential data located in customer applications or storage.
This data can be encrypted at rest, however, insider threats often
have legitimate access to the decrypted data. If there is a separation
between on-premise and the cloud, through own key management, this
25
can be partially avoided. When there is an insider threat at the on-
premise location, this is nearly impossible to avoid.
7. Insecure Interfaces and APIs are the most public part of an appli-
cation, this means that it will be attacked a lot and should be secure.
Insecure APIs can give access to data that is used in applications. This
often is plaintext data since it has to be used in applications.
8. Weak Control Plane refers to the fact that the service providers
and users do not have enough control over the security architecture.
For example, they want to include their own key management and use
their own encryption solution but the cloud provider does not support
this. This limits the control of the service provider and prevents them
from actively securing their applications. This also makes them more
dependent on the cloud provider for security.
11. Abuse and Nefarious Use of Cloud Services refers to the fact
that the adversary can also use the cloud. This does not impact con-
fidentiality of data directly, however, it can be beneficial for an ad-
versary to be located in the same physical infrastructure. Information
extraction or privilege escalation attacks can be the consequence of
bad virtualization at the cloud provider [31].
26
will discard all the value encryption at rest will provide us. Shifting the key
management from the cloud provider to the user can thus be seen as a good
development. This would separate the keys and the data. A downside to
this solution is that we cannot compute on encrypted data directly in the
cloud. HE can solve a part of this problem by allowing direct computation
on encrypted data.
Another downside of storing the keys in the cloud is that cloud accounts
have legitimate access to the data. This removes encryption at rest for
those accounts that have the correct privileges and allows them to access
the plaintext data and applications. Therefore, encryption at rest, without
separating keys and data, does not protect against Account Hijacking (5)
and Insider Threat(6).
Threats can also occur inside cloud services themselves. Insecure Inter-
faces and APIs (7) allow an adversary to extract plaintext data that resides
inside the services. Inside these services the encryption at rest is removed
in order to use the data. With HE we can create secure cloud services that
operate on ciphertext data. This means that insecure APIs only disclose
encrypted data instead of plaintext data.
Most of the threats reflect on outsider threats, however, insider threats
also occur when an adversary resides in the same cloud as the target. Abuse
and nefarious use of cloud services (11) can thus also be a threat to the
data of cloud users. While it is good to highlight the good points of HE, we
should not forget the fact that, just like all cryptography, it can also be used
by the adversary. The fact that the cloud provider does not know what data
is stored on their servers, or the fact that they do not know exactly which
computation they are running, might cause a problem for certain scenarios.
Although most of these threats are not directly caused by insecure ap-
plications or the application layer, we can see that implementing HE at this
layer can indirectly protect the other layers. We also see that ten out of
eleven threats can result in data tampering, this is another crucial part of
applications in the cloud and something HE can not provide any guarantees
about. The report also reflects on the recent trend that sees the threats
shift from the cloud providers responsibility to threats related to the user
responsibility. This somewhat indicates that the security of the cloud in-
frastructure is improving at the cloud providers side.
To conclude this section we want to discuss a new threat in the list. For
this thesis one of the more interesting threats is the Weak Control Plane
(8). Perhaps the name is not that descriptive but it reflects perfectly on
the need for additional means to protect data as a user. For example, in
the form of secure computation in the cloud. Secure computation is a tech-
nique that a user might want to implement in their applications. This means
cloud/service providers should support the option to use secure computation
27
since it empowers users with techniques to actively secure their data from
the design. It gives users the extra security they might want for sensitive
applications and helps them to securely transition into the cloud.
Although the report does not give us an exhaustive list with all the
security threats and issues, it does give a good perception of the key issues.
This gives us some more tangible points with which we show why HE can
have added benefit for the cloud and thus we hope it can improve the cloud
security.
28
Chapter 6
Homomorphic encryption
When looking at Figure 1 (See appendix A) we can exactly see the analogy
with the cloud. The paper also states the concept of time-sharing, which
we would nowadays call cloud computing. Even the need for sharing the
resources stayed the same. In 1970 it was too expensive to inefficiently use
a computer, while in 2020 it is still expensive to inefficiently use computers.
In both time frames the concept of privacy during computation is something
we want to ensure, whether we call it a data bank or the cloud, it should
not matter.
29
scheme should be a secure encryption scheme, but not every secure encryp-
tion scheme is homomorphic. A traditional encryption scheme specifies the
key generation, encryption and decryption algorithm. Homomorphic en-
cryption refers to a normal encryption scheme, which has been extended by
the evaluate algorithm. This evaluate algorithm allows for manipulation on
ciphertext while directly impacting the plaintext.
Since 1978 several schemes have been published which are used in ev-
eryday applications, e.g., RSA and ElGamal. However, these schemes are
not particularly famous because of their homomorphic properties. This was
often seen as a weakness rather than a strength since this introduces mal-
leability. In Timeline 1 we can see a summary of the evolution of this
research field. There are several ways these schemes are classified in the
literature based on their “level” of homomorphic properties. In general we
can divide them in three categories:
30
Timeline 1: History of the Privacy Homomorphism
1978 RSA [32]
1978 Rivest et al. introduce the privacy homomorphism [16]
1984 Goldwasser & Micali [33]
1985 ElGamal [4]
1999 Paillier [34]
2005 Boneh, Goh & Nissim [35]
2009 Gentry [36]
2011 Brakerski & Vaikuntanathan [37]
2012 Brakerski, Gentry & Vaikuntanathan [38]
2012 Brakerski, Fan & Vercauteren [39][40]
2013 Gentry, Sahai & Waters [41]
2016 Cheon, Kim, Kim & Song [42]
202? Practical homomorphic encryption [?]
31
6.1 Partial homomorphic encryption
Just before their paper on data banks, Rivest and Adleman together with
Shamir released their RSA encryption scheme [32]. It is no coincidence that
the scheme they describe has a partial privacy homomorphism. We can
describe this with the following example.
Example 6.1.1. (Multiplicatively Homomorphic RSA)
See Definition 2.1.1 for the textbook RSA encryption scheme. If we take
two message, m1 and m2 , we get the following two ciphertexts: c1 = m1 e
mod N , c2 = m2 e mod N .
If we multiply the two ciphertexts, we get: c3 = m1 e · m2 e mod N
Which results in: c3 = (m1 · m2 )e mod N .
Decryption will give us the new message: m3 = c3 d = (m1 · m2 )e = m1 · m2
mod N .
In this example we can see that by multiplying the ciphertext c1 with
c2 , we implicitly multiply the two messages m1 and m2 .
Definition 6.1.1. (Multiplicative property) An encryption scheme has the
multiplicative property if we can combine two ciphertexts with the group
operator and on decryption we will receive the multiplication of the mes-
sages.
The ElGamal cryptosystem was introduced in 1985[4]. This crypto sys-
tem also has the multiplicative property.
Example 6.1.2. (Multiplicatively Homomorphic ElGamal)
See Definition 2.2.2 for the ElGamal encryption scheme.
c1 = (g r1 , m1 · hr1 )
c2 = (g r2 , m2 · hr2 )
c3 = c1 · c2 = (g r1 · g r2 , m1 · hr1 · m2 · hr2 )
c3 = (g r1 +r2 , (m1 · m2 )hr1 +r2 )
m3 = m1 · m2
We can use the product rule of exponentiation to transform the multi-
plicative property into an additive property.
Example 6.1.3. (Additively Homomorphic ElGamal)
See Definition 2.2.2 for the ElGamal encryption scheme.
c1 = (g r1 , g m1 · hr1 )
c2 = (g r2 , g m2 · hr2 )
c3 = c1 · c2 = (g r1 · g r2 , g m1 · hr1 · g m2 · hr2 )
c3 = (g r1 +r2 , (g m1 +m2 ) · hr1 +r2 )
t1 = g m1 +m2
Solve discrete log to get m3 = m1 + m2 .
Note that this only works for a short message m3 , otherwise it will be hard
to compute the discrete log, as it should be.
32
Definition 6.1.2. (Additive property) An encryption scheme has the ad-
ditive property if we can combine two ciphertexts with the group operator
and on decryption we will receive the addition of the messages.
In 2005 Boneh et al. [35] released the first scheme that could do both.
The main contribution in this paper is the first homomorphic encryption
scheme which allows both addition and multiplication. However, the amount
of multiplications is limited to one. Before and after this multiplication, we
can perform any number of additions.
This scheme is close to a privacy homomorphism but the true definition
of fully homomorphic encryption refers to an unlimited amount of computa-
tions of both addition and multiplication on encrypted data. RSA, ElGamal
and many others that only contain a partial property are thus classified as
partial homomorphic encryption.
33
X Y Output
0 0 1
1 0 0
0 1 0
1 1 1
can operate on a higher level. The gates translate to the arithmetic oper-
ations of elements in a field. Since we work on a higher level this is easier
to comprehend and implement in applications directly. To shortly recap,
boolean and arithmetic circuits are used to represent a computation. This
is a good way to represent functions on encrypted data because we can only
use specific operators or gates. However, the main question remains. How
can we achieve addition and multiplication on encrypted data?
Breakthrough
In 2009, Gentry published a paper in which he explained how we can achieve
addition and multiplication on encrypted data through lattices [36], this pro-
vides the building blocks for the previously mentioned circuits. However,
due to the way the plaintexts are encrypted, a noise term is introduced in
the ciphertext. When performing calculations on the ciphertexts this noise
increases. During decryption this noise is removed and the plaintext can be
recovered. If this noise grows too large, the decryption will fail. This means
that the encryption scheme introduced by Gentry does support both addi-
tion and multiplication but not an unlimited amount of times. The intuition
behind his solution to this problem is to define a circuit that represents the
decryption algorithm. The homomorphic version of the decryption algo-
rithm requires an encrypted version of the private key and encrypted input,
the output is also encrypted. This allows us to refresh the encrypted input
and thus remove the noise, without having to decrypt it to plaintext first.
Another way of looking at this is refreshing the ciphertext to reduce noise.
This solution is called bootstrapping.
34
6.3 Somewhat homomorphic encryption
Since the bootstrapping procedure brings a lot of performance overhead we
often do not want to use it or we need to increase its efficiency. Another
downside is the requirement of the encryption of a private key, which could
have unwanted implications, also refered to as circular security [44].
Not using bootstrapping limits the expressivity of the computations,
i.e., the depth of the circuits but provides us with faster evaluation. To
increase the expressivity again we want to reduce the noise growth. The
more computations we can perform before the noise becomes too large the
better.
Somewhat homomorphic encryption and more specific leveled homomor-
phic encryption are schemes that can handle functions of a limited amount
of depth, thus not fully homomorphic. The recent improvements in the field
of SHE focus on reducing the amount of noise growth, reducing the size of
the parameters and increasing the speed of the evaluation. Other improve-
ments can be made with regards to simplicity of the security reductions,
i.e., base them on well studied hardness assumptions and remove additional
assumptions.
Both techniques will result in more practical systems because of the reduced
sizes of parameters and ciphertext.
Subsequent improvements are made by extending the LWE problem to a ring
variant[40]. The Brakerski, Gentry and Vaikuntanathan scheme (BGV) and
the Brakerski, Fan and Vercauteren scheme (BFV) are considered the most
promising schemes for practical performance. The security of both schemes
is based on the hardness of the RLWE problem. This ring variant allows for
a lot of different optimizations:
• Single Instruction Multiple Data (SIMD) [46], perform the same in-
struction on all these data points;
35
• Residue number system variants [47], used to split big integers in
smaller parts for calculation and combine them later through, e.g.,
the Chinese Remainder Theorem.
6.4 Performance
As seen in the previous section, the performance is impacted by a lot of
different factors. From a high level perspective we can see two things, firstly
that the encryption and decryption of the data are pure performance over-
head. Since a plaintext computation does not need this additional step.
Secondly, the homomorphic evaluation should be compared to the plaintext
equivalent.
For example, instead of multiplying two integers directly, we now multi-
ply polynomials in which the integers are encoded and encrypted. However,
this ciphertext polynomial can include several integers using the batching
technique. Therefore, it would not be efficient to perform just one integer
multiplication. Thus comparing an integer calculation directly to its homo-
morphic equivalent is hard to do. This is even more complicated for more
advanced functions and especially if they contain a lot of multiplications.
More multiplications directly impact the parameters and thus also the speed
of the evaluation.
In order to achieve maximum performance we also need to represent
the function as an efficient circuit, optimize the parameters for this specific
circuit, optimize the security parameters and also know all the optimiza-
tions of the underlying scheme. We conclude it is not possible to compare
a plaintext computation to a HE computation without knowing all these
aspects. Optimizing in all these areas is also something that has to be fur-
ther researched and perhaps automated to a certain extend by encryption
libraries. Although we cannot directly compare the functions, and several
schemes and papers define their speed differently, we can roughly describe
the improvements by looking at their measurements. Since the evaluation
speed is dominated by the multiplication speed and bootstrapping speed,
we will now show several measurements on those operations and show the
improvement.
In the first implementations of Gentry’s scheme the time of a bootstrapping
operation ranged from 30 seconds to 30 minutes depending on the param-
eters. The parameters are used to ensure the hardness of the underlying
problem and thus the security level and to allow several different sizes of
data to be encrypted. A small setting can be seen as a dimension of 2048,
this makes public-key sizes around 70 megabytes and the time to run one
bootstrapping operation around 30 seconds. For larger dimensions such as
32768 this could increase to public-key sizes of several gigabytes and a boot-
strapping operation in around 30 minutes [48].
36
In a paper from 2016, bootstrapping speeds were achieved sub 0.1 sec-
onds. This is not directly comparable to the Gentry setting since it uses a
different scheme and parameters. However, this does indicate that a lot of
progress has been made on reducing the bootstrapping time [49].
This short section shows that although it is hard to compare the dif-
ferent calculations directly, we still see some great improvements in terms
of speed of the homomorphic components. However, this comparison will
always be in the favor of plaintext computation since performance overhead
is unavoidable, yet the gap is closing.
While the improvements to the algorithms are very promising, we can
also improve on hardware implementations, e.g., GPU, ASICS and FPGA
implementations and multi-threading. In addition to this the the amount
of computational power will increase with time, making this difference less
noticeable.
When looking at the applications from this performance perspective we
have to consider whether or not the performance overhead is noticeable in
a particular calculation and whether we want to increase the privacy/confi-
dentiality of the data or that we need fast computations.
37
S + e), A]. This public key is an instance of the RLWE problem and
thus both parts of the public key should be indistinguishable and this
means it is not possible to recover S.
38
Chapter 7
Security of HE
39
change encrypted data reliably, often result in malformed data and decryp-
tion failure. However, one can not assume that its not possible to change
the data reliably. Especially when using homomorphic encryption, since it
gives us this exact feature by design. This is why we need to have additional
security measures to guarantee integrity in protocols and applications that
rely on HE or encryption in general.
7.1.2 Verifiability
An extra property that we would like to achieve, which is related to in-
tegrity, is called verifiablity. The goal of this property is to give guarantees
as to which function has been calculated. We can achieve this through
Zero-Knowledge Verifiable Computation [50]. This means that the party
that computes the function has to provide the user with some sort of proof
of computation. This proof has to be verified by the user, and thus provides
the guarantee that the result corresponds to the evaluation of the function.
40
Definition 7.2.2 (Known-plaintext Attack). The adversary can observe
pairs of corresponding plaintexts and ciphertexts.
Each attack model shows a different power level of the adversary. De-
pending on the real-life application in which encryption is used, we can de-
termine which model fits best. For the cloud we can make some assumptions
to the strength of the adversary.
Ciphertext-only attacks are very likely since the data is located at the
cloud provider. This means it has access to encrypted data, which it can
observe. It might also be possible that an adversary has a pair of ciphertext
and plaintext, e.g., when results are published. These are the two passive
models a secure HE scheme has to protect against.
The other two are active models, this means the adversary can choose
which plaintext or ciphertext will be revealed. These attacks are far more
invasive, but not impossible in real-life. Since it is likely that we will be
encrypting the same plaintext multiple times, we require the encryption to
be different each time. This property is called semantic security, i.e., two
encryptions of the same message should be indistinguishable. Therefore,
CPA security should be guaranteed by any secure HE scheme. In almost all
HE schemes this is guaranteed through randomized encryption.
CCA attacks are harder to prevent since HE is designed to be malleable
[51]. If we want to create CCA secure encryption, the homomorphic prop-
erties will often be lost and this defeats the purpose of HE. Although this
cannot be resolved strictly by the scheme itself, certain schemes require
evaluation keys. This prevents performing calculations by random attack-
ers, and thus providing some form of non-malleability. However, this does
not solve the problem fundamentally.
41
7.2.1 Security reductions
As seen in the previous chapter, the BFV scheme is based on the RLWE
hardness assumption. We can tell something about the hardness of certain
problems by trying to make a reduction to other well-studied hardness as-
sumptions. A security reduction proves that the new assumption is at least
as hard to break as the problem we reduce it to. This means that if we
can break RLWE, we can use it to break more classical lattice assumptions,
e.g., BDD, SVP or CVP [52]. Since these older lattice assumptions are well-
studied and the most efficient attacks against these problems are not fast
enough, we can assume that attacks against the new assumption are also
not fast enough.
To ensure that these problems are actually hard, the parameters have to
be correctly set. Selecting the correct parameters for the encryption schemes
should be supported such that we can reliably base our security on the
underlying hardness assumption. With wrong parameters this underlying
problem is not hard anymore and decryption would be trivial.
42
Chapter 8
Use cases
43
tering uses input from multiple users to predict the preferences of a single
user. Content-based filtering uses a description of items, in this case movies
and series, and a list of user preferences. While collaborative filtering can
predict what you would like to watch based on your history and the his-
tory of others, content-based filtering directly compares your preferences to
a list with similar items. Since collaborative filtering uses input from mul-
tiple sources the use of HE in this specific scenario will be hard, therefore
we consider content-based filtering. This requires the creation of features
describing the movies and features describing the user. The next step is to
find similarities between both features. These similarities can be used to
create future recommendations, for Netflix there is also an incentive to keep
this secret. Other companies that want to create a recommendation system
for their streaming platform can replicate it, which is also a reason why the
user is not allowed to compute this locally. The calculations used in this
filtering, e.g., weighted sum, can be computed using HE. The weighted sum
is used to calculate the weighted mean. This directly translates to a ranking
with which the user can see which movies are similar to the ones that have
been scored. The calculation uses the following input:
• S is a similarity matrix, e.g., position (1,2) shows the similarity be-
tween movie 1 and movie 2, ranging from 1 to 10;
• p~0 is the vector containing a 1 if the movie has been rated and 0
otherwise.
It outputs the recommendation vector, ~r. There are two ways to compute
the recommendation.
1. On encrypted p~ and encrypted p~0 :
(a) ~t = S · p~
(b) ~u = S · p~0
~
(c) ~r = ~ut , this will be calculated entrywise.
44
(d) Divide previous two results pairwise.
While the first solution requires the vector to contain zero values, the
second one only requires an encryption of the ranked items. This would be
more efficient considering that the total amount of movies greatly surpasses
the amount of movies rated, thus increasing the size of p~ unnecessarily.
Therefore, we give an example of the second calculation. Note that in the
second case p~0 is not confidential. The service provider learns which movies
have been rated but not the actual rating. This would in theory provide less
privacy, however, in practice the service provider can potentially also learn
the watched movies from side-channels or meta data.
Example 8.1.1. (Recommendation calculation 2) [54]
For this
example we
will use the following input:
10 7 9 6
7 10 10 5
S= , p~ = (2, 3).
9 10 10 4
6 5 4 10
The calculation requires the following steps:
1. The user will encrypt p~.
3. The service provider will send this encrypted result back to the user,
along with the sum of the similarity scores, (10+7, 7+10, 9+10, 6+5) =
(17, 17, 19, 11).
4. The user can now compute the weighted mean after decryption:
~r = (41/17, 44/17, 48/19, 27/11) ≈ (2.41, 2.59, 2.53, 2.45).
After removing the already watched movies from the list, the conclusion is
that the user should watch movie 3 since it has a higher score than movie 4.
Note that we want to perform the division at the user side since we are
working over the integers in this example. For modular arithmetic, division
could be implemented using the modular multiplicative inverse. However,
in this example we want the real number as a result.
45
service provider, making this performance overhead less noticeable for the
user.
46
8.3 General framework
The described use case, together with the examples from the introduction
show some key characteristics. In general we can see the following properties
in a use case:
• remote computation;
47
new research. The research institute can choose to provide this algo-
rithm in the form of a cloud service. Directly using plaintext patient
data in their algorithm in the cloud can lead to privacy risks. When
implementing this cloud service using HE, we can run the algorithm
on encrypted patient data and return an encrypted result back to the
hospital. This also makes it possible to protect the confidentiality of
the function. The usage of this algorithm also depends on the amount
of patients that have to be diagnosed, which can be seen as a varying
demand influx.
48
Chapter 9
Proof of concept
This chapter will explain the reasoning behind the design, architecture and
technical choices. It will include several technical details that are needed to
understand the proof of concept. Several suggestions are mentioned that can
be used to further improve upon the concept.
9.1 Design
When developing a proof of concept it is important to keep in mind that we
want to show that a concept is feasible. For HE it is not possible to show
the input and output plaintext of a computation on the cloud platform,
since the entire concept is based on local storage of the private keys and
user-control over these keys. The cloud is not allowed to see the plaintext.
Therefore we decided to show the ciphertexts and their manipulations and
explain this as best as possible. In the design we have three main segments.
The data, the calculation and the result. The data tab allows the user to
upload a ciphertext and see the description and context of this ciphertext.
The calculation tab allows the user to create a calculation, in which you
select the ciphertexts and the operator. The calculations that are created
can be selected and executed. The results tab allows the user to see the
description and context of the calculation they have performed and supports
downloading the result.
The decryption of the ciphertext can not be shown in the cloud due to
the nature of the application. However, this is part of the total demo and
also is necessary to show the correctness of the computation performed in the
cloud. The only way in which the homomorphic evaluation can be shown in
the cloud demo is to disable the encryption and only show the homomorphic
addition and multiplication functionality on plaintext. However, this does
not represent the concept as well as local encryption and decryption.
49
9.1.1 Minimum viable product
In essence every application that is based on HE follows the same structure.
To start we have to encode and encrypt our data in such a way that the
evaluating party can process the data accordingly. Therefore, the first step
is always: encrypt the data. The second step is to retrieve the data for
evaluation. After evaluation we get a new ciphertext which is an encryption
of the result. This ciphertext has to be stored. The final step involves the
decryption of the data and hence retrieving the correct result. For the PoC
these three steps are structured in the following way.
1. Locally encode and encrypt the data and upload the ciphertexts.
3. Download encrypted result and locally decrypt and decode the result.
This is the minimum functionality that the PoC has to support. However,
in more advanced applications based on HE, the decode, decrypt and up-
load functionality should be mainly automated to enable processing of larger
datasets through cloud connectors. The evaluation functionality should be
extended to combine multiple ciphertexts and allow for more advanced com-
putation. The third step can also be combined with other techniques such
that only the people that are entitled to it get the correct result without
manual downloads and decryption.
50
• HElib, which has been under development with help from IBM since
2009. The library supports the BGV and the CKKS scheme. In both
schemes a number of optimizations have been implemented. HElib also
supports “assembly language for HE” which means that it is possible
to manipulate data on the lowest level [58].
These libraries are available as C++ codebases. The SAP Cloud Platform
does support C++, however for Node.js there exists a convenient buildpack.
This makes it easy to deploy a Node.js backend on the cloud platform.
Therefore we chose the Node.js implementation of the Microsoft SEAL li-
brary. This library has good documentation and is fit for the purpose of
the proof of concept. Node-seal npm package brings SEAL to Javascript
through Web Assembly called by wrappers that invoke the C++ code. It
can also be easily installed with the Node package manager (npm). Accord-
ing to benchmarks from the node-seal library, the web implementation will
be 6 times slower for addition and 4 times slower for multiplication[59]. Al-
though this implementation will be slower than the C++ version, the PoC
is not purely focused on showing the speed. A modular architecture will
allow us to switch out the backend for the C++ equivalent at any time.
Standardization
In addition the development of several open-source libraries, there is also
a standardization effort by several developers from the academic world in
combination with major companies such as Google, Microsoft and IBM[60].
In these standards the security, API and applications of homomorphic en-
cryption are defined and discussed. The goal is to create a knowledge base
for researchers and companies alike. This can be compared to the Crypto-
graphic Standards and Guidelines provided by NIST. We think this is an
important part in the secure and efficient usage of homomorphic encryption
in applications.
Microsoft SEAL
The choice for SEAL has been made based on several key aspects. The first
aspect is the clear documentation and the ease of use. This is something
that they are actively working on to improve and this keeps making it easier
for programmers without much of a cryptographic background. In short, the
library supports two types of schemes, the Brakerski Fan Vercauteren(BFV)
and the Cheon Kim Kim Song (CKKS) scheme. The BFV scheme is mainly
used when the data can be represented by integers, the CKKS scheme sup-
ports real numbers (floating points). In addition to the schemes itself there
are also several batching and encoding techniques that will be relevant for
this proof of concept. The final consideration is the speed of the algorithm.
SEAL has implemented most of the optimizations that were introduced in
51
the last couple of years which will result in higher performance and speed
in the cloud.
SAP Fiori
There are several ways to create and structure the frontend. SAP Fiori
Cloud gives developers the tools and guidelines to create a consistent user
experience. This methodology allows the creation of uniform applications
across the cloud. The client UI technology with which this is made possible
is called SAPUI5, an HTML-5 based development toolkit. To help clarify
the PoC we want a consistent and clear design, we think this is best achieved
with SAPUI5.
MTA model
As discussed earlier we want a modular architecture such that we can swap
several modules e.g. Node.js backend for a C++ backend. The cloud plat-
form provides us with a multi-target application model. In this model we
can link several modules together, sharing services such as the Hana Cloud
database or the UAA (user, account and authentication service). This means
the frontend and backend can make use of the same authentication provider
and we can easily swap modules.
52
9.2 Implementation
In Appendix B.1, we show the file structure and in Appendix B.2 we show
the user interface. The application code can be split into two parts, the
frontend and the backend. We used the pre-generated model, view controller
structure in the frontend [61]. For the backend we used a Node.js webserver,
combined with a SAP HANA Cloud database [62].
9.2.1 Frontend
The model contains all the models for the data and handles the application
data. This is separated from the view, which only shows the data through
the UI. The UI has been specified by a home view and three fragments. In
the fragments we specify all the elements such as the predefined table module
or the wizard screen. The buttons in each fragment contain an event. The
controller modifies the view and model when called through these events.
The controller also contains functions that are used to communicate with
the webserver. We used Ajax to send data and call the services of the back-
end [63].
There are also some general settings that are used for the routing within
the webapp, these can be found in the approuter. This is the single point of
entry which uses the xsuaa authentication service. This means you need to
have a user account on the cloud platform to reach the application. From
this point we can also redirect to the service backend.
9.2.2 Backend
The backend contains a database and several different services, both linked
to the frontend using the multi-target application structure and configura-
tion files. All the services require an authenticated user and a csrf-token.
To upload the files, xsjs is used. The xsOdata service provides the tables in
the frontend with the correct values to display. The final services the back-
end provides is the HE computation. They are provided through several
functions supported by the Node.js server and have been split in two parts:
53
To communicate with the database, we use calls to the stored proce-
dures. Stored procedures are pre-defined queries that can be called by the
backend. This means it prevents wrong insert statements and custom SQL
queries, which in the worst case, can lead to SQL injection. There are three
procedures:
• dataUploader, used to insert ciphertext data after uploading;
• calculationUploader, used to insert a calculation specified by the user;
• resultUploader, used to insert the result of the computation.
Database
The database used is the SAP HANA Cloud database, it is mainly used as
a database service. It provides real-time data access and analysis. However,
we only use it to store and retrieve our ciphertexts, calculations and results.
To specify the database tables there is a clear distinction between design-
time and run-time. In the design-time we specified all the tables, procedures
and synonyms. After deployment we can use the real-time container to store
and retrieve our data. To generate the three unique ID’s we use sequences,
these will automatically increment the ID for each new table entry. In the
database we have three tables:
• Project-H.database.data::Data;
• Project-H.database.data::Calculations;
• Project-H.database.data::Results.
To link design and run-time we used the following synonyms:
• Project-H.database.data::localData;
• Project-H.database.data::localCalculations;
• Project-H.database.data::localResults.
Client-side code
On the client-side we also use a Node.js server with the Microsoft SEAL
library. It is used to encrypt and decrypt the data on the client-side. It
can also provide us with some additional tests before we move the code into
the cloud. The private key used to encrypt remains offline at the client
and should be stored securely. The context of the encryption, together with
the ciphertext, can be written to a file. This file can be uploaded in the
application. The result can be downloaded and contains the same context
and the new ciphertext with the result. This can be used to decrypt and
retrieve the result.
54
Chapter 10
Related work
This chapter will take a look at other existing implementations and recent
events in the area of secure (cloud) computation.
55
• ShareMind, a database and application server that can process en-
crypted data. One of the techniques they use to accomplish this is
HE. It can provide end-to-end data protection and accountability. A
more concrete example is an implementation made using the Share-
Mind framework to prevent satellite collisions [67]. They use encrypted
trajectory data to securely compare and calculate whether or not a
collision will happen. This will improve orbital safety without sharing
confidential satellite information, e.g., in military satellites.
56
Chapter 11
Conclusions
57
The usecase we described not only illustrates why Netflix wants to use
the cloud, but also that they perform computations on sensitive data inside
the cloud. As an example we showed that their recommendation system can
be made privacy-preserving by replacing the specific calculations on sensitive
data with calculations on encrypted data, i.e., the calculation of the weighted
sum. This would protect the user preference against the service provider,
cloud provider and some of the described threats to the cloud itself. This
example also allowed us to derive some key characteristics. We concluded
that it would make most sense to use HE when outsourcing a computa-
tion on highly sensitive data to a partially or fully untrusted environment,
replacing plaintext computation rather than the traditional cryptography.
This computation often involves mathematical and statistical functions but
can also be more complex as seen in, e.g., machine learning.
This means that the use cases are very specific and also depend on the
performance of the different HE schemes. The performance of HE varies
between each different type. We will give a short summary of the different
HE classifications:
58
is applied to a pair of ciphertexts. All the numbers packed inside the ci-
phertexts are combined into one new value, by evaluating them pairwise.
The new resulting ciphertext includes the encrypted results of this pairwise
evaluation, without revealing either the input or the output.
Although the proof of concept can perform calculations, it also shows
some of the weaknesses in HE. We conclude that the confidentiality of the
data can indeed be guaranteed by secure HE schemes. However, other
wanted properties, such as integrity, availability and verifiability have to
be guaranteed through additional measures. Verifiability is especially inter-
esting for HE since it can give us guarantees about the function that has
been evaluated. When looking at HE as a primitive we can also see that
CCA security conflicts with its malleability. This changes our assumption
on implementing “just” an encryption into the conclusion that HE can only
work in good security architectures including additional controls, key man-
agement and when taking malleability into account that is inherent to HE.
The security requirements and limitations in combination with the perfor-
mance overhead show us why many people see homomorphic encryption as a
step backwards. However, recent developments show us that if secure com-
putation can be implemented correctly and efficiently, we can unlock value
from data that was previously inaccessible, i.e., in cases in which we need to
achieve confidentiality of input, output or function. This makes us conclude
that it is a good privacy-enhancing technology and that it certainly has a
future in many applications and protocols that want to include privacy by
design and truly ensure privacy end-to-end in a cloud environment.
• In this work we often considered the case in which one party has the
encryption/decryption key. In a decentralized and real-life setting this
may not always be the case. Multi-Key Homomorphic Encryption
schemes allow multiple participants to decrypt. Other options are
Treshold FHE schemes or proxy re-encryption techniques.
59
• To optimize the performance of implementations with HE, we need to
carefully choose the parameters and also have great knowledge about
the underlying scheme. It also means we have to translate our higher
level functions into HE circuits (binary or arithmetic). How can we
best achieve this?
• In the proof of concept, one could have a closer look at the application
security, look for ways to include integrity and verifiability and increase
stability and performance (by swapping to a C++ backend).
60
Appendix A
61
Appendix B
62
B.2 Proof of concept
63
Bibliography
[8] O. Regev, “The learning with errors problem (invited survey),” in Pro-
ceedings of the 2010 IEEE 25th Annual Conference on Computational
Complexity, CCC ’10, (USA), p. 191–204, IEEE Computer Society,
2010.
64
[10] P. Mell and T. Grance, “The nist definition of cloud computing,”
September 2011.
[22] I. Damgård and T. Toft, “Trading sugar beet quotas - secure multiparty
computation in practice.,” ERCIM News, vol. 2008, 01 2008.
65
Software PROtection, SPRO ’16, (New York, NY, USA), p. 1, Associ-
ation for Computing Machinery, 2016.
[24] D. Harnik, “Impressions of intel sgx perfor-
mance.” https://medium.com/@danny_harnik/
impressions-of-intel-sgx-performance-22442093595a, Dec.
2017.
[25] F.-X. Standaert, Introduction to Side-Channel Attacks, pp. 27–42. 12
2010.
[26] A. Barenghi, L. Breveglieri, I. Koren, and D. Naccache, “Fault injection
attacks on cryptographic devices: Theory, practice, and countermea-
sures,” Proceedings of the IEEE, vol. 100, pp. 3056–3076, Nov 2012.
[27] J. V. Bulck, M. Minkin, O. Weisse, D. Genkin, B. Kasikci, F. Piessens,
M. Silberstein, T. F. Wenisch, Y. Yarom, and R. Strackx, “Foreshadow:
Extracting the keys to the intel SGX kingdom with transient out-of-
order execution,” in 27th USENIX Security Symposium (USENIX Se-
curity 18), (Baltimore, MD), p. 991–1008, USENIX Association, Aug.
2018.
[28] S. Mavrovouniotis and M. Ganley, “Hardware security modules,” Secure
Smart Embedded Devices, Platforms and Applications, pp. 383–405, 06
2013.
[29] Cloud Security Alliance, “Top threats to cloud computing, the egregious
11.” https://cloudsecurityalliance.org/, June 2019.
[30] A. Shostack, “The threats to our products.” https://www.microsoft.
com/security/blog/2009/08/27/the-threats-to-our-products/.
[31] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, you, get
off of my cloud: Exploring information leakage in third-party compute
clouds,” in Proceedings of the 16th ACM Conference on Computer and
Communications Security, CCS ’09, (New York, NY, USA), p. 199–212,
Association for Computing Machinery, 2009.
[32] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining dig-
ital signatures and public-key cryptosystems,” Commun. ACM, vol. 21,
p. 120–126, Feb. 1978.
[33] S. Goldwasser and S. Micali, “Probabilistic encryption,” Journal of
Computer and System Sciences, vol. 28, no. 2, pp. 270 – 299, 1984.
[34] P. Paillier, “Public-key cryptosystems based on composite degree
residuosity classes,” in Proceedings of the 17th International Confer-
ence on Theory and Application of Cryptographic Techniques, EURO-
CRYPT’99, (Berlin, Heidelberg), p. 223–238, Springer-Verlag, 1999.
66
[35] D. Boneh, E. Goh, and K. Nissim, “Evaluating 2- dnf formulas on
ciphertexts”,” Proceedings of the 2Nd Conference on Theory of Cryp-
tography, pp. 325–342, 01 2005.
[43] J. von zur Gathen and G. Seroussi, “Boolean circuits versus arithmetic
circuits,” Information and Computation, vol. 91, no. 1, pp. 142 – 154,
1991.
[45] Z. Brakerski, Cryptographic Methods for the Clouds. PhD thesis, the
Weizmann Institute of Science, Rehovot, Israel, 2011.
67
[46] N. Smart and F. Vercauteren, “Fully homomorphic simd operations,”
IACR Cryptology ePrint Archive, vol. 2011, p. 133, 01 2011.
[47] S. Halevi, Y. Polyakov, and V. Shoup, “An improved rns variant of the
bfv homomorphic encryption scheme,” in Topics in Cryptology – CT-
RSA 2019 - The Cryptographers’ Track at the RSA Conference 2019,
Proceedings (M. Matsui, ed.), Jan. 2019.
[51] M. Chenal and Q. Tang, “On key recovery attacks against exist-
ing somewhat homomorphic encryption schemes.” Cryptology ePrint
Archive, Report 2014/535, 2014. https://eprint.iacr.org/2014/
535.
68
[57] “PALISADE Lattice Cryptography Library (release 1.9.2).” https://
palisade-crypto.org/, Apr. 2020.
69