Secure Cloud EHR With Semantic Access Control, Searchable Encryption and Attribute Revocation
Secure Cloud EHR With Semantic Access Control, Searchable Encryption and Attribute Revocation
Secure Cloud EHR With Semantic Access Control, Searchable Encryption and Attribute Revocation
Abstract—To ensure a secure Cloud-based Electronic Health be financially devastating to an organization. Noncompliance
Record (EHR) system, we need to encrypt data and impose carries a range of penalties depending on the degree of
field-level access control to prevent malicious usage. Since the misconduct which can result in hefty fines. Violations can also
attributes of the Users will change with time, the encryption
policies adopted may also vary. For large EHR systems, it is lead to an arrest. As a result, an EHR solution must abide with
often necessary to search through the encrypted data in real- all applicable laws and regulations and also allow an easy and
time and perform client-side computations without decrypting all seamless exchange of patient information.
patient records. This paper describes our novel cloud-based EHR
system that uses Attribute Based Encryption (ABE) combined A. Motivation
with Semantic Web technologies to facilitate differential access
to an EHR, thereby ensuring only Users with valid attributes EHR system with semantic access control. Recently, Walid
can access a particular field of the EHR. The system also et al. [55] proposed a cloud-based EHR system that offers
includes searchable encryption using keyword index and search
trapdoor, which allows querying EHR fields without decrypting
a semantically rich, policy-driven mechanism that employs
the entire patient record. The attribute revocation feature is Attribute-Based Access Control (ABAC) [22] to evaluate
efficiently managed in our EHR by delegating the revision of the users’ entrance to the system.
secret key and ciphertext to the Cloud Service Provider (CSP). The architecture of their system was created using Semantic
Our methodology incorporates advanced security features that Web Technologies [7]. By referencing the HIPAA knowledge
eliminate malicious use of EHR data and contributes significantly
towards ensuring secure digital health systems on the Cloud.
graph (ontology) developed in [23], they created a HIPAA-
Index Terms—Attribute Revocation, Searchable Encryption, consistent knowledge graph. This way, the system systemati-
Electronic Health Record, Knowledge Graph (Ontology), Cloud cally addressed the issues with the aforementioned compliance
Computing, Cloud Security problem.
In particular, the system used a knowledge graph to derive
I. I NTRODUCTION user attributes and the EHR fields based on the type of
request. The knowledge graph is queried using SPARQL, and
Healthcare organizations are increasingly adopting cloud-
it entails complete details of individuals in the organization
based technologies to maintain their digital records efficiently.
and their associated unique attributes. The unique attribute
A cloud-based Electronic Health Record (EHR) service allows
control various access to different fields of an EHR. Thus,
centralizing patient data and using the advantages of elasticity
each individual has distinct access to a patient EHR.
and scalability of a cloud infrastructure [5], [34], [35], [40].
The system also allowed the client to search through en-
The cloud also offers a highly supportive atmosphere for
crypted data based on keyword queries, without needing to
efficiently handling the load [2]. Moreover, cloud services are
download and decrypt all encrypted data from the cloud.
usually a more economical solution than others when they
develop a technology infrastructure to deploy their services. Disadvantages of the work [55]. Although their system
It has become much more popular due to the pay-as-you-go has many good features, it still has a few disadvantages. In
concept, which incentivizes customers to pay only for what this work, we want to improve the system by resolving those
they want and how much they use. disadvantages.
Although the Cloud offers many advantages, it also con- First of all, the system uses two different encryption
tinues to pose unique risks to healthcare organizations in schemes, one for searchability and the other for data encryp-
terms of data privacy and security. With the recognition of the tion. It often becomes more cumbersome in large systems to
security risks, all healthcare organizations must comply with have multiple schemes within a system, as more keys and
Health Insurance Portability and Accountability Act (HIPAA) policies may have to be maintained. A system would be better
[14], [50] and Health Information Technology for Economic if it uses a single encryption scheme.
and Clinical Health (HITECH) [49] privacy guidelines set by Second, although the system provides the search feature, the
regulatory authorities. Failing to comply with the acts can search time is quite slow. In the presence of Big data, it usually
requires a lot of time and analysis to search through the data changes in the system along with the associated EHR fields.
to locate specific patients with certain diseases or conditions. So, the knowledge graph functions to revoke unwanted user
We want to speed up the search time. attributes and help to protect patient privacy.
Most importantly, the system doesn’t provide revocation. The outsourcing framework also benefits when the system
It is essential that an attribute-based EHR system have an performs revocation. In particular, when a user attribute is
attribute revocation feature, since the organization policies and revoked, the revocation takes place only in the CSP level.
the user attributes keep changing with time. For example, a This works because a user secret is split into two keys, and
physician might have been promoted, so its attributes would revoking only one key would still disable the function of a user
change. Likewise, an employee might have moved to another secret. The secret key that lies within the user remains firm
department, or an employee leaves the organization, so its during the entire revocation process, which greatly simplifies
unique attributes must be revoked from the system. Often the the revocation process.
organization policies also vary with time, which requires some
attributes to be revoked. Edge computing. The term edge computing [51] refers to
B. Our Work the need to analyze data locally before sending it to the cloud,
and we have followed this principle in our system. Inside the
In this paper, we improve the EHR system in [55] by
organizational periphery, which we refer to as the edge in
resolving the issues mentioned above. Below, we overview
our system, we enforce an access control mechanism on the
our system.
records. All users are checked only within the organization’s
Underlying ABE encryption. We use the revocable, search- borders, preserving their anonymity. Within the organization
able ABE scheme introduced by Wang et al. [59]. The scheme limit, we have implemented a robust encryption approach that
provides both searchability and data encryption, which fits protects data integrity from privacy risks until moving it to the
our purpose. The scheme also allows us to outsource the cloud. As a result, the frontier continues to be a formidable
computation to the cloud, which we describe next. barrier to data protection.
Outsourcing the computation to cloud. In general, our
system requires fewer computations as most functions are Threat model. Cloud users, while storing their data on the
safely delegated to the cloud service provider (CSP). This cloud, usually categorize CSP on one of these threat models:
is achieved by splitting each user’s secret into two keys and the Honest-But-Curious (HBC) adversary model, where CSPs
having one key uploaded to the CSP while the other key is run the programs and algorithms correctly, but might look
kept secret to the user. at the information passed between entities; or the malicious
For example, since each user has a dedicated private key adversary model, where providers behave in an arbitrary
stored within the CSP, partial decryption is delegated to the manner that may be hostile to the cloud customer [42]. We
CSP. The output from the partial decryption still hides the have considered the HBC threat model for our approach since
plaintext from the CSP and outsiders. However, given the cloud users trust the functionality of their applications running
partial decryption from the CSP, the user can recover the on the cloud, but they may not fully trust the CSP whose
plaintext record with much less effort. dataset is stored in distant Cloud data centers. First of all,
Clouds may be exposed to tainted workers who fail to adhere
Faster search time. Owing to the presence of Big Data,
to data protection standards. Secondly, cloud applications may
searching through encrypted data requires utmost attention.
be subject to external cyberattacks, and cloud users may not
The search time can also be improved, thanks to the outsourc-
be aware of the possible repercussions on their data security
ing framework.
when such invasions occur. The users worry that the CSP
In particular, to search for a keyword in the encrypted index
might attempt to decrypt the data by analyzing it thoroughly
for the EHR database, the user first creates a token connected
or monitoring data traffic between users. We presume our
with a keyword query. For privacy, the token hides the keyword
framework to withstand a compromised user attempting to
from the CSP.
decode ciphertext with her decryption key and gain knowledge.
Once the CSP gets the search token, the CSP uses the token
We also assume our system to be resilient in the face of a user
to run the search algorithm over the ciphertexts to see which
alliance trying to crack the ciphertext with decryption keys
ones have the privately linked keyword (s). When the indexes,
that no single member of the coalition can decode with her
keywords, and the user’s attributes are set to meet the ci-
decryption key.
phertext access control policies, the cloud service retrieves the
search results. The encoded version of the message containing
the keyword is sent back to the recipient.
C. Organization
Our technique enables keyword search with substantially
reduced network latency and client-side computing costs com- The remainder of the paper is structured as follows – We
pared to prior work [55]. discuss related work in Section II, preliminaries in section III,
Attribute revocation. The knowledge graph records all system architecture in Section IV, system implementation in
users and patients in our framework, considering all attribute Section V, and conclusions in Section VI.
II. R ELATED W ORK retrieval and their usage of incorrect contexts. Semantic Web
A. Electronic Health Record System technologies include languages such as Resource Description
Framework (RDF) [26] and Web Ontology Language (OWL)
Digital health record systems are commonly employed to
[43] for defining ontologies and describing meta-data using
enhance hospital services, improve treatment efficacy, and
these ontologies as well as tools for reasoning over these
reduce premiums [16], [25]. EHR records a patient’s vital stats,
descriptions.
diagnoses, medications, immunization history, laboratory and
radiology reports, doctor notes, and other medical facts along Our most fundamental requirement is for a representation
with the patient’s details. An EHR system provides several that supports interoperability at both the syntactic and semantic
benefits such as accurate documentation, disease tracking, levels. OWL has well-defined semantics grounded in first-
data sharing, statistical analysis, and so forth. Consequently, order logic and model theory, allowing programs to draw
security and privacy concerns have hampered the spread of inferences with the assurance that the subsequent interpretation
the EHR system, and they have received increasing focus in is sound. An important advantage for OWL over many other
current years [35]–[37], [48]. Narayan et al. [45] recommended knowledge-representation systems is that it has well-defined
using ABE to protect the privacy of EHR data from outside subset profiles guaranteeing sound and complete reasoning
threats, as well as the CSP. Fatos et al. [60] originally with various levels of reasoning complexity and is designed
presented a multi-user fuzzy keyword search method that to work with popular implementation technologies, such as
supported fine-grained permission restriction over encrypted OWL QL for databases and OWL RL for rule-based systems.
data. A second design requirement is for a language that is designed
Unfortunately, most proposed solutions fall short in provid- to integrate well with the Web and Cloud, which is becoming
ing controlled access, encryption device, searchable encryp- the dominant technology for today’s digital health systems.
tion, and attribute revocation. Furthermore, the majority of These technologies can be used to provide common semantics
the accessible application is licensed, making them costly to of service information and policies enabling all agents who
use. In this circumstance, our research effort tries to develop understand essential Semantic Web technologies to communi-
an open-source, low-cost EHR management system that can cate and use each other’s data and Services effectively. OWL is
provide advanced data privacy and protection levels. built on basic Web standards and protocols and is evolving to
remain compatible with them. It is possible to embed RDF and
B. Regulatory Policies OWL knowledge in HTML pages, and several search engines
Patient data is secured in the United States under some (including Google) will find and process some embedded RDF.
statutes; the most notable is the HIPAA Act. Electronic safe
health information (ePHI) [11] is the name given to the
information about one’s health that is protected by these
D. Attribute-Based Encryption
rules. The Health Information Technology for Economic and
Clinical Health Act (HITECH) allows sharing ePHI while still
ABE [17], introduced by Sahai and Waters, has been one
requiring HIPAA privacy and protection laws to be applied
way to ensure data security and eliminate risks. In ABE [17],
more strictly and thoroughly [1]. These rules, on the other
The data is encoded using a set of attributes, and the private
hand, make no mention of encryption principles or algorithms.
key is defined using a different set of attributes. Based on the
Furthermore, data encryption in data access control and trans-
threshold parameter, the ciphertext can only be deciphered if
fer is defined as addressable rather than mandatory. This left
the two sets of attributes overlap. One of the EHR system
space for different definitions and then became a source of
security developments was known as ABE [3], [6], and [45].
debate regarding sharing ePHI. Cloud-based EHR services in
It has been further divided into ciphertext-policy ABE (CP-
the United States are required to comply with these regulatory
ABE) [8] and key-policy ABE (KP-ABE) [4] due to lack of
standards and ensure enhanced data protection combined with
expressibility. The secret key is coupled with an attribute set in
the seamless user experience that cloud services offer. This
CP-ABE [8], and the ciphertext is paired with an access policy.
also requires implementing strict access control mechanisms
In most cases, the policy is defined as a Boolean formula
to provide unauthorized access by any user is prohibited by
with a specific set of attributes. A secret key may decrypt a
their EHR.
ciphertext if the attributes set match the access policy, while
C. Semantic Web Technologies the whole scenario is reversed in the KP-ABE scheme.
We have used Semantic Web technologies to develop our CP-ABE [8] is considered more effective for authentication
system’s knowledge graph and the reasoning component of in the cloud because an individual ciphertext defines a policy
our system. These enable us to build the schema using W3C that explicitly specifies attributes that data users must hold for
standardized languages that support our design requirements, the encryption process. Joshi et al. [24] developed attributed-
including interoperability, sound semantics, Web integration, based access control (ABAC) that is semantically enriched in
and the availability of tools and system components. Seman- accessing data leveraging CP-ABE [8]. Their model evaluates
tic Web tools enable data to be annotated with machine- access categories based on user attributes and EHR fields. The
understandable meta-data, allowing the automation of their national hub controls both EHR secure entry and distribution.
E. Attribute-Based Encryption With Attribute Revocation on symmetric cryptography, establishing a significant standard
Since the user’s attribute can vary significantly over time, for keyword search on encrypted data. Boneh et al. [10] later
attribute revocation is crucial in ABE frameworks. Perretti et pioneered SE research into public-key cryptography. Follow-
al. [46] were the first to implement attribute revocation, which ing that, numerous SE schemes were developed to improve
they accomplished by a timed rekeying process. Each attribute search performance, security issues, and search functionality
had an expiration time in the system, so authority centers [12], [15], [27], [32], [53]. Attribute-based keyword search,
had to reprint revised keys regularly. The authority center had which combines ABE and SE properties, has seen many hypes
to cease releasing and modifying the current edition of the in current history that can be seen in [28], [29], [33], [44], [58],
attribute to revoke an attribute in the scheme. Bethencourt [64], and [38].
et al. [8] later expanded Perretti’s work where there was a III. P RELIMINARIES
single expiration time connected with the user’s private key.
Let λ be the security parameter.
Boldyreva et al. [9] proposed a revocable KP-ABE scheme
that improved on their previous revocable IBE. Wang et al. A. Revocable, Searchable ABE
[57], [56] presented two explicitly revocable CP-ABE frame- In this section, we describe revocable, searchable attribute-
works based on bilinear and multilinear maps, accordingly. based encryption scheme.
Several ABE systems involving instant attribute revocation
were suggested in current history. Yu et al. [62] and Ibraimi Syntax. Let X be the attribute universe. A revocable, search-
proposed et al. [21] the CP-ABE scheme that employs a semi- able ABE consists of the following algorithms:
λ
trusted proxy server to execute instant attribute revocation. • Setup(1 , X ) → (mpk, msk, msvk). The setup algorithm
Their approach shifted the authority’s responsibilities to the gets as input the security parameter λ, the attribute
proxy server, significantly reducing the authority’s burden. universe X . It outputs the public parameter mpk, the
They have, nevertheless, been unable to obtain fine-grained master secret key msk, and the master secret version key.
access control. Furthermore, as the number of users increases The master secret version key will be updated when users
rapidly, the proxy server’s update work skyrockets. Li et al. or attributes are revoked through algorithm Update-msvk
[31] devised an effective CP-ABE scheme of user revocation described below.
1 2
with a lower computing expense. Several other schemes can • KeyGen(msk, msvk, x) → (skx , skx ). The key generation
be seen in [66], [47], [63], and [39]. algorithm gets as input msk, msvk and a set of attributes
Computational efficiency is another consideration in the x. It outputs a pair of secret keys (sk1x , sk2x ).
latest ABE schemes. Outsourced decoding technologies can The first key sk1x will be sent to the user, and the second
help decrease the user’s computing load. Green et al. [18] first key sk2x will be stored on the cloud server.
proposed an effective ABE scheme that facilitates outsourced • Enc(mpk, msk, f, m) → ctf . The encryption algorithm
decoding. The bulk of decryption activities are done by the gets as input mpk, and a boolean formula f over X , and
CSP using the users’ key. Zhou et al. in [65] suggested a message m. It outputs a cipehrtext ctf .
an optimized data management system centered on mobile • EncInd(mpk, W ) → IW . The encrypted index algorithm
devices, in which portions of the encryption and decryption gets as input mpk, and a set of keywords W . It outputs
processes were safely delegated to the CSP without sensitive an encrypted index IW for W .
1
data leakage. Li et al. [30] proposed an ABE scheme including • Token(skx , w) → tw . The token generation algorithm
complete verification for outsourced decryption, which ad- gets as input the user secret key sk1x and a query keyword
dresses the problem of ensuring the accuracy of outsourced w. It outputs a token tw .
2
decryption for unauthorized individuals. The scheme imple- • Test(skx , IW , tw ) → 0/1. The test algorithm gets as
mented in our systems seems to be perfect compared to [20], input the clout secret key sk2x , the encrypted index IW and
[61], [54], and [41]. the user generated token tw . If the embedded keyword in
tw is contained in IW , it outputs true; otherwise it outputs
F. Keyword Search Over Encrypted Data false.
Fast and efficient searchability is required for any EHR Note that this algorithm can be performed by the cloud
system, particularly in the movement of evidence-based health- that holds the key sk2x when it receives the token tw for
care, because doctors have a limited time in which to make the user; the encrypted index IW is typically stored on
judgments. Dawes et al. [13] mentioned that time constraints the cloud server.
2
are the most significant factor impeding computer systems in • Decrypt-cloud(skx , ctf ) → pd. This algorithm gets as
medical practice. Physicians indicated that response time is input the cloud secret key sk2x and the ciphertext ctf . If
one of the obstacles to EHR system adoption in another study f (x) = 1, it outputs partial decryption pd; otherwise, it
by Holden et al. [19]. Searchable encryption (SE) thus remains outputs an error.
1
to be of the utmost important feature in EHR systems. • Decrypt-user(skx , pd) → m. Given the partial decryp-
SE is an encryption technique that allows users to scan tion, the user with sk1x will recover the message m.
for keywords in cyphertext without revealing the keywords. • Update-msvk(msvk, x) → ∆x . This algorithm is run by
Song et al. [52] first devised a realistic SE scheme focused the central authority to update the attribute x when a
user with attribute x is revoked. The algorithm updates organizations control these two modules, they are known as
the master secret version key for the attribute x, and also trustworthy bodies. All users are authenticated inside the or-
outputs ∆x to be used for updating the master public key, ganizational perimeters, preserving their anonymity. The other
the cloud secret key that is associated with attribute x, section concerns an untrustworthy CSP. Before uploading data
and ciphertexts associated with attribute x. to the cloud, a rigorous encryption approach is implemented
• Update-mpk(mpk, ∆x ). This algorithm updates the mas- within the organizational border to protect data integrity from
ter public key mpk using ∆x . privacy risks. An attacker may also be sabotaging the CSP. In
• Update-cloudkey(sk2x , ∆x ). This algorithm updates the our system, we assume a compromised CSP will behave in an
cloud secret key sk2x using ∆x . honest-but-curious manner [42].
• Update-ct(ct, ∆x ). This algorithm updates ciphertext ct Our framework has a diverse set of users, authorities, and
using ∆x . data owners from various medical fields. A single CSP stores
the EHRs, encrypted index file, and user’s secondary secret
Revocation Security. For a stateful adversary A and secu- keys. The Authentication module performs a thorough check
rity parameter λ, we define an experiment ExptrevokeA (λ) as on any request to the framework. Each user is granted access
follows: rights based on attributes as determined by the organization’s
Exptrevoke
A (λ): policies. Patients have read access to all fields of their EHR.
f ∗ ← A(1λ );
(mpk, msk, msvk) ← Setup(1λ , X ); Use cases. Whether users choose to read, write, revoke an
(m0 , m1 ) ← AKeyGen(msk,msvk,·),Update-msvk(msvk,·) (mpk); attribute, or browse through encrypted EHRs, our framework
b ←R {0, 1}; has multiple use cases. A user first asks for access to the EHR
ctf ∗ ← Enc(mpk, f ∗ , mb ); system. The Authentication Module reviews the application by
b0 ← A(ctf ∗ ) looking over the user attributes in the user graph and ABAC
If b = b0 output 1; otherwise output 0. rules defined according to the individual company policy. If
the attributes follow the guidelines of the company, access is
In the above, all queries x that A makes to oracle
granted.
KeyGen(msk, msvk, ·) should satisfy f ∗ (x) 6= 1. In addition,
Whenever a user modifies an EHR, the framework uses the
all queries m0 and m1 should have the same length.
Data Processing Module to encrypt the updated details of the
A revocable, searchable ABE is said to be revocation
accessed fields. The Attribute Control Center in this module
secure, if for all polynomial adversary A, the probability
supplies the user attributes during the process. The Key
| Pr[Exptrevoke (λ)] − 1/2| is negligible in λ.
Production Unit provides encryption keys for re-encryption.
Keyword-search Security. For a stateful adversary A and The EHR ontology housed with the CSP is then modified with
security parameter λ, we define an experiment Exptkeyword (λ) the ciphertexts. A similar operation is performed during a read
as follows: request.
Exptkeyword
A (λ): During the search process, the user enters the search key-
(mpk, msk, msvk) ← Setup(1λ , X ); word as a request. The Key Production Unit provides the keys
x ← A(mpk); used for searching. Using the search keyword and hidden keys,
(sk1x , sk2x ) ← KeyGen(msk, msvk, x); the Token Origination Unit creates a trapdoor. The trapdoor is
1
(W0 , W1 ) ← AToken(skx ,·) (mpk); then sent to the CSP, where it is compared to the encrypted
b ←R {0, 1}; Indexes. The search operation retrieves encrypted EHRs if
IWb ← EncInd(mpk, Wb ); there is a match. The user may then choose any particular
1
b0 ← AToken(skx ,·) (IWb ) EHR to decrypt.
If b = b0 output 1; otherwise output 0. Attribute revocation is entirely handled in the Data Process-
ing Module. The user gives revoked attributes to the Attribute
In the above, all queries w to Token(sk1x , ·) should satisfy
Center, which it stores and supplies to Cryptography Unit. The
w 6∈ {W0 , W1 }.
Key Production Unit provides the master key. The ciphertext
A revocable, searchable ABE is said to be keyword-search
and the secondary secret that lies with the CSP are then
secure, if for all polynomial adversary A, the probability
updated to account for the changes.
| Pr[Exptkeyword (λ)] − 1/2| is negligible in λ.
In the following sections, we will go through each sub-
The scheme we use. In this paper, we use the scheme module in detail.
in [59] that satisfies both revocation security and keyword-
A. Authentication Module
search security.
Any login request passes through a comprehensive inves-
IV. S YSTEM A RCHITECTURE tigation in this module. The key policy behind the module
The entire framework is based on the principles of Edge is the ABAC. There are also several units within the module
computing [51]. It is divided into two sections, with the or- with critical functions. The user’s login information is at first
ganizational boundary comprising the Authentication Module checked in the database. If it passes, the sub-modules begin
and Data Processing Module as shown in Figure 1. Since to perform their functions.
Fig. 1. System Architecture