Data Mangement

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

International Journal of Recent Technology and Engineering (IJRTE)

ISSN: 2277-3878, Volume-7, Issue-5C, February 2019

An Overview of Data Management in Cloud


Computing
K.Yogitha Lakshmi, S.Dhanalakshmi, B.G.Obula Reddy 

Abstract: As we all familiar with cloud computing, it’s Community Cloud: It can be managed, operated and
not a latest technology, rather we can mention it as an owned by one or more organizations in the community, or a
emerging technology where most of the industry is trying third party or combination of both. It may exists on/off
to store not only its crucial data for redundancy but also premises. Eg: Salesforce community cloud etc.,
looking for the service management. In that scenario first Hybrid Cloud: It is the combination of two or more types
thing comes in mind is management of data in most of infrastructure services that works under proprietary rules
efficient way possible. So here we tried to showcase two and standards. Eg: VMWare vCloud etc.,
technologies of cloud data management namely Cloud
BigTable and Cloud DataStore which they have their own
way of working environments. It makes so much
importance to choose the right technology for the right
nature of work.
Keywords: Cloud Storage, Data Management,
Virtualization, Google File Systems, Data Store

1. INTRODUCTION
Cloud computing referred as delivery of computing
services such as networking, servers, databases, storage,
virtualization, storage, software, business analytics and so
on over the Internet as a utility just like using
telephone/mobile services. It offers product innovations and
flexible resources for the business like Pay per Use services
Fig.1. Cloud Architecture
from The Cloud. The advantages of cloud computing are:
Flexible resources - On-demand services gives user a quick
scale up or down of the resources. Metered service gives the
2. THE ROLE OF VIRTUALIZATION IN CLOUD
liability to pay for what you use. Self service you can access ENVIRONMENT
all the IT resources without any assistance.
Virtualization is a multi-tenancy user infrastructure which
A. Deployment Models of Cloud Computing is located at remote site and can perform function of
Deployment models of cloud computing represents that multiple systems in one physical system by means of high
public cloud, private cloud, hybrid cloud, community cloud speed internet. In cloud environment it comes under IaaS
and different services. The Fig.1. Shows the representation (Infrastructure as a Service), where the cloud consumer gets
of cloud architecture in various models and its services. the service to use cloud based ready to access virtual
Public Cloud: This infrastructure will be used by public storage and also some built in services. The pricing of these
cloud user in which some of the services will be unavailable. services depends on data storage no. of GB used per hour,
These resources will be provided and organized by a cloud network infrastructure used per hour etc., Fig.2. Represents
service provider, academics or other organizations. The the components stack of virtualization in hypervisor and
cloud server exists on the premises of the cloud provider. hardware parts.
Eg: Google App Engine, Windows Azure etc., Virtualization Component Stack consists of Hardware,
Private Cloud: This infrastructure is for exclusive use of a Operating Systems, Middleware and application layer.
single organization with various services, it can be managed Operating Systems layer split into two parts:
by the organization, a third party or sometimes both. It a) Hypervisor is also called as virtual machine manager
exists on or off the premises. Eg: VMWare, RedHat etc., which allow user to have multiple OS in single hardware
b) Guest OS is a running Operating system within the
Virtual machine.
Revised Manuscript Received on February 22, 2019.
K.Yogitha Lakshmi, Assistant Professor, Department of IT, Malla
Reddy Engineering College(A),Telangana,India
S.Dhanalakshmi, Professor, Department of CSE, Malla Reddy
Engineering College(A),Telangana,India
B.G.Obula Reddy, Professor, Department of CSE, Malla Reddy
Engineering College(A),Telangana,India

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: E10160275C19/19©BEIESP 61 & Sciences Publication
International Conference on Advances in Signal Processing, Power, Embedded, Soft Computing, Communication and
Control Systems (ICSPECS-2019) | 11th & 12th January 2019 | GPREC, Kurnool, A.P. India

It provides efficient storage management in large IT sector


and reduces downtime. Three types of storage
virtualizations are DAS(Direct Attached Storage),
SAN(Storage Area Network), NAS(Network Attached
Storage). DAS is a primary way of data storage, where the
storage drivers are directly connected to the server. NAS
follows a method of storage called sharded method which
connects through the networks and it is used for file sharing,
device sharing and scheduled/ Ad-hoc backup of the server.
SAN is a technique of storing data in a device that is shared
among different servers over a high speed network.

4. DATA MANAGEMENT IN CLOUD


COMPUTING
Fig.2. Virtualization Component Stack In cloud computing data management itself is a big
The three goals of virtual machine are; challenge in processing large quantity of data for the
Equivalence: To poses unbiased hardware performance purpose of data storage, parallel processing of data
among all the VM's(Virtual Machine) execution, analytical processing and online query execution
Resource Control: The VM's should be in complete all by ensuring consistency and durability under peak loads.
control of any virtualized resources Some of the cloud based analytical data management
Efficiency: The VM's instructions should be executed systems are: BigTable, HBase, HyperTable, Hive and
from its CPU rather than involving hypervisor. HadoopDB. PNUTS and Cassandra are the web based data
management systems. Here, in this paper the working nature
3. TYPES OF VIRTUALIZATION of BigTable and Dynamo will be discussed
There are mainly three types of virtualizations namely A.GFS (Google File System)
Server virtualization, Client Virtualization and Storage It is designed to manage large files in distributed
Virtualization. The Fig.3. Shows the different categories of networks of servers which is connected by a high speed
virtualizations. internet. It provides atomicity during read/ writes operations
Server Virtualization: It is the most common type of of individual files. Supports read/ write and update
virtualization in cloud computing, where it gives optimum operations simultaneously by multiple client programs.
usage of server by running multiple applications on multiple
operation systems at the same time on single server with the
use of hypervisor by controlling CPU, Memory and other
components without need of source code.

Fig.4. GFS Architecture


A single Master controls the namespace. A large file will
be cut down to chunks or blocks with the size of 64MB.
These Chunks (GFS) or Name nodes(HDFS) are stored on a
Fig.3. Types of Virtualization servers called Chunk Server. The main functionality of this
server is to replicate these chunks three times on different
Client Virtualization: In client virtualization, the physical racks and network segments.
administrator can manage and control the operations of Read Operations in GFS: a) Client program request for
client machine like personal devices. Here we need to have metadata by sending its full path and offset of a file to
glance at three types of client virtualizations. First is remote MASTER or NAMENODE. b) MASTER replies back with
level, where consumers can able to access cloud server metadata from one of its replica chunk where the data is
which is located remotely anywhere and anytime across a found.
network. Second, local level which runs on local server for Write/Append Operations in GFS: For initiating
the purpose of security. Third, application level write/append operation the process is same as read operation
virtualization which allows applications to run on isolated or along with some extra steps. a) Client sends its data to be
private environment which is accessible by providing
authentication.
Storage Virtualization: In Storage virtualization, a single
storage device manages multiple network storage resources.

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: E10160275C19/19©BEIESP 62 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-7, Issue-5C, February 2019

appended to all its chunk servers b) Chunk server B.Google Cloud DataStore
acknowledge the receipt of this data. c) Among all its chunk Google cloud Datastore is a NoSQL document database
server replicas, MASTER chooses a PRIMARY Chunk developed for incredible scalability, high performance and
server which is responsible to append the client data to its to support application development. The most appreciated
secondary chunk servers. feature in cloud datastore is to provide high performance to
Fault Tolerance in GFS: MASTER bind its its subscriber even in the high incoming data traffic
synchronization with its replicas by sending regular Heart- situation. It Maintain ACID properties and also it give high
Beat messages. In case of failure, chunk server meta-data availability.
will be sent to MASTER and it will choose a new Primary Cloud DataStore is used for applications like:
Chunk server. Product Catalog where it provides real-time inventory
User Profiles where the retailer can view the preferences
5. MANAGING DATA IN CLOUD of the user based on past interests.
ENVIRONMENT Bank Transactions where ACID property will guarantee
Managing data in cloud environments can be provided the transaction of transferred funds.
different storage techniques in google platforms. Its specify All the data in Datastore stores in one bigtable called as
features of Google Cloud BigTable and Google Cloud Entity Table. It stores data horizontally across its disks in
Datastore. which it is called as shared and key values are sorted
lexicographically.
A.Google Cloud BigTable
It can handle multiple queries at a time by various users
BigTable is a distributed storage system that store large with the help of multiple index tables. For every data set
amount of data such as petabytes in NoSQL cloumn- they have entity sets from where user gets the results back.
oriented way of data store developed by Google Inc. to for example a query will have a defined set of results say
manage its internet search and web service functions. It 100 entities, because of this scenarios some queries would
works on powerful database servers which gives the benefits not get support in cloud datastore. Like in traditional
of scalability, easy administration and maintain elasticity of RDBMS cloud datastore doesn't support schema and it is a
cluster without any down time. schemaless database. Cloud Datastore do not support join
BigTable is used to store and query the following types of operations, it won't filter data from a table with multiple
data: keyed properties or by the result of a subquery. Cloud
Time Series Data Datastore doesn't do justice for analysis of data but it can
Marketing Data provide assurance for a transactional data.
Financial Data
Internet of things Data 6. CONCLUSION
Graph Data
In cloud computing environment, without the
The Fig.5. Represents the BigTable Storage Model of
virtualization technique it would not be possible to use
rows and columns specifications. Each column store
single hardware device among the users. It is the basic
arbitrary value as name-value pair in form of column family.
service of any development in cloud computing. Data
At the time of table creation initial value of no.of column
management in cloud computing shows the rapid growth of
families will be fixed. Labels of column families can be
deployment in remote servers for the purpose of storage and
created at any point of time. Each BigTable cell can contain
cloud services. Cloud BigTable is mainly used for the non-
multiple versions of data in decreasing order of timestamp.
transactional data where it does not give any redundancy for
the data. It can be used for data analytics where you can get
the results by querying historical data. Cloud DataStore is
built on BigTable but they are completely different from
each other, where it supports ACID properties of the
transaction and it is used on transactional data. It features
are similar to SQL but it cannot perform some operations.

7. REFERENCES
Fig.5. BigTable storage Model
Each table in BigTable will be divided into different row 1. Hamlen, K. Kantarcioglu, M. Khan, L. Thuraisingham,
B. (2010). Security Issues for Cloud
ranges called tablets. These Tablets will be maintained by a
Computing.International Journal of Information Security
server called tablet server. It stores each column family in an and Privacy, 4(2), 36-48.
allocated row range inside a distributed file called SSTable. 2. Bernardo Ferreira, Henrique Domingos (2012).
BigTable maintains its meta-data table in a single meta-data Management and Search of Private Data on Storage
server which is used to locate the user tablets in response to Clouds.Center for Informatics and Information
their read/write operations. The meta-data table itself will be Technologies.SDMCMM’12, December 3-4, 2012,
divided into no.of tablets to support its large amount of data Montreal, Quebec, Canada.
in most effective way . Root Table will help point out other
meta-data tablets. It supports large parallel reads and inserts
operations simultaneously on the same table.

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: E10160275C19/19©BEIESP 63 & Sciences Publication
International Conference on Advances in Signal Processing, Power, Embedded, Soft Computing, Communication and
Control Systems (ICSPECS-2019) | 11th & 12th January 2019 | GPREC, Kurnool, A.P. India

3. RizwanMian, Patrick Martin (2012). Executing data-


intensive workloads in a Cloud.ACM International
Symposium on Cluster 2012 12th IEEE/ACM
International Symposium on Cluster, Cloud and Grid
Computing.
4. Xiao-Bai Li, SumitSarkar (2006). Privacy Protection in
Data Mining: A Perturbation Approach for Categorical
Data Information Systems Research. (17) 3, 254–270
5. Iyengar, V. S. (2002). Transforming data to satisfy
privacy constraints. Knowledge Discovery
DataMining.ACM Press, New York, 279–288.
6. Daniel J. Abadi (nd) Data Management in the Cloud:
Limitations and Opportunities. IEEE Computer Society
Technical Committee on Data Engineering
7. B. Siddhisena, Lakmal Wruasawithana, Mithila Mendis,
―Next generation muti tenant virtualization cloud
computing platform‖, In: Proceedings of 13th
International conference on advanced communication
technology(ICACT), vol. 12, no.3; 2011. p.405–10.
8. Z. Xiao and Y. Xiao, ―Security and Privacy in Cloud
Computing‖, IEEE Communications Surveys &
Tutorials, vol. 15, no. 2, pp. 843–859, 2013.
9. Sunilkumar S.Manvi, Gopal Krishna Shyam, "Resource
anagement for Infrastructure as a Service(IaaS) in cloud
computing: A survey", Journal of Network and Computer
Applications 41, (2014) 424–440.
10. Chase JS, Darrell C Anderson, Prachi N Thakar, Amin M
Vahdat, ―Managing energy and server resources in
hosting centers‖, In: Proceedings of 11th IEEE/ACM
international conference on grid computing (GRID),
vol.12, no.4; 2010. p.50–2.
11. B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, T.
Wood, ―Agile dynamic provisioning of multi-tier
Internet applications‖, ACM Trans Auton Adaptive Syst
2010; 5 (5):139–48.
12. Vaquero LM, Luis Rodero-Merino, Rajkumar Buyya,
―Dynamically scaling applications in the cloud‖, In:
Proceedings of the ACM SIGCOMM computer
communication review, vol.41, no.1; 2011. p.45–52.

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number: E10160275C19/19©BEIESP 64 & Sciences Publication

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy