Sesi 1 Pendahuluan Big Data

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 78

SESI #1 PENDAHULUAN

BIG DATA
OUTLINE
• Fundamentals of Big Data
• Big Data Types
• Big Data Technology Components
• Virtualization and How It Supports Distributed Computing
• The Cloud and Big Data
Fundamentals of Big Data
BIG Data Word Cloud
Gelombang Pengelolaan Data

Wave 1:
Wave 2: Web Wave 3:
Creating
and content Managing big
manageable
management data
data structures
Pendefinisian Big Data
Pendefinisian Big Data
Arsitektur Managemen Big Data
Perihal Unjuk Kerja Big Data
MapReduce was designed by Google Big Table is a sparse, distributed, persistent
as a way of efficiently executing a set multidimensional sorted map. It
of functions against a large amount is intended to store huge volumes of data across
of data in batch mode. commodity servers.

Map Big
Hadoop
Reduce Table

Hadoop is designed to parallelize data processing across


computing nodes to speed computations
and hide latency. Hadoop allows applications based on
MapReduce to run on large clusters of commodity hardware
Sekilas Tentang Hadoop
Analitikal Data Secara Tradisional dan
Pengembangannya
Big data
Analytical
Reporting data
and
warehouses and
visualization
analytics
data marts

Bi
g
d
at
a
a
p
pl
ic
a
ti
o
n
s
Big Data Types
Sumber Data dari BIG Data
Big Data Terstruktur
Computer /
Machine Human Generated
Generated
Sensor Input Data
Web Log
Click Stream
Point of Sale
Gaming
Financial Related
Big Data Tidak Terstruktur
Computer / Machine Generated Human Generated

Image Satellite Text internal to your company

Scientific Data Social media data

Photographs and video Mobile Data

Radar & Sonar Website Content


ECM/CMS dalam Manajemen BIG DATA
ECM/CMS dalam Manajemen BIG DATA
Kebutuhan Realtime & Non Realtime dalam
BIG Data
Big Data Technology
Components
Kebutuhan Komponen Infrastruktur &
Ekosistem BIG Data
Arsitektur BIG Data Stack
Layer 0: Redundant Physical Infrastructure
Performance: Availability: Do Scalability: Flexibility: How
How you need a 100 How big does quickly can you Cost:
responsive do percent uptime your add more What can
you need the guarantee of infrastructure resources to the
service? infrastructure? you afford?
system to be? need to be?
Performance, also

How long can your


How much disk space How quickly can your

Because the infrastructure

called latency, is often infrastructure recover is a set of components, you


business wait in is needed today and in
might be able to buy the
measured end to end, the case of a the future? from failures? The “best” networking and
based on a single service interruption

How much computing most flexible decide to save money on
transaction or query power do you need? infrastructures can be storage (or vice versa). You
request. Very fast
or failure? Highly Typically, you need to costly, but you can need to establish
(high-performance, available decide what you need control the costs with requirements for each of
infrastructures are these areas in the context
lowlatency) and then add a little cloud services, where of an overall budget and
infrastructures tend also very more scale for you only pay for what then make trade-offs
to be very expensive. expensive. unexpected challenges you actually use where necessary.
Layer 1: Security Infrastructure
User access to raw or computed big data has about the same level of technical requirements as non-big data

Data access: implementations. The data should be available only to those who have a legitimate business need for examining
or interacting with it. Most core data storage platforms have rigorous security schemes and are often augmented
with a federated identity capability, providing appropriate access across the many layers of the architecture.

Application Application access to data is also relatively straightforward from a technical perspective. Most
application programming interfaces (APIs) offer protection from unauthorized usage or access.
access: This level of protection is probably adequate for most big data implementations.

Data encryption is the most challenging aspect of security in a big data environment. In traditional environments, encrypting and decrypting
data really stresses the systems’ resources. With the volume, velocity, and varieties associated with big data, this problem is exacerbated. The
Data encryption: simplest (brute-force) approach is to provide more and faster computational capability. However, this comes with a steep price tag —
especially when you have to accommodate resiliency requirements. A more temperate approach is to identify the data elements requiring
this level of security and to encrypt only the necessary items.

Threat The inclusion of mobile devices and social networks exponentially increases both
the amount of data and the opportunities for security threats. It is therefore
detection: important that organizations take a multiperimeter approach to security.
Layer 2: Operational Databases
✓ Atomicity: A transaction is “all or nothing” when it is atomic. If any part of the transaction or
the underlying system fails, the entire transaction fails.

✓ Consistency: Only transactions with valid data will be performed on the database. If the data
is corrupt or improper, the transaction will not complete and the data will not be written to the
database.
✓ Isolation: Multiple, simultaneous transactions will not interfere with each other. All valid
transactions will execute until completed and in the order they were submitted for processing.

✓ Durability: After the data from the transaction is written to the database, it stays there
“forever.”
Layer 2: Operational Databases
Layer 3: Organizing Data Services and Tools
A distributed file system: Serialization services:
Necessary to accommodate Necessary for persistent
the decomposition of data data storage and
streams and to provide multilanguage remote
scale and storage capacity procedure calls (RPCs)

Coordination services: Extract, transform, and load


(ETL) tools: Necessary for the
Necessary for building
loading and conversion of
distributed applications structured and unstructured
Layer 4: Analytical Data Warehouses

BIG Data BIG Data


Analytic Application
BIG Data Analytic
BIG Data Analytic
BIG Data Analytic
Big Data Analytic Tool
REPORTING & DASHBOARD

Analytics and advanced analytics


BIG DATA APPLICATION
BIG DATA APPLICATION
BIG DATA APPLICATION
Virtualization and How It
Supports Distributed Computing
Distributed Computing
• Ditsributed computing berkaitan dengan system hardware dan software yang
memiliki lebih dari satu elemen pemrosesan atau storage element, concurent
process, atau multiple program berjalan dibawah pengendalian yang ketat.
• Pada distributed computing suatu program di pecah ke beberapa bagian yang
dijalankan secara bersamaan pada banyak komputer yang berkomunikasi
melalui jaringan.
• Distributed computing adalah suatu bentuk dari parallel computing, tetapi
parallel computing paling umum digunakan untuk menggambarkan bagain-
bagain program yang berjalan secara bersamaan di atas multiple processor di
komputer yang sama.
• Kedua tipe processing (distributed dan paralel computing) membutuhkan
pembagian program kepada bagian-bagian yang dapat berjalan serempak,
tetapi distributed computing yang dapat berjalan secara simultan tetapi
program yang didistribusilan sering harus sesuai dengan lingkungan yang
heterogen, link jaringan dari berbagai latency dan kegagalan-kegagalan di
jaringan atau komputer yang tidak diprediksi.
Prinsip Kerja Distributed Computing

www.cloudcomputingchina.com
Dasar-dasar Virtualisasi
Manfaat Virtualisasi
✓ Virtualization of physical
resources (such as servers, ✓ Virtualization enables
storage, and networks) improved control over the
enables substantial usage and performance of
improvement in the utilization your IT resources.
of these resources.

✓ Virtualization can provide a


✓ Virtualization provides a
level of automation and
foundation for cloud
standardization to optimize
computing.
your computing environment.
Virtualisasi untuk Lingkungan BIG Data

Partitioning: In virtualization, many applications and operating systems are supported in a


single physical system by partitioning (separating) the available resources.

Isolation: Each virtual machine is isolated from its host physical system and other virtualized
machines. Because of this isolation, if one virtual instance crashes, the other virtual machines and the host
system aren’t affected. In addition, data isn’t shared between one virtual instance and another.

Encapsulation: A virtual machine can be represented (and even stored) as a single file, so you
can identify it easily based on the services it pro vides. For example, the file containing the encapsulated
process could be a complete business service. This encapsulated virtual machine could be presented to an
application as a complete entity. Thus, encapsulation could protect each application so that it doesn’t interfere
with another application.
VIRTUALISASI SERVER
VIRTUALISASI APLIKASI
VIRTUALISASI SISTEM
System
Virtualization The administrator build the disk image from
existing system disks.

The system disk images can then be stored


on the server.

When the users turn on their computers, the


server streams the system disk images.

As the streaming begins, the users' system-


disk-less computer will boot up as if the
system disks are physically attached.
VIRTUALISASI APLIKASI
Application
Virtualization Software pack is made by the pack builder,
from Servers

The administrator uploads software packs on


the server

When the users launches the application,


server streams it to the users in realtime.

As the streaming process begins, the


application is virtualized as if it is installled in
the local machine.
VIRTUALISASI APLIKASI
Application
Virtualization
from a
Store the software pack on the
Portable device storage.
Device

When the user launches the


application, the application gets
streamed.
As the streaming process begins,
the user can use the software as
if it is locally installed.
VIRTUALISASI STORAGE
Flexible
Virtual Exsisting small disks can be aggreaged into a
Storage massive network attached storage (NAS).
Virtualization

The aggregated disk is controlled and


monitored by the storage server in realtime.

The storage can be dynamically attached to


any devices on the network.

The disk is now virtualized, and so can be


used as if it is physically attached on the
devices.
VIRTUALISASI DESKTOP
Desktop
Virtualization Virtual machines (VM) are built in the server
with Virtual by administrator.
Machines

The virtual machines are managed and


monitored in the server.

When the users access the VMs, the desktop


environments of VMs are provided.

Desktops of VMs in the server are virtualized,


and the users can have another network
separated computing environments.
VIRTUALISASI NETWORK
TANTANGAN DALAM VIRTUALISASI
CLOUD COMPUTING
National Institut of Science and Technology (NIST) sebagai bagian dari Departemen Perdagangan
Amerika, telah membuat beberapa rekomendasi standar tentang berbagai aspek dari Cloud
Computing untuk dijadikan referensi
5 Karakteristik Utama Cloud Computing

On
Demand
• Sampai saat ini paradigma cloud Self
computing ini masih berevolusi, Service
masih menjadi subjek
perdebatan dikalangan Broad

5 Karakteristik
akademisi, vendor TI dan Measured network
pemeritah/bisnis services access
• Berdasarkan NIST, ada 5 kriteria
yang harus dipenuhi oleh
Cloud
sebuah sistem untuk bisa
dimasukkan kedalam keluarga
Computing
cloud
Rapid Resources
elasticity Pooling
Arsitektur Cloud Computing

Layer physical hardware divirtualisasiuntuk memberikan platform yang fleksible dan meningkatakn
utilisasiresources. Kunci dari new enterprise data center adalah bagaimanamengkombinasikan
layer virtualiasi dan layer management agar dapat mengelola data center secara efisien, men-deploy
dan meng-configure layanan dengan cepat.
.
Apa arti Cloud Computing bagi Service
Provider ?

• Cepat menyediakan layanan


• Mengurangi skala server
• Meningkatkan tingkat utilisasi resources
• Memperbaiki efisiensi pengelolaan
• Biaya pemeliharaan lebih rendah
• Lokasi infrastruktur di area biaya gedung dan listrik yang
rendah
• Memberikan ‘business continuity service’
• Meningkatkan efisiensi manajemen operasional
• Meningkatkan ‘service level’
• Arsitektur yang kompleks
• Mengubah model binis dan tingkat kepercayaan
Apa arti Cloud Computing bagi User ?
• Mengurangi beban kerja klien atau beban kerja klien menjadi lebih
rendah
• Total Cost Ownership (TCO) lebih rendah
• Pemisahan tugas pemeliharaan infrastruktur dari domain-spesifik
pengembangan aplikasi
• Pemisahan kode aplikasi dari sumber-daya fisik
• Tidak perlu membeli asset untuk ‘pemakaian satu kali’ atau
pekerjaan komputing yang tidak sering penggunaanya
• Memperbesar ‘resources on-demand’
• Membuat aplikasi memiliki ‘high availability’’
• Cepat men-deploy aplikasi
• Membayar apa yang digunakan (Pay per use)
5 Karakteristik Utama Cloud Computing

• Pengguna dapat memesan


On Demand
dan mengelola layanan
Self Service
tanpa interaksi manusia
dengan penyedia layanan,
• Misalnya dengan
mengguna-kan, sebuah
Measured Broad
portal web dan manajemen
services
5 Karakteristik network
access
antarmuka.
• Pengadaan dan perleng-
Cloud kapan layanan serta
Computing sumberdaya yang terkait
terjadi secara otomatis pada
penyedia.
Rapid Resources
elasticity Pooling
5 Karakteristik Utama Cloud Computing
Layanan yang
tersedia terhubung
melalui jaringan pita
On Demand lebar, terutama untuk
Self Service dapat diakses secara
memadai melalui
jaringan internet, baik
menggunakan thin
Measured Broad client, thick client
services
5 Karakteristik network
access
ataupun media lain
seperti smartphone
Cloud
Computing

Rapid Resources
elasticity Pooling
5 Karakteristik Utama Cloud Computing
• Penyedia layanan cloud, memberikan
On Demand layanan melalui sumberdaya yang
Self Service dikelompokkan di satu atau berbagai
lokasi date center yang terdiri dari
sejumlah server dengan mekanisme
multi-tenant.
• Mekanisme multi-tenant ini
Measured Broad memungkinkan sejumlah sumberdaya
services
5 Karakteristik network
access
komputasi tersebut digunakan secara
bersama-sama oleh sejumlah user, di
Cloud mana sumberdaya tersebut baik yang
berbentuk fisik maupun virtual, dapat

Computing dialokasikan secara dinamis untuk


kebutuhan pengguna/pelanggan sesuai
permintaan.
• Dengan demikian, pelanggan tidak perlu
tahu bagaimana dan darimana
Rapid Resources permintaan akan sumberdaya
elasticity Pooling komputasinya dipenuhi oleh penyedia
layanan. Yang penting, setiap permintaan
dapat dipenuhi. Sumberdaya komputasi
ini meliputi media penyimpanan,
memory,processor, pita jaringan dan
mesin virtual.
5 Karakteristik Utama Cloud Computing
On Demand
Self Service
• Kapasitas komputasi yang
disediakan dapat secara elastis
dan cepat disediakan, baik itu Measured Broad
dalam bentuk penambahan
ataupun pengurangan
services 5 Karakteristik network
Cloud
access
kapasitas yang diperlukan.
• Untuk pelanggan sendiri,
dengan kemampuan ini
Computing
seolah-olah kapasitas yang
tersedia tak terbatas besarnya,
dan dapat "dibeli" kapan saja Rapid Resources
dengan jumlah berapa saja. elasticity Pooling
5 Karakteristik Utama Cloud Computing
• Sumberdaya cloud yang On Demand
tersedia harus dapat diatur Self Service
dan dioptimasi
penggunaannya, dengan suatu
sistem pengukuran yang dapat
Measured Broad
mengukur penggunaan dari
setiap sumberdaya komputasi
services
5 Karakteristik network
access
yang digunakan
(penyimpanan, memory,
Cloud
processor, lebar pita, aktivitas Computing
user, dan lainnya).
• Dengan demikian, jumlah
sumberdaya yang digunakan Rapid Resources
dapat secara transparan elasticity Pooling
diukur yang akan menjadi
dasar bagi user untuk
membayar biaya penggunaan
layanan.
4 Deployment Model Infrastruktur Cloud
Computing

Private Cloud • Infrastruktur layanan cloud dioperasikan hanya


untuk sebuah organisasi /perushaaan tertentu
• Pelanggannya biasanya organisasi dengan skala
besar
Community Cloud • Infrastruktur dapat dikelola sendiri oleh
organisasi atau oleh pihak ke-tiga
• Lokasi bisa on-site atau off-site

Public Cloud

Hybrid Cloud
The 4 Implementations of the Cloud
• Private Cloud
• Private Clouds are normal data centers within an enterprise with all the 4
attributes of the Cloud – Elasticity, Self Service, Pay-By-Use and
Programmability
• By setting up a Private Cloud, enterprises can consolidate their IT
infrastructure
• They will need fewer IT staff to manage the data center
• Reduced power bills because of the low electricity consumption and lesser
cooling equipment needs
Private Cloud
4 Deployment Model Infrastruktur Cloud
Computing

Private Cloud
• Dalam model ini, sebuah infrastruktur cloud
digunakan bersama-sama oleh beberapa
organisasi yang memiliki kesamaan
kepentingan, misalnya dari sisi misinya, atau
Community Cloud tingkat keamanan yang dibutuhkan, dan lainnya.
• Jadi, community cloud ini merupakan
"pengembangan terbatas" dari private cloud.
Dan sama juga dengan private cloud,
infrastruktur cloud yang ada bisa di-manage oleh
Public Cloud salah satu dari organisasi itu, ataupun juga oleh
pihak ketiga.

Hybrid Cloud
The 4 Implementations of the Cloud
• Community Cloud
• Community Cloud is implemented when a set of businesses have a similar
requirement and share the same context.
• For example, the Federal government in US may decide to setup a
government specific Community Cloud that can leveraged by all the states
• Through this, individual local bodies like state governments will be freed from
investing, maintaining and managing their local data centers
• So, a Community Cloud is a sort of Private Cloud but goes beyond just one
organization
Community Cloud
4 Deployment Model Infrastruktur Cloud
Computing

Private Cloud

Community Cloud • Jenis layanan cloud yang disediakan untuk


umum atau group perusahaan
• Layanan disediakan oleh perusahaan penjual
layanan cloud
Public Cloud

Hybrid Cloud
The 4 Implementations of the Cloud
• Public Cloud
• It needs a huge investment and only well established companies with deep
pockets like Microsoft, Amazon and Google can afford to set them up.
• Public Cloud is implemented on thousands of servers running across
hundreds of data centers deployed across tens of locations around the world
• Customers can choose a location for his application to be deployed
Public Cloud
4 Deployment Model Infrastruktur Cloud
Computing
• Merupakan komposisi dari dua atau lebih infrastruktur
Private Cloud cloud (private, community, atau public).
• Meskipun secara entitas mereka tetap berdiri sendiri-
sendiri, tapi dihubungkan oleh suatu
teknologi/mekanisme yang memungkinkan portabilitas
data dan aplikasi antar cloud itu. Misalnya, mekanisme
Community Cloud load balancing yang antarcloud, sehingga alokasi
sumberdaya bisa dipertahankan pada level yang
optimal.
• Menurut lembaga NIST bahwa definisi dan batasan
dari Cloud Computing sendiri masih mencari bentuk
Public Cloud dan standarnya. Sehingga nanti pasarlah yang akan
menentukan model mana yang akan bertahan.
• Namun semua sepakat bahwa cloud computing akan
menjadi masa depan dari dunia komputasi. Bahkan
lembaga riset bergengsi Gartner Group juga telah
Hybrid Cloud menyatakan bahwa Cloud Computing adalah wacana
yang tidak boleh dilewatkan oleh seluruh pemangku
kepentingan di dunia TI.
The 4 Implementations of the Cloud
• Hybrid Cloud
• a combination of Private Cloud and Public Cloud
• Security plays a critical role in connecting the Private Cloud to the Public
Cloud
• Amazon Web Services has recently announced Virtual Private Cloud (VPC)
that securely bridges Private Cloud and Amazon Web Services
• Microsoft’s recent Windows AppFabric brings the concept of Hybrid Cloud to
Microsoft‟s future customers
Hybrid Cloud
Keuntungan Cloud Computing

• Cloud computing menghapus silo-silo dalam ‘data center’ tradisional


• Arsitektur awan memiliki skalabilitas, fleksibilitas, dan transparansi yang
memungkinkan layanan TI baru dapat disediakan dengan cepat dan biaya
efektif dengan menggunakan service level agreements (SLA) yang mencakup
IT requirement dan policy, memenuhi permintaan high utilization, dinamis,
merespon perubahan, dan memenuhi tingkat keamanan dan kinerja yang
tinggi
• Cloud Computing memberikan keuntungan bagi perusahaan
• Reduced cost
• Flexibility
• Improved Automation
• Sustainability
• Focus on Core Competency
Providers in the Big Data Cloud Market -
Amazon
• Targeted for processing huge volumes of data. Elastic MapReduce utilizes a hosted Hadoop
Amazon Elastic MapReduce: framework running on EC2 and Amazon Simple Storage Service (Amazon S3). Users can now run
HBase (a distributed, column-oriented data store).
• A fully managed not only SQL (NoSQL) database service. DynamoDB is a fault tolerant, highly
available data storage service offering self-provisioning, transparent scalability, and simple
✓ Amazon DynamoDB:
administration. It is implemented on SSDs (solid state disks) for greater reliability and high
performance. We talk more about NoSQL.
• A web-scale service designed to store any amount of data. The strength of its design center is
✓ Amazon Simple Storage performance and scalability, so it is not as feature laden as other data stores. Data is stored in
Service (S3): “buckets” and you can select one or more global regions for physical storage to address latency
or regulatory needs.
• Tuned for specialized tasks, this service provides low-latency tuned high performance
✓ Amazon High Performance computing clusters. Most often used by scientists and academics, HPC is entering the
Computing: mainstream because of the offering of Amazon and other HPC providers. Amazon HPC clusters
are purpose built for specific workloads and can be reconfigured easily for new tasks.
• Available in limited preview, RedShift is a petabyte scale data warehousing service built on a
✓ Amazon RedShift: scalable MPP architecture. Managed by Amazon, it offers a secure, reliable alternative to in-
house data warehouses and is compatible with several popular business intelligence tools.
Providers in the Big Data Cloud Market -
Google big data services
• A cloud-based capability for virtual machine computing, Google Compute
Engine offers a secure, flexible computing environment from energy
Google Compute Engine: efficient data centers. Google also offers workload management solutions
from several technology partners who have optimized their products for
Google Compute Engine.
• Allows you to run SQL-like queries at a high speed against large data sets
of potentially billions of rows. Although it is good for querying data, data
Google Big Query: cannot be modified after it is in it. Consider Google Big Query a sort of
Online Analytical Processing (OLAP) system for big data. It is good for ad
hoc reporting or exploratory analysis.
• A cloud-based, machine learning tool for vast amounts of data,
Prediction is capable of identifying patterns in data and then
Google Prediction API: remembering them. It can learn more about a pattern each time it is
used. The patterns can be analyzed for a variety of purposes, including
fraud detection, churn analysis, and customer sentiment
Where to be careful when using cloud
services

Data transport: Be sure to figure out how you


Data integrity: You need to make sure that your Costs: Little costs can add up. Be careful to read get your data into the cloud in the first place.
Compliance: Make sure that your provider can
provider has the right controls in place to the fine print of any contract, and make sure For example, some providers will let you mail it
comply with any compliance issues particular to
ensure that the integrity of your data is that you know what you want to do in the to them on media. Others insist on uploading it
your company or industry.
maintained. cloud. over the network. This can get expensive, so be
careful.

Performance: Because you’re interested in


Data access: What controls are in place to make
getting performance from your service provider,
sure that you and only you can access your
make sure that explicit definitions of service-
data? In other words, what forms of secure Location: Where will your data be located? In
level agreements exist for availability, support,
access control are in place? This might include some companies and countries, regulatory
and performance. For example, your provider
identity management, where the primary goal issues prevent data from being stored or
may tell you that you will be able to access your
is protecting personal identity information so processed on machines in a different country.
data 99.999 percent of the time; however, read
that access to computer resources, applications,
the contract. Does this uptime include
data, and services is controlled properly.
scheduled maintenance?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy