DATA WAREHOUSE - Pertemuan01

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

DATA WAREHOUSE

Dr. Dian Puspita Hapsari., S.Kom., M.Kom.


What is a database?
• A database is a collection of data or information.

• Databases are typically accessed electronically and are used to


support Online Transaction Processing (OLTP).

• Database Management Systems (DBMS) store data in the


database and enable users and applications to interact with the
data.

• The term “database” is commonly used to reference both the


database itself as well as the DBMS.
Many types of data can be stored in databases

• Patient medical records


• Items in an online store
• Financial records
• Articles and blog entries
• Sports scores and statistics
• Online gaming information
• Student grades and scores
• IoT device readings
• Mobile application information
Applications of Database
Sector Usage
Use in the banking sector for customer information, account-related activities,
Banking
payments, deposits, loans, credit cards, etc.
Airlines Use for reservations and schedule information.
Universities To store student information, course registrations, colleges, and results.

Telecommunication It helps to store call records, monthly bills, balance maintenance, etc.

Finance Helps you to store information related stock, sales, and purchases of stocks and bonds.
Sales & Production Use for storing customer, product and sales details.
It is used for the data management of the supply chain and for tracking production of
Manufacturing
items, inventories status.
HR Management Detail about employee’s salaries, deduction, generation of pay checks, etc.
Relational databases:
Oracle, MySQL, Microsoft SQL Server, and PostgreSQL

Document databases:
MongoDB and CouchDB

Key-value databases:
Redis and DynamoDB

Wide-column stores:
Cassandra and Hbase

Graph databases:
Neo4j and Amazon Neptune
OLAP - Online Analytical Processing
Data warehouses and Data lakes are
meant to support Online Analytical
Processing (OLAP).
OLAP systems are typically used to
collect data from a variety of sources.
The data is then used to power a range of analytical
use cases ranging from business intelligence and
reporting (e.g., quarterly sales reports by store)
to forecasting (e.g., predicting home sales for the
next six months based on historical trends).
What is a data warehouse?
• A data warehouse is a system that stores highly structured
information from various sources.

• Data warehouses typically store current and historical data


from one or more systems.

• The goal of using a data warehouse is to combine disparate


data sources in order to analyze the data, look for insights,
and create business intelligence (BI) in the form of reports
and dashboards.
Data warehouse examples
➢Amazon Redshift.
➢Google Big Query.
➢IBM Db2 Warehouse.
➢Microsoft Azure Synapse.
➢Oracle Autonomous Data Warehouse.
➢Snowflake.
➢Teradata Vantage.
Applications of Data Warehousing
Sector Usage
It is used for airline system management operations like crew assignment, analyzes of
Airline
route, frequent flyer program discount schemes for passenger, etc.

Banking It is used in the banking sector to manage the resources available on the desk effectively.

Data warehouse used to strategize and predict outcomes, create patient’s treatment
Healthcare sector reports, etc. Advanced machine learning, big data enable datawarehouse systems can
predict ailments.
Data warehouses are widely used to analyze data patterns, customer trends, and to track
Insurance sector
market movements quickly.

It helps you to track items, identify the buying pattern of the customer, promotions and
Retain chain
also used for determining pricing policy.

In this sector, data warehouse used for product promotions, sales decisions and to make
Telecommunication
distribution decisions.
What is a data lake?
A data lake is a repository of data from disparate sources that
is stored in its original, raw format.

Like data warehouses, data lakes store large amounts of


current and historical data.

What sets data lakes apart is their ability to store data in a


variety of formats including JSON, BSON, CSV, TSV, Avro, ORC,
and Parquet.
What is a data lake?
Typically, the primary purpose of a data lake is to analyze the data to
gain insights.
However, organizations sometimes use data lakes simply for their cheap
storage with the idea that the data may be used for analytics in the future.

With modern tools and technologies, a data lake can also form the storage
layer of a database.

Tools like Starburst, Presto, Dremio, and Atlas Data Lake


can give a database-like view into the data stored in your data lake.
Technology that provide flexible and scalable
storage for building data lakes:

• AWS S3
• Azure Data Lake Storage Gen2
• Google Cloud Storage
Technologies enable organizing and querying
data in data lakes

• MongoDB Atlas Data Lake.


• AWS Athena.
• Presto.
• Starburst.
• Databricks SQL Analytics.
Database Data Lake Data Warehouse
Workloads Operational and transactional Analytical Analytical
Structured, semi-structured, Structured and/or semi-
Data Type Structured or semi-structured
and/or unstructured structured
Pre-defined and fixed schema
Rigid or flexible schema No schema definition required
Schema Flexibility definition for ingest (schema
depending on database type for ingest (schema on read)
on write and read)
May not be up-to-date based May not be up-to-date based
Data Freshness Real time
on frequency of ETL processes on frequency of ETL processes
Business analysts, application Business analysts and data
Users Application developers
developers, and data scientists scientists
Easy data storage simplifies
ingesting raw data
A schema is applied The fixed schema makes
Fast queries for storing and
Pros afterwards to make working working with the data easy for
updating data
with the data easy for business analysts
business analysts
Separate storage and compute
Difficult to design and evolve
schema
May have limited analytics Requires effort to organize Scaling compute may require
Cons
capabilities and prepare data for use unnecessary scaling of
storage, because they are
tightly coupled
Disadvantages of Database Disadvantages of Data Warehouse
• Cost of Hardware and Software of an • Adding new data sources takes time, and it is
implementing Database system is high which associated with high cost.
can increase the budget of your organization.
• Sometimes problems associated with the data
• Many DBMS systems are often complex warehouse may be undetected for many years.
systems, so the training for users to use the
DBMS is required. • Data warehouses are high maintenance
systems. Extracting, loading, and cleaning data
• DBMS can’t perform sophisticated calculations could be time-consuming.

• Issues regarding compatibility with systems • The data warehouse may look simple, but
which is already in place actually, it is too complicated for the average
users. You need to provide training to end-
• Data owners may lose control over their data, users, who end up not using the data mining
raising security, ownership, and privacy issues. and warehouse.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy