unit II big data architecture

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Drivers for Big Data:

Big Data has quickly risen to become one of the most desired
topics in the industry.
The main business drivers for such rising demand for Big Data
Analytics are :
1. The digitization of society
2. The drop in technology costs
3. Connectivity through cloud computing
4. Increased knowledge about data science
5. Social media applications
6. The rise of Internet-of-Things(IoT)
Example: A number of companies that have Big Data at the
core of their strategy like :
Apple, Amazon, Facebook and Netflix have become very
successful at the beginning of the 21st century.

Big Data Architecture :


Big data architecture is designed to handle the ingestion,
processing, and analysis of data that is too large or complex for
traditional database systems.
The big data architectures include the following components:
Data sources: All big data solutions start with one or more
data sources.
Example,
o Application data stores, such as relational databases.
o Static files produced by applications, such as web server log
files.
o Real-time data sources, such as IoT devices.
Data storage: Data for batch processing operations is stored in
a distributed file store that can hold high volumes of large files
in various formats (also called data lake).
Example,
Azure Data Lake Store or blob containers in Azure Storage.
Batch processing: Since the data sets are so large, therefore a
big data solution must process data files using long-running
batch jobs to filter, aggregate, and prepare the data for analysis.
Real-time message ingestion: If a solution includes real-time
sources, the architecture must include a way to capture and
store real-time messages for stream processing.
Stream processing: After capturing real-time messages, the
solution must process them by filtering, aggregating, and
preparing the data for analysis. The processed stream data is
then written to an output sink. We can use open-source Apache
streaming technologies like Storm and Spark Streaming for this.
Analytical data store: Many big data solutions prepare data for
analysis and then serve the processed data in a structured
format that can be queried using analytical tools. Example:
Azure Synapse Analytics provides a managed service for large-
scale, cloud-based data warehousing.
Analysis and reporting: The goal of most big data solutions is
to provide insights into the data through analysis and reporting.
To empower users to analyze the data, the architecture may
include a data modelling layer. Analysis and reporting can also
take the form of interactive data exploration by data scientists
or data analysts.
Orchestration: Most big data solutions consist of repeated
data processing operations, that transform source data, move
data between multiple sources and sinks, load the processed
data into an analytical data store, or push the results straight to
a report. To automate these workflows, we can use an
orchestration technology such as Azure Data Factory.
Big Data Technology Components :

1. Ingestion :
The ingestion layer is the very first step of pulling in raw data.
It comes from internal sources, relational databases, non-
relational databases, social media, emails, phone calls etc.
There are two kinds of ingestions :
Batch, in which large groups of data are gathered and delivered
together.
Streaming, which is a continuous flow of data. This is
necessary for real-time data analytics.
2. Storage :
Storage is where the converted data is stored in a data lake or
warehouse and eventually processed.
The data lake/warehouse is the most essential component of a
big data ecosystem.
It needs to contain only thorough, relevant data to make
insights as valuable as possible.
It must be efficient with as little redundancy as possible to
allow for quicker processing.
3. Analysis :
In the analysis layer, data gets passed through several tools,
shaping it into actionable insights.
There are four types of analytics on big data :

 Diagnostic: Explains why a problem is happening.

 Descriptive: Describes the current state of a business


through historical data.

 Predictive: Projects future results based on historical


data.

 Prescriptive: Takes predictive analytics a step further by


projecting best future efforts.
4. Consumption :
The final big data component is presenting the information in a
format digestible to the end-user.
This can be in the forms of tables, advanced visualizations and
even single numbers if requested.
The most important thing in this layer is making sure the intent
and meaning of the output is understandable.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy