0% found this document useful (0 votes)
14 views

Unit 1 (Chapter 1) - Introduction

The document provides an introduction to Big Data Analytics, defining Big Data and its characteristics, including volume, velocity, and variety. It discusses the historical development of Big Data, its importance for businesses, and the challenges faced in Big Data analytics. Additionally, it outlines different types of data analytics, emphasizing their significance in decision-making and operational efficiency.

Uploaded by

Prasad Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Unit 1 (Chapter 1) - Introduction

The document provides an introduction to Big Data Analytics, defining Big Data and its characteristics, including volume, velocity, and variety. It discusses the historical development of Big Data, its importance for businesses, and the challenges faced in Big Data analytics. Additionally, it outlines different types of data analytics, emphasizing their significance in decision-making and operational efficiency.

Uploaded by

Prasad Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Big Data Analytics – Unit 1 (Chapter – 1: Introduction) 1

Big Data Analytics


Unit-1 (Chapter - 1 : Introduction)

Definition of Big Data


According to Gartner, the definition of Big Data –
“Big data” is high-volume, velocity, and variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.”

Big Data is a type of data which has complex, voluminous and large data sets, which the traditional
data processing software applications cannot handle such huge data. These datasets have to be
processed and analysed to uncover valuable information that can benefit businesses and
organizations.

History of Big Data


1. Growth of Digital Storage & Databases (1960s - 1980s)

1965: Moore’s Law predicted the exponential growth of computing power.


1970: Edgar F. Codd introduced the relational database model, leading to SQL-based
databases.
1980s: Data Warehousing concepts emerged to store and analyze structured data.

2. Internet and the Rise of Big Data (1990s - Early 2000s)

1991: The World Wide Web was created, dramatically increasing digital data generation.
1997: NASA coined the term "Big Data" in reference to handling large datasets.
2000s: Companies like Google, Amazon, and Facebook began collecting vast amounts of user
data, requiring more advanced processing systems.

3. Big Data Revolution (2010s - Present)

2010: Apache Hadoop and Spark became popular for distributed computing.
2012: The "3 Vs of Big Data" (Volume, Velocity, Variety) were formally defined by Gartner.
2015-Present: AI and Machine Learning integrated with Big Data analytics, leading to
breakthroughs in healthcare, finance, and marketing.

4. The Future of Big Data (Beyond 2020s)

Edge computing and real-time analytics are gaining prominence.


Quantum computing may revolutionize data processing.
Ethical AI and data privacy laws (e.g., GDPR, CCPA) are shaping how Big Data is managed.

Prof. Prasad Patil,


Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 1: Introduction) 2

Characteristics of Big Data


Gartner listed the 3 ‘V s of Big Data – Variety, Velocity, and Volume.
Following are the 8 characteristics of big data.

1. Volume (Massive Amount of Data)

Definition: Big Data involves enormous amounts of data, often measured in terabytes, petabytes,
or even exabytes.

Example: Social media platforms generate hundreds of terabytes of data daily (e.g., Facebook
processes 500+ terabytes of data per day).

2. Velocity (Speed of Data Generation & Processing)

Definition: Big Data is generated and processed at high speed, requiring real-time or near-real-
time analysis.

Example: Stock market transactions, IoT sensor data, and online streaming services need instant
processing to be useful.

3. Variety (Different Types & Sources of Data)

Definition: Big Data comes in different formats—structured, semi-structured, and unstructured.

Examples:

Structured Data (SQL databases, spreadsheets)

Semi-structured Data (JSON, XML, log files)

Unstructured Data (videos, images, social media posts, emails)

4. Veracity (Data Quality & Accuracy)

Definition: Big Data can be incomplete, inconsistent, or misleading, requiring data cleansing and
validation.

Example: Social media sentiment analysis requires removing fake news, spam, and irrelevant
posts to ensure meaningful insights.

5. Value

Definition: The ultimate goal of Big Data is to extract valuable insights for decision-making.

Example:

Businesses use customer data to personalize recommendations (e.g., Amazon, Netflix).

Healthcare uses Big Data to predict disease outbreaks and improve patient care.

Prof. Prasad Patil,


Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 1: Introduction) 3

6. Variability: The meaning and context of data change over time


(e.g., trending hashtags on Twitter).

7. Visualization: Making complex Big Data easier to understand through graphs, charts, and
dashboards.

8. Volatility: Data relevance changes quickly, requiring fast processing and storage strategies.

Types of Big Data


There are 3 types of data:
a) Structure data,
b) Unstructured data and
c) Semi-structured data.

a) Structured data

Structured is one of the types of big data and by structured data, we mean data that can be

processed, stored, and retrieved in a fixed format. It refers to highly organized information that

can be readily and seamlessly stored and accessed from a database by simple search engine

algorithms. For instance, the employee table in a company database will be structured as the

employee details, their job positions, their salaries, etc., will be present in an organized manner.

Example:

• Customer details in a database (Name, Age, Email, Phone Number)

• Banking transactions

• Employee records

b) Unstructured data

Unstructured data refers to the data that lacks any specific form or structure whatsoever. This

makes it very difficult and time-consuming to process and analyze unstructured data. Email is an

example of unstructured data. Structured and unstructured are two important types of big data.

Example:

• Images, videos, and audio files

• Social media posts (Tweets, Instagram captions)

• Handwritten notes

Prof. Prasad Patil,


Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 1: Introduction) 4

c) Semi-structured data

Semi structured is the third type of big data. Semi-structured data pertains to the data containing

both the formats mentioned above, that is, structured and unstructured data. To be precise, it refers

to the data that although has not been classified under a particular repository (database), yet

contains vital information or tags that segregate individual elements within the data. Thus we come

to the end of types of data.

Example:

• JSON and XML files

• Emails (subject and body text)

• Web logs

Differences Between Structured, Unstructured, and Semi-Structured Data

Prof. Prasad Patil,


Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 1: Introduction) 5

Importance of Big data


OR
Why is Big Data Important?

The importance of big data does not revolve around how much data a company has but how a
company utilizes the collected data. Every company uses data in its own way; the more efficiently a
company uses its data, the more potential it has to grow. The company can take data from any source
and analyze it to find answers which will enable:

1. Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based Analytics can bring cost
advantages to business when large amounts of data are to be stored and these tools also help in
identifying more efficient ways of doing business.

2. Time Reductions: The high speed of tools like Hadoop and in-memory analytics can easily identify
new sources of data which helps businesses analyzing data immediately and make quick decisions
based on the learning.

3. Understand the market conditions: By analyzing big data you can get a better understanding of
current market conditions. For example, by analyzing customers’ purchasing behaviours, a company
can find out the products that are sold the most and produce products according to this trend. By
this, it can get ahead of its competitors.

4. Control online reputation: Big data tools can do sentiment analysis. Therefore, you can get
feedback about who is saying what about your company. If you want to monitor and improve the
online presence of your business, then, big data tools can help in all this.

5. Using Big Data Analytics to Boost Customer Acquisition and Retention

The customer is the most important asset any business depends on. There is no single business that
can claim success without first having to establish a solid customer base. However, even with a
customer base, a business cannot afford to disregard the high competition it faces. If a business is
slow to learn what customers are looking for, then it is very easy to begin offering poor quality
products. In the end, loss of clientele will result, and this creates an adverse overall effect on business
success. The use of big data allows businesses to observe various customer related patterns and
trends. Observing customer behaviour is important to trigger loyalty.

6. Using BDA to Solve Advertisers Problem & Offer Marketing Insights

Big data analytics can help change all business operations. This includes the ability to match customer
expectation, changing company’s product line and of course ensuring that the marketing campaigns
are powerful.

7. Big Data Analytics as a Driver of Innovations and Product Development

Another huge advantage of big data is the ability to help companies innovate and redevelop their
products.

Prof. Prasad Patil,


Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 1: Introduction) 6

Challenges of Big Data Analytics:


Following are the challenges of Big Data Analytics
1. Need For Synchronization Across Disparate Data Sources
As data sets are becoming bigger and more diverse, there is a big challenge to incorporate them into
an analytical platform. If this is overlooked, it will create gaps and lead to wrong messages & insights.

2. Acute Shortage of Professionals Who Understand Big Data Analysis


The analysis of data is important to make this voluminous amount of data being produced in every
minute, useful. With the exponential rise of data, a huge demand for big data scientists and Big Data
analysts has been created in the market. It is important for business organizations to hire a data
scientist having skills that are varied as the job of a data scientist is multidisciplinary. Another major
challenge faced by businesses is the shortage of professionals who understand Big Data analysis.
There is a sharp shortage of data scientists in comparison to the massive amount of data being
produced.

3. Getting Meaningful Insights Through the Use of Big Data Analytics


It is imperative for business organizations to gain important insights from Big Data analytics, and also
it is important that only the relevant department has access to this information. A big challenge faced
by the companies in the Big Data analytics is mending this wide gap in an effective manner.

4. Getting Voluminous Data into The Big Data Platform


It is hardly surprising that data is growing with every passing day. This simply indicates that
business organizations need to handle a large amount of data on daily basis. The amount and
variety of data available these days can overwhelm any data engineer & that is why it is considered
vital to make data accessibility easy and convenient for brand owners and managers.

5. Uncertainty Of Data Management Landscape


With the rise of Big Data, new technologies and companies are being developed every day.
However, a big challenge faced by the companies in the Big Data analytics is to find out which
technology will be best suited to them without the introduction of new problems and potential risks.

6. Data Storage and Quality


Business organizations are growing at a rapid pace. With the tremendous growth of the companies
and large business organizations, increases the amount of data produced. The storage of this massive
amount of data is becoming a real challenge for everyone. Popular data storage options like data
lakes/ warehouses are commonly used to gather and store large quantities of unstructured and
structured data in its native format. The real problem arises when a data lakes/ warehouse tries to
combine unstructured and inconsistent data from diverse sources, it encounters errors. Missing data,
inconsistent data, logic conflicts, and duplicates data all result in data quality challenges.

7. Security And Privacy of Data


Once business enterprises discover how to use Big Data, it brings them a wide range of possibilities
and opportunities. However, it also involves the potential risks associated with big data when it
comes to the privacy and the security of the data. The Big Data tools used for analysis and storage
utilizes the data disparate sources. This eventually leads to a high risk of exposure of the data, making
it vulnerable. Thus, the rise of voluminous amount of data increases privacy and security concerns.

Prof. Prasad Patil,


Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 1: Introduction) 7

Data Analytics:
Data analytics is a process of practicing of examination of raw data to identify trends, draw
conclusions, and extract meaningful information.

Data analytics helps to manage qualitative and quantitative data to enable discovery, simplify
organization, support governance, and generate insights for a business.

Types of Data Analytics


Data analytics can be broadly categorized into the following 4 types:

1. Descriptive Analytics: This type of analysis is focused on understanding and describing


historical data. It answers the question, "What has happened?" Depending on the scenario,
some data analysts use descriptive analytics as a summary to support investigations and
analysis from other types of analytics. This can be tagged as “best practice” because it
explains the results from other analytics regarding historical data.
Example: Analyzing sales data from the past quarter to see the performance of different
products.

2. Diagnostic Analytics: Diagnostic Analytics helps in understanding why something happened


by analyzing data in greater depth. It answers, "Why did it happen?". It helps to answer any
question or for the solution of any problem. We try to find any dependency and pattern in
the historical data of the particular problem. Common techniques used for Diagnostic
Analytics are Data discovery, Data mining, Correlations.
Example: If sales dropped in a particular month, diagnostic analytics might reveal that this
was due to stock shortages or a marketing campaign's failure.

3. Predictive Analytics: Predictive analytics uses historical data to predict future trends and
behaviours. It answers, "What could happen?". predictive analytics uses data to determine
the probable outcome of an event or a likelihood of a situation occurring. Techniques used
are Machine learning, statistical modelling, and forecasting techniques.
Example: Predicting customer churn based on past behaviour patterns.

4. Prescriptive Analytics: Prescriptive Analytics helps in recommending actions that can help
achieve desired outcomes. It answers, "What should we do?". Prescriptive analytics goes
beyond predicting future outcomes by also suggesting action benefits from the predictions
and showing the decision maker the implication of each decision option. Prescriptive
Analytics not only anticipates what will happen and when to happen but also why it will
happen. Further, Prescriptive Analytics can suggest decision options on how to take
advantage of a future opportunity or mitigate a future risk and illustrate the implication of
each decision option. Techniques used are Optimization algorithms, simulation, and decision
analysis.
Example: Suggesting the best pricing strategy for products to maximize profit.

Prof. Prasad Patil,


Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 1: Introduction) 8

Importance of Data Analytics


Data analytics is crucial in today’s data-driven world, helping businesses, researchers, and individuals
make informed decisions. Here’s why it is important:

1. Better Decision-Making: Data-driven insights help organizations make informed decisions


rather than relying on intuition.
Example: A retail company can use sales data to determine which products to stock more.
2. Improving Efficiency and Productivity: It Identifies inefficiencies in processes and suggests
improvements.
Example: Manufacturing companies use analytics to reduce production downtime and
optimize supply chains.
3. Enhancing Customer Experience: It helps businesses understand customer behaviour,
preferences, and feedback.
Example: E-commerce platforms use recommendation algorithms to personalize product
suggestions.
4. Predicting Future Trends: Predictive analytics allows businesses to anticipate market trends
and customer demands.
Example: Financial institutions use predictive modelling to detect fraudulent transactions.
5. Optimizing Business Strategies: It helps businesses create effective marketing strategies and
improve operational efficiency.
Example: Companies analyze social media engagement to refine their advertising strategies.
6. Cost Reduction: Identifies cost-saving opportunities by analyzing financial data.
Example: Airlines use analytics to optimize fuel consumption and reduce operational costs.
7. Risk Management: It helps in identifying potential risks and mitigating them before they
become serious.
Example: Insurance companies assess risk profiles using analytics to determine policy pricing.
8. Enhancing Research and Innovation: In academia and industry, analytics aids in scientific
discoveries and innovative solutions.
Example: In healthcare, data analytics is used to discover new drug formulations and
treatments.
9. Competitive Advantage: Companies leveraging analytics gain an edge over competitors by
understanding market trends and customer needs.
Example: Tech giants like Google and Amazon dominate the market by using advanced data
analytics.
10. Personalized Learning and Education: In education, analytics helps track student
performance and design personalized learning plans.
Example: Adaptive learning platforms use analytics to adjust content based on a student’s
progress.

Data analytics is transforming industries by providing valuable insights, optimizing processes, and
driving innovation. Whether in business, healthcare, education, or research, leveraging data analytics
is key to staying ahead in a competitive and evolving world.

Prof. Prasad Patil,


Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 1: Introduction) 9

Life Cycle of Data Analytics


The Data Analytics Life Cycle consists of a structured approach to analyzing data and deriving insights.
It includes the following key phases:

1. Data Discovery (Problem Identification)


• Objective: Define the problem and understand the goals of the analysis.
• Activities: Identify business or research problems.
o Gather domain knowledge.
o Understand available data sources.
Example: A company wants to analyze customer purchasing behaviour to improve sales.

2. Data Collection (Data Acquisition)


• Objective: Gather relevant data from various sources.
• Activities:
o Collect data from databases, APIs, sensors, surveys, or external sources.
o Ensure data is relevant and reliable.
Example: Gathering customer purchase history from an e-commerce database.

3. Data Preparation (Cleaning & Transformation)


• Objective: Clean and preprocess raw data for analysis.
• Activities:
o Remove duplicates, handle missing values, and correct inconsistencies.
o Convert data into a structured format.
o Perform feature engineering (creating new variables from existing data).
Example: Filling in missing customer age data with an average value.

Prof. Prasad Patil,


Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 1: Introduction) 10

4. Model Planning

• Select the variables (features) most relevant to solving the problem.


• Choose appropriate algorithms (e.g., regression, clustering, classification) based on the
problem type.
• Develop a roadmap for the modelling phase, outlining how the data will be used.
• Create an initial hypothesis about how the model will behave with the chosen data.
• Use exploratory data analysis (EDA) techniques like correlation matrices and scatter plots to
understand relationships in the data.
Model Planning is a crucial phase in the Data Analytics Life Cycle, where we determine the approach,
techniques, and tools for analyzing data. It helps in selecting the right model to solve a problem
efficiently.
Model Planning is a critical step that lays the foundation for successful data analysis and machine
learning implementation. A well-planned approach ensures efficient data processing, accurate
predictions, and valuable insights for decision-making

5. Model Building

• Build models using the selected algorithms and the prepared dataset.
• Train models by feeding them the training dataset to allow them to learn patterns.
• Use a test dataset to validate model performance and avoid overfitting.
• Iterate and refine the model by adjusting parameters for better accuracy.
Model Building is the phase in the Data Analytics Life Cycle where the selected model is developed,
trained, and evaluated using data. This step involves implementing the chosen algorithm, tuning it for
better performance, and preparing it for deployment.
Model Building is the core of data analytics, where raw data is transformed into a powerful predictive
system. A well-trained and optimized model ensures accuracy, efficiency, and business impact.

6. Communication of Results

• Visualize results through charts, graphs, dashboards, and other visuals to make insights
understandable.
• Present key findings to stakeholders in a concise, actionable format.
• Summarize the insights derived from the data and explain how they align with business
goals.
• Provide recommendations on actions based on the data analysis.
• Prepare a comprehensive report that includes all aspects of the data analysis process,
findings, and conclusions.

7. Operationalize

• Deploy the model into production environments, integrating it into business processes.
• Automate tasks like decision-making or predictions based on the model’s insights.
• Continuously monitor model performance to ensure it remains accurate and relevant as new
data becomes available.
• Update or retrain the model as needed to accommodate changing data trends.
• Document the deployment and maintenance processes to ensure long-term usability.

Prof. Prasad Patil,


Department of Computer Applications,
KLE Tech University, Belagavi.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy