Unit 1 (Chapter 1) - Introduction
Unit 1 (Chapter 1) - Introduction
Big Data is a type of data which has complex, voluminous and large data sets, which the traditional
data processing software applications cannot handle such huge data. These datasets have to be
processed and analysed to uncover valuable information that can benefit businesses and
organizations.
1991: The World Wide Web was created, dramatically increasing digital data generation.
1997: NASA coined the term "Big Data" in reference to handling large datasets.
2000s: Companies like Google, Amazon, and Facebook began collecting vast amounts of user
data, requiring more advanced processing systems.
2010: Apache Hadoop and Spark became popular for distributed computing.
2012: The "3 Vs of Big Data" (Volume, Velocity, Variety) were formally defined by Gartner.
2015-Present: AI and Machine Learning integrated with Big Data analytics, leading to
breakthroughs in healthcare, finance, and marketing.
Definition: Big Data involves enormous amounts of data, often measured in terabytes, petabytes,
or even exabytes.
Example: Social media platforms generate hundreds of terabytes of data daily (e.g., Facebook
processes 500+ terabytes of data per day).
Definition: Big Data is generated and processed at high speed, requiring real-time or near-real-
time analysis.
Example: Stock market transactions, IoT sensor data, and online streaming services need instant
processing to be useful.
Examples:
Definition: Big Data can be incomplete, inconsistent, or misleading, requiring data cleansing and
validation.
Example: Social media sentiment analysis requires removing fake news, spam, and irrelevant
posts to ensure meaningful insights.
5. Value
Definition: The ultimate goal of Big Data is to extract valuable insights for decision-making.
Example:
Healthcare uses Big Data to predict disease outbreaks and improve patient care.
7. Visualization: Making complex Big Data easier to understand through graphs, charts, and
dashboards.
8. Volatility: Data relevance changes quickly, requiring fast processing and storage strategies.
a) Structured data
Structured is one of the types of big data and by structured data, we mean data that can be
processed, stored, and retrieved in a fixed format. It refers to highly organized information that
can be readily and seamlessly stored and accessed from a database by simple search engine
algorithms. For instance, the employee table in a company database will be structured as the
employee details, their job positions, their salaries, etc., will be present in an organized manner.
Example:
• Banking transactions
• Employee records
b) Unstructured data
Unstructured data refers to the data that lacks any specific form or structure whatsoever. This
makes it very difficult and time-consuming to process and analyze unstructured data. Email is an
example of unstructured data. Structured and unstructured are two important types of big data.
Example:
• Handwritten notes
c) Semi-structured data
Semi structured is the third type of big data. Semi-structured data pertains to the data containing
both the formats mentioned above, that is, structured and unstructured data. To be precise, it refers
to the data that although has not been classified under a particular repository (database), yet
contains vital information or tags that segregate individual elements within the data. Thus we come
Example:
• Web logs
The importance of big data does not revolve around how much data a company has but how a
company utilizes the collected data. Every company uses data in its own way; the more efficiently a
company uses its data, the more potential it has to grow. The company can take data from any source
and analyze it to find answers which will enable:
1. Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based Analytics can bring cost
advantages to business when large amounts of data are to be stored and these tools also help in
identifying more efficient ways of doing business.
2. Time Reductions: The high speed of tools like Hadoop and in-memory analytics can easily identify
new sources of data which helps businesses analyzing data immediately and make quick decisions
based on the learning.
3. Understand the market conditions: By analyzing big data you can get a better understanding of
current market conditions. For example, by analyzing customers’ purchasing behaviours, a company
can find out the products that are sold the most and produce products according to this trend. By
this, it can get ahead of its competitors.
4. Control online reputation: Big data tools can do sentiment analysis. Therefore, you can get
feedback about who is saying what about your company. If you want to monitor and improve the
online presence of your business, then, big data tools can help in all this.
The customer is the most important asset any business depends on. There is no single business that
can claim success without first having to establish a solid customer base. However, even with a
customer base, a business cannot afford to disregard the high competition it faces. If a business is
slow to learn what customers are looking for, then it is very easy to begin offering poor quality
products. In the end, loss of clientele will result, and this creates an adverse overall effect on business
success. The use of big data allows businesses to observe various customer related patterns and
trends. Observing customer behaviour is important to trigger loyalty.
Big data analytics can help change all business operations. This includes the ability to match customer
expectation, changing company’s product line and of course ensuring that the marketing campaigns
are powerful.
Another huge advantage of big data is the ability to help companies innovate and redevelop their
products.
Data Analytics:
Data analytics is a process of practicing of examination of raw data to identify trends, draw
conclusions, and extract meaningful information.
Data analytics helps to manage qualitative and quantitative data to enable discovery, simplify
organization, support governance, and generate insights for a business.
3. Predictive Analytics: Predictive analytics uses historical data to predict future trends and
behaviours. It answers, "What could happen?". predictive analytics uses data to determine
the probable outcome of an event or a likelihood of a situation occurring. Techniques used
are Machine learning, statistical modelling, and forecasting techniques.
Example: Predicting customer churn based on past behaviour patterns.
4. Prescriptive Analytics: Prescriptive Analytics helps in recommending actions that can help
achieve desired outcomes. It answers, "What should we do?". Prescriptive analytics goes
beyond predicting future outcomes by also suggesting action benefits from the predictions
and showing the decision maker the implication of each decision option. Prescriptive
Analytics not only anticipates what will happen and when to happen but also why it will
happen. Further, Prescriptive Analytics can suggest decision options on how to take
advantage of a future opportunity or mitigate a future risk and illustrate the implication of
each decision option. Techniques used are Optimization algorithms, simulation, and decision
analysis.
Example: Suggesting the best pricing strategy for products to maximize profit.
Data analytics is transforming industries by providing valuable insights, optimizing processes, and
driving innovation. Whether in business, healthcare, education, or research, leveraging data analytics
is key to staying ahead in a competitive and evolving world.
4. Model Planning
5. Model Building
• Build models using the selected algorithms and the prepared dataset.
• Train models by feeding them the training dataset to allow them to learn patterns.
• Use a test dataset to validate model performance and avoid overfitting.
• Iterate and refine the model by adjusting parameters for better accuracy.
Model Building is the phase in the Data Analytics Life Cycle where the selected model is developed,
trained, and evaluated using data. This step involves implementing the chosen algorithm, tuning it for
better performance, and preparing it for deployment.
Model Building is the core of data analytics, where raw data is transformed into a powerful predictive
system. A well-trained and optimized model ensures accuracy, efficiency, and business impact.
6. Communication of Results
• Visualize results through charts, graphs, dashboards, and other visuals to make insights
understandable.
• Present key findings to stakeholders in a concise, actionable format.
• Summarize the insights derived from the data and explain how they align with business
goals.
• Provide recommendations on actions based on the data analysis.
• Prepare a comprehensive report that includes all aspects of the data analysis process,
findings, and conclusions.
7. Operationalize
• Deploy the model into production environments, integrating it into business processes.
• Automate tasks like decision-making or predictions based on the model’s insights.
• Continuously monitor model performance to ensure it remains accurate and relevant as new
data becomes available.
• Update or retrain the model as needed to accommodate changing data trends.
• Document the deployment and maintenance processes to ensure long-term usability.