0% found this document useful (0 votes)

24 views21 pages

Chapter 1

Uploaded by

ganeshafulpagare14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views21 pages

Chapter 1

Uploaded by

ganeshafulpagare14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Unit-I Introduction to Big Data [10L] Max Marks 14

Data Storage and Analysis:

What is big data?
Big data refers to extremely large data sets, either structured or not, that
professionals analyze to discover trends, patterns, or behaviors. It's unique in
that it has what professionals describe as the three Vs—volume, velocity, and
variety—in such large amounts that traditional data management systems
struggle to store or analyze the data successfully. Therefore, scalable
architecture must be available to manage, store, and analyze big data sets.
Big data storage is a scalable architecture that allows businesses to collect,
manage, and analyze immense sets of data in real-time. The design of big data
storage solutions is specifically tailored to address the speed, volume, and
complexity of the data sets. Some examples of big data storage options are:
• Data lakes are centralized storage solutions that process and secure data
in its native format without size limitations. They can enable different
forms of smart analytics, such as machine learning and visualizations.
• Data warehouses aggregate data sets from different sources into a single
storage unit for robust analysis, supporting data mining, artificial
intelligence (AI), and more. Unlike a data lake, data warehouses have a
three-tier structure for storing data.
• Data pipelines gather raw data and transport it into repositories, such as
lakes or warehouses.
Data lakes, warehouses, and pipelines exist within several different storage
options, including:
• Cloud-based storage system is where a business outsources the storage
of its data to a vendor that operates a cloud storage system.
• Colocation storage is the process of a business renting space to store its
servers rather than having it on-site.
• On-premise storage is where a business manages its network and
servers on-site. This can include hardware, such as servers, that houses
the data at an organization’s premises.
Characteristics of Big Data:

Big Data contains a large amount of data that is not being processed by
traditional data storage or the processing unit. It is used by many multinational
companies to process the data and business of many organizations. The data
flow would exceed 150 exabytes per day before replication.
There are five v's of Big Data that explains the characteristics.
5 V's of Big Data
o Volume
o Veracity
o Variety
o Value
o Velocity

Volume
The name Big Data itself is related to an enormous size. Big Data is a vast
'volumes' of data generated from many sources daily, such as business
processes, machines, social media platforms, networks, human
interactions, and many more.
Facebook can generate approximately a billion messages, 4.5 billion times that
the "Like" button is recorded, and more than 350 million new posts are
uploaded each day. Big data technologies can handle large amounts of data.
Variety
Big Data can be structured, unstructured, and semi-structured that are being
collected from different sources. Data will only be collected
from databases and sheets in the past, But these days the data will comes in
array forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc.

The data is categorized as below:

a. Structured data: In Structured schema, along with all the required
columns. It is in a tabular form. Structured Data is stored in the relational
database management system.
b. Semi-structured: In Semi-structured, the schema is not appropriately
defined, e.g., JSON, XML, CSV, TSV, and email. OLTP (Online Transaction
Processing) systems are built to work with semi-structured data. It is stored in
relations, i.e., tables.
c. Unstructured Data: All the unstructured files, log files, audio files,
and image files are included in the unstructured data. Some organizations have
much data available, but they did not know how to derive the value of data
since the data is raw.
d. Quasi-structured Data:The data format contains textual data with
inconsistent data formats that are formatted with effort and time with some
tools.
Example: Web server logs, i.e., the log file is created and maintained by some
server that contains a list of activities.
Veracity
Veracity means how much the data is reliable. It has many ways to filter or
translate the data. Veracity is the process of being able to handle and manage
data efficiently. Big Data is also essential in business development.
For example, Facebook posts with hashtags.
Value
Value is an essential characteristic of big data. It is not the data that we process
or store. It is valuable and reliable data that we store, process, and
also analyze.
Velocity
Velocity plays an important role compared to others. Velocity creates the speed
by which the data is created in real-time. It contains the linking of
incoming data sets speeds, rate of change, and activity bursts. The primary
aspect of Big Data is to provide demanding data rapidly.
Big data velocity deals with the speed at the data flows from sources
like application logs, business processes, networks, and social media sites,
sensors, mobile devices, etc.
Big Data Analytics:
Big data analytics describes the process of uncovering trends, patterns, and
correlations in large amounts of raw data to help make data-informed
decisions. These processes use familiar statistical analysis techniques—like
clustering and regression—and apply them to more extensive datasets with the
help of newer tools. Big data has been a buzz word since the early 2000s, when
software and hardware capabilities made it possible for organizations to handle
large amounts of unstructured data. Since then, new technologies—from
Amazon to smartphones—have contributed even more to the substantial
amounts of data available to organizations. With the explosion of data, early
innovation projects like Hadoop, Spark, and NoSQL databases were created for
the storage and processing of big data. This field continues to evolve as data
engineers look for ways to integrate the vast amounts of complex information
created by sensors, networks, transactions, smart devices, web usage, and
more.
1. Collect Data
Data collection looks different for every organization. With today’s technology,
organizations can gather both structured and unstructured data from a variety
of sources — from cloud storage to mobile applications to in-store IoT sensors
and beyond. Some data will be stored in data warehouses where business
intelligence tools and solutions can access it easily. Raw or unstructured data
that is too diverse or complex for a warehouse may be assigned metadata and
stored in a data lake.
2. Process Data
Once data is collected and stored, it must be organized properly to get accurate
results on analytical queries, especially when it’s large and unstructured.
Available data is growing exponentially, making data processing a challenge for
organizations. One processing option is batch processing, which looks at large
data blocks over time. Batch processing is useful when there is a longer
turnaround time between collecting and analyzing data. Stream
processing looks at small batches of data at once, shortening the delay time
between collection and analysis for quicker decision-making. Stream
processing is more complex and often more expensive.
3. Clean Data
Data big or small requires scrubbing to improve data quality and get stronger
results; all data must be formatted correctly, and any duplicative or
irrelevant data must be eliminated or accounted for. Dirty data can obscure and
mislead, creating flawed insights.
4. Analyze Data
Getting big data into a usable state takes time. Once it’s ready, advanced
analytics processes can turn big data into big insights. Some of these big data
analysis methods include:
• Data mining sorts through large datasets to identify patterns and
relationships by identifying anomalies and creating data clusters.
• Predictive analytics uses an organization’s historical data to make
predictions about the future, identifying upcoming risks and
opportunities.
• Deep learning imitates human learning patterns by using artificial
intelligence and machine learning to layer algorithms and find patterns
in the most complex and abstract data.
Use Cases of Big Data
1. Social Media and Entertainment: You must have witnessed streaming
service apps such as Netflix recommending shows and movies based on
your previous searches and what you have watched. It is done using the
concept of Big Data. Netflix and other streaming service apps create a
custom user profile, where they store the data of users, including their
search history, their history, which genre they watch the most, at what
time of day they prefer to watch the most, their streaming time per day,
etc. analyze it and accordingly gives recommendations. It helps in a
better streaming experience for the users.
2. Shopping: Websites like Amazon, Flipkart, etc., also use Big Data to
recommend products based on your previous purchases, search history,
and interests. It is done to maximize their profits and provide a better
shopping experience to their customers.
3. Education: Big Data helps in analyzing and monitoring the behavior and
activities of students, like the time they need to answer a question, the
number of questions skipped, and the difficulty level of the questions
that are skipped, and thus helps students to analyze their overall
preparation, weak topics, strong topics, etc.
4. Healthcare: Healthcare sectors use Big Data to track and analyze the
health and fitness of the patients, the number of visits, the number of
skipped appointments a patient, etc. Mass outbreaks of diseases can be
predicted by analyzing the data and using algorithms.
5. Transportation: Traffic control by collecting and analyzing the data from
several sensors and cameras installed on roads and highways. Accident-
prone areas can be detected with the help of Big Data analysis; thus,
required measures can be taken to avoid accidents.
Evolution of Big Data
o The earliest record to track and analyze data was not decades back but
thousands of years back when accounting was first introduced in
Mesopotamia.
o In the 20th century, IBM developed the first large-scale data project,
punch carding systems, which tracked the information of millions of
Americans.
o With the emergence of the World Wide Web and supercomputers in the
1990s, the creation of data on a large scale started to grow at an
exponential rate. It was in the early 1990s when the term 'Big Data' was
first used.
o The two main challenges regarding 'Big Data' were storing and
processing such a huge volume of data.
o In 2005, Yahoo created the open-source framework Hadoop, which
stores and processes large data sets.
o The storage solution in Hadoop was named HDFS (Hadoop Distributed
File System), and the processing solution was named MapReduce.
o In 2008, Cloudera became the first company to provide commercial
Hadoop distribution.
o In 2013, the Creators of Apache Spark founded a company, Databricks,
which offers a platform for Big Data and Machine Learning solutions.
o Over the past few years, top Cloud providers such as Microsoft, Google,
and Amazon also started to provide Big Data solutions. These Cloud
providers made it much easier for users and companies to work on Big
Data.
Importance of Big Data
1. A better understanding of market conditions.
2. Time and cost saving.
3. Solving advertisers' problems.
4. Offering better market insights.
5. Boosting customer acquisition and retention.
Applications of Big Data
Big Data finds applications in various sectors, such as-
1. Banking and Security
2. Social Media and Entertainment
3. E-commerce websites
4. HealthCare
5. Education
6. Transportations
Big Data Analytics
Big Data Analytics uses modern tools and techniques to extract valuable
insights, trends, hidden patterns, and relations with the help of large sets of
data, which can be structured, semi-structured, or unstructured. It helps
in better decision-making and optimizes business operations.
Let's consider the example of YouTube, which has over 2.6 billion monthly
active users. It generates a huge amount of data every day. With the help of
this data, it recommends videos based on what you have watched previously,
your likes, shares, etc. What enables this is the tools and frameworks resulting
from Big Data Analytics.
The importance of big data analytics
Big data analytics is important because it helps companies leverage their data
to identify opportunities for improvement and optimisation. Across different
business segments, increasing efficiency leads to overall more intelligent
operations, higher profits, and satisfied customers. Big data analytics helps
companies reduce costs and develop better, customer-centric products and
services.
Data analytics helps provide insights that improve the way our society
functions. In health care, big data analytics not only keeps track of and analyses
individual records but it plays a critical role in measuring outcomes on a global
scale. During the COVID-19 pandemic, big data-informed health ministries
within each nation’s government on how to proceed with vaccinations and
devised solutions for mitigating pandemic outbreaks in the future.
Benefits of big data analytics
Incorporating big data analytics into a business or organisation has several
advantages. These include:
• Cost reduction: Big data can reduce costs in storing all business data in
one place. Tracking analytics also helps companies find ways to work
more efficiently to cut costs wherever possible.
• Product development: Developing and marketing new products,
services, or brands is much easier when based on data collected from
customers’ needs and wants. Big data analytics also helps businesses
understand product viability and to keep up with trends.
• Strategic business decisions: The ability to constantly analyse data helps
businesses make better and faster decisions, such as cost and supply
chain optimisation.
• Customer experience: Data-driven algorithms help marketing efforts
(targeted ads, for example) and increase customer satisfaction by
delivering an enhanced customer experience.
• Risk management: Businesses can identify risks by analysing data
patterns and developing solutions for managing those risks.

Big data in the real world Big data analytics helps companies and governments
make sense of data and make better, informed decisions.
• Entertainment: Providing a personalised recommendation of movies and
music according to a customer’s preferences has been transformative for the
entertainment industry (think Spotify and Netflix).
• Education: Big data helps schools and educational technology companies
develop new curriculums while improving existing plans based on needs and
demands.
• Health care: Monitoring patients’ medical histories helps doctors detect and
prevent diseases.
• Government: Big data can be used to collect data from CCTV and traffic
cameras, satellites, body cameras and sensors, emails, calls, and more, to help
manage the public sector.
• Marketing: Customer information and preferences can be used to create
targeted advertising campaigns with a high return on investment (ROI).
• Banking: Data analytics can help track and monitor illegal money laundering.

Types of big data analytics (+ examples)

Four main types of big data analytics support and inform different business
decisions.
1. Descriptive analytics
Descriptive analytics refers to data that can be easily read and interpreted. This
data helps create reports and visualise information that can detail company
profits and sales.
Example: During the pandemic, a leading pharmaceutical company conducted
data analysis on its offices and research labs. Descriptive analytics helped them
identify consolidated unutilised spaces and departments, saving the company
millions of pounds.
2. Diagnostics analytics
Diagnostics analytics helps companies understand why a problem occurred. Big
data technologies and tools allow users to mine and recover data that helps
dissect an issue and prevent it from happening in the future.
Example: An online retailer’s sales have decreased even though customers
continue to add items to their shopping carts. Diagnostics analytics helped to
understand that the payment page was not working correctly for a few weeks.
3. Predictive analytics
Predictive analytics looks at past and present data to make predictions. With
artificial intelligence (AI), machine learning, and data mining, users can analyse
the data to predict market trends.
Example: In the manufacturing sector, companies can use algorithms based on
historical data to predict if or when a piece of equipment will malfunction or
break down.
4. Prescriptive analytics
Prescriptive analytics solves a problem, relying on AI and machine learning to
gather and use data for risk management.
Example: Within the energy sector, utility companies, gas producers, and
pipeline owners identify factors that affect the price of oil and gas to hedge
risks.
Big data analytics tools
Harnessing all of that data requires tools. Thankfully, technology has advanced
so that many intuitive software systems are available for data analysts to use.
• Hadoop: An open-source framework that stores and processes big data
sets. Hadoop can handle and analyse structured and unstructured data.
• Spark: An open-source cluster computing framework for real-time
processing and data analysis.
• Data integration software: Programs that allow big data to be
streamlined across different platforms, such as MongoDB, Apache,
Hadoop, and Amazon EMR.
• Stream analytics tools: Systems that filter, aggregate, and analyse data
that might be stored in different platforms and formats, such as Kafka.
• Distributed storage: Databases that can split data across multiple servers
and can identify lost or corrupt data, such as Cassandra.
• Predictive analytics hardware and software: Systems that process large
amounts of complex data, using machine learning and algorithms to
predict future outcomes, such as fraud detection, marketing, and risk
assessments.
• Data mining tools: Programs that allow users to search within structured
and unstructured big data.
• NoSQL databases: Non-relational data management systems ideal for
dealing with raw and unstructured data.
• Data warehouses: Storage for large amounts of data collected from
many different sources, typically using predefined schemas.
Analytics architecture design:-
Requirement for new analytical architecture:-
Analytics architecture refers to the systems, protocols, and technology used to
collect, store, and analyze data. The concept is an umbrella term for a variety of
technical layers that allow organizations to more effectively collect, organize,
and parse the multiple data streams they utilize.
When building analytics architecture, organizations need to consider both the
hardware — how data will be physically stored — as well as the software that
will be used to manage and process it.
Analytics architecture also focuses on multiple layers, starting with data
warehouse architecture, which defines how users in an organization can access
and interact with data. Storage is a key aspect of creating a reliable analytics
process, as it will establish both how your data is organized, who can access it,
and how quickly it can be referenced.
Structures like data marts, data lakes, and more standard warehouses are all
popular foundations for modern analytics architecture. On the user side,
creating easier processes for access means including tools like natural language
processing and ad-hoc analytics capabilities to reduce the need for specialized
workers and wasted resources. When seen as a whole, analytics architecture is
a key aspect of business intelligence.
How can I Use Analytics Architecture?
No matter what kind of organization you have, data analytics is becoming a
central part of business operations. The fast-rising amount of data your
multiple touch points collect means that using a simple spreadsheet is quickly
becoming unfeasible.
Analytics architecture helps you not just store your data but plan the optimal
flow for data from capture to analysis. Understanding these steps can give you
a better idea of your hardware and logistics needs and clue you in on the best
tools to use.
One important use for analytics architecture in your organization is the design
and construction of your preferred data storage and access mechanism. Many
companies prefer a more structured approach, using traditional data
warehouses or data mart models to keep data more organized and easily
sorted for access later.
Others prefer to keep data in a single storage structure such as a data lake,
which comes with its own benefits but makes data slightly less accessible and
organized. Regardless, your analytics platform architecture will largely define
how your organization interacts with data, as well as how you gain insights
from it.
There is need of workspace to Data Science projects which are basically built
for experimenting with data,with flexible as well as agile data architectures.
Number of organizations still posses data warehouses which give excellent
support for reporting in traditional way and simplified data analysis activities
but problems arise when there is need of more robust analysis.
Fig 1 . illustrates typical data architecture as well as various challenges it
present to data scientist and other users who are trying to implement
advanced analysis.This section examines the data flow to the Data Scientist and
how this individual fits into the process of getting data to analyze on projects.

Fig 1. Typical analytic architecture

For the purpose of data sources to be loaded into the data warehouse , there is
need that the data should be well understood , normalized with the suitable
data type definitions and in structured format.Although this kind of
centralization enables security, backup, and failover of highly critical data, it
also means that data typically must go through significant preprocessing and
checkpoints before it can enter this sort of controlled environment, which does
not lend itself to data exploration and iterative analytics.
As a result of this level of control on the EDW, additional local systems may
emerge in the form of departmental warehouses and local data marts that
business users create to accommodate their need for flexible analysis. These
local data marts may not have the same constraints for security and structure
as the main EDW and allow users to do some level of more in-depth
analysis.However, these one-off systems reside in isolation, often are not
synchronized or integrated with other data stores, and may not be backed up.
Once in the data warehouse, data is read by additional applications across the
enterprise for BI and reporting purposes. These are high-priority operational
processes getting critical data feeds from the data warehouses and
repositories.
At the end of this workflow, analysts get data provisioned for their downstream
analytics.Because users generally are not allowed to run custom or intensive
analytics on production databases, analysts create data extracts from the EDW
to analyze data offline in R or other local analytical tools. Many times these
tools are limited to in-memory analytics on desktops analyzing samples of data,
rather than the entire population of a datasets. Because these analyses are
based on data extracts, they reside in a separate location, and the results of the
analysis — and any insights on the quality of the data or anomalies — rarely
are fed back into the main data repository.
Because new data sources slowly accumulate in the EDW due to the rigorous
validation and data structuring process, data is slow to move into the EDW, and
the data schema is slow to change.
Departmental data warehouses may have been originally designed for a
specific purpose and set of business needs, but over time evolved to house
more and more data, some of which may be forced into existing schemas to
enable BI and the creation of OLAP cubes for analysis and reporting. Although
the EDW achieves the objective of reporting and sometimes the creation of
dashboards, EDWs generally limit the ability of analysts to iterate on the data in
a separate nonproduction environment where they can conduct in-depth
analytics or perform analysis on unstructured data.The typical data
architectures just described are designed for storing and processing mission-
critical data, supporting enterprise applications, and enabling corporate
reporting activities. Although reports and dashboards are still important for
organizations, most traditional data architectures inhibit data exploration and
more sophisticated analysis. Moreover, traditional data architectures have
several additional implications for data scientists.
● High-value data is hard to reach and leverage, and predictive analytics and
data mining activities are last in line for data. Because the EDWs are designed
for central data management and reporting, those wanting data for analysis are
generally prioritized after operational processes.
● Data moves in batches from EDW to local analytical tools. This workflow
means that data scientists are limited to performing in-memory analytics (such
as with R, SAS, SPSS, or Excel), which will restrict the size of the datasets they
can use. As such, analysis may be subject to constraints of sampling, which can
skew model accuracy.
● Data Science projects will remain isolated and ad hoc, rather than centrally
managed. The implication of this isolation is that the organization can never
harness the power of advanced analytics in a scalable way, and Data Science
projects will exist as nonstandard initiatives, which are frequently not aligned
with corporate business goals or strategy.All these symptoms of the traditional
data architecture result in a slow “time-to-insight” and lower business impact
than could be achieved if the data were more readily accessible and supported
by an environment that promoted advanced analytics. As stated earlier, one
solution to this problem is to introduce analytic sandboxes to enable data
scientists to perform advanced analytics in a controlled and sanctioned way.
Meanwhile, the current Data Warehousing solutions continue offering
reporting and BI services to support management and mission-critical
operations.

Need of big data frameworks

Why a Big Data Framework?

Frameworks provide structure. The core objective of the Big Data Framework is
to provide a structure for enterprise organisations that aim to benefit from the
potential of Big Data. In order to achieve long-term success, Big Data is more
than just the combination of skilled people and technology – it requires
structure and capabilities.
The Big Data Framework was developed because – although the benefits and
business cases of Big Data are apparent – many organizations struggle to
embed a successful Big Data practice in their organization. The structure
provided by the Big Data Framework provides an approach for organizations
that takes into account all organizational capabilities of a successful Big Data
practice. All the way from the definition of a Big Data strategy, to the technical
tools and capabilities an organization should have.
The main benefits of applying a Big Data framework include:
1. The Big Data Framework provides a structure for organisations that want
to start with Big Data or aim to develop their Big Data capabilities
further.
2. The Big Data Framework includes all organisational aspects that should
be taken into account in a Big Data organization.
3. The Big Data Framework is vendor independent. It can be applied to any
organization regardless of choice of technology, specialisation or tools.
4. The Big Data Framework provides a common reference model that can
be used across departmental functions or country boundaries.
5. The Big Data Framework identifies core and measurable capabilities in
each of its six domains so that the organization can develop over time.
Big Data is a people business. Even with the most advanced computers and
processors in the world, organisations will not be successful without the
appropriate knowledge and skills. The Big Data Framework therefore aims to
increase the knowledge of everyone who is interested in Big Data. The modular
approach and accompanying certification scheme aims to develop knowledge
about Big Data in a similar structured fashion.
The Big Data framework provides a holistic structure toward Big Data. It looks
at the various components that enterprises should consider while setting up
their Big Data organization. Every element of the framework is of equal
importance and organisations can only develop further if they provide equal
attention and effort to all elements of the Big Data framework.
The Structure of the Big Data Framework
The Big Data framework is a structured approach that consists of six core
capabilities that organisations need to take into consideration when setting up
their Big Data organization. The Big Data Framework is depicted in the figure
below:

The Big Data Framework consists of the following six main elements:
1. Big Data Strategy
Data has become a strategic asset for most organisations. The capability to
analyse large data sets and discern pattern in the data can provide
organisations with a competitive advantage. Netflix, for example, looks at user
behaviour in deciding what movies or series to produce. Alibaba, the Chinese
sourcing platform, became one of the global giants by identifying which
suppliers to loan money and recommend on their platform. Big Data has
become Big Business.
In order to achieve tangible results from investments in Big Data, enterprise
organisations need a sound Big Data strategy. How can return on investments
be realised, and where to focus effort in Big Data analysis and analytics? The
possibilities to analyse are literally endless and organisations can easily get lost
in the zettabytes of data. A sound and structured Big Data strategy is the first
step to Big Data success.
2. Big Data Architecture
In order to work with massive data sets, organisations should have the
capabilities to store and process large quantities of data. In order to achieve
this, the enterprise should have the underlying IT infrastructure to facilitate Big
Data. Enterprises should therefore have a comprehensive Big Data
architecture to facilitate Big Data analysis. How should enterprises design and
set up their architecture to facilitate Big Data? And what are the requirements
from a storage and processing perspective?
The Big Data Architecture element of the Big Data Framework considers the
technical capabilities of Big Data environments. It discusses the various roles
that are present within a Big Data Architecture and looks at the best practices
for design. In line with the vendor-independent structure of the Framework,
this section will consider the Big Data reference architecture of the National
Institute of Standards and Technology (NIST).
3. Big Data Algorithms
A fundamental capability of working with data is to have a thorough
understanding of statistics and algorithms. Big Data professionals therefore
need to have a solid background in statistics and algorithms to deduct insights
from data. Algorithms are unambiguous specifications of how to solve a class
of problems. Algorithms can perform calculations, data
processing and automated reasoning tasks. By applying algorithms to large
volumes of data, valuable knowledge and insights can be obtained.
The Big Data algorithms element of the framework focuses on the (technical)
capabilities of everyone who aspires to work with Big Data. It aims to build a
solid foundation that includes basic statistical operations and provides an
introduction to different classes of algorithms.
4. Big Data Processes
In order to make Big Data successful in enterprise organization, it is necessary
to consider more than just the skills and technology. Processes can help
enterprises to focus their direction. Processes bring structure, measurable
steps and can be effectively managed on a day-to-day basis. Additionally,
processes embed Big Data expertise within the organization by following
similar procedures and steps, embedding it as ‘a practice’ of the organization.
Analysis becomes less dependent on individuals and thereby, greatly enhancing
the chances of capturing value in the long term.
5. Big Data Functions
Big Data functions are concerned with the organisational aspects of managing
Big Data in enterprises. This element of the Big Data framework addresses how
organisations can structure themselves to set up Big Data roles and discusses
roles and responsibilities in Big Data organisations. Organisational culture,
organisational structures and job roles have a large impact on the success of
Big Data initiatives. We will therefore review some ‘best practices’ in setting up
enterprise big data
In the Big Data Functions section of the Big Data Framework, the non-technical
aspects of Big Data are covered. You will learn how to set up a Big Data Center
of Excellence (BDCoE). Additionally, it also addresses critical success factors for
starting Big Data project in the organization.
6. Artificial Intelligence
The last element of the Big Data Framework addresses Artificial Intelligence
(AI). One of the major areas of interest in the world today, AI provides a whole
world of potential. In this part of the framework, we address the relation
between Big Data and Artificial Intelligence and outline key characteristics of
AI.
Many organisations are keen to start Artificial Intelligence projects, but most
are unsure where to start their journey. The Big Data Framework takes a
functional view of AI in the context of bringing business benefits to enterprise
organisations. The last section of the framework therefore showcases how AI
follows as a logical next step for organisations that have built up the other
capabilities of the Big Data Framework. The last element of the Big Data
Framework has been depicted as a lifecycle on purposes. Artificial Intelligence
can start to continuously learn from the Big Data in the organization in order to
provide long lasting value.
Challenges of Big Data Analytics:-
Data is a very valuable asset in the world today. The economics of data is based
on the idea that data value can be extracted through analytics. Though Big data
and analytics are still in their initial growth stage, their importance cannot be
undervalued. As big data starts to expand and grow, the Importance of big data
analytics will continue to grow in everyday personal and business lives. In
addition, the size and volume of data are increasing daily, making it important
to address big data daily. Here we will discuss the Challenges of Big Data
Analytics.
According to surveys, many companies are opening up to using big data
analytics in their daily functioning. With the rising popularity of Big data
analytics, it is obvious that investing in this medium will secure the future
growth of companies and brands.
The key to data value creation is Big Data Analytics, so it is important to focus
on that aspect of analytics. Many companies use different methods to employ
Big Data analytics, and there is no magic solution to successfully implementing
this. While data is important, even more important is the process through
which companies can gain insights with their help. Gaining insights from data is
the goal of big data analytics, so investing in a system that can deliver those
insights is extremely crucial and important. Therefore, successful
implementation of big data analytics requires a combination of skills, people,
and processes that can work in perfect synchronization with each other.
Some of the major challenges that big data analytics programs are facing today
include the following:
1. Uncertainty of Data Management Landscape: Because big data is
continuously expanding, new companies and technologies are developed
every day. A big challenge for companies is to find out which technology
works bests for them without introducing new risks and problems.
2. The Big Data Talent Gap: While Big Data is growing, very few experts are
available. This is because Big data is a complex field, and people who
understand this field’s complexity and intricate nature are far from
between. Another major challenge in the field is the talent gap that
exists in the industry
3. Getting data into the big data platform: Data is increasing every single
day. This means that companies have to tackle a limitless amount of data
on a regular basis. The scale and variety of data available today can
overwhelm any data practitioner, which is why it is important to make
data accessibility simple and convenient for brand managers and owners.
4. Need for synchronization across data sources: As data sets become
more diverse, they must be incorporated into an analytical platform. It
can create gaps and lead to wrong insights and messages if ignored.
5. Getting important insights through the use of Big data analytics: It is
important that companies gain proper insights from big data analytics,
and it is important that the correct department has access to this
information. A major challenge in big data analytics is bridging this gap in
an effective fashion.
This article will look at these challenges in a closer manner and understand
how companies can tackle these challenges in an effective fashion.
Implementation of Hadoop infrastructure. Learn Hadoop skills like HBase, Hive,
Pig, and Mahout.
• Challenge 1
The challenge of rising uncertainty in data management: In a world of big
data, the more data you have, the easier it is to gain insights from them.
However, in big data, there are a number of disruptive technology in the world
today, and choosing from them might be a tough task. That is why big data
systems need to support both the operational and, to a great extent, analytical
processing needs of a company. These approaches are generally lumped into
the NoSQL framework category, which differs from the conventional relational
database management system.
• Challenge 2
The gap in experts in big data analytics: An industry completely depends on
the resources it has access to, whether human or material. Some tools for big
data analytics range from traditional relational database tools with alternative
data layouts designed to increase access speed while decreasing the storage
footprint, in-memory analytics, NoSQL data management frameworks, and the
broad Hadoop ecosystem. With so many systems and frameworks, there is a
growing and immediate need for application developers who have knowledge
of all these systems. Despite the fact that these technologies are developing at
a rapid pace, there is a lack of people who possess the required technical skill.
• Challenge 3
The challenge of getting data into the big data platform: Every company is
different and has different amounts of data to deal with. While some
companies are completely data-driven, others might be less so. That is why it is
important to understand these distinctions before finally implementing the
right data plan. Also, not all companies understand the full implication of big
data analytics. Assuming that every company is knowledgeable about the
benefits and growth strategy of business data analytics would seriously impact
the success of this initiative. That is why it is important that business
development analytics are implemented with the knowledge of the company.
• Challenge 4
The challenge of the need for synchronization across data sources: Once data
is integrated into a big platform, data copies are migrated from different
sources at different rates. Schedules can sometimes be out of sync within the
entire system. There are different types of synchrony. It is important that data
is in sync. Otherwise, this can impact the entire process. With so many
conventional data marks and data warehouses, sequences of data extractions,
transformations, and migrations, there is always a risk of data being
unsynchronized.
• Challenge 5
The challenge of getting important insights through Big data analytics: Data is
valuable only as long as companies can gain insights from them. By augmenting
the existing data storage and providing access to end users, big data analytics
needs to be comprehensive and insightful. The data tools must help companies
not just have access to the required information but also eliminate the need for
custom coding. As data grows inside, it is important that companies understand
this need and process it in an effective manner. With the increase in data size
on time and cycle, ensuring the proper adaptation of data is a critical factor in
the success of any company.

Learning Unit 6 - Big Data
No ratings yet
Learning Unit 6 - Big Data
11 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Sample Credit Union Business Development Plan
No ratings yet
Sample Credit Union Business Development Plan
11 pages
Death of Sillicon Valley
No ratings yet
Death of Sillicon Valley
6 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
34 pages
Hadoop 2 & 3 Units Final
No ratings yet
Hadoop 2 & 3 Units Final
27 pages
8 Revolution of Big Data
No ratings yet
8 Revolution of Big Data
18 pages
Unit 1
No ratings yet
Unit 1
54 pages
Big Data Skn
No ratings yet
Big Data Skn
24 pages
BDA Unit 1
No ratings yet
BDA Unit 1
60 pages
Big Data Analysis by deshbandhu
No ratings yet
Big Data Analysis by deshbandhu
368 pages
unit-1-big-data-notes
No ratings yet
unit-1-big-data-notes
40 pages
Module 6_Big Data and NOSQL
No ratings yet
Module 6_Big Data and NOSQL
63 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Big Data Analytics
No ratings yet
Big Data Analytics
127 pages
BIG DATA
No ratings yet
BIG DATA
45 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
UNIT I
No ratings yet
UNIT I
25 pages
Unit 1_BDS_DS307
No ratings yet
Unit 1_BDS_DS307
47 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
BIG DATA ANALTICS (UNIT 1)
No ratings yet
BIG DATA ANALTICS (UNIT 1)
31 pages
BIG DATA Technology: Subtitle
No ratings yet
BIG DATA Technology: Subtitle
34 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
Umbrella Activities in Software Engineering
No ratings yet
Umbrella Activities in Software Engineering
7 pages
Big Data
No ratings yet
Big Data
16 pages
Unit-1 Final sgs
No ratings yet
Unit-1 Final sgs
24 pages
Bda Unit1
No ratings yet
Bda Unit1
19 pages
BDA notes part 1
No ratings yet
BDA notes part 1
11 pages
Big Data Analytics
No ratings yet
Big Data Analytics
83 pages
Big Data-Introduction
No ratings yet
Big Data-Introduction
14 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
IT clas10
No ratings yet
IT clas10
3 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
BIG DATA INTRODUCTION hadoop
No ratings yet
BIG DATA INTRODUCTION hadoop
24 pages
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
No ratings yet
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
75 pages
Mis
No ratings yet
Mis
12 pages
Unit 1 Notes Bda
No ratings yet
Unit 1 Notes Bda
20 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Unit 1 and Unit 2 notes bda
No ratings yet
Unit 1 and Unit 2 notes bda
11 pages
Ayesha CV
No ratings yet
Ayesha CV
1 page
Cohesity Best Practice Data Protection
No ratings yet
Cohesity Best Practice Data Protection
16 pages
Ayush Pawar Resume
No ratings yet
Ayush Pawar Resume
1 page
IMTC634_Data Science_Chapter 11
No ratings yet
IMTC634_Data Science_Chapter 11
22 pages
50 Qlik Sense Interview Questions To Ask To Hire Top Analysts 1726054010278j3s
No ratings yet
50 Qlik Sense Interview Questions To Ask To Hire Top Analysts 1726054010278j3s
1 page
Knowledge Sharing A Key For KM Success
No ratings yet
Knowledge Sharing A Key For KM Success
12 pages
Big Data Analytics
No ratings yet
Big Data Analytics
73 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
WIPRO
No ratings yet
WIPRO
8 pages
BD-Topic 3-Big Data
No ratings yet
BD-Topic 3-Big Data
12 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
ETB 1 (Big data)
No ratings yet
ETB 1 (Big data)
28 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
BD 1
No ratings yet
BD 1
15 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
Apple
No ratings yet
Apple
1 page
Race Growth System 90day Planning Template Smart Insights
No ratings yet
Race Growth System 90day Planning Template Smart Insights
17 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Fadhaa Zaid Khalaf Al-Gburi
No ratings yet
Fadhaa Zaid Khalaf Al-Gburi
7 pages
BDAchap 1
No ratings yet
BDAchap 1
15 pages
Big Data
No ratings yet
Big Data
7 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
UNIT_1 BDA
No ratings yet
UNIT_1 BDA
14 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Role of Artificial Intelligence in Enhancing Effic
No ratings yet
Role of Artificial Intelligence in Enhancing Effic
11 pages
BDM 1
No ratings yet
BDM 1
37 pages
Nato International Staff
No ratings yet
Nato International Staff
47 pages
New Service Go Live
No ratings yet
New Service Go Live
7 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
FI 1: Display Chart of Accounts
No ratings yet
FI 1: Display Chart of Accounts
17 pages
12.3 Big Data: Prepared By: Mohammad Nabeel Arshad
No ratings yet
12.3 Big Data: Prepared By: Mohammad Nabeel Arshad
57 pages
Case Study Compilation
No ratings yet
Case Study Compilation
12 pages
Shopify Dropshipping - Extreme Commerce
100% (3)
Shopify Dropshipping - Extreme Commerce
189 pages
RTP - 110 Subcontracting - TRAIN
No ratings yet
RTP - 110 Subcontracting - TRAIN
90 pages
Introduction To Big Data - Presentation
No ratings yet
Introduction To Big Data - Presentation
30 pages
ICAgile APO Students Handbook-Session 2
100% (1)
ICAgile APO Students Handbook-Session 2
41 pages
What Is Big Data?
No ratings yet
What Is Big Data?
3 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
Account Payables: 1. Define Vendor Account Groups
No ratings yet
Account Payables: 1. Define Vendor Account Groups
55 pages
Case Study On Lenovo "Challenger To Leader": (Session 2020 - 2022)
No ratings yet
Case Study On Lenovo "Challenger To Leader": (Session 2020 - 2022)
8 pages
Specification of Use Cases For Information Management Practices in Healthcare: Patient Registration Use Case
No ratings yet
Specification of Use Cases For Information Management Practices in Healthcare: Patient Registration Use Case
24 pages
SAP Profit Center Accounting
No ratings yet
SAP Profit Center Accounting
5 pages
G.H. Patel Post Graduate Institute of Business Management: Summer Internship Training Project Report
No ratings yet
G.H. Patel Post Graduate Institute of Business Management: Summer Internship Training Project Report
79 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
HR Helpdesk Contact Point
No ratings yet
HR Helpdesk Contact Point
1 page
Case-2 - Group 5
No ratings yet
Case-2 - Group 5
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 1

Uploaded by

Chapter 1

Uploaded by

Unit-I Introduction to Big Data [10L] Max Marks 14

Data Storage and Analysis:

The data is categorized as below:

Types of big data analytics (+ examples)

Fig 1. Typical analytic architecture

Need of big data frameworks

Why a Big Data Framework?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.