0% found this document useful (0 votes)

0 views

Big Data Class 27Feb

The document outlines the evolution of data management, highlighting the historical milestones from the 1940s to the present, including the rise of big data and the challenges faced in managing it. It discusses the importance of structuring data for analysis, the types of data (structured, unstructured, semi-structured), and the four V's of big data: volume, velocity, variety, and veracity. Additionally, it addresses the data analytics project life cycle and the common challenges organizations face in understanding and implementing data analytics.

Uploaded by

raaayaraaay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Big Data Class 27Feb

Uploaded by

raaayaraaay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

History of Data Management-

Evolution of Big Data

In the early
In the 60s,60s,
early
technology witnessed In the 90s, technology
problems withwith
problems Today, technology is
witnessed issues with
velocity
velocityororreal-time
real-time facing issues related
variety (e-mails,
data assimilation. This to huge volume,
documents, videos),
This
needneed inspired
inspired the leading to new
leading to the
the evolution
evolutionofof storage and
emergence of non-SQL
databases.
databases. processing solutions.
stores.
Evolution
Year Milestone
of Big Data
1940s An American librarian speculated the potential shortfall of shelves and cataloging staff,
realizing the rapid increase in information and limited storage.

1960s ‘Automatic Data Compression’ acts as a complete automatic and fast three-part compressor
that can be used for any kind of information in order to reduce the slow external storage
requirements and increase the rate of transmission from a computer system.

1970s In Japan, the Ministry of Posts and Telecommunications initiated a project to study
information flow in order to track the volume of information circulating in the country.

1980s A research project was started by the Hungarian Central Statistics office to account for the
country’s information industry. It measured the volume of information in bits.

1990s Digital storage systems became more economical than paper storage. Challenges related to
the amount of data and the presence of obsolete data became apparent.

2000 onwards Various methods were introduced to streamline information. Techniques for controlling the
Volume, Velocity and Variety of data emerged, thus introducing 3D data management.
Structuring Big Data:
• Structuring of data is arranging the available data in a manner such
that it becomes
- easy to study
- easy to analyze
- derive conclusion
Why is structuring required?
In daily life, we have come across questions like:
 How do I use to my advantage the vast amount of data and
information I come across?
 Which news articles should I read of the thousands I come across?
 How do I choose a book of the millions available on my favourite sites
or stores?
 How do I keep myself updated about new events, sports, inventions
and discoveries taking place across the globe?
Solutions to such Questions can be found by Information Processing
systems.
• Structuring data helps in understanding user behaviors, requirements
and preferences to make personalized recommendations for every
individual.

• Various sources generate a variety of data, such as images, text,

audios, etc.

• All such different types of data can be structured only if it is sorted

and organized in some logical pattern.

• The process of structuring data requires one to first understand the

various types of data available today.
Types of Data
• Data that comes from multiple sources, such as databases, Enterprise
Resource Planning(ERP) systems, weblogs, chat history and GPS maps
varies in its format.

• Different formats of data need to be made consistent and clear to be

used for analysis.

• Data is primarily obtained from the following types of sources:

• Internal sources such as organizational or enterprise data
• External sources such as social data
Comparison between the Internal and External
sources of data
Data Source Definition Examples of Sources Application
Internal Provides structured or • Customer Relationship This data (current data in
organized data that originates Management(CRM) the operational system) is
from within the enterprise and • Enterprise Resource used to support daily
helps run business Planning(ERP), systems business operations of an
• Customers details organization.
• Products and sales data
• Generally OLTP and
operational data
External Provides unstructured or • Business partners This data is often analyzed
unorganized data that • Syndicate data suppliers to understand the entities
originates from the external • Internet mostly external to the
environment of an oganization • Government organizations such as
• Market research customers, competitors,
organizations market and environment.
Concepts of Big Data
It is obvious that in Big Data
We deal with these Analysis
Data Data
Things. Storage mining

Distributed Data
system Science
Big Data
Par al
Pro allel ci
fi nce
ces r ti
A lige
sing
nt el
I
• On the basis of the data received from the sources mentioned in the
table, Big Data comprises:
- Structured data
- Unstructured data
- Semi-Structured data
• The unstructured data is larger in volume than the structured and
semi-structured data, approximately 70% to 80% of data is in
unstructured form.
Unstruct Semi-
Structure
ured structure Big Data
d data
data d data
Structured Data:
• Structured data can be defined as the data that has a defined repeating pattern.
• This pattern makes it easier for any program to sort, read and process the data.
• Structured data:
- Is organized data in a predefined format
- Is stored in tabular form
- Is the data that resides in fixed fields within a record or file
- Is formatted data that has entities and their attributes mapped
- Is used to query and report against predetermined data types
Some sources of structured data include:
• Relational databases(in the form of tables)
• Flat files in the form of records (like comma separated values(csv) and
tab-separated files)
• Multidimensional databases (majorly used in data warehouse
technology)
• Legacy databases.
• Example:
Customer ID Name Product ID City State

12365 Smith 241 Graz Styria

23658 Jack 365 Wolfsberg Carinthia

32456 Kady 421 Enns Upper Austria

Unstructured Data:
• Unstructured data is a set of data that might or might not have any
logical or repeating patterns.
• Unstructured Data:
- Consists typically of metadata, i.e., the additional information
related to data.
- Comprises inconsistent data, such as data obtained from files,
social media websites, satellites etc.
- Consists of data in different formats such as e-mails, text, audio,
video or images.
• Some sources of unstructured data include :
- Text both internal and external to an organization
- Social media-Data obtained from social networking platforms.
- Mobile data- Data such as text messages and location information.
• About 80% of enterprise data consists of unstructured content.
Challenges associated with Unstructured data
- Identifying the unstructured data that can be processed.
- Sorting, organizing and arranging unstructured data in different
sets and formats.
- Combining and linking unstructured data in a more structured
format to derive any logical conclusions out of the available
information.
- costing in terms of storage space and human resource(data analysts and
data scientists) needed to deal with the exponential growth of
unstructured data.
Semi-Structured Data
• Semi-structured data is also known as having a schema-less or self-
describing structure, refers to a form of structured data that contains
tags or markup elements.
• Some sources of semi-structured data include:
- File systems such as Web data in the form of cookies.
- Data exchange formats such as Javascript Object Notation(JSON)

data
Elements of Big Data
• According to Gartner, data is growing at the rate of 59% year. This
growth can be depicted in terms of the following four V’s:
- Volume
- Velocity
- Variety
- Veracity
Volume:
• Volume is the amount of data generated by organizations or
individuals.
• Today, the volume of data in most organizations is approaching
exabytes.Some experts predict the volume of data to reach Zettabytes
in the coming years.
• For example, according to IBM, over 2.7 zetabytes of data is present in
the digital world today.
• Every minute, over 571 new websites are being created.
• The internet alone generates huge amount of data. The following
figures help us to get an idea of the internet traffic:
- Internet has around 14.3 trillion live web pages, and 48 billion web
pages are indexed by Google Inc,; 14 billion webpages are indexed by
Microsoft Bing.
- Internet has around 672 exabytes of accessible data.
- Total data stored on the Internet is over 1 yottabyte.
• The exact size of the internet will be never known.
Velocity
• Velocity describes the rate at which data is generated, captured and
shared.
• Enterprises can capitalize on data only if it is captured and shared in
real time.
• Information Processing systems such as CRM and ERP face problems
associated with data ,which keeps adding up but cannot be processed
quickly.
• These systems are able to attend data in batches every few hours;
however, even this time lag causes the data to lose its importance as
new data is constantly being generated.
• For example,eBay analyzes around 5 million transactions per day in
real time to detect and prevent frauds arising from the use of PayPal.
• The sources of high velocity data include the following:
• IT devices, including routers, switches, firewalls etc., constantly
generate valuable data.
• Social media, including Facebook posts, tweets and other social media
activities, create huge amount of data.
• Portable device, including mobile, PDA etc., also generate data at a
high speed.
Data Analytics Project Life Cycle:
• The data analytics project life cycle typically involves several phases,
each with its own set of tasks, goals, and deliverables.
• An analysis process contains all or some of the following phases:
1. Business Understanding:
- The first phase involves identifying and understanding the
business objectives. It deals with problems to be solved and
decisions to be made.
- The main goal is to enhance the business profitability.
- Once the business objectives are determined, the analysts
evaluate the situation, and identify the data mining goals.
- According to the defined goals, a project plan is created
between the analytics team and the IT or development team.
2. Data Collection:
- The process of collecting data is an important task in executing
a project plan accurately.
- In this phase, data from different data sources is collected first
and then described in terms of its application and need of the
project.
- This process is also called Data Exploration. Exploration of data
is required to ensure the quality of the collected data.
3. Data Preparation:
- From the data thus collected, unnecessary or unwanted data is
to be removed in this phase.
- In other words, the data must be prepared for the purpose of
analysis.
4.Data Modeling:
- In this phase, a model is created by using a data modeling
technique.
- The data model is used to analyze the relationships between
different selected objects in the data.
- Test cases are created to assess the applicability of model and
data is structured according to the model.
5.Data Evaluation:
- The results obtained from different test cases are evaluated and
reviewed for errors.
- After validating the results, analysis reports are created for
determining the next plan of action.
6.Deployment:
- In this phase, the plan is finalized for deployment.
- The deployed plan is constantly checked for errors and
maintenance.
- This process is also termed as reviewing of the project.

• The various phases of analysis are explained in figure.

Determine Collect Select
Select Evaluate Plan
Business Initial Modeling
Data Results Development
Objectives data Technique

Generate Plan
Assess Describe Clean Review
Test monitoring &
Situation Data Data Process
Design maintenance

Determine
Explore Produce
Data Construct Build Determine
data final
Mining Data model next steps
report
Goals
Verify
Produce Integrate Assess Review
Data
Project Plan Data model Project
Quality
Format
Data
Problems and Challenges in understanding Data
Analytics
• Understanding data analytics can present several challenges and
problems, both for individuals and organizations. Here are some
common issues:
1. Complexity of Data: Understanding the intricacies of different data
types, formats, and structures can be challenging.
2. Data Quality Issues: Poor data quality, including inaccuracies,
inconsistencies, missing values, and duplications, can impact the
accuracy and reliability of analytics results. Cleaning and preprocessing
data to ensure quality can be time-consuming and resource-intensive.
3. Lack of Data Governance: Inadequate data governance practices,
including unclear data ownership, access controls, and data
management policies, can lead to data security breaches, privacy
concerns, and regulatory compliance issues.
4.Limited Skills and Expertise: Data analytics requires a diverse skill set
encompassing statistics, mathematics, programming, data visualization,
and domain-specific knowledge. A shortage of skilled professionals with
expertise in these areas can hinder effective data analysis efforts.
5. Technology and Infrastructure Constraints: Inadequate technology
infrastructure, including outdated tools, insufficient computing
resources, and incompatible systems, can impede data analytics
initiatives and limit the scalability and performance of analytics
solutions.
6. Integration Challenges: Integrating data from disparate sources and
systems to create a unified view for analysis can be challenging.
7. Interpretation and Communication: Analyzing data is only part of
the process; effectively interpreting the results and communicating
insights to stakeholders is equally important.
8. Ethical and Bias Concerns: Data analytics can raise ethical concerns
related to privacy, fairness, and bias.
9. Cost and Return on Investment (ROI): Implementing data analytics
initiatives can be costly, requiring investment in technology, talent, and
infrastructure.
10. Cultural Resistance and Organizational Change: Resistance to
change and cultural barriers within organizations can hinder the
adoption of data-driven decision-making processes.
Web Page Categorization:
• Web page categorization refers to the process of assigning a specific
category or classification to a webpage based on its content, purpose,
or theme.
• This categorization is often performed by algorithms or human
reviewers to help organize and index the vast amount of information
available on the internet.
• There are several approaches to web page categorization:
1. Keyword Analysis: This involves analyzing the keywords present in
the webpage's content or metadata to determine its category.
Keywords related to specific topics or themes can indicate the
nature of the page.
2. Natural Language Processing (NLP): Advanced NLP techniques can
be used to analyze the textual content of web pages and categorize
them based on semantic meaning. This involves techniques such as text
classification, topic modeling, and sentiment analysis.
3. Machine Learning: Machine learning algorithms can be trained on
labeled datasets to automatically categorize web pages. These
algorithms learn patterns and features from the content of web pages
and use them to predict the most appropriate category.
4.Website Metadata: Information such as meta tags, titles, and
descriptions embedded in the HTML of web pages can provide valuable
clues about their category.
5.Link Analysis: Web pages often link to other pages within similar
topics or categories. Analyzing the links pointing to a page or
originating from it can provide insights into its category.
6.User Behavior: Analyzing user interactions with web pages, such as
click-through rates, time spent on page, and bounce rates, can also help
determine their category.
7.Hybrid Approaches: Combining multiple methods such as keyword
analysis, NLP, and machine learning algorithms can improve the
accuracy of web page categorization.
• Web page categorization is used for various purposes, including
search engine optimization (SEO), content filtering, targeted
advertising, and organizing web directories.
• It helps users find relevant information more efficiently and enables
businesses to deliver more targeted content and advertisements to
their audience.
Computing the frequency of Stock Market Change:
• To compute the frequency of stock market change, you would
typically follow these steps:
1. Define a Time Interval: Determine the time interval over which you
want to compute the frequency of stock market change. This could
be daily, weekly, monthly, etc.
2. Collect Stock Market Data: Gather historical stock market data for
the selected time interval. This data typically includes the opening
and closing prices of the stock for each time period, although you
can use other metrics such as high, low, and volume depending on
your analysis requirements.
3. Calculate Changes: For each time interval, calculate the change in
stock price. This can be done by subtracting the closing price from
the opening price or by calculating the percentage change.
4. Count Changes: Count the number of times the stock price changed
within your chosen time interval. You can set a threshold for what
constitutes a significant change, such as a certain percentage or
absolute value.
5. Compute Frequency: Divide the total number of changes by the total
number of time intervals to obtain the frequency of stock market
change.
Use of Big Data in Social Networking:
• A human being lives in a social environment and gains knowledge and
experience through communication.
• Today, communication is not restricted to meeting in person.
• The affordable and handy use of mobile phones and the internet have
made communication and sharing data of all kinds possible across the
globe.
• Some popular social networking sites are Twitter, Facebook and
LinkedIn.
• Lets first understand the meaning of social network data.
• Social network data refers to the data generated from people
socializing on social media .
• On a social networking site ,you will find different people constantly
adding and updating comments, statuses, preferences, etc.
• All these activities generate large amounts of data.
• Analyzing and mining such large volumes of data show business
trends with respect to wants and preferences and likes and dislikes of
a wide audience .
• This data can be segregated on the basis of different age groups ,
locations and genders for the purpose of analysis .
• Based on the information extracted ,organizations design products
and services specific to peoples need.
YouTube users
upload 72
hours of new
Apple users video Email users
download send 200
nearly 50,000 million
apps messages
Every
minute of
Amazon the day Google
generates receives over
over $80,000 2,000,000
in online sales search queries
Twitter users Facebook users
send over share 2.5million
300,000 pieces of
tweets content

Social Network Data Generated Every Minute of the Day

• The following are the areas in which decision-making processes are
influenced by social network data:
- Business Intelligence
- Marketing
- Product design and development
• Business Intelligence: Is a data analysis process to convert a raw dataset
to meaningful information by using different techniques and tools for
boosting business performance. This system allows a company to collect,
store, access and analyze data for adding value to decision making.
• Marketing: Consumers can now make their preferences clear and select
the marketing messages they wish to receive-when, where and from
whom. Marketers aim to deliver what consumers want by using
interactive communication across digital channels such as e-mail, mobile,
social and the web. Affiliate marketing is a reward-based marketing
structure.
• Product Design and Development: By listening to what customers
want, by understanding where the gap in the offering is and so on,
organizations can make the right decisions in the direction of their
product development and offerings.
Use of Big Data in Preventing Fraudulent activities:
• A fraud can be defined as the false representation of facts, leading to
concealment or distortion of the truth.
• The following are some of the most common types of financial frauds:

- Credit card fraud: In an online shopping transactions, the online retailer cannot
see the authentic user of the card and therefore, the valid owner of the card
cannot be verified. In spite of the security checks, such as address verification or
card security code, fraudsters manage to manipulate the loopholes in the system.

- Exchange or return policy fraud: An online retailer always has a policy allowing
the exchange and return of goods and sometimes, people take advantage of this
policy. Such a fraud can be averted by charging a restocking fee on the returned
goods, getting customers signature on the delivery of the product.
• Personal Information fraud: In this type of fraud, people obtain the
login information of a customer and then log-in to the customers
account, purchase a product online and then change the delivery
address to a different location.
• All these frauds can be prevented only by studying the customer’s
ordering patterns and keeping track of out-of-line orders.
• Other aspects should also be taken into consideration such as any
change in the shipping address, rush orders, sudden huge orders and
suspicious billing addresses.
• By observing such precautions , the frequency of the occurrence of
such frauds can be reduced to a certain event, but cannot be
completely eliminated.
Preventing Fraud Using Big Data Analytics
• One of the ways to prevent financial frauds is to study the customer’s
ordering pattern and other related data. This method works only
when the data to be analyzed is small in size.
• In order to deal with huge amounts of data and gain meaningful
business insights, organizations need to apply Big Data Analytics.
• Analyzing Big Data allows organizations to:
- Keep track of and process huge volumes of data.
- Differentiate between real and fraudulent entries.
- Identify new methods of fraud and add them to the list of fraud-
prevention checks.
- Verify whether a product has actually been delivered to the valid
recipient.
- Determine the location of the customer and the time when the
product was actually delivered.
- Check the listings of popular retail sites, such as e-Bay, to find
whether the product is up for sale somewhere else.

Fraud Detection in Real Time:

• Big Data also helps to detect frauds in real time.
• It compares live transactions with different data sources to validate
the authenticity of online transactions.
• For example, in an online transaction Big Data would compare the
incoming IP address with the geo-data received from the customer’s
smartphone apps. A valid match between the two confirms the
authenticity of the transaction.
• Big Data also examines the entire historical data to track suspicious
patterns of the customer order.
• These patterns are then used to create checks for avoiding real-time
fraud.
Visually Analyzing Fraud:
• Image analytics is another emerging field that can help detect frauds.
• It refers to the process of analyzing image data with the help of digital
processing of the image. Examples: Bar code and QR codes.
• Some other examples include complex solutions such as facial
recognition and position-and-movement analysis.
Use of Big Data in Retail Industry
• Considering the immense number of transactions and their correlation,
the retail industry offers a promising space for Big Data to operate.
• Business insights in customer behavior and company health can be
obtained by finding a relation between the organization’s sales
between in-store and online sales.
• It could be very difficult for a marketing analyst to understand the
health and strength of different types of products and campaigns and
reconcile the data obtained from these systems.
• Many times extracting data in real time is not feasible as systems are
affected because of scaling issues.
• Raw transactional data can only help a company understand its sales
but does not provide any relationships, patterns, or other clues for
deeper analysis.
• Some information in a Big Data feed can have a long-term strategic
value while some information will be used immediately and some
information will not be used at all.
• The main part of taming Big data is to identify which portions fall into
which category.

Use of RFID Data in Retail:

• The introduction of Radio Frequency Identification(RFID) technology
automated the process of labeling and tracking of products, thereby,
saving significant time, cost and effort.
• Walmart was one of the first retailers to implement RFID in its
merchandise.
• The RFID technology helps better item tracking by differentiating the
items that are out of stock and that are available on shelves.
• Various types of RFID tags are available for various environments such
as cardboard boxes, wooden, glass or metal containers.
• Tags also come in various sizes and are of varied capabilities, including
read and write capability, memory and power requirements.
• They also have a wide range of durability.
• Some varieties are paper-thin and are typically for one-time use and
are called ‘Smart Labels’.
• RFID tags can also be customized and withstand heat, moisture, acids
and other extreme conditions. Some RFID tags are also reusable.
• The use of RFIDs saves time, reduces labor, enhances the visibility of
products throughout the production- delivery cycle and saves costs.
Advantages of using RFID:
Asset Management

Inventory Control

Shipping and Receiving

Regulatory Compliance

Service and Warranty Authorizations

Some common benefits of using RFID are shown in fig.

• Asset Management: Organizations can tag all their capital assets, such as
pallets, vehicles and tools in order to trace them anytime and from any
location.
• Inventory control: One of the primary benefits of using RFID is inventory
tracking, especially in areas where tracking has not been done or was
not possible before.
• Using an RFID tracking system can result in an optimized inventory level,
and thus reduce the overall cost of stocking and labor.
• Shipping and Receiving: RFID tags can also be used to trigger automated
shipping tracking applications.
• Serial Shipping Container Code(SSCC) is widely used in shipping labels.
• The data contained in the RFID tags can be considered with the shipment
information, which can easily be read by the receiving organization to
simplify the receiving process and eliminate processing delays.
• Regulatory Compliance: The entire custody trail can be produced
before regulatory bodies such as the Food and Drug
Administration(FDA), Department of transportation(DOT), and
Occupational safety and Health Administration(OSHA) along with
other regulatory requirements, provided the RFID tag that travels with
the material has been updated with all the handling data.
• Service and Warranty Authorizations: A warranty card or document
to request for a service warranty would no longer be necessary
because an RFID tag can hold all this information.
• Once the repair or service has been completed, the information can
be fed into the RFID tag as the maintenance history. This is something
that will always remain on the product.
• If future repairs are required the technician can access this
information without accessing any external database, which helps in
reducing calls and time-expensive enquiries into documents.

n8n Masterclass
100% (2)
n8n Masterclass
31 pages
Statistics Test Questions With Answer Key
100% (3)
Statistics Test Questions With Answer Key
2 pages
Itfm Assignment Group 8
100% (1)
Itfm Assignment Group 8
16 pages
Big Data Intro
No ratings yet
Big Data Intro
12 pages
Itfm Assignment Group 5
No ratings yet
Itfm Assignment Group 5
14 pages
BigData_1
No ratings yet
BigData_1
14 pages
Cloud computing
No ratings yet
Cloud computing
86 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
All
No ratings yet
All
62 pages
Unit - I Part I
No ratings yet
Unit - I Part I
48 pages
Introduction To Bigdata
No ratings yet
Introduction To Bigdata
31 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
36 pages
unit 1
No ratings yet
unit 1
24 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
No ratings yet
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
9 pages
Big Data UNIT I
No ratings yet
Big Data UNIT I
91 pages
Unit 1
No ratings yet
Unit 1
26 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
BDU1
No ratings yet
BDU1
39 pages
Unit I - BDA
No ratings yet
Unit I - BDA
12 pages
IMTC634_Data Science_Chapter 11
No ratings yet
IMTC634_Data Science_Chapter 11
22 pages
BDA Presentations M1 P1
No ratings yet
BDA Presentations M1 P1
40 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
BDT 1
No ratings yet
BDT 1
49 pages
Big Data Analytics - Complete Notes
No ratings yet
Big Data Analytics - Complete Notes
136 pages
Unit 1
No ratings yet
Unit 1
59 pages
BigData Unit-1
No ratings yet
BigData Unit-1
72 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
BD U1.PDF.crdownload
No ratings yet
BD U1.PDF.crdownload
65 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
38 pages
Unit-I (Big Data)
No ratings yet
Unit-I (Big Data)
30 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
Converted 4011171
No ratings yet
Converted 4011171
144 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
Introduction
No ratings yet
Introduction
21 pages
Data Science Class2
No ratings yet
Data Science Class2
33 pages
Bda M1
No ratings yet
Bda M1
111 pages
BD Unit1
No ratings yet
BD Unit1
45 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
UNIT- 1_DA_Notes
No ratings yet
UNIT- 1_DA_Notes
51 pages
Unit 4
No ratings yet
Unit 4
29 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
Unit - Big - Data
No ratings yet
Unit - Big - Data
107 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Assignment: Advance Marketing Research & Data Analytics
No ratings yet
Assignment: Advance Marketing Research & Data Analytics
4 pages
Introduction Big Data
No ratings yet
Introduction Big Data
140 pages
Bda (Chapter 1)
No ratings yet
Bda (Chapter 1)
8 pages
BDA M1 (40pgs)
No ratings yet
BDA M1 (40pgs)
40 pages
Big Data Unit-1 Kcs-061
No ratings yet
Big Data Unit-1 Kcs-061
64 pages
Unit Iii Big Data Analytics What Is Data?
No ratings yet
Unit Iii Big Data Analytics What Is Data?
36 pages
Big Data Analytics QB
No ratings yet
Big Data Analytics QB
44 pages
Basics of Big Data Notes
No ratings yet
Basics of Big Data Notes
17 pages
BDA Question Answer
No ratings yet
BDA Question Answer
29 pages
Fbda Unit-1
No ratings yet
Fbda Unit-1
17 pages
Big Data Analytics Unit 1
No ratings yet
Big Data Analytics Unit 1
26 pages
UNIT I notes
No ratings yet
UNIT I notes
26 pages
Sengamala Thayaar Educational Trust Women's College: Sundarakkottai, Mannargudi
No ratings yet
Sengamala Thayaar Educational Trust Women's College: Sundarakkottai, Mannargudi
14 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Sprinklr
No ratings yet
Sprinklr
26 pages
Matrix Hashing With Two Level of Collision Resolution: National Institute of Technology Raipur
No ratings yet
Matrix Hashing With Two Level of Collision Resolution: National Institute of Technology Raipur
7 pages
Active Directory Fundamentals: Knowledge Base Questions & Answers
No ratings yet
Active Directory Fundamentals: Knowledge Base Questions & Answers
3 pages
Veerachary CBCS RDBMS IV SEM-watermark
100% (3)
Veerachary CBCS RDBMS IV SEM-watermark
115 pages
cbjeitpu06
No ratings yet
cbjeitpu06
6 pages
Estrutura DAO Genérica Simples para Java
No ratings yet
Estrutura DAO Genérica Simples para Java
4 pages
SAP HANA Master Guide en
No ratings yet
SAP HANA Master Guide en
78 pages
Sizing Guide For SAP Process Control
No ratings yet
Sizing Guide For SAP Process Control
8 pages
Business Analytics
No ratings yet
Business Analytics
5 pages
From Solid To Shell: Automatic Creation of Middle Surface
No ratings yet
From Solid To Shell: Automatic Creation of Middle Surface
9 pages
Learn ASP - NET: ASP - NET: Databases Cheatsheet
No ratings yet
Learn ASP - NET: ASP - NET: Databases Cheatsheet
11 pages
Database Systems Course Content
No ratings yet
Database Systems Course Content
7 pages
Semantic Web Notes
No ratings yet
Semantic Web Notes
181 pages
A Step-By-Step Guide To Normalization in DBMS With Examples
No ratings yet
A Step-By-Step Guide To Normalization in DBMS With Examples
28 pages
SQL Skills Resume
100% (1)
SQL Skills Resume
6 pages
HC Workshop Session-02 Workbook
No ratings yet
HC Workshop Session-02 Workbook
57 pages
Make A Single Page Application (SPA) With Vue - Js and Sanity
No ratings yet
Make A Single Page Application (SPA) With Vue - Js and Sanity
16 pages
Unit IV Notes
No ratings yet
Unit IV Notes
47 pages
School Fee Management System in C# With Source Code - Techringe
100% (1)
School Fee Management System in C# With Source Code - Techringe
5 pages
Ip Project
No ratings yet
Ip Project
29 pages
Transaction Data Load With SAP BPC in NetWeaver - SAP GL Data For Consolidation
100% (2)
Transaction Data Load With SAP BPC in NetWeaver - SAP GL Data For Consolidation
45 pages
Oracle Group By Having Clause Exercises Solutions
No ratings yet
Oracle Group By Having Clause Exercises Solutions
6 pages
Storage Replication Adapter (SRA) - SRA Commands Run Slower Than They Did at Initial Setup
No ratings yet
Storage Replication Adapter (SRA) - SRA Commands Run Slower Than They Did at Initial Setup
7 pages
91-S Computer Science
100% (1)
91-S Computer Science
15 pages
Rdbms Internal Q Paper III Sem
No ratings yet
Rdbms Internal Q Paper III Sem
4 pages
SAS Macro To Reorder Dataset Variables in Alphabetic Order
No ratings yet
SAS Macro To Reorder Dataset Variables in Alphabetic Order
4 pages
Chapter 4 - Import-Export Data
No ratings yet
Chapter 4 - Import-Export Data
30 pages
Dataprep Cheat Sheet
No ratings yet
Dataprep Cheat Sheet
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Big Data Class 27Feb

Uploaded by

Big Data Class 27Feb

Uploaded by

History of Data Management-

Evolution of Big Data

• Various sources generate a variety of data, such as images, text,

• All such different types of data can be structured only if it is sorted

• The process of structuring data requires one to first understand the

• Different formats of data need to be made consistent and clear to be

• Data is primarily obtained from the following types of sources:

12365 Smith 241 Graz Styria

23658 Jack 365 Wolfsberg Carinthia

32456 Kady 421 Enns Upper Austria

• The various phases of analysis are explained in figure.

Social Network Data Generated Every Minute of the Day

Fraud Detection in Real Time:

Use of RFID Data in Retail:

Shipping and Receiving

Service and Warranty Authorizations

Some common benefits of using RFID are shown in fig.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.