0% found this document useful (0 votes)

19 views

ADBMS-Module 1 Notes

Uploaded by

chethanrangaiah1294

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

ADBMS-Module 1 Notes

Uploaded by

chethanrangaiah1294

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

ADBMS & Big Data(22MCS23)

Advanced DBMS and Big data

Module 1
Understanding Big data
What is big data?

Big data primarily refers to data sets that are too large or complex to be dealt with
by traditional data-processing application software. Data with many entries offer
greater statistical power, while data with higher complexity may lead to a higher
false discovery rate.

Difference Between Big Data and Data Mining

Data Mining Big Data

It is one of the method in the Big Data is a technique to collect, maintain and
pipeline of Big Data. process the huge information. It explains the data
ADBMS & Big Data(22MCS23)

Data Mining Big Data

relationship.

It is about extracting the vital and valuable

Data mining is a part of
information from huge amount of the data.It is a
Knowledge Discovery of the
technique of tracking and discovering of trends of
Data. It is close view of the
complex data sets. It is a large or overall view of
data.
the data.

The goal is same as Big The goal is to make data more vital and usable i.e.
Data as it is one of the tool by extracting only important information from the
of Big Data. huge data within existing traditional aspects.

It is manual as well as It is only automated as computing huge data is

automated in nature. difficult.

It only focuses on only one It focuses and works with all form of data i.e.
form of data. i.e. structured. structured, unstructured or semi-structured.

It is used to create certain

business insights. Data It is mainly used for business purposes and
mining is a manager of the customer satisfaction. Big Data is a mine.
mine.

It is a sub set of Big Data.

It is a super set of Data Mining.
i.e. one of the tools.

It is a tool to dig up the vital

information from the large It is more involved with the processes of handling
data. Data can be large as voluminous data. Data can only be large.
well as small.
ADBMS & Big Data(22MCS23)

S.No. Big Data Data Warehouse

Big data is the data which is in Data warehouse is the collection of historical
1. enormous form on which technologies data from different operations in an
can be applied. enterprise.

Big data is a technology to store and Data warehouse is an architecture used to

2.
manage large amount of data. organize the data.

It takes structured, non-structured or

3. It only takes structured data as an input.
semi-structured data as an input.

Big data does processing by using Data warehouse doesn’t use distributed file
4.
distributed file system. system for processing.

Big data doesn’t follow any SQL In data warehouse we use SQL queries to
5.
queries to fetch data from database. fetch data from relational databases.

Apache Hadoop can be used to handle Data warehouse cannot be used to handle
6.
enormous amount of data. enormous amount of data.

When new data is added, the changes When new data is added, the changes in
7. in data are stored in the form of a file data do not directly impact the data
which is represented by a table. warehouse.

Data warehouse requires more efficient

Big data doesn’t require efficient
management techniques as the data is
8. management techniques as compared
collected from different departments of the
to data warehouse.
enterprise.
ADBMS & Big Data(22MCS23)

Why Big data

Importance of Big data

Big data is the next generation of data warehousing and business analytics and
is poised to deliver top line revenue cost effectively for enterprises. The greatest
part about this phenomenon is the rapid pace of innovation and change.

There are three standard answer for questions

1. Computing perfect storm. Big Data analytics are the natural result of four
major global trends: Moore ’s Law (which basically says that technology always
gets cheaper), mobile computing (that smart phone or mobile tablet in your hand),
social networking (Facebook, Foursquare, Pinterest, etc.), and cloud computing
(you don ’t even have to own hardware or software anymore; you can rent or lease
someone else ’s).

2. Data perfect storm. Volumes of transactional data have been around for
decades for most big firms, but the flood gates have now opened with more
volume, and the velocity and variety—the three Vs—of data that has arrived in
unprecedented ways. This perfect storm of the three Vs makes it extremely
complex and cumbersome with the current data management and analytics
technology and practices.

3. Convergence perfect storm. Another perfect storm is happening, too.

Traditional data management and analytics software and hardware technologies,
open-source technology, and commodity hardware are merging to create new
alternatives for IT and business executives to address Big Data analytics.People are
able to store that much data now and more than they ever before. We have reached
this tipping point where they don’t have to make decisions about which half to
keep or how much history to keep. It’s now economically feasible to keep all of
your history and all of your variables and go back later when you have a new
question and start looking for an answer

Big data in many sectors today will range from a few dozen terabytes to multiple
petabytes (thousands of terabytes).The real challenge is identifying or developing
most cost-effective and reliable methods for extracting value from all the terabytes
ADBMS & Big Data(22MCS23)

and petabytes of data now available. That’s where Big Data analytics become
necessary.

Comparing traditional analytics to Big Data analytics, the differences in speed,

scale, and complexity are tremendous.

Companies use big data in their systems to improve operational efficiency, provide
better customer service, create personalized marketing campaigns and take other
actions that can increase revenue and profits. Businesses that use big data
effectively hold a potential competitive advantage over those that don't because
they're able to make faster and more informed business decisions.

For example, big data provides valuable insights into customers that companies
can use to refine their marketing, advertising and promotions to increase customer
engagement and conversion rates. Both historical and real-time data can be
analyzed to assess the evolving preferences of consumers or corporate buyers,
enabling businesses to become more responsive to customer wants and needs.

Medical researchers use big data to identify disease signs and risk factors.
Doctors use it to help diagnose illnesses and medical conditions in patients. In
addition, a combination of data from electronic health records, social media sites,
the web and other sources gives healthcare organizations and government agencies
up-to-date information on infectious disease threats and outbreaks.

There are multiple benefits organizations can get by using big data.

Here are some more examples of how organizations in various industries use big
data:

Big data helps oil and gas companies identify potential drilling locations and
monitor pipeline operations. Likewise, utilities use it to track electrical grids.
Types of big data


ADBMS & Big Data(22MCS23)

 Financial services firms use big data systems for risk management and real-time
analysis of market data.
 Manufacturers and transportation companies rely on big data to manage their
supply chains and optimize delivery routes.
 Government agencies use big data for emergency response, crime prevention
and smart city initiatives.

Types of big data

The information contained in big data repositories can be classified .2.5

quintillion bytes of data are generated every day by users. Predictions by Statista
suggest that by the end of 2021, 74 Zettabytes( 74 trillion GBs) of data would be
generated by the internet. Managing such a vacuous and perennial outsourcing of
data is increasingly difficult. So, to manage such huge complex data, Big data
was introduced, it is related to the extraction of large and complex data into
meaningful data which can’t be extracted or analyzed by traditional methods.

All data cannot be stored in the same way. The methods for data storage
can be accurately evaluated after the type of data has been identified. A Cloud
Service, like Microsoft Azure, is a one-stop destination for storing all kinds of
data; blobs, queues, files, tables, disks, and applications data. However, even
within the Cloud, there are special services to deal with specific sub-categories of
data.
For example, Azure Cloud Services like Azure SQL and Azure Cosmos DB help
in handling and managing sparsely varied kinds of data.
Applications Data is the data that is created, read, updated, deleted, or processed
by applications. This data could be generated via web apps, android apps, iOS
apps, or any applications whatsoever. Due to a varied diversity in the kinds of
data being used, determining the storage approach is a little nuanced.
Big data can be classified in 3 types
ADBMS & Big Data(22MCS23)

Structured Data

 Structured data can be crudely defined as the data that resides in a fixed field
within a record.
 It is type of data most familiar to our everyday lives. for ex: birthday,address
 A certain schema binds it, so all the data has the same set of properties.
Structured data is also called relational data. It is split into multiple tables to
enhance the integrity of the data by creating a single record to depict an entity.
Relationships are enforced by the application of table constraints.
 The business value of structured data lies within how well an organization can
utilize its existing systems and processes for analysis purposes.

Sources of structured data

A Structured Query Language (SQL) is needed to bring the data together.
Structured data is easy to enter, query, and analyze. All of the data follows the
same format. However, forcing a consistent structure also means that any
ADBMS & Big Data(22MCS23)

alteration of data is too tough as each record has to be updated to adhere to the
new structure. Examples of structured data include numbers, dates, strings, etc.
The business data of an e-commerce website can be considered to be structured
data.
Class SectionRoll NoGrade

Geek1 11 A 1 A

Geek2 11 A 2 B

Geek3 11 A 3 A

Cons of Structured Data

1. Structured data can only be leveraged in cases of predefined functionalities.
This means that structured data has limited flexibility and is suitable for
certain specific use cases only.
2. Structured data is stored in a data warehouse with rigid constraints and a
definite schema. Any change in requirements would mean updating all of that
structured data to meet the new needs. This is a massive drawback in terms of
resource and time management.

Semi-Structured Data

 Semi-structured data is not bound by any rigid schema for data storage and
handling. The data is not in the relational format and is not neatly organized
into rows and columns like that in a spreadsheet. However, there are some
features like key-value pairs that help in discerning the different entities from
each other.
 Since semi-structured data doesn’t need a structured query language, it is
commonly called NoSQL data.
 A data serialization language is used to exchange semi-structured data across
systems that may even have varied underlying infrastructure.
ADBMS & Big Data(22MCS23)

 Semi-structured content is often used to store metadata about a business

process but it can also include files containing machine instructions for
computer programs.
 This type of information typically comes from external sources such as social
media platforms or other web-based data feeds.

Semi-Structured Data
Data is created in plain text so that different text-editing tools can be used to draw
valuable insights. Due to a simple format, data serialization readers can be
implemented on hardware with limited processing resources and bandwidth.
Data Serialization Languages
Software developers use serialization languages to write memory-based data in
files, transit, store, and parse. The sender and the receiver don’t need to know
about the other system. As long as the same serialization language is used, the
data can be understood by both systems comfortably. There are three
predominantly used Serialization languages.

1. XML– XML stands for eXtensible Markup Language. It is a text-based

markup language designed to store and transport data. XML parsers can be found
ADBMS & Big Data(22MCS23)

in almost all popular development platforms. It is human and machine-readable.

XML has definite standards for schema, transformation, and display. It is self-
descriptive. Below is an example of a programmer’s details in XML.
 XML

<CodingPlatform
Type="Fav">GeeksforGeeks</CodingPlatform>

<CodingPlatform
Type="3rdFav">CodeisLife</CodingPlatform>

</CodingPlatforms>

</ProgrammerDetails>

<!--The 2ndFav and 3rdFav Coding Platforms

are imaginative because Geeksforgeeks is the
best!-->
ADBMS & Big Data(22MCS23)

XML expresses the data using tags (text within angular brackets) to shape the
data (for ex: FirstName) and attributes (For ex: Type) to feature the data.
However, being a verbose and voluminous language, other formats have gained
more popularity.
2. JSON– JSON (JavaScript Object Notation) is a lightweight open-standard file
format for data interchange. JSON is easy to use and uses human/machine-
readable text to store and transmit data objects.
 Javascript

"firstName": "Jane",

"lastName": "Doe",

"codingPlatforms": [

{ "type": "Fav", "value": "Geeksforgeeks"

{ "type": "2ndFav", "value": "Code4Eva!"

{ "type": "3rdFav", "value": "CodeisLife"

}

This format isn’t as formal as XML. It’s more like a key/value pair model than a
formal data depiction. Javascript has inbuilt support for JSON. Although JSON is
very popular amongst web developers, non-technical personnel find it tedious to
ADBMS & Big Data(22MCS23)

work with JSON due to its heavy dependence on JavaScript and structural
characters (braces, commas, etc.)
3. YAML– YAML is a user-friendly data serialization language. Figuratively, it
stands for YAML Ain’t Markup Language. It is adopted by technical and non-
technical handlers all across the globe owing to its simplicity. The data structure
is defined by line separation and indentation and reduces the dependency on
structural characters. YAML is extremely comprehensive and its popularity is a
result of its human-machine readability.

YAML example
A product catalog organized by tags is an example of semi-structured data.

Unstructured Data

 Unstructured data is the kind of data that doesn’t adhere to any definite
schema or set of rules. Its arrangement is unplanned and haphazard.
 Photos, videos, text documents, and log files can be generally considered
unstructured data. Even though the metadata accompanying an image or a
video may be semi-structured, the actual data being dealt with is unstructured.
 Additionally, Unstructured data is also known as ―dark data‖ because it cannot
be analyzed without the proper software tools.
ADBMS & Big Data(22MCS23)

Characteristics or dimensions of big data

1. Volume

The volume of your data is how much of it there is – measured in gigabytes,

zettabytes (ZB), and yottabytes (YB). Industry trends predict a significant increase
in data volume over the next few years. Earlier, there were issues with storing and
processing this enormous volume of data. But nowadays, data gathered from all
these sources is organized using distributed systems like Hadoop. Understanding
the usefulness of the data requires knowledge of its magnitude. Additionally, one
may use the volume to identify if a data set is big data or not.

2. Velocity

Velocity describes how quickly data is processed. Any significant data operation
has to operate at a high rate. The linkage of incoming data sets, activity bursts, and
the pace of change make up this phenomenon. Sensors, social media platforms, and
application logs all continuously generate enormous volumes of data. There is no
use in spending time or effort
ADBMS & Big Data(22MCS23)

on it if the data flow is not constant.

3. Variety

The many types of big data are referred to as variety. As it impacts performance, it
is one of the main problems the big data sector is now dealing with. It’s crucial to
organize your data so that you can manage its diversity effectively. Variety is the
wide range of information you collect from numerous sources.

4. Veracity

The correctness of your data is referred to as veracity. The accuracy of your

findings can be severely harmed by poor veracity, making it one of the most
crucial big data qualities. It specifies the level of data reliability. It is vital to
remove the information that is not essential and use the remaining data for
processing because most of the data you encounter is unstructured.

5. Value

Value is the advantage that the data provides to your company. Does it reflect the
objectives of your company? Does it aid in the growth of your company? It’s one
of the most crucial fundamentals of big data. Data scientists first transform
unprocessed data into knowledge. The best data from this data collection is then
extracted once it has been cleaned. On this data set, analysis and pattern
recognition are performed. The results of the method may be used to determine the
value of the data.

Convergence of key trends

Refer text book

Grid Computing

The High Performance Computing (HPC) and Grid Computing communities have
been doing large-scale data processing for years, using such Application
Program Interfaces (APIs) as Message Passing Interface (MPI). The approach in
HPC is to distribute the work across a cluster of machines, which access a shared
ADBMS & Big Data(22MCS23)

file system, hosted by a Storage Area Network (SAN). This works well for
predominantly compute- intensive jobs, but it becomes a problem when nodes need
to access larger data volumes since the network bandwidth is the bottleneck and
compute nodes become idle.

MapReduce tries to collocate the data with the compute node, so data access is fast
because it is local. This feature, known as data locality, is at the heart of
MapReduce and is the reason for its good performance. Recognizing that network
bandwidth is the most precious resource in a data center environment (it is easy to
saturate network links by copying data around), MapReduce implementations
conserve it by explicitly modelling network topology. MapReduce operates only at
the higher level: the programmer thinks in terms of functions of key and value
pairs, and the data flow is implicit.

Coordinating the processes in a large-scale distributed computation is a challenge.

The hardest aspect is gracefully handling partial failure—when you don’t know
whether or not a remote process has failed—and still making progress with the
overall computation. MapReduce spares the programmer from having to think
about failure, since the implementation detects failed map or reduce tasks and
reschedules replacements on machines that are healthy. MapReduce is able to do
this because it is a shared-nothing architecture, meaning that tasks have no
dependence on one other. So from the programmer’s point of view, the order in
which the tasks run doesn’t matter. By contrast, MPI programs have to explicitly
manage their own check pointing and recovery, which gives more control to the
programmer but makes them more difficult to write.

MapReduce might sound like a restrictive programming model as we are limited to

key and value types that are related in specified ways, and mappers and reducers
run with very limited coordination between one another (the mappers pass keys
and values to reducers). Large range of algorithms can be expressed in
MapReduce, from image analysis, to graph-based problems, to machine learning
algorithms.

Volunteer Computing
ADBMS & Big Data(22MCS23)

SETI, the Search for Extra-Terrestrial Intelligence, runs a project called

SETI@home in which volunteers donate CPU time from their otherwise idle
computers to analyze radio telescope data for signs of intelligent life outside earth.
SETI@home is the most well-known of many volunteer computing projects.
Others include the Great Internet Mersenne Prime Search (to search for large prime
numbers) and Folding@home (to understand protein folding and how it relates to
disease).Volunteer computing projects work by breaking the problem they are
trying to solve into chunks called work units, which are sent to computers around
the world to be analyzed. For example, a SETI@home work unit is about 0.35 MB
of radio telescope data, and takes hours or days to analyze on a typical home
computer. When the analysis is completed, the results are sent back to the server,
and the client gets another work unit. As a precaution to combat cheating, each
work unit is sent to three different machines and needs at least two results to agree
to be accepted.

Although SETI@home may be similar to MapReduce (breaking a problem into

independent pieces to be worked on in parallel), there are some significant
differences. The SETI@home problem is very CPU-intensive, which makes it
suitable for running on hundreds of thousands of computers across the world
because the time to transfer the work unit is dwarfed by the time to run the
computation on it. Volunteers are donating CPU cycles, not bandwidth.

MapReduce is designed to run jobs that last minutes or hours on trusted, dedicated
hardware running in a single data center with very high aggregate bandwidth inter-
connects. By contrast, SETI@home runs a perpetual computation on untrusted
machines on the Internet with highly variable connection speeds and no data
locality.

Unstructured Data
Unstructured data tends to grow exponentially, unlike structured data, which tends
to grow in a more linear fashion.Unstructured data is basically information that
either does not have a predefined data model and/or does not fi t well into a
relational database. Unstructured information is typically text heavy, but may
contain data such as unstructured social data to monitor their own systems.Although
ADBMS & Big Data(22MCS23)

there is a worth in big data analytics there are stiil some business and technology
hurdles to clear dates

Unstructured data tends to grow exponentially, unlike structured data, which

tends to grow in a more linear fashion. From a business perspective, you’ll need to
learn how to:

■ Use Big Data analytics to drive value for your enterprise that aligns with
your core competencies and creates a competitive advantage for your
enterprise
■ Capitalize on new technology capabilities and leverage your existing
technology assets
■ Enable the appropriate organizational change to move towards fact- based
decisions, adoption of new technologies, and uniting people from
multiple disciplines into a single multidisciplinary team
■ Deliver faster and superior results by embracing and capitalizing on the
ever-increasing rate of change that is occurring in the global market place

Big Data analytics uses a wide variety of advanced analytics, as listed in Figure

SQL Descripti Data Predictiv Simulatioo Optimizatio

Analy ve Mining Analytics n n
tics Analytics
• Count • Univari • Associat • Classiﬁcati • Monte • Linear
• Mean ate ion rules on Carlo optimizati
• OLAP distribut • Clustering • Regression • Agent- on
ion • Feature • Forecasting based • Non-

• Centra extracti • Spatial modeli linear

l on • Mac
ng optimizati
tenden hine • Discrete on
cy lear event
• Dispersion ning modelin
• Text
g
analytics
ADBMS & Big Data(22MCS23)

There are potential Big Data Business Models for enterprises seeking to exploit Big
Data analytics

Big Data analytics certainly represents an enormous opportunity for businesses

to exploit their data assets to realize substantial bottom line results for their
enterprise. The key to success for organizations seeking to take advantage of this
opportunity is:
■ Leverage all your current data and enrich it with new data sources

■ Enforce data quality policies and leverage today’s best technology and

people to support the policies

■ Relentlessly seek opportunities to imbue your enterprise with fact- based

decision making
■ Embed your analytic insights throughout your organization

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (83)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
77% (13)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Raj Informatica Cloud IICS Course Content
No ratings yet
Raj Informatica Cloud IICS Course Content
6 pages
Big Data PPT 55b0fc01e7543
No ratings yet
Big Data PPT 55b0fc01e7543
31 pages
Big Data Project
100% (3)
Big Data Project
61 pages
BDM 1
No ratings yet
BDM 1
37 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Advanced Analytics: What Is Big Data Analytics? Definition, Benefits, and More
No ratings yet
Advanced Analytics: What Is Big Data Analytics? Definition, Benefits, and More
13 pages
BDA UNIT-1 (Lecture-1)
No ratings yet
BDA UNIT-1 (Lecture-1)
5 pages
Big Data in CRM
No ratings yet
Big Data in CRM
12 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Unit 1
No ratings yet
Unit 1
54 pages
Bigdatappt
No ratings yet
Bigdatappt
31 pages
Unit-III CC&BD Cs62 Ab
No ratings yet
Unit-III CC&BD Cs62 Ab
85 pages
Big Data Analytics: Recent Achievements and New Challenges
No ratings yet
Big Data Analytics: Recent Achievements and New Challenges
5 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
Big Data: Management Information Systems
No ratings yet
Big Data: Management Information Systems
11 pages
Big Data Analytics
No ratings yet
Big Data Analytics
73 pages
Big Data Analytics
No ratings yet
Big Data Analytics
83 pages
Big Data: Abstract
No ratings yet
Big Data: Abstract
15 pages
BDA-1
No ratings yet
BDA-1
26 pages
What Is Big Data
No ratings yet
What Is Big Data
8 pages
Big Data Analysis Seminar
100% (1)
Big Data Analysis Seminar
15 pages
Big-Data-ppt
No ratings yet
Big-Data-ppt
30 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
Big Data
No ratings yet
Big Data
9 pages
What Is Big Data?
No ratings yet
What Is Big Data?
3 pages
IMTC634_Data Science_Chapter 11
No ratings yet
IMTC634_Data Science_Chapter 11
22 pages
Big Data
No ratings yet
Big Data
30 pages
Book Big Data Technology
No ratings yet
Book Big Data Technology
87 pages
Big Data Mining: A Challenge and How To Manage It: Csa Deptt. Pdmce Jitender Csa Deptt. Pdmce
No ratings yet
Big Data Mining: A Challenge and How To Manage It: Csa Deptt. Pdmce Jitender Csa Deptt. Pdmce
3 pages
Big Data
No ratings yet
Big Data
43 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Report of Big Data
No ratings yet
Report of Big Data
14 pages
What Is Big Data
No ratings yet
What Is Big Data
18 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
Unit 1
No ratings yet
Unit 1
17 pages
Big Data Analytics - Unit 1
No ratings yet
Big Data Analytics - Unit 1
29 pages
Articulo Examen Global - Ingles PROTEGIDO
No ratings yet
Articulo Examen Global - Ingles PROTEGIDO
10 pages
Big Data in Business
No ratings yet
Big Data in Business
11 pages
Data Warehousing Data Minig Etc.....................
No ratings yet
Data Warehousing Data Minig Etc.....................
23 pages
Akash Decap456 Introduction to Big Data
No ratings yet
Akash Decap456 Introduction to Big Data
297 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
BDA - Unit-I
No ratings yet
BDA - Unit-I
35 pages
Big Data
No ratings yet
Big Data
9 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
Big Data
No ratings yet
Big Data
31 pages
WHAT IS BIG DATA87
No ratings yet
WHAT IS BIG DATA87
4 pages
Big Data Presentation Slide
100% (1)
Big Data Presentation Slide
30 pages
Unit - I - Types of Digital Data
No ratings yet
Unit - I - Types of Digital Data
45 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
Unit 1
No ratings yet
Unit 1
55 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
Bigdata Unit1
No ratings yet
Bigdata Unit1
62 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Big Data: Made By: Harshita Salian 17038 Syed Khadija Rizvi 17049 Sayyed Alfiya 17041 Rahul Masam 17028 Deepak Pal 17033
No ratings yet
Big Data: Made By: Harshita Salian 17038 Syed Khadija Rizvi 17049 Sayyed Alfiya 17041 Rahul Masam 17028 Deepak Pal 17033
12 pages
Big Data
No ratings yet
Big Data
31 pages
Course Material
100% (1)
Course Material
57 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
55 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Frequently Asked Questions (Faqs) : E Filing and CPC
No ratings yet
Frequently Asked Questions (Faqs) : E Filing and CPC
22 pages
ABB How To Create A XML File With Drive Composer Pro & Use in TwinCat
No ratings yet
ABB How To Create A XML File With Drive Composer Pro & Use in TwinCat
8 pages
Interoperable Standards For Telecom E-Business: Jenny Huang
No ratings yet
Interoperable Standards For Telecom E-Business: Jenny Huang
26 pages
Sproxy
No ratings yet
Sproxy
5 pages
Complete Download Programming PHP 4th Edition Peter Macintyre PDF All Chapters
100% (4)
Complete Download Programming PHP 4th Edition Peter Macintyre PDF All Chapters
52 pages
D4.1 - Informe Sobre El Análisis de Requisitos TIC Del IQS
No ratings yet
D4.1 - Informe Sobre El Análisis de Requisitos TIC Del IQS
176 pages
Q & A V11
88% (8)
Q & A V11
4 pages
sathyabama-IIsem-Advanced Internet Technology-683203-783202
No ratings yet
sathyabama-IIsem-Advanced Internet Technology-683203-783202
2 pages
Date MSXSL
No ratings yet
Date MSXSL
26 pages
8.4 IdentityIQ Installation Guide v3
No ratings yet
8.4 IdentityIQ Installation Guide v3
40 pages
AcCommandWeight XML
No ratings yet
AcCommandWeight XML
23 pages
XML Schema: Elementformdefault and Attributeformdefault: The Definition of Elementformdefault
No ratings yet
XML Schema: Elementformdefault and Attributeformdefault: The Definition of Elementformdefault
10 pages
Articula o Folhas
No ratings yet
Articula o Folhas
1,456 pages
Web Services & WCF: Ankit
No ratings yet
Web Services & WCF: Ankit
18 pages
MULTIMEDIA Unit 1
No ratings yet
MULTIMEDIA Unit 1
14 pages
Web Technology Assignment 1 and 2
No ratings yet
Web Technology Assignment 1 and 2
2 pages
Unit - I Web Technologies Notes
No ratings yet
Unit - I Web Technologies Notes
72 pages
3 XML DTD and XSLT Assignment No 31
No ratings yet
3 XML DTD and XSLT Assignment No 31
7 pages
XML Specs
No ratings yet
XML Specs
4 pages
Dive Into Python 1st Edition Mark Pilgrim (Auth.) instant download
100% (6)
Dive Into Python 1st Edition Mark Pilgrim (Auth.) instant download
55 pages
Launching A Panel/driver: Applications) Will Be Loaded
No ratings yet
Launching A Panel/driver: Applications) Will Be Loaded
2 pages
HTML Classes
No ratings yet
HTML Classes
30 pages
9.0 (1) Cisco Unified Communications Manager TAPI Developers Guide
No ratings yet
9.0 (1) Cisco Unified Communications Manager TAPI Developers Guide
996 pages
Avocet Mobile Deployment
No ratings yet
Avocet Mobile Deployment
18 pages
Interview - Oracle BI Publisher3
No ratings yet
Interview - Oracle BI Publisher3
4 pages
CIS-CATUsersGuide 000 PDF
No ratings yet
CIS-CATUsersGuide 000 PDF
56 pages
Advanced Database Topics: Querying XML With Xpath and Xquery
No ratings yet
Advanced Database Topics: Querying XML With Xpath and Xquery
78 pages
Supporting The Development of Unmanned Vehicles MAVlink Protocol and Its Security Aspects
No ratings yet
Supporting The Development of Unmanned Vehicles MAVlink Protocol and Its Security Aspects
40 pages
Terms Common in Mobile Development
No ratings yet
Terms Common in Mobile Development
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ADBMS-Module 1 Notes

Uploaded by

ADBMS-Module 1 Notes

Uploaded by

ADBMS & Big Data(22MCS23)

Advanced DBMS and Big data

Difference Between Big Data and Data Mining

Data Mining Big Data

It is about extracting the vital and valuable

It is manual as well as It is only automated as computing huge data is

It is used to create certain

It is a sub set of Big Data.

It is a tool to dig up the vital

S.No. Big Data Data Warehouse

Big data is a technology to store and Data warehouse is an architecture used to

It takes structured, non-structured or

Data warehouse requires more efficient

Why Big data

There are three standard answer for questions

3. Convergence perfect storm. Another perfect storm is happening, too.

Comparing traditional analytics to Big Data analytics, the differences in speed,

Types of big data

The information contained in big data repositories can be classified .2.5

Sources of structured data

Cons of Structured Data

 Semi-structured content is often used to store metadata about a business

1. XML– XML stands for eXtensible Markup Language. It is a text-based

in almost all popular development platforms. It is human and machine-readable.

<!--The 2ndFav and 3rdFav Coding Platforms

{ "type": "Fav", "value": "Geeksforgeeks"

{ "type": "2ndFav", "value": "Code4Eva!"

{ "type": "3rdFav", "value": "CodeisLife"

Characteristics or dimensions of big data

The volume of your data is how much of it there is – measured in gigabytes,

on it if the data flow is not constant.

The correctness of your data is referred to as veracity. The accuracy of your

Convergence of key trends

Refer text book

Coordinating the processes in a large-scale distributed computation is a challenge.

MapReduce might sound like a restrictive programming model as we are limited to

SETI, the Search for Extra-Terrestrial Intelligence, runs a project called

Although SETI@home may be similar to MapReduce (breaking a problem into

Unstructured data tends to grow exponentially, unlike structured data, which

SQL Descripti Data Predictiv Simulatioo Optimizatio

• Centra extracti • Spatial modeli linear

Big Data analytics certainly represents an enormous opportunity for businesses

people to support the policies

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.