0% found this document useful (0 votes)

44 views17 pages

Big Data Analytics (Unit-II)

Uploaded by

Aashish Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views17 pages

Big Data Analytics (Unit-II)

Uploaded by

Aashish Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Big Data Analytics (UNIT – II)

1. Exploring the Big data stack.

Ans =Exploring The Big data Stack
Big Data analysis also needs the creation of a model or architecture Data architecture. The
Big data environment must fulfill all the foundational requirenments and must be able to
perform the following functions:
 Capturing data from different sources
 Cleaning and integrating data of different types of formats
 Sorting and organizing data
 Analyzing data
 Identifying relationships and patterns
 Deriving conclusions based on the data analysis

The big data stack includes:

 Big data application: Extracts insights like hidden patterns, market trends, and
customer preferences
 Data ingestion: Moves data, especially unstructured data, to a system where it can be
stored and analyzed
 Computer data storage: Centralizes and consolidates data from various sources for
analytical purposes
 Data warehouse: A centralized storage container that consolidates company data
 Data analytics: Helps organizations gain insights, optimize operations, and predict
future outcomes
 ETL tools: Prepares a new data source to be stored
 Automated generation of insights: Provides an easier and faster way to obtain
important findings
 Business Analytics: Uses data to enable data-driven decisions
 Data lakes: Stores large amounts of raw data
2. Data Source Layer.
Ans = Data Sources Layer
Organizations generate a huge amount of data on a daily basis. The basic function of the data
sources layer is to absorb and integrate the data coming from various sources, at varying
velocity and in different formats. Before this data is considered for big datastack, we have to
differentiate between the noise and relevant information.

The data source layer in big data is capable of processing large amounts of data from
different sources in batch and real-time. These sources include:

 Data warehouses
 RDMS
 SaaS apps
 Internet of Things sensors
The data available for analysis can vary in origin and format. The format may be
structured, unstructured, or semi-structured. The speed of data arrival and delivery will vary
according to the source. The data collection mode may be direct or through data providers, in
batch mode or in real-time.

3. Ingestion Layer.
Ans = Ingestion Layer : The role of the ingestion layer is to absorb the huge inflow of data
and sort it out in different categories. This layer separates noise from relevant information. It
can handle huge volume, high velocity, and a variety of data. The ingestion layer validates,
cleanses, transforms, reduces, and integrates the unstructured data into the Big Data stack for
further processing.
The role of the ingestion layer is to absorb the huge inflow of data and sort it out in different
categories. This layer separates noise from relevant information. It can handle huge volume,
high velocity, and a variety of data. The ingestion layer validates, cleanses, transforms,
reduces, and integrates the unstructured data into the Big Data stack for further processing. he
data ingestion layer is the first layer in the big data architecture. It's responsible for
collecting data from various sources, such as: IoT devices, Data lakes, Databases, SaaS
applications.

The data ingestion layer prioritizes and categorizes the data. It also:

 Processes incoming data

 Validates individual files
 Routes data to the correct destination
The data ingestion layer is the first step in building a data pipeline. It's also the toughest task
in the big data system.
The data ingestion layer ends with the data visualization layer, which presents the data to the
user.

Data ingestion tools come with a variety of security features, including:

 Encryption
 Support for protocols such as Secure Sockets Layer and HTTP over SSL
Figure illustrates the functioning of the ingestion layer:

4. Storage Layer
Ans = Storage Layer
Hadoop is an open source framework used to store large volumes of data in a distributed
manner across multiple machines The Hadoop storage layer supports fault-tolerance and
parallelization. which enable high-speed distributed processing algorithms to execute over
large-scale data. There are two major components of Hadoop: a scalable Hadoop Distributed
File System (HDFS) that can support petabytes of data and a MapReduce engine that
computes results in batches.
HDPS is a file system that is used to store huge volumes of data across a large number of
commodity machines in a cluster. The data can be in terabytes or petabytes. HDFS stores data
in the form of blocks of files and follows the write-once-read-many model to data from these
blocks of files The files stored in the HDPS are operated upon by many complex programs,
as per the requirement
data storage requirements can be addressed by a single concept known as Not Only SQL
(NoSQL) databases. Some examples of NoSQL databases include HBASE, MongoDB,
AllegroGraph, and InfiniteGraph.
5. RDMS and Big Data.
Ans = Storing Data In Data Bases and Data Warehouses:
RDBMS and Big Data,
An RDBMS uses a relational model where all the data is stored using preset schemas. These
schemas are linked using the values in specific columns of each table. The data is
hierarchical, which means for data to be stored or transacted it needs to adhere to ACID
standards, namely:
Atomicity-Ensures full completion of a database operation.
Consistency-Ensures that data abides by the schema (table) standards, such as correct data
type entry, constraints, and keys.
Isolation-Refers to the encapsulation of information. Makes only necessary information
visible.
Durability-Ensures that transactions stay valid even after a power failure or errors.
In traditional database systems, every time data is accessed or modified, it requires to be
moved (indexed) to a central location for processing. Therein lies a major limitation of
hardware upgradation. You can upgrade your hardware to improve performance, however,
depending on the hardware platform, there is a limitation on the number of processors and
system memory that can be used to concurrently perform database operations. Besides the
processing power restraint, network latency can also occur during data transfer to the central
node.
6. Issues with relational model.

Ans = Relational Database Limitations

Although there are more benefits of using relational databases, it has some limitations also.
Let’s see the limitations or disadvantages of using the relational database.

1 – Maintenance Problem
The maintenance of the relational database becomes difficult over time due to the increase in
the data. Developers and programmers have to spend a lot of time maintaining the database.

2 – Cost
The relational database system is costly to set up and maintain. The initial cost of the
software alone can be quite pricey for smaller businesses, but it gets worse when you factor
in hiring a professional technician who must also have expertise with that specific kind of
program.

3 – Physical Storage
A relational database is comprised of rows and columns, which requires a lot of physical
memory because each operation performed depends on separate storage. The requirements of
physical memory may increase along with the increase of data.

4 – Lack of Scalability
While using the relational database over multiple servers, its structure changes and becomes
difficult to handle, especially when the quantity of the data is large. Due to this, the data is
not scalable on different physical storage servers. Ultimately, its performance is affected i.e.
lack of availability of data and load time etc. As the database becomes larger or more
distributed with a greater number of servers, this will have negative effects like latency and
availability issues affecting overall performance.

5 – Complexity in Structure
Relational databases can only store data in tabular form which makes it difficult to represent
complex relationships between objects. This is an issue because many applications require
more than one table to store all the necessary data required by their application logic.

6 – Decrease in performance over time

The relational database can become slower, not just because of its reliance on multiple tables.
When there is a large number of tables and data in the system, it causes an increase in
complexity. It can lead to slow response times over queries or even complete failure for them
depending on how many people are logged into the server at a given time.

7. Explain On-Relational Database.

Ans = A relational database is a collection of information that organizes data points with
defined relationships for easy access. In the relational database model, the data structures --
including data tables, indexes and views -- remain separate from the physical storage
structures, enabling database administrators to edit the physical data storage without affecting
the logical data structure.

In the enterprise, relational databases are used to organize data and identify relationships
between key data points. They make it easy to sort and find information, which helps
organizations make business decisions more efficiently and minimize costs. They work well
with structured data.

How does a relational database work?

The data tables used in a relational database store information about related objects. Each row
holds a record with a unique identifier -- known as a key -- and each column contains the
attributes of the data. Each record assigns a value to each feature, making relationships
between data points easy to identify.
The standard user and application program interface (API) of a relational database is the
Structured Query Language. SQL code statements are used both for interactive queries for
information from a relational database and for gathering data for reports. Defined data
integrity rules must be followed to ensure the relational database is accurate and accessible.

The key advantages of relational databases include the following:

1.Categorizing data. Database administrators can easily categorize and store data in a
relational database that can then be queried and filtered to extract information for reports.
Relational databases are also easy to extend and aren't reliant on physical organization. After
the original database creation, a new data category can be added without having to modify the
existing applications.

2.Accuracy. Data is stored just once, eliminating data deduplication in storage procedures.

3.Ease of use. Complex queries are easy for users to carry out with SQL, the main query
language used with relational databases.

4.Collaboration. Multiple users can access the same database.

5.Security. Direct access to data in tables within an RDBMS can be limited to specific users.

8. Integrating big data with traditional data warehouse.

Ans = Integrating data into a traditional data warehouse involves a process known as ETL,
which stands for Extract, Transform, and Load. This process is crucial for converting raw
data from various sources into a format that can be analyzed and used for decision-making.

The first step, extraction, involves pulling data from various sources. These sources can be
anything from databases, cloud data storage, data lakes, to big data platforms. SQL
(Structured Query Language) is often used in this step to query and retrieve data from these
sources, including disparate sources like Amazon Redshift and Google BigQuery.

Once the data is extracted, it undergoes the transformation process. This step involves
cleaning, validating, and converting the data into a consistent format that can be used in the
data warehouse. This might involve tasks such as removing duplicates, validating data for
consistency and accuracy, and converting data types to match the data warehouse schema.

The final step is loading the data into the data warehouse. This involves writing the
transformed data into the data warehouse's storage system. Depending on the requirements,
this could be a full load, where all the data is written into the warehouse, or an incremental
load, where only new or updated data is written.
This process has evolved with the advent of cloud data warehouses and big data, leading to
new techniques and tools for data integration. For instance, the ingestion of data into
platforms like Amazon Redshift and Google BigQuery has become more streamlined and
efficient.

9. Applications of Big Data

Ans = Big companies utilize those data for their business growth. By analyzing this data,
the useful decision can be made in various cases as discussed below:

1. Tracking Customer Spending Habit, Shopping Behavior: In big retails store

(like Amazon, Walmart, Big Bazar etc.) management team has to keep data of
customer’s spending habit (in which product customer spent, in which brand they
wish to spent, how frequently they spent), shopping behavior, customer’s most liked
product (so that they can keep those products in the store). Which product is being
searched/sold most, based on that data, production/collection rate of that product get
fixed.
2. Smart Traffic System: Data about the condition of the traffic of different road,
collected through camera kept beside the road, at entry and exit point of the city,
GPS device placed in the vehicle (Ola, Uber cab, etc.). All such data are analyzed
and jam-free or less jam way, less time taking ways are recommended. Such a way
smart traffic system can be built in the city by Big data analysis. One more profit is
fuel consumption can be reduced.
3. Secure Air Traffic System: At various places of flight (like propeller etc) sensors
present. These sensors capture data like the speed of flight, moisture, temperature,
other environmental condition. Based on such data analysis, an environmental
parameter within flight are set up and varied.
4. Virtual Personal Assistant Tool: Big data analysis helps virtual personal assistant
tool (like Siri in Apple Device, Cortana in Windows, Google Assistant in Android)
to provide the answer of the various question asked by users. This tool tracks the
location of the user, their local time, season, other data related to question asked,
etc. Analyzing all such data, it provides an answer.
5. IoT:Manufacturing company install IOT sensor into machines to collect operational
data. Analyzing such data, it can be predicted how long machine will work without
any problem when it requires repairing so that company can take action before the
situation when machine facing a lot of issues or gets totally down. Thus, the cost to
replace the whole machine can be saved.
6. Education Sector: Online educational course conducting organization utilize big
data to search candidate, interested in that course. If someone searches for YouTube
tutorial video on a subject, then online or offline course provider organization on
that subject send ad online to that person about their course.

10.Data Visualization.
Ans = Data visualization is the fourth layer and is responsible for creating visualizations
of the data that humans can easily understand. This layer is important for making the data
accessible.
The data visualization layer in big data architecture measures the success of a project. It
allows users to perceive the value of the data. The data visualization layer uses Microsoft
Power BI to enable users to:

 Connect to a semantic model

 Create rich visualizations
 Organize visualizations on a canvas to build reports
 Pin visualizations to build dashboards
 Share visualizations across the enterprise
Other layers in big data architecture include:

 Ingestion layer: Loads data from data sources into the data platform
 Analytics layer: Consumes business insight derived from analytics applications
 Manage layer: Separates noise and relevant information from a huge data set
Some visualization types include:

 Line charts: Represent the relationship of data

 Treemaps: Charts of colored rectangles, with size representing value

RCV Academy

11. Security Layer

Ans = Big data security is a collection of measures and tools that protect data and analytics
methods from attacks, theft, and other malicious activities. Big data security is made up of
three layers: incoming, stored, and outgoing data.
Big data security tools and measures include:

 Visibility into all data access and interactions

 Data classification
 Data event correlation
 Application control
 Device control and encryption
 Web application and cloud storage control
 Trusted network awareness
 Access and privileged user control
 Firewalls
 Strong user authentication
 End-user training
 Intrusion protection systems (IPS) and intrusion detection systems (IDS)
 Digital signing solutions
 Cryptographic key security management
Best practices for big data security include:

 Continuously monitoring and auditing all access to sensitive data

 Keeping out unauthorized users and intrusions
 Encrypting data in transit and at rest

12.Big Data Virtualization Layer.

Ans = Big data virtualization is a process that creates virtual structures for big data
systems. It enables organizations to use all the data they collect to achieve various goals and
objectives.
Big data virtualization offers a modernized approach to data integration. It serves as a logical
data layer that combines all enterprise data to produce real-time information for business
users.
Big data virtualization guarantees that data is adequately connected with other systems so that
organizations may harness big data for analytics and operations.
Big data virtualization minimizes persistent data stores and associated costs. It integrates data
from multiple sources of different types into a holistic, logical view without moving it
physically.

Some challenges of virtualization in big data include:

 Performance issues on the virtual machine

 Consequences of insufficient resource allocations
 Hardware issues interrupting virtualization performance
 Poor network performance
 Monitoring the underlying network environment for virtualization

13. Physical Infrastructure Layer.

Ans = Before learning about the physical infrastructure layer, you need to know about the
principles on which Big Data implementation is based, Some of these principles are:
Performance-High-end infrastructure is required to deliver high performance with low
latency. Performance is measured end to end, on the basis of a single transaction or request. It
would be rated high if the total time taken in traversing a query request is low. The total time
taken by a data packet to travel from one node to another is described as Generally, the setups
that provide high performance and low latency are quite expensive than normal infrastructure
setups. query latency.
Availability-The infrastructure setup must be available at all times to ensure nearly a 100
percent uptime guarantee of service. It is obvious that businesses cannot wait in case of a
service interruption or failure; therefore, an alternative of the main system must also be
maintained.
Scalability-The Big Data infrastructure should be scalable enough to accommodate varying
storage and computing requirements. They must also be capable to deal with any nexpected
challenges
Flexibility-Flexible infrastructures facilitate adding more resources to the setup and promote
failure recovery. It should be noted that flexible infrastructure is also costly; however, costs
can be controlled with the use of cloud services, where you need to pay for what you actually
use.
Cost-You must select the infrastructure that you can afford. This includes all the hardware,
networking, and storage requirements. You must consider all the above parameters in the
context of your overall budget and then make trade-offs, where necessary. From the above
points, it can be concluded that a robust and inespersive physical infrastructure can be
implemented to handle Big Data This requirement is addressed by the Hadoop physical
infrastructure layer. This layer is based on a distributed computing model, which allows the
physical storage of data in many different locations by linking them through networks and the
distributed file system The Hadoop physical infrastructure layer also supports redundancy of
data, because data is collected from so many different sources. Figure 6.5 shows the hardware
topology used for Big Data implementation

Hadoop infrastructure layer takes care of the hardware and network requirements. It can
provide a virtualized cloud environment or a distributed grid of commodity servers over a fast
gigabit network. Following are the main components of a Hadoop infrastructure:
N commodity servers (8-core, 24 GBs RAM, 4 to 12 TBs, gig-E)
2-level network (20 to 40 nodes per rack)
14. Platform Management Layer in big data.
Ans = The management system in big data focuses on data access and data mining. The
management system is made up of six modules:
Interface acquisition, Program scheduling, Data aggregation, Platform alerting,
Marketing analysis, Visualization.

The platform management layer includes an edge application service platform for virtualized
resource management, which allocates resources in the network to different services and
provides the operation and management of edge services.

Virtualization resource management is responsible for allocating virtualized hardware

resources to service users in edge scenarios flexibly and efficiently. Allocate resources to
different applications on demand according to the types of services required by users in edge
scenarios (such as transmission-intensive, computing-intensive, and storage-intensive), taking
into account various factors such as user mobility, demand changes, and network
environment.
15. Issues with No-relational database.

Ans = NoSQL databases, which stand for "not only SQL," are a popular alternative to
traditional relational databases. They are designed to handle large amounts of unstructured or
semi-structured data, and are often used for big data and real-time web applications.
However, like any technology, NoSQL databases come with their own set of challenges.

Challenges of NoSQL :

1)Data modeling and schema design : One of the biggest challenges with NoSQL databases
is data modeling and schema design. Unlike relational databases, which have a well-defined
schema and a fixed set of tables, NoSQL databases often do not have a fixed schema. This
can make it difficult to model and organize data in a way that is efficient and easy to query.
Additionally, the lack of a fixed schema can make it difficult to ensure data consistency and
integrity.

2)Query complexity : Another challenge with NoSQL databases is query complexity.

Because of the lack of a fixed schema and the use of denormalized data, it can be difficult to
perform complex queries or joins across multiple collections. This can make it more difficult
to extract insights from your data and can increase the time and resources required to perform
data analysis.

3)Scalability : NoSQL databases are often used for big data and real-time web applications,
which means that they need to be able to scale horizontally. However, scaling a NoSQL
database can be complex and requires careful planning. You may need to consider issues
such as sharding, partitioning, and replication, as well as the impact of these decisions on
query performance and data consistency.

4)Management and administration : Managing and administering a NoSQL database can

be more complex than managing a traditional relational database. Because of the lack of a
fixed schema and the need for horizontal scaling, it can be more difficult to ensure data
consistency, perform backups and disaster recovery, and monitor performance. Additionally,
many NoSQL databases have different management and administration tools than relational
databases, which can add to the learning curve.

5)Data security : Ensuring the security of sensitive data is a critical concern for any
organization. NoSQL databases, however, may not have the same level of built-in security
features as relational databases. This means that additional measures may need to be put in
place to secure data at rest and in transit, such as encryption and authentication.

16. Server Virtualization.

Ans = Server virtualization is a process that divides a physical server into multiple virtual
servers. This allows you to run multiple workloads on one physical server. Virtualization can
be beneficial for big data systems because it:

 Improves efficiency
 Allows for fewer physical servers in a data center
 Helps platforms scale to handle large volumes of data
 Improves application processing performance
 Allows you to run different operating systems on the same hardware
Big data is a collection of structured, unstructured, and semi-structured data that continues to
grow exponentially. It's characterized by: Volume, Variety, Velocity, Variability.
Virtualization is not legally required for big data analysis, but software frameworks are
more efficient in a virtualized environment. For example, any MapReduce algorithm will
perform better in a virtualized environment.

17. Monitoring Layer.

Ans = Big data monitoring tracks metrics like: Response times, Resource utilization, Error
rates, Transaction performance.
Monitoring can alert users to issues or anomalies so they can take action.
The security and governance layer of big data architecture includes: Access control,
Encryption, Network security, Usage monitoring, Auditing mechanisms.
The security layer also tracks the operations of other layers.

Big data monitoring can detect fraud by:

 Monitoring transactions in real-time

 Comparing transactions with previous or existing data

18. Visualization Layer.

Ans = The visualization layer is the fourth layer in big data architecture. It's responsible
for creating visualizations of data that are easy for humans to understand. The main
goal of data visualization is to make it easier to identify patterns, trends, and outliers
in large data sets.

The visualization layer uses Microsoft Power BI to enable users to:

 Connect to the semantic model (as a dataset)
 Create rich visualizations
 Organize visualizations on a canvas to build reports
 Pin visualizations to build dashboards
 Share visualizations across the enterprise
Some examples of data visualizations include: Maps, Graphs, Treemaps, Line charts.
The visualization layer is important for making data accessible.

19. CAP Theorem.

Ans = The CAP theorem, originally introduced as the CAP principle, can be used to
explain some of the competing requirements in a distributed system with replication. It is a
tool used to make system designers aware of the trade-offs while designing networked
shared-data systems.
The three letters in CAP refer to three desirable properties of distributed systems with
replicated data: consistency (among replicated copies), availability (of the system for read
and write operations) and partition tolerance (in the face of the nodes in the system being
partitioned by a network fault).
The CAP theorem states that it is not possible to guarantee all three of the desirable
properties – consistency, availability, and partition tolerance at the same time in a
distributed system with data replication.
The theorem states that networked shared-data systems can only strongly support two of the
following three properties:

 Consistency –
Consistency means that the nodes will have the same copies of a replicated data
item visible for various transactions. A guarantee that every node in a distributed
cluster returns the same, most recent and a successful write. Consistency refers
to every client having the same view of the data. There are various types of
consistency models. Consistency in CAP refers to sequential consistency, a very
strong form of consistency.

 Availability –
Availability means that each read or write request for a data item will either be
processed successfully or will receive a message that the operation cannot be
completed. Every non-failing node returns a response for all the read and write
requests in a reasonable amount of time. The key word here is “every”. In simple
terms, every node (on either side of a network partition) must be able to respond
in a reasonable amount of time.

 Partition Tolerance –
Partition tolerance means that the system can continue operating even if the
network connecting the nodes has a fault that results in two or more partitions,
where the nodes in each partition can only communicate among each other. That
means, the system continues to function and upholds its consistency guarantees
in spite of network partitions. Network partitions are a fact of life. Distributed
systems guaranteeing partition tolerance can gracefully recover from partitions
once the partition heals.
 The following figure represents which database systems prioritize specific
properties at a given time:


 CAP theorem with databases examples

20. List some major functions of the big data architecture model.
Ans = A big data architecture is a system that manages, stores, processes, and analyzes
large amounts of data. It's designed to handle data that's too large or complex for traditional
database systems.

The major functions of a big data architecture include:

Ingestion, Processing, Analysis, Storage, Management, Access, Categorizing data,
Supporting different users.
Big data architectures must be able to handle the scale, complexity, and variety of big
data. They must also be able to support the needs of different users, who may want to access
and analyze the data differently.

Big data architectures typically involve one or more of the following types of workload:

 Batch processing of big data sources at rest

 Stream processing

 Hadoop is a popular, open-source batch processing framework for storing, processing,

and analyzing vast volumes of data.

There is more than one workload type involved in big data systems, and they are broadly
classified as follows:
1. Merely batching data where big data-based sources are at rest is a data processing
situation.
2. Real-time processing of big data is achievable with motion-based processing.
3. The exploration of new interactive big data technologies and tools.
4. The use of machine learning and predictive analysis.

Planning is a useless endeavor because developments in e-business and e-commerce and in the political, economic, and societal environments are moving too quickly nowadays.” Do you agree or disagree with this statement? Why?
100% (8)
Planning is a useless endeavor because developments in e-business and e-commerce and in the political, economic, and societal environments are moving too quickly nowadays.” Do you agree or disagree with this statement? Why?
11 pages
15CS565 cloud computing co po
No ratings yet
15CS565 cloud computing co po
3 pages
Itcna Study Notes Block 1
No ratings yet
Itcna Study Notes Block 1
28 pages
(Emerging Trends in Mechatronics) Aydin Azizi, Reza Vatankhah Barenji - Industry 4.0 - Technologies, Applications, and Challenges-Springer (2022)
No ratings yet
(Emerging Trends in Mechatronics) Aydin Azizi, Reza Vatankhah Barenji - Industry 4.0 - Technologies, Applications, and Challenges-Springer (2022)
268 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
Accops Product Editions Guide
No ratings yet
Accops Product Editions Guide
12 pages
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
No ratings yet
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
117 pages
Main Ec5fe7f7 Js liceNSE
No ratings yet
Main Ec5fe7f7 Js liceNSE
1 page
Index Win32 Bundle liceNSE
No ratings yet
Index Win32 Bundle liceNSE
1 page
Cdc Oracle
No ratings yet
Cdc Oracle
223 pages
Practical 5 Aashish
No ratings yet
Practical 5 Aashish
1 page
Practical 9
No ratings yet
Practical 9
2 pages
Chapter 3
No ratings yet
Chapter 3
4 pages
CSIL-CI_Installing_CSI_Linux
No ratings yet
CSIL-CI_Installing_CSI_Linux
41 pages
unit 2
No ratings yet
unit 2
6 pages
Chapter 5 Virtual Ization
No ratings yet
Chapter 5 Virtual Ization
7 pages
BDA PPT M1 P1 Big Data Stack
No ratings yet
BDA PPT M1 P1 Big Data Stack
44 pages
Denodo Training QuickStart Virtual Machine - User Manual - 23062020-1
No ratings yet
Denodo Training QuickStart Virtual Machine - User Manual - 23062020-1
30 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Files 1 2020 April NotesHubDocument 1586849482
No ratings yet
Files 1 2020 April NotesHubDocument 1586849482
60 pages
M4 Notes
No ratings yet
M4 Notes
13 pages
OS
No ratings yet
OS
20 pages
Session 1
No ratings yet
Session 1
48 pages
Unit I
No ratings yet
Unit I
11 pages
BCA V Sem
No ratings yet
BCA V Sem
5 pages
VCenter Server Upgrade
100% (1)
VCenter Server Upgrade
41 pages
Database (1)
No ratings yet
Database (1)
7 pages
Group1_Openstack (2)
No ratings yet
Group1_Openstack (2)
36 pages
WK 3
No ratings yet
WK 3
29 pages
bigdata
No ratings yet
bigdata
15 pages
Cloud Computing Material Unit - 1
No ratings yet
Cloud Computing Material Unit - 1
24 pages
6th SEM Syllabus IIoT 2021
No ratings yet
6th SEM Syllabus IIoT 2021
20 pages
Module 2
No ratings yet
Module 2
48 pages
Unit 3 DS
No ratings yet
Unit 3 DS
8 pages
Database (2)
No ratings yet
Database (2)
72 pages
DATABASE MANAGEMENT
No ratings yet
DATABASE MANAGEMENT
7 pages
CC M2
No ratings yet
CC M2
19 pages
BIS Lecture 01 - Introduction (1)
No ratings yet
BIS Lecture 01 - Introduction (1)
28 pages
Notes
No ratings yet
Notes
14 pages
BDA Question Bank
No ratings yet
BDA Question Bank
20 pages
unit 1 big data
No ratings yet
unit 1 big data
15 pages
Cloud Unit04
No ratings yet
Cloud Unit04
67 pages
Introduction to Data Models 677e35511a823
No ratings yet
Introduction to Data Models 677e35511a823
45 pages
SQL Unit1
No ratings yet
SQL Unit1
28 pages
ncp2
No ratings yet
ncp2
24 pages
CSC 111 (Introduction To Database Concepts & It Application)
No ratings yet
CSC 111 (Introduction To Database Concepts & It Application)
24 pages
BD UNIT 1,2
No ratings yet
BD UNIT 1,2
12 pages
DSBDA EndSem2023 12F FlyHigh
No ratings yet
DSBDA EndSem2023 12F FlyHigh
20 pages
BDA_2M
No ratings yet
BDA_2M
10 pages
Transcript For Presentation: Slide 1: Introduction
No ratings yet
Transcript For Presentation: Slide 1: Introduction
9 pages
Database
No ratings yet
Database
4 pages
ak_as2
No ratings yet
ak_as2
15 pages
Adbms Finals Reviewer
No ratings yet
Adbms Finals Reviewer
3 pages
UDBMS NOTES
No ratings yet
UDBMS NOTES
18 pages
Unit - 2 Virtual Machines and Virtualization of Clusters and Data Centers
No ratings yet
Unit - 2 Virtual Machines and Virtualization of Clusters and Data Centers
28 pages
BDA_2M
No ratings yet
BDA_2M
13 pages
3 Assignment
No ratings yet
3 Assignment
5 pages
Unit 2 - BD - Big Data Technology Foundations
No ratings yet
Unit 2 - BD - Big Data Technology Foundations
44 pages
BDA Module-1
No ratings yet
BDA Module-1
9 pages
Data Base System Assignment
No ratings yet
Data Base System Assignment
4 pages
BIG DATA ANALYTICS
No ratings yet
BIG DATA ANALYTICS
10 pages
BDA notes part 1
No ratings yet
BDA notes part 1
11 pages
BD unit 1
No ratings yet
BD unit 1
5 pages
Unit 2 - BD - Big Data Technology Foundations
No ratings yet
Unit 2 - BD - Big Data Technology Foundations
76 pages
New 2nd Lecture Data Resource Management
No ratings yet
New 2nd Lecture Data Resource Management
24 pages
Types o Database
No ratings yet
Types o Database
11 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
CH - 1 Relational Database Design Updated
No ratings yet
CH - 1 Relational Database Design Updated
80 pages
database
No ratings yet
database
4 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
4 pages
Bigdata Notes
No ratings yet
Bigdata Notes
136 pages
Section 1 - Technical Sales Foundations For IBM QRadar For Cloud (QRoC) V1 P10000-017
No ratings yet
Section 1 - Technical Sales Foundations For IBM QRadar For Cloud (QRoC) V1 P10000-017
215 pages
VMware - Course Presentation - Text
No ratings yet
VMware - Course Presentation - Text
74 pages
Big Data Analytics (R20a0520)
No ratings yet
Big Data Analytics (R20a0520)
84 pages
BDS Session 1
100% (1)
BDS Session 1
70 pages
Cloud Tranning
No ratings yet
Cloud Tranning
130 pages
Eve Cook Book 1.35 2020
No ratings yet
Eve Cook Book 1.35 2020
251 pages
Compatibility Matrix HPSM (9.3)
No ratings yet
Compatibility Matrix HPSM (9.3)
11 pages
WS-011 Windows Server 2019 Administration
No ratings yet
WS-011 Windows Server 2019 Administration
57 pages
Docker Resume
100% (1)
Docker Resume
8 pages
3 4 5 IT Infrastructure
No ratings yet
3 4 5 IT Infrastructure
97 pages
Cloud Computing
No ratings yet
Cloud Computing
13 pages
NSX Small DC Design Guide
No ratings yet
NSX Small DC Design Guide
37 pages
The Essential Guide to Database Management
From Everand
The Essential Guide to Database Management
Pasquale De Marco
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Big Data Analytics (Unit-II)

Uploaded by

Big Data Analytics (Unit-II)

Uploaded by

Big Data Analytics (UNIT – II)

1. Exploring the Big data stack.

The big data stack includes:

 Processes incoming data

Data ingestion tools come with a variety of security features, including:

Ans = Relational Database Limitations

6 – Decrease in performance over time

7. Explain On-Relational Database.

How does a relational database work?

The key advantages of relational databases include the following:

4.Collaboration. Multiple users can access the same database.

8. Integrating big data with traditional data warehouse.

9. Applications of Big Data

1. Tracking Customer Spending Habit, Shopping Behavior: In big retails store

 Connect to a semantic model

 Line charts: Represent the relationship of data

11. Security Layer

 Visibility into all data access and interactions

 Continuously monitoring and auditing all access to sensitive data

12.Big Data Virtualization Layer.

Some challenges of virtualization in big data include:

 Performance issues on the virtual machine

13. Physical Infrastructure Layer.

Virtualization resource management is responsible for allocating virtualized hardware

2)Query complexity : Another challenge with NoSQL databases is query complexity.

4)Management and administration : Managing and administering a NoSQL database can

16. Server Virtualization.

17. Monitoring Layer.

Big data monitoring can detect fraud by:

 Monitoring transactions in real-time

18. Visualization Layer.

The visualization layer uses Microsoft Power BI to enable users to:

19. CAP Theorem.

 CAP theorem with databases examples

The major functions of a big data architecture include:

 Batch processing of big data sources at rest

 Hadoop is a popular, open-source batch processing framework for storing, processing,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.