0% found this document useful (0 votes)

8 views

Adbms

Uploaded by

Kash Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Adbms

Uploaded by

Kash Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Q.

1 Attempt the following (any 5) (25-10M)

i. Star Schema: It is a data modeling technique in which a central "fact"
table is connected to multiple "dimension" tables, forming a star-like
structure. Snowflake Schema: It is an extension of the star schema
where dimension tables are further normalized into multiple levels,
creating a snowflake-like structure.

ii. Table Inheritance: It is a concept in database design where one table

inherits attributes and relationships from another table, forming a
hierarchy. For example, a "Person" table may inherit attributes from a
"Customer" table, which in turn inherits from a "User" table, creating a
hierarchy of tables with increasing specialization.

iii. MOLAP (Multidimensional Online Analytical Processing): It is a type

of database technology that stores data in a multidimensional cube
format. It allows for efficient analysis of large volumes of data and
enables complex calculations, aggregations, and slicing/dicing
operations for analytical purposes.

iv. Dendrograms require: 1) Distance or similarity measures between

objects, 2) A method for merging or splitting clusters, 3) A stopping
criterion to determine when to stop the clustering process, and 4) A
visualization method to represent the hierarchical structure of clusters.

v. Advantages of parallel databases include: 1) Increased performance

and scalability through parallel processing, 2) High availability and fault
tolerance, 3) Efficient utilization of hardware resources, and 4) Support
for complex analytical queries involving large datasets.

vi. Uses of data warehouses: 1) Business intelligence and decision-

making, 2) Trend analysis and forecasting, 3) Customer relationship
management (CRM) and market segmentation, 4) Performance
measurement and monitoring, and 5) Data mining and advanced
analytics.

vii. Data mining refers to the process of discovering patterns,

relationships, and insights from large datasets. It involves extracting
valuable information, knowledge, or patterns that were previously
unknown, and using them to make informed decisions or predictions.
Data mining techniques include classification, clustering, regression,
and association rule mining.

Q.2 Answer the following (any 3): (4*3=12M)

i. Parallel query evaluation refers to the execution of database queries
by dividing the workload among multiple processors or nodes. It aims to
improve query performance by processing data in parallel, reducing
response times. Parallelism can be achieved through techniques such as
parallel query execution plans, parallel data processing, and parallel
data distribution.
ii. Data warehousing is needed to address the limitations of traditional
transactional databases when it comes to analyzing large volumes of
data. It provides a centralized repository for storing, integrating, and
managing data from various sources. Data warehouses enable efficient
data retrieval, complex analysis, and reporting, supporting decision-
making processes and business intelligence initiatives.

iii. Online Analytical Processing (OLAP) is a technology used for data

analysis and reporting. It allows users to perform multidimensional
analysis on large volumes of data, providing a flexible and intuitive way
to explore information. OLAP systems provide features like drill-down,
roll-up, slicing, and dicing to analyze data from different perspectives,
enabling users to gain insights and make informed decisions.

iv. Different distance formulas are used to calculate the similarity or

dissimilarity between objects or data points in various data mining and
clustering algorithms. Some commonly used distance formulas include
Euclidean distance, Manhattan distance, Chebyshev distance,
Minkowski distance, and Hamming distance. Each formula has its own
characteristics and is suitable for different types of data and analysis
scenarios.

v. Implementing Object-Relational Database Management Systems

(ORDBMS) introduces new challenges, such as: 1) Data modeling
complexities due to the integration of object-oriented features, 2)
Performance optimization for object-oriented queries and complex
relationships, 3) Incorporating object-oriented programming paradigms
into the database system, 4) Managing the evolution and maintenance
of object schemas, and 5) Ensuring compatibility with existing relational
database systems and standards. Overcoming these challenges requires
careful planning, design, and implementation strategies.

Q.3 Answer the following (any 2) (2*5=10M)

i. Parallel databases and distributed databases are both designed to
handle large amounts of data, but they differ in their approach:
- Parallel Databases: In a parallel database, a single database is divided
across multiple processors or nodes, and each node processes a portion
of the data simultaneously. The nodes communicate and coordinate
their actions to execute queries in parallel, improving performance.
- Distributed Databases: In a distributed database, data is physically
distributed across multiple nodes or sites, which can be geographically
dispersed. Each node maintains its own subset of the data and operates
autonomously. Queries may be executed locally or distributed across
multiple nodes for processing.

ii. The drill down feature in OLAP allows users to navigate from
summarized data to more detailed levels of information. For example,
consider an OLAP cube representing sales data. Initially, the user may
view the total sales for a specific region. By drilling down, the user can
access more detailed information, such as sales by city, and further drill
down to sales by store or individual products.

iii. Complex data types in SQL refer to the ability to store structured or
semi-structured data within a relational database. Examples include:
- Arrays: A collection of values of the same type.
- Structs: A composite data type that groups multiple related fields
together.
- JSON: Support for storing and querying JSON (JavaScript Object
Notation) data.
- XML: Support for storing and manipulating XML (eXtensible Markup
Language) data.
- Spatial data types: Allows for storing and querying geographic or
geometric data.
- User-defined types: The ability to define custom data types based on
specific requirements.

iv. To divide the given dataset into two clusters using the k-means
algorithm, an initial step is to randomly assign two cluster centers, let's
say C1 and C2. Then, each data point is assigned to the cluster whose
center it is closest to. The mean of the data points in each cluster is
computed, and the cluster centers are updated accordingly. This process
is iteratively repeated until the cluster assignments stabilize. The
resulting clusters for the given dataset may vary depending on the
initialization and the distance metric used.

Q.4 Answer the following (any 2) (2*6-12M)

i. Architecture of a Data Warehouse:

A data warehouse typically follows a three-tier architecture, consisting

of the following components:
1. Source Systems: These are the systems that generate and store
operational data, such as transactional databases, CRM systems, or
external data sources. Data is extracted from these source systems and
transformed before being loaded into the data warehouse.

2. Data Warehouse: The data warehouse serves as the central

repository for consolidated and integrated data. It consists of multiple
components:

- Staging Area: It is the initial landing zone where data is extracted

from source systems. Data is stored temporarily in its raw form before
undergoing transformation.

- Data Integration Layer: This layer performs the extraction,

transformation, and loading (ETL) process. Data is cleansed,
standardized, and transformed to match the data warehouse schema.

- Data Storage: This component stores the transformed and structured

data. It typically uses a relational database management system
(RDBMS) or a columnar database to support efficient querying and
analysis.

- Metadata Repository: It stores the metadata, which provides

information about the data in the data warehouse, including its
structure, source, and transformation rules.
3. Business Intelligence (BI) Layer: This layer provides tools and
interfaces for data analysis, reporting, and visualization. Users can
access and interact with the data warehouse through various BI tools,
such as dashboards, ad hoc query tools, or OLAP tools.

Here is a simplified diagram representing the architecture of a data

warehouse:

```
+---------------------+
| Business Intelligence |
| (BI) Layer |
+---------------------+
|
v
+---------------------+
| Data Storage |
| and Processing |
+---------------------+
|
v
+--------------+--------------+---------------------+
| Staging | Data | Metadata |
| Area | Integration | Repository |
+--------------+--------------+---------------------+
|
v
+---------------------+
| Source Systems |
+---------------------+
```

ii. Agglomerative Clustering:

Agglomerative clustering is a hierarchical clustering technique that

starts with each data point as an individual cluster and then iteratively
merges the closest clusters until a stopping criterion is met. It follows
these steps:

1. Initialization: Each data point is initially considered as a separate

cluster.

2. Distance Calculation: The distance between each pair of clusters is

computed based on a chosen distance metric (e.g., Euclidean distance).
3. Merge: The two closest clusters are merged into a single cluster. The
distance between clusters is updated using linkage methods like single
linkage, complete linkage, or average linkage.

4. Repeat: Steps 2 and 3 are repeated until a stopping criterion is met.

This could be a predetermined number of clusters or a specific level of
similarity.

5. Create Dendrogram: A dendrogram is a visual representation of the

clustering process, showing the hierarchy of merged clusters.

Agglomerative clustering is a bottom-up approach, where small clusters

are progressively merged into larger ones. The result is a hierarchical
tree-like structure that can be cut at different levels to obtain clusters of
desired sizes.

iii. Bayesian Classification Technique:

Bayesian classification is a statistical technique used for supervised

classification tasks. It utilizes Bayes' theorem to estimate the probability
of a data point belonging to a particular class based on its observed
features.

The technique involves the following steps:

1. Training: During the training phase, the algorithm learns the
underlying probability distribution of each class and the relationships
between features. It calculates the prior probabilities of each class and
the likelihood of each feature given the class.

2. Prediction: In the prediction phase, the algorithm applies

Bayes' theorem to estimate the posterior probability of each class given

the observed features. It selects the class with the highest posterior
probability as the predicted class for the data point.

Bayesian classification assumes that features are conditionally

independent given the class. Naive Bayes classifier is a popular variant
that makes this assumption, simplifying the calculations. It is widely
used for text classification, spam filtering, and other applications where
probabilistic reasoning is required.

iv. Structured Types and Inheritance in SQL:

Structured types and inheritance are features in SQL that allow for the
definition of custom data types and relationships between them:

1. Structured Types: SQL supports the creation of user-defined

structured types, also known as composite types or object types. These
types can consist of multiple attributes, each with its own data type. For
example, a "Person" structured type may have attributes like "name,"
"age," and "address." Structured types provide a way to encapsulate
related attributes into a single unit.

2. Inheritance: SQL also supports the concept of inheritance between

structured types. Inheritance allows one structured type to inherit
attributes and behavior from another type, forming a hierarchy. For
example, a "Student" structured type can inherit attributes from the
"Person" type, adding additional attributes specific to students.
Inheritance allows for code reuse, data organization, and polymorphism
within the database.

These features provide flexibility and extensibility in data modeling,

allowing developers to define custom data structures and relationships
based on specific requirements.

Q.5 Answer the following (any 2) (2*8=16)

i. Certainly! Here's an example of a distance matrix for a

dataset with five data points (A, B, C, D, E) using the divisive
clustering technique:

A B C D E
A 0 5 3 4 6
B 5 0 2 3 7
C 3 2 0 6 4
D 4 3 6 0 5
E 6 7 4 5 0

In this example, the distance matrix represents the dissimilarities or

distances between each pair of data points. The values in the matrix are
the distances between the respective data points, calculated using a
chosen distance metric (e.g., Euclidean distance, Manhattan distance,
etc.).

For instance, the distance between data point A and data point B is 5,
between A and C is 3, between A and D is 4, and so on.

This distance matrix can be used as input for the divisive clustering
algorithm to perform the clustering process, as explained in the
previous response.

ii. Certainly! Here's an example dataset consisting of four data

points (A, B, C, D):

A: (2, 3)
B: (5, 1)
C: (4, 6)
D: (7, 2)
To create a distance matrix using the Euclidean formula, we calculate
the Euclidean distance between each pair of data points.

The Euclidean distance formula between two points (x1, y1) and (x2, y2)
is:

Distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)

Using this formula, we can calculate the distance between each pair of
points:

Distance between A and B:

sqrt((5 - 2)^2 + (1 - 3)^2) = sqrt(9 + 4) = sqrt(13) ≈ 3.61

Distance between A and C:

sqrt((4 - 2)^2 + (6 - 3)^2) = sqrt(4 + 9) = sqrt(13) ≈ 3.61

Distance between A and D:

sqrt((7 - 2)^2 + (2 - 3)^2) = sqrt(25 + 1) = sqrt(26) ≈ 5.10

Distance between B and C:

sqrt((4 - 5)^2 + (6 - 1)^2) = sqrt(1 + 25) = sqrt(26) ≈ 5.10
Distance between B and D:
sqrt((7 - 5)^2 + (2 - 1)^2) = sqrt(4 + 1) = sqrt(5) ≈ 2.24

Distance between C and D:

sqrt((7 - 4)^2 + (2 - 6)^2) = sqrt(9 + 16) = sqrt(25) = 5.00

Now we can arrange these distances in a distance matrix:

This distance matrix represents the Euclidean distances between each

pair of data points. It can be used as input for various data mining or
clustering algorithms that rely on distance-based computations.
iii. The k-medoid algorithm is a clustering algorithm that aims to
partition a dataset into k clusters, where each cluster is represented by
a medoid, which is the most centrally located data point within the
cluster. It is a variation of the k-means algorithm that uses actual data
points as cluster centers instead of the mean.

The k-medoid algorithm follows these steps:

1. Initialization: Randomly select k data points from the dataset as the

initial medoids.

2. Assignment: Assign each data point to the nearest medoid based on a

chosen distance metric, such as Euclidean distance or Manhattan
distance.

3. Medoid Update: For each cluster, calculate the total dissimilarity or

distance between each data point in the cluster and all other data
points. Select the data point with the lowest total dissimilarity as the
new medoid for that cluster.

4. Iteration: Repeat steps 2 and 3 until the medoids no longer change or

a stopping criterion is met. The stopping criterion could be a
predetermined number of iterations or a threshold for the change in
medoids.
The k-medoid algorithm is less sensitive to outliers compared to k-
means since medoids are actual data points from the dataset. However,
it can be computationally expensive, especially for large datasets, as it
requires calculating the dissimilarity between all pairs of data points.

Example:
Let's consider a dataset of customers with their annual income and
spending score. We want to cluster them into two groups using the k-
medoid algorithm.

Dataset:
Customer 1: (Income = $50,000, Spending Score = 60)
Customer 2: (Income = $30,000, Spending Score = 40)
Customer 3: (Income = $70,000, Spending Score = 90)
Customer 4: (Income = $80,000, Spending Score = 70)
Customer 5: (Income = $20,000, Spending Score = 20)

1. Initialization: Randomly select two data points as initial medoids:

Customer 1 and Customer 5.

2. Assignment: Calculate the distance between each data point and the
medoids. Assign each data point to the nearest medoid.

Cluster 1: {Customer 1, Customer 2, Customer 4}

Cluster 2: {Customer 3, Customer 5}

3. Medoid Update: Calculate the total dissimilarity of each data point

within its cluster and select the data point with the lowest total
dissimilarity as the new medoid.

Cluster 1: Customer 2 (new medoid)

Cluster 2: Customer 5 (new medoid)

4. Iteration: Repeat steps 2 and 3 until the medoids no longer change.

After further iterations, the algorithm converges, and the final

clustering result is:

Cluster 1: {Customer 2, Customer 4}

Cluster 2: {Customer 1, Customer 3, Customer 5}

The medoids Customer 2 and Customer 5 represent their respective

clusters, and the other customers are assigned to the nearest medoid
based on distance.

iv. KDD (Knowledge Discovery in Databases) is a process used to extract

useful knowledge or insights from large volumes of data. It involves
several iterative steps:
1. Problem Definition: Clearly define the goal and objectives of the
knowledge discovery process, along with the specific data mining tasks
to be performed.

2. Data Preparation: Gather and preprocess the data required for

analysis. This involves data selection, cleaning, integration, and
transformation to ensure data quality and compatibility.

3. Data Mining: Apply various data mining techniques, such as

clustering, classification, regression, association rules, or anomaly
detection, to extract patterns, relationships

, and insights from the prepared data.

4. Pattern Evaluation: Evaluate and interpret the patterns and models

generated by the data mining algorithms. Assess their validity,
usefulness, and reliability in addressing the defined problem.

5. Knowledge Presentation: Present the discovered knowledge and

insights in a meaningful and understandable format to stakeholders.
This may involve visualization, reports, dashboards, or interactive tools
for decision-making.
6. Knowledge Utilization: Apply the discovered knowledge and insights
to make informed decisions, develop strategies, or improve processes in
the relevant domain.

The KDD process is iterative, and feedback from stakeholders and

domain experts is crucial at each step. It involves a combination of
domain knowledge, statistical analysis, machine learning, and data
visualization techniques to uncover hidden patterns and gain valuable
insights from data.

2012 10 25 Ste Maximo For Service Providers
No ratings yet
2012 10 25 Ste Maximo For Service Providers
37 pages
SE130351 - NgoNhat Thien - DBW301 - Test2
No ratings yet
SE130351 - NgoNhat Thien - DBW301 - Test2
4 pages
Unisys ClearPath MCP Software Product Catalog
No ratings yet
Unisys ClearPath MCP Software Product Catalog
410 pages
Data Mining
No ratings yet
Data Mining
3 pages
List Data Warehouse Models With Example
No ratings yet
List Data Warehouse Models With Example
19 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Recent Trends in IT
No ratings yet
Recent Trends in IT
7 pages
Answers
No ratings yet
Answers
4 pages
Data Mining ---------1.
No ratings yet
Data Mining ---------1.
34 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
dwh
No ratings yet
dwh
34 pages
??? ????????? ???
No ratings yet
??? ????????? ???
21 pages
Ads Ise 2
No ratings yet
Ads Ise 2
11 pages
Ds Assign
No ratings yet
Ds Assign
6 pages
Data Mining - Assignment
No ratings yet
Data Mining - Assignment
15 pages
DWDM IT-32 DATAWAREHOUSING & DATAMINING
No ratings yet
DWDM IT-32 DATAWAREHOUSING & DATAMINING
9 pages
Datawarehouse and Data Mining Final Notes
No ratings yet
Datawarehouse and Data Mining Final Notes
9 pages
Lecture 13
No ratings yet
Lecture 13
17 pages
Cat Data Mining
No ratings yet
Cat Data Mining
4 pages
DWM QB ANSWERS
No ratings yet
DWM QB ANSWERS
14 pages
Unit-I Part II Erp
No ratings yet
Unit-I Part II Erp
60 pages
DM Unit2(Part1)
No ratings yet
DM Unit2(Part1)
19 pages
100 Important Questions with Solutions for Data Warehousing & Data Mining (BCS058)
No ratings yet
100 Important Questions with Solutions for Data Warehousing & Data Mining (BCS058)
119 pages
Recent Trends in IT Q&A SEM VI 2022-23
100% (1)
Recent Trends in IT Q&A SEM VI 2022-23
17 pages
dwm 2
No ratings yet
dwm 2
31 pages
Data Mining and Data Warehouse: Raju - Qis@yahoo - Co.in Praneeth - Grp@yahoo - Co.in
No ratings yet
Data Mining and Data Warehouse: Raju - Qis@yahoo - Co.in Praneeth - Grp@yahoo - Co.in
8 pages
Nosql Datawarehouse
No ratings yet
Nosql Datawarehouse
11 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
70 pages
CTEVT Data mining_solution 2079
No ratings yet
CTEVT Data mining_solution 2079
19 pages
DMBI Summer 23
No ratings yet
DMBI Summer 23
33 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
comp 414 revision
No ratings yet
comp 414 revision
9 pages
Question With Answer
No ratings yet
Question With Answer
22 pages
Data Mining v3
No ratings yet
Data Mining v3
54 pages
Data Mining Display
No ratings yet
Data Mining Display
20 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
Adbms Finals Reviewer
No ratings yet
Adbms Finals Reviewer
3 pages
BigQuery
No ratings yet
BigQuery
8 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
Data Mining Assignment
0% (1)
Data Mining Assignment
11 pages
SQL_FULL_NOTES
No ratings yet
SQL_FULL_NOTES
17 pages
Advanced Database Concepts
No ratings yet
Advanced Database Concepts
7 pages
Sheet 1 Solution1
No ratings yet
Sheet 1 Solution1
4 pages
F. Designing Databases
No ratings yet
F. Designing Databases
2 pages
حل اسئلة استاذ علاء
No ratings yet
حل اسئلة استاذ علاء
10 pages
Data Warehousing and Data Mining 10 Marks
No ratings yet
Data Warehousing and Data Mining 10 Marks
26 pages
Unit IV Data Mining
No ratings yet
Unit IV Data Mining
65 pages
Chapter 13 - Data Warehousing
No ratings yet
Chapter 13 - Data Warehousing
31 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
Data Mining Important
No ratings yet
Data Mining Important
15 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Question Bank For DMDW
No ratings yet
Question Bank For DMDW
10 pages
2.data Warehouse and OLAP
No ratings yet
2.data Warehouse and OLAP
14 pages
Unit 1
No ratings yet
Unit 1
60 pages
Unit-5 dm
No ratings yet
Unit-5 dm
18 pages
Module-1
No ratings yet
Module-1
78 pages
Advance Database System
No ratings yet
Advance Database System
8 pages
Advance Database Concepts
No ratings yet
Advance Database Concepts
23 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Database Management System
From Everand
Database Management System
Knowledge Flow
No ratings yet
Sdds050a Ai Journal
No ratings yet
Sdds050a Ai Journal
11 pages
Sdds004a Ai Journal
No ratings yet
Sdds004a Ai Journal
21 pages
AI
No ratings yet
AI
15 pages
B3 G4 Project
No ratings yet
B3 G4 Project
81 pages
Automated Trading System
No ratings yet
Automated Trading System
10 pages
50 Kash Sharma DSP
No ratings yet
50 Kash Sharma DSP
39 pages
AI Art + AR + Real-Time Inpainting and
No ratings yet
AI Art + AR + Real-Time Inpainting and
26 pages
01-Lecture 1
No ratings yet
01-Lecture 1
29 pages
BIM Presentation
No ratings yet
BIM Presentation
14 pages
[Ebooks PDF] download Data cleaning First Edition Association For Computing Machinery. full chapters
No ratings yet
[Ebooks PDF] download Data cleaning First Edition Association For Computing Machinery. full chapters
38 pages
Salesforce - Pass4sure - Certified Business Analyst - Free.draindumps.2022 Sep 05.by - Oscar.122q.vce
No ratings yet
Salesforce - Pass4sure - Certified Business Analyst - Free.draindumps.2022 Sep 05.by - Oscar.122q.vce
12 pages
Compass Brochure PDF
No ratings yet
Compass Brochure PDF
4 pages
How To Configure Your SAP GUI HTML For Web Access: Follow RSS Feed Like
No ratings yet
How To Configure Your SAP GUI HTML For Web Access: Follow RSS Feed Like
16 pages
Introduction ERP
No ratings yet
Introduction ERP
6 pages
Cloud Data Warehouse Benchmark
No ratings yet
Cloud Data Warehouse Benchmark
11 pages
Introduction To Computer
No ratings yet
Introduction To Computer
88 pages
Operating System Imp Notes
No ratings yet
Operating System Imp Notes
191 pages
Answer-key-Computer-Class-7-Ethics-and-Safety-Measures-in-Computing
No ratings yet
Answer-key-Computer-Class-7-Ethics-and-Safety-Measures-in-Computing
5 pages
Interview Question
No ratings yet
Interview Question
15 pages
NOSQL
No ratings yet
NOSQL
25 pages
J6-Perancangan Ui-Ux Pada Aplikasi Mobile Penjualan Di 3R Stationary Menggunakan Metode Design Sprint
No ratings yet
J6-Perancangan Ui-Ux Pada Aplikasi Mobile Penjualan Di 3R Stationary Menggunakan Metode Design Sprint
12 pages
Logcat
No ratings yet
Logcat
7 pages
S4HANA System Conversion Pre-Check Setting Up Customer Vendor Integration
No ratings yet
S4HANA System Conversion Pre-Check Setting Up Customer Vendor Integration
26 pages
(Ebook) Faster Smarter Beginning Programming by Jim Buyens ISBN 9780735617803, 9788120322462, 0735617805, 8120322460 All Chapters Instant Download
100% (1)
(Ebook) Faster Smarter Beginning Programming by Jim Buyens ISBN 9780735617803, 9788120322462, 0735617805, 8120322460 All Chapters Instant Download
81 pages
Hadoop YARN Architecture
No ratings yet
Hadoop YARN Architecture
5 pages
Operating System Lab Manulas For TE IT PUNE 2010
100% (3)
Operating System Lab Manulas For TE IT PUNE 2010
75 pages
Ranjini S Rao
No ratings yet
Ranjini S Rao
2 pages
A Smart Manufacturing Service System Based On Edge Computing, Fog Computing and Cloud Computing
No ratings yet
A Smart Manufacturing Service System Based On Edge Computing, Fog Computing and Cloud Computing
9 pages
Practical-2 Date: / /: A. Download Sqlyog
No ratings yet
Practical-2 Date: / /: A. Download Sqlyog
11 pages
GIS Data Management_Presentation
No ratings yet
GIS Data Management_Presentation
16 pages
Continuous Software Engineering and Beyond: Trends and Challenges
No ratings yet
Continuous Software Engineering and Beyond: Trends and Challenges
9 pages
First Year Harmony - Thomas Tapper, 1908
No ratings yet
First Year Harmony - Thomas Tapper, 1908
179 pages
PDF Ejercicios de Logica Cableada en Cade Simu Compress
No ratings yet
PDF Ejercicios de Logica Cableada en Cade Simu Compress
13 pages
Quiz - Software Engineering 2021
No ratings yet
Quiz - Software Engineering 2021
19 pages
CV Bilal Khan
No ratings yet
CV Bilal Khan
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Adbms

Uploaded by

Adbms

Uploaded by

Q.

1 Attempt the following (any 5) (25-10M)

ii. Table Inheritance: It is a concept in database design where one table

iii. MOLAP (Multidimensional Online Analytical Processing): It is a type

iv. Dendrograms require: 1) Distance or similarity measures between

v. Advantages of parallel databases include: 1) Increased performance

vi. Uses of data warehouses: 1) Business intelligence and decision-

vii. Data mining refers to the process of discovering patterns,

Q.2 Answer the following (any 3): (4*3=12M)

iii. Online Analytical Processing (OLAP) is a technology used for data

iv. Different distance formulas are used to calculate the similarity or

v. Implementing Object-Relational Database Management Systems

Q.3 Answer the following (any 2) (2*5=10M)

Q.4 Answer the following (any 2) (2*6-12M)

A data warehouse typically follows a three-tier architecture, consisting

2. Data Warehouse: The data warehouse serves as the central

- Staging Area: It is the initial landing zone where data is extracted

- Data Integration Layer: This layer performs the extraction,

- Data Storage: This component stores the transformed and structured

- Metadata Repository: It stores the metadata, which provides

Here is a simplified diagram representing the architecture of a data

ii. Agglomerative Clustering:

Agglomerative clustering is a hierarchical clustering technique that

1. Initialization: Each data point is initially considered as a separate

2. Distance Calculation: The distance between each pair of clusters is

4. Repeat: Steps 2 and 3 are repeated until a stopping criterion is met.

5. Create Dendrogram: A dendrogram is a visual representation of the

Agglomerative clustering is a bottom-up approach, where small clusters

iii. Bayesian Classification Technique:

Bayesian classification is a statistical technique used for supervised

The technique involves the following steps:

2. Prediction: In the prediction phase, the algorithm applies

Bayes' theorem to estimate the posterior probability of each class given

Bayesian classification assumes that features are conditionally

iv. Structured Types and Inheritance in SQL:

1. Structured Types: SQL supports the creation of user-defined

2. Inheritance: SQL also supports the concept of inheritance between

These features provide flexibility and extensibility in data modeling,

Q.5 Answer the following (any 2) (2*8=16)

i. Certainly! Here's an example of a distance matrix for a

In this example, the distance matrix represents the dissimilarities or

ii. Certainly! Here's an example dataset consisting of four data

Distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)

Distance between A and B:

Distance between A and C:

Distance between A and D:

Distance between B and C:

Distance between C and D:

Now we can arrange these distances in a distance matrix:

This distance matrix represents the Euclidean distances between each

The k-medoid algorithm follows these steps:

1. Initialization: Randomly select k data points from the dataset as the

2. Assignment: Assign each data point to the nearest medoid based on a

3. Medoid Update: For each cluster, calculate the total dissimilarity or

4. Iteration: Repeat steps 2 and 3 until the medoids no longer change or

1. Initialization: Randomly select two data points as initial medoids:

Cluster 1: {Customer 1, Customer 2, Customer 4}

3. Medoid Update: Calculate the total dissimilarity of each data point

Cluster 1: Customer 2 (new medoid)

4. Iteration: Repeat steps 2 and 3 until the medoids no longer change.

After further iterations, the algorithm converges, and the final

Cluster 1: {Customer 2, Customer 4}

The medoids Customer 2 and Customer 5 represent their respective

iv. KDD (Knowledge Discovery in Databases) is a process used to extract

2. Data Preparation: Gather and preprocess the data required for

3. Data Mining: Apply various data mining techniques, such as

, and insights from the prepared data.

4. Pattern Evaluation: Evaluate and interpret the patterns and models

5. Knowledge Presentation: Present the discovered knowledge and

The KDD process is iterative, and feedback from stakeholders and

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.