0% found this document useful (0 votes)

5 views

Unit 5 Notes DWM

Market Basket Analysis is a technique used to analyze customer purchasing habits by identifying associations between items bought together, which helps retailers optimize marketing strategies and shelf placement. The Apriori algorithm is utilized to find frequent itemsets through an iterative process of joining and pruning itemsets based on minimum support thresholds. Cluster Analysis groups data into clusters based on similarity, allowing for insights into data patterns across various applications such as marketing and biology.

Uploaded by

Gajanan Markad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Unit 5 Notes DWM

Uploaded by

Gajanan Markad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Unit 5: Mining Frequent Patterns and Cluster Analysis

Q.1] Describe Market Basket Analysis

Market Basket Analysis:
 A typical example of frequent itemset mining is market basket analysis. This
process analyzes customer buying habits by finding associations between the
different items that customers place in their “shopping baskets”.
 The discovery of these associations can help retailers to develop marketing
strategies by gaining insight into which items are frequently purchased together
by customers.
 Example: If customers are buying milk, how likely they also buy bread (and
what kind of bread) on the same trip to the supermarket?
 This information can lead to increase in sales by helping retailers do selective
marketing and plan their shelf space.
Q.] Explain Market Basket Analysis with example.
Market Basket Analysis:
Market Basket Analysis is a modelling technique based upon the theory that if
you buy a certain group of items, you are more (or less) likely to buy another
group of items.
Ex: (Computer → Antivirus)
Market Basket Analysis is one of the key techniques used by large retailers to
uncover associations between items.
It works by looking for combinations of items that occur together frequently in
transactions. i.e it allows retailers to identify relationships between the items
that people buy.
Market basket analysis can be used in deciding the location and promotion of
goods inside a store.
Market Basket Analysis creates If-Then scenario rules, for example, if item A is
purchased then item B is likely to be purchased.
Example: How is it used?
As a first step, market basket analysis can be used in deciding the location and
promotion of goods inside a store.
If, it has been observed, purchasers of Barbie dolls have been more likely to buy
candy, then high-margin candy can be placed near to the Barbie doll display.
Customers who would have bought candy with their Barbie dolls had they
thought of it will now be suitably tempted.
But this is only the first level of analysis. Differential market basket analysis
can find interesting results and can also eliminate the problem of a potentially
high volume of trivial results.
In differential analysis, compare results between different stores, between
customers in different demographic groups, between different days of the week,
different seasons of the year, etc.
If we observe that a rule holds in one store, but not in any other (or does not
hold in one store, but holds in all others), then we know that there is something
interesting about that store.
Investigating such differences may yield useful insights which will improve
company sales.
Market Basket Analysis used for:
1. Analysis of credit card purchases.
2. Analysis of telephone calling patterns.
3. Identification of fraudulent medical insurance claims.
4. Analysis of telecom service purchases.
Q.] Explain Apriori algorithm with examples.
Apriori Algorithm:
To find frequent itemsets Apriori algorithm is used, because it uses prior
knowledge of frequent itemset properties. We apply an iterative approach or
level-wise search where k-frequent itemsets are used to find k+1 itemsets.
This algorithm uses two steps “join” and “prune” (prune means delete) to
reduce the search space.
It is an iterative approach to discover the most frequent itemsets.
Apriori says:
The probability that item x is not frequent is if:
• P(x) is less than minimum support threshold, and then x is not frequent.
The steps followed in the Apriori Algorithm of data mining are:
1. Join Step: This step generates (K+1) itemset from K-itemsets by joining each
item with itself.
2. Prune Step: This step scans the count of each item in the database. If the
candidate item does not meet minimum support, then it is denoted as infrequent
and thus it is removed. This step is performed to reduce the size of the candidate
itemsets.
Example Apriori Method: (consider any other relevant example)
Consider the given database D and minimum support 50%. Apply the Apriori
algorithm and find frequent itemsets with confidence greater than 70%

Solution:
Calculate min_supp=0.5*4=2 (support count is 2)
(0.5: given minimum support in problem, 4: total transactions in database D)
Step 1: Generate candidate list C1 from D
C1=

Step 2: Scan D for count of each candidate and find the support.
C1=

Step 3: Compare candidate support count with min_supp (i.e. 2)

(prune or remove the itemset which have support count less than min_supp i.e.
2)
L1=

Step 4: Generate candidate list C2 from L1 (k-itemsets converted to k+1

itemsets)
C2=
Step 5: Scan D for count of each candidate and find the support.
C2=

Step 6: Compare candidate support count with min_supp (i.e. 2)

(prune or remove the itemset which have support count less than min_supp i.e.
2)
L2=

Step 7: Generate candidate list C3 from L2

(k-itemsets converted to k+1 itemsets)
C3=

Step 8: Scan D for count of each candidate and find the support.
C3=

Step 9: Compare candidate support count with min_supp (i.e. 2)

(prune or remove the itemset which have support count less than min_supp i.e.
2)
L3=

Here, {2,3,5} is the frequent itemset found by using Apriori method.

Q.] Given the following data, apply the Apriori algorithm. Min support = 50
in Database D.

Q.] Define Frequent Item set.

Q.] Explain frequent item sets mining methods. OR Explain finding frequent
item sets using candidate generation
Frequent Itemset Mining:
Finding frequent patterns, associations, correlations, or causal structures among
sets of items or objects in transaction databases, relational databases, and other
information repositories.
There are Several algorithms for generating rules have been used. Like Apriori
Algorithm and FP Growth algorithm for generating the frequent itemsets.
Apriori algorithm finds interesting association along with a huge set of data
items. The association rule mining problem was firstly given by Apriori.
Q.] Explain steps for generating association rule from frequent item sets.
1. Generating itemsets from a list of items:
First step in generation of association rules is to get all the frequent itemsets on
which binary partitions can be performed to get the antecedent and the
consequent. For example, if there are 6 items {Bread, Butter, Egg, Milk,
Notebook, Toothbrush} on all the transactions combined, itemsets will look like
{Bread}, {Butter}, {Bread, Notebook}, {Milk, Toothbrush}, {Milk, Egg,
Vegetables} etc. Size of an itemset can vary from one to the total number of
items that we have. Now, we seek only frequent itemsets from this and not all so
as to put a check on the number of total itemsets generated.
2. Generating all possible rules from the frequent itemsets:
Once the frequent itemsets are generated, identifying rules out of them is
comparatively less taxing. Rules are formed by binary partition of each itemset.
If {Bread,Egg,Milk,Butter} is the frequent itemset, candidate rules will look
like:
(Egg, Milk, Butter → Bread), (Bread, Milk, Butter → Egg), (Bread, Egg →
Milk, Butter), (Egg, Milk → Bread, Butter), (Butter→ Bread, Egg, Milk)
From a list of all possible candidate rules, we aim to identify rules that fall
above a minimum confidence level (minconf). Just like the anti-monotone
property of support, confidence of rules generated from the same itemset also
follows an anti-monotone property. It is anti-monotone with respect to the
number of elements in consequent.
This means that confidence of (A,B,C→ D) ≥ (B,C → A,D) ≥ (C → A,B,D). To
remind, confidence for {X → Y} = support of {X,Y}/support of {X}
As we know that support of all the rules generated from same itemset remains
the same and difference occurs only in the denominator calculation of
confidence. As number of items in X decrease, support{X} increases (as follows
from the anti-monotone property of support) and hence the confidence value
decreases.
An intuitive explanation for the above will be as follows. Consider F1 and F2:
F1 = fraction of transactions having (butter) also having (egg, milk, bread)
There will be many transactions having butter and all three of egg, milk and
bread will be able to find place only in a small number of those.
F2 = fraction of transactions having (milk, butter, bread) also having (egg)
There will only be a handful of transactions having all three of milk, butter and
bread (as compared to having just butter) and there will be high chances of
having egg on those.
So, it will be observed that F1 < F2. Using this property of confidence, pruning
is done in a similar way as was done while looking for frequent itemsets. It is
illustrated in the figure below.

We start with a frequent itemset {a,b,c,d} and start forming rules with just one
consequent. Remove the rules failing to satisfy the minconf condition. Now,
start forming rules using a combination of consequents from the remaining
ones. Keep repeating until only one item is left on antecedent. This process has
to be done for all frequent itemsets.
Q.] Describe association rules in mining frequent patterns and cluster analysis.
Q.] Explain Apriori algorithms for frequent itemset using candidate generation.
Q.] Describe Cluster Analysis
Cluster Analysis:
Clustering is a data mining technique used to place data elements into related
groups without advance knowledge.
Clustering is the process of grouping a set of data objects into multiple groups
or clusters so that objects within a cluster have high similarity, but are very
dissimilar to objects in other clusters.
Dissimilarities and similarities are assessed based on the attribute values
describing the objects and often involve distance measures.
Cluster analysis or simply clustering is the process of partitioning a set of data
objects (or observations) into subsets.
Each subset is a cluster, such that objects in a cluster are similar to one another,
yet dissimilar to objects in other clusters. The set of clusters resulting from a
cluster analysis can be referred to as a clustering.

Requirements of Cluster Analysis:

 Scalability: Need highly scalable clustering algorithms to deal with large
databases.
 Ability to deal with different kinds of attributes: Algorithms should be
capable to be applied on any kind of data such as interval-based (numerical)
data, categorical, and binary data.
 Discovery of clusters with attribute shape: The clustering algorithm should be
capable of detecting clusters of arbitrary shape. They should not be bounded to
only distance measures that tend to find spherical cluster of small sizes.
 High dimensionality: the clustering algorithm should not only be able to
handle low-dimensional data but also the high dimensional space.
 Ability to deal with noisy data: Databases contain noisy, missing or erroneous
data. Some algorithms are sensitive to such data and may lead to poor quality
clusters.
 Interpretability: The clustering results should be interpretable,
comprehensible, and usable.
Example: K-means (any relevant example like this)
k-means algorithm to create 3 clusters for given set of values:
{2,3,6,8,9,12,15,18,22}

Answer:
Set of values: 2,3,6,8,9,12,15,18,22
1. Break given set of values randomly in to 3 clusters and calculate the mean
value.
K1: 2,8,15 mean=8.3
K2: 3,9,18 mean=10
K3: 6,12,22 mean=13.3
2. Reassign the values to clusters as per the mean calculated and calculate the
mean again.
K1: 2,3,6,8,9 mean=5.6
K2: mean=0
K3: 12,15,18,22 mean=16.75
3. Reassign the values to clusters as per the mean calculated and calculate the
mean again.
K1: 3,6,8,9 mean=6.5
K2: 2 mean=2
K3: 12,15,18,22 mean=16.75
4. Reassign the values to clusters as per the mean calculated and calculate the
mean again.
K1: 6,8,9 mean=7.6
K2: 2,3 mean=2.5
K3: 12,15,18,22 mean=16.75
5. Reassign the values to clusters as per the mean calculated and calculate the
mean again.
K1: 6,8,9 mean=7.6
K2: 2,3 mean=2.5
K3: 12,15,18,22 mean=16.75

6. Mean of all three clusters remain same.

Final 3 clusters are {6,8,9}, {2,3}, {12,15,18,22}

Applications of Clustering:
1. Marketing
2. Biology
3. Libraries
4. Insurance
5. City-planning
6. WWW

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
[2025-05-27]-FPM_LECTURE 9-
No ratings yet
[2025-05-27]-FPM_LECTURE 9-
35 pages
DWM-UNIT-4
No ratings yet
DWM-UNIT-4
11 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
DWDM-UNIT-4
No ratings yet
DWDM-UNIT-4
12 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
9
No ratings yet
9
6 pages
Unit 5
No ratings yet
Unit 5
40 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Unit 3 - DM FULL
No ratings yet
Unit 3 - DM FULL
46 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Unit 4
No ratings yet
Unit 4
72 pages
Data Mining frequent patterns
No ratings yet
Data Mining frequent patterns
22 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Association Rules
No ratings yet
Association Rules
24 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Unit-4_Part-1
No ratings yet
Unit-4_Part-1
152 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Chapter 5 Mining Frequent Pattern-DWM (1) (1) (1)
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM (1) (1) (1)
48 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Contents
No ratings yet
Contents
59 pages
DWDM Mid Ii
No ratings yet
DWDM Mid Ii
13 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
667a8d24bb947_ppt
No ratings yet
667a8d24bb947_ppt
24 pages
dm 2
No ratings yet
dm 2
71 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
20 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
Association Rule Mining 2023 (Compatibility Mode)
No ratings yet
Association Rule Mining 2023 (Compatibility Mode)
44 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
DA Unit 4
No ratings yet
DA Unit 4
125 pages
Association Rule Mining
No ratings yet
Association Rule Mining
21 pages
DMDW Chapter 4(Updated)
No ratings yet
DMDW Chapter 4(Updated)
28 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Marketbasket Analysis
No ratings yet
Marketbasket Analysis
28 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet
MAD Program 3
No ratings yet
MAD Program 3
9 pages
Unit_2_Notes_MAD
No ratings yet
Unit_2_Notes_MAD
10 pages
STE 5th Unit Important Questions with answer
No ratings yet
STE 5th Unit Important Questions with answer
9 pages
STE 2ND UNIT Important Questions with answer
No ratings yet
STE 2ND UNIT Important Questions with answer
7 pages
EDU 650 Week 2 Discussion
No ratings yet
EDU 650 Week 2 Discussion
3 pages
3D and MULTI-DIMENSIONAL
No ratings yet
3D and MULTI-DIMENSIONAL
15 pages
Mtech RM & IPR I YEAR I SEM I MID IMP QUS WITH ANS
No ratings yet
Mtech RM & IPR I YEAR I SEM I MID IMP QUS WITH ANS
12 pages
Rujukan Untuk Sistem APA
No ratings yet
Rujukan Untuk Sistem APA
6 pages
P10. Leadership
No ratings yet
P10. Leadership
32 pages
Yared Temesgen CVV
100% (2)
Yared Temesgen CVV
4 pages
English Proficiency Test - PRE-TEST
No ratings yet
English Proficiency Test - PRE-TEST
2 pages
St. Vincent College of Cabuyao
No ratings yet
St. Vincent College of Cabuyao
5 pages
Partituras Gratis Da Musica A Nica Esperanca Cantor Cristao 581 para Orquestra de Cordas Do Classicos Do Gospel Gospel Gratuito Tom Si Bemol Maior
No ratings yet
Partituras Gratis Da Musica A Nica Esperanca Cantor Cristao 581 para Orquestra de Cordas Do Classicos Do Gospel Gospel Gratuito Tom Si Bemol Maior
3 pages
Uniform Warning #2
No ratings yet
Uniform Warning #2
1 page
366 Incinerator Training and Familiarization Record (Rev.01) GOYO 01.03.18
No ratings yet
366 Incinerator Training and Familiarization Record (Rev.01) GOYO 01.03.18
2 pages
Procedure For Spun Pile at Vac 1 (CH 155.55)
No ratings yet
Procedure For Spun Pile at Vac 1 (CH 155.55)
11 pages
Philip K. Bock - The Importance of Erving Goffman To Psychological Anthropology
No ratings yet
Philip K. Bock - The Importance of Erving Goffman To Psychological Anthropology
19 pages
Remainders: Investigate
No ratings yet
Remainders: Investigate
6 pages
Transnational Feminism in 21s
No ratings yet
Transnational Feminism in 21s
173 pages
Understanding Sources of Bias in Diagnostic Accuracy Studies
No ratings yet
Understanding Sources of Bias in Diagnostic Accuracy Studies
8 pages
IGCSE class tests 1 Chapters-1.5-to-1.7
No ratings yet
IGCSE class tests 1 Chapters-1.5-to-1.7
16 pages
Please Take Notice That The Court en Banc Issued A Resolution Dated FEBRUARY 21, 2012, Which Reads As Follows
No ratings yet
Please Take Notice That The Court en Banc Issued A Resolution Dated FEBRUARY 21, 2012, Which Reads As Follows
8 pages
ELIS Faq0616
No ratings yet
ELIS Faq0616
34 pages
Sultan Kudarat State University College of Graduate Studies
No ratings yet
Sultan Kudarat State University College of Graduate Studies
8 pages
Bakeshop Ingredients
No ratings yet
Bakeshop Ingredients
12 pages
Elementary Musi-WPS Office
No ratings yet
Elementary Musi-WPS Office
6 pages
Reading and Listening Introduction To Softwares - Students
No ratings yet
Reading and Listening Introduction To Softwares - Students
2 pages
Date Talk Cheat Pack
100% (1)
Date Talk Cheat Pack
21 pages
Internship Training Report
0% (1)
Internship Training Report
48 pages
STD 10 Unit 2 - Self Management Skills
No ratings yet
STD 10 Unit 2 - Self Management Skills
8 pages
Juujika No Rokunin - Volume 13 Chapter 127 - Read Free Manga Online at Bato.to
No ratings yet
Juujika No Rokunin - Volume 13 Chapter 127 - Read Free Manga Online at Bato.to
1 page
Test of English: 1.-Colorea Los Utiles Escolares Según Se Indica
No ratings yet
Test of English: 1.-Colorea Los Utiles Escolares Según Se Indica
15 pages
Resume Shravani Sorte-1
No ratings yet
Resume Shravani Sorte-1
1 page
Goibibo Ticket
No ratings yet
Goibibo Ticket
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 5 Notes DWM

Uploaded by

Unit 5 Notes DWM

Uploaded by

Unit 5: Mining Frequent Patterns and Cluster Analysis

Q.1] Describe Market Basket Analysis

Step 3: Compare candidate support count with min_supp (i.e. 2)

Step 4: Generate candidate list C2 from L1 (k-itemsets converted to k+1

Step 6: Compare candidate support count with min_supp (i.e. 2)

Step 7: Generate candidate list C3 from L2

Step 9: Compare candidate support count with min_supp (i.e. 2)

Here, {2,3,5} is the frequent itemset found by using Apriori method.

Q.] Define Frequent Item set.

Requirements of Cluster Analysis:

6. Mean of all three clusters remain same.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.