0% found this document useful (0 votes)
5 views

Unit 5 Notes DWM

Market Basket Analysis is a technique used to analyze customer purchasing habits by identifying associations between items bought together, which helps retailers optimize marketing strategies and shelf placement. The Apriori algorithm is utilized to find frequent itemsets through an iterative process of joining and pruning itemsets based on minimum support thresholds. Cluster Analysis groups data into clusters based on similarity, allowing for insights into data patterns across various applications such as marketing and biology.

Uploaded by

Gajanan Markad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Unit 5 Notes DWM

Market Basket Analysis is a technique used to analyze customer purchasing habits by identifying associations between items bought together, which helps retailers optimize marketing strategies and shelf placement. The Apriori algorithm is utilized to find frequent itemsets through an iterative process of joining and pruning itemsets based on minimum support thresholds. Cluster Analysis groups data into clusters based on similarity, allowing for insights into data patterns across various applications such as marketing and biology.

Uploaded by

Gajanan Markad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Unit 5: Mining Frequent Patterns and Cluster Analysis

Q.1] Describe Market Basket Analysis


Market Basket Analysis:
 A typical example of frequent itemset mining is market basket analysis. This
process analyzes customer buying habits by finding associations between the
different items that customers place in their “shopping baskets”.
 The discovery of these associations can help retailers to develop marketing
strategies by gaining insight into which items are frequently purchased together
by customers.
 Example: If customers are buying milk, how likely they also buy bread (and
what kind of bread) on the same trip to the supermarket?
 This information can lead to increase in sales by helping retailers do selective
marketing and plan their shelf space.
Q.] Explain Market Basket Analysis with example.
Market Basket Analysis:
Market Basket Analysis is a modelling technique based upon the theory that if
you buy a certain group of items, you are more (or less) likely to buy another
group of items.
Ex: (Computer → Antivirus)
Market Basket Analysis is one of the key techniques used by large retailers to
uncover associations between items.
It works by looking for combinations of items that occur together frequently in
transactions. i.e it allows retailers to identify relationships between the items
that people buy.
Market basket analysis can be used in deciding the location and promotion of
goods inside a store.
Market Basket Analysis creates If-Then scenario rules, for example, if item A is
purchased then item B is likely to be purchased.
Example: How is it used?
As a first step, market basket analysis can be used in deciding the location and
promotion of goods inside a store.
If, it has been observed, purchasers of Barbie dolls have been more likely to buy
candy, then high-margin candy can be placed near to the Barbie doll display.
Customers who would have bought candy with their Barbie dolls had they
thought of it will now be suitably tempted.
But this is only the first level of analysis. Differential market basket analysis
can find interesting results and can also eliminate the problem of a potentially
high volume of trivial results.
In differential analysis, compare results between different stores, between
customers in different demographic groups, between different days of the week,
different seasons of the year, etc.
If we observe that a rule holds in one store, but not in any other (or does not
hold in one store, but holds in all others), then we know that there is something
interesting about that store.
Investigating such differences may yield useful insights which will improve
company sales.
Market Basket Analysis used for:
1. Analysis of credit card purchases.
2. Analysis of telephone calling patterns.
3. Identification of fraudulent medical insurance claims.
4. Analysis of telecom service purchases.
Q.] Explain Apriori algorithm with examples.
Apriori Algorithm:
To find frequent itemsets Apriori algorithm is used, because it uses prior
knowledge of frequent itemset properties. We apply an iterative approach or
level-wise search where k-frequent itemsets are used to find k+1 itemsets.
This algorithm uses two steps “join” and “prune” (prune means delete) to
reduce the search space.
It is an iterative approach to discover the most frequent itemsets.
Apriori says:
The probability that item x is not frequent is if:
• P(x) is less than minimum support threshold, and then x is not frequent.
The steps followed in the Apriori Algorithm of data mining are:
1. Join Step: This step generates (K+1) itemset from K-itemsets by joining each
item with itself.
2. Prune Step: This step scans the count of each item in the database. If the
candidate item does not meet minimum support, then it is denoted as infrequent
and thus it is removed. This step is performed to reduce the size of the candidate
itemsets.
Example Apriori Method: (consider any other relevant example)
Consider the given database D and minimum support 50%. Apply the Apriori
algorithm and find frequent itemsets with confidence greater than 70%

Solution:
Calculate min_supp=0.5*4=2 (support count is 2)
(0.5: given minimum support in problem, 4: total transactions in database D)
Step 1: Generate candidate list C1 from D
C1=

Step 2: Scan D for count of each candidate and find the support.
C1=

Step 3: Compare candidate support count with min_supp (i.e. 2)


(prune or remove the itemset which have support count less than min_supp i.e.
2)
L1=

Step 4: Generate candidate list C2 from L1 (k-itemsets converted to k+1


itemsets)
C2=
Step 5: Scan D for count of each candidate and find the support.
C2=

Step 6: Compare candidate support count with min_supp (i.e. 2)


(prune or remove the itemset which have support count less than min_supp i.e.
2)
L2=

Step 7: Generate candidate list C3 from L2


(k-itemsets converted to k+1 itemsets)
C3=

Step 8: Scan D for count of each candidate and find the support.
C3=

Step 9: Compare candidate support count with min_supp (i.e. 2)


(prune or remove the itemset which have support count less than min_supp i.e.
2)
L3=

Here, {2,3,5} is the frequent itemset found by using Apriori method.


Q.] Given the following data, apply the Apriori algorithm. Min support = 50
in Database D.

Q.] Define Frequent Item set.


Q.] Explain frequent item sets mining methods. OR Explain finding frequent
item sets using candidate generation
Frequent Itemset Mining:
Finding frequent patterns, associations, correlations, or causal structures among
sets of items or objects in transaction databases, relational databases, and other
information repositories.
There are Several algorithms for generating rules have been used. Like Apriori
Algorithm and FP Growth algorithm for generating the frequent itemsets.
Apriori algorithm finds interesting association along with a huge set of data
items. The association rule mining problem was firstly given by Apriori.
Q.] Explain steps for generating association rule from frequent item sets.
1. Generating itemsets from a list of items:
First step in generation of association rules is to get all the frequent itemsets on
which binary partitions can be performed to get the antecedent and the
consequent. For example, if there are 6 items {Bread, Butter, Egg, Milk,
Notebook, Toothbrush} on all the transactions combined, itemsets will look like
{Bread}, {Butter}, {Bread, Notebook}, {Milk, Toothbrush}, {Milk, Egg,
Vegetables} etc. Size of an itemset can vary from one to the total number of
items that we have. Now, we seek only frequent itemsets from this and not all so
as to put a check on the number of total itemsets generated.
2. Generating all possible rules from the frequent itemsets:
Once the frequent itemsets are generated, identifying rules out of them is
comparatively less taxing. Rules are formed by binary partition of each itemset.
If {Bread,Egg,Milk,Butter} is the frequent itemset, candidate rules will look
like:
(Egg, Milk, Butter → Bread), (Bread, Milk, Butter → Egg), (Bread, Egg →
Milk, Butter), (Egg, Milk → Bread, Butter), (Butter→ Bread, Egg, Milk)
From a list of all possible candidate rules, we aim to identify rules that fall
above a minimum confidence level (minconf). Just like the anti-monotone
property of support, confidence of rules generated from the same itemset also
follows an anti-monotone property. It is anti-monotone with respect to the
number of elements in consequent.
This means that confidence of (A,B,C→ D) ≥ (B,C → A,D) ≥ (C → A,B,D). To
remind, confidence for {X → Y} = support of {X,Y}/support of {X}
As we know that support of all the rules generated from same itemset remains
the same and difference occurs only in the denominator calculation of
confidence. As number of items in X decrease, support{X} increases (as follows
from the anti-monotone property of support) and hence the confidence value
decreases.
An intuitive explanation for the above will be as follows. Consider F1 and F2:
F1 = fraction of transactions having (butter) also having (egg, milk, bread)
There will be many transactions having butter and all three of egg, milk and
bread will be able to find place only in a small number of those.
F2 = fraction of transactions having (milk, butter, bread) also having (egg)
There will only be a handful of transactions having all three of milk, butter and
bread (as compared to having just butter) and there will be high chances of
having egg on those.
So, it will be observed that F1 < F2. Using this property of confidence, pruning
is done in a similar way as was done while looking for frequent itemsets. It is
illustrated in the figure below.

We start with a frequent itemset {a,b,c,d} and start forming rules with just one
consequent. Remove the rules failing to satisfy the minconf condition. Now,
start forming rules using a combination of consequents from the remaining
ones. Keep repeating until only one item is left on antecedent. This process has
to be done for all frequent itemsets.
Q.] Describe association rules in mining frequent patterns and cluster analysis.
Q.] Explain Apriori algorithms for frequent itemset using candidate generation.
Q.] Describe Cluster Analysis
Cluster Analysis:
Clustering is a data mining technique used to place data elements into related
groups without advance knowledge.
Clustering is the process of grouping a set of data objects into multiple groups
or clusters so that objects within a cluster have high similarity, but are very
dissimilar to objects in other clusters.
Dissimilarities and similarities are assessed based on the attribute values
describing the objects and often involve distance measures.
Cluster analysis or simply clustering is the process of partitioning a set of data
objects (or observations) into subsets.
Each subset is a cluster, such that objects in a cluster are similar to one another,
yet dissimilar to objects in other clusters. The set of clusters resulting from a
cluster analysis can be referred to as a clustering.

Requirements of Cluster Analysis:


 Scalability: Need highly scalable clustering algorithms to deal with large
databases.
 Ability to deal with different kinds of attributes: Algorithms should be
capable to be applied on any kind of data such as interval-based (numerical)
data, categorical, and binary data.
 Discovery of clusters with attribute shape: The clustering algorithm should be
capable of detecting clusters of arbitrary shape. They should not be bounded to
only distance measures that tend to find spherical cluster of small sizes.
 High dimensionality: the clustering algorithm should not only be able to
handle low-dimensional data but also the high dimensional space.
 Ability to deal with noisy data: Databases contain noisy, missing or erroneous
data. Some algorithms are sensitive to such data and may lead to poor quality
clusters.
 Interpretability: The clustering results should be interpretable,
comprehensible, and usable.
Example: K-means (any relevant example like this)
k-means algorithm to create 3 clusters for given set of values:
{2,3,6,8,9,12,15,18,22}

Answer:
Set of values: 2,3,6,8,9,12,15,18,22
1. Break given set of values randomly in to 3 clusters and calculate the mean
value.
K1: 2,8,15 mean=8.3
K2: 3,9,18 mean=10
K3: 6,12,22 mean=13.3
2. Reassign the values to clusters as per the mean calculated and calculate the
mean again.
K1: 2,3,6,8,9 mean=5.6
K2: mean=0
K3: 12,15,18,22 mean=16.75
3. Reassign the values to clusters as per the mean calculated and calculate the
mean again.
K1: 3,6,8,9 mean=6.5
K2: 2 mean=2
K3: 12,15,18,22 mean=16.75
4. Reassign the values to clusters as per the mean calculated and calculate the
mean again.
K1: 6,8,9 mean=7.6
K2: 2,3 mean=2.5
K3: 12,15,18,22 mean=16.75
5. Reassign the values to clusters as per the mean calculated and calculate the
mean again.
K1: 6,8,9 mean=7.6
K2: 2,3 mean=2.5
K3: 12,15,18,22 mean=16.75

6. Mean of all three clusters remain same.


Final 3 clusters are {6,8,9}, {2,3}, {12,15,18,22}

Applications of Clustering:
1. Marketing
2. Biology
3. Libraries
4. Insurance
5. City-planning
6. WWW

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy