0% found this document useful (0 votes)
124 views

Arm PPT

The document discusses using association rule mining and market basket analysis to generate product recommendations for customers on Bigbasket. It describes how association rule mining can be used to find rules that predict the occurrence of an item based on other co-occurring items in transactions. The key steps involve finding frequent itemsets whose support is above a minimum threshold, and then generating high confidence rules from these itemsets. Examples of applications discussed include generating recommendations for supermarket shelf management by identifying commonly purchased item combinations.

Uploaded by

Rupesh Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views

Arm PPT

The document discusses using association rule mining and market basket analysis to generate product recommendations for customers on Bigbasket. It describes how association rule mining can be used to find rules that predict the occurrence of an item based on other co-occurring items in transactions. The key steps involve finding frequent itemsets whose support is above a minimum threshold, and then generating high confidence rules from these itemsets. Examples of applications discussed include generating recommendations for supermarket shelf management by identifying commonly purchased item combinations.

Uploaded by

Rupesh Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Customer Analytics at Bigbasket

– Product Recommendations
Association Rule Mining
• Given a set of transactions, find rules that will predict the occurrence
of an item based on the occurrences of other items in the transaction

Market-Basket transactions
Example of Association Rules
TID Items
{Diaper} → {Beer},
1 Bread, Milk {Milk, Bread} → {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread} → {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!
Definition: Frequent Itemset
• Itemset TID Items
• A collection of one or more items 1 Bread, Milk
• Example: {Milk, Bread, Diaper}
2 Bread, Diaper, Beer, Eggs
• k-itemset 3 Milk, Diaper, Beer, Coke
• An itemset that contains k items
4 Bread, Milk, Diaper, Beer
• Support count () 5 Bread, Milk, Diaper, Coke
• Frequency of occurrence of an itemset
• E.g. ({Milk, Bread, Diaper}) = 2
• Support
• Fraction of transactions that contain an itemset
• E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
• An itemset whose support is greater than or equal to a minsup threshold
Definition: Association Rule
TID Items
1 Bread, Milk
• Association Rule 2 Bread, Diaper, Beer, Eggs
– An implication expression of the form X → Y, where 3 Milk, Diaper, Beer, Coke
X and Y are itemsets 4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
– Example:
{Milk, Diaper} → {Beer}
Example:
• Rule Evaluation Metrics {Milk , Diaper }  Beer
– Support (s)
◆ Fraction of transactions that contain both X and Y  (Milk, Diaper, Beer ) 2
s= = = 0.4
– Confidence (c) |T| 5
◆ Measures how often items in Y  (Milk, Diaper, Beer ) 2
appear in transactions that c= = = 0.67
 (Milk, Diaper ) 3
contain X
Association Rule Mining Task
• Given a set of transactions T, the goal of association rule mining is to
find all rules having
• support ≥ minsup threshold
• confidence ≥ minconf threshold
• Brute-force approach:
• List all possible association rules
• Compute the support and confidence for each rule
• Prune rules that fail the minsup and minconf thresholds
 Computationally prohibitive!
Mining Association Rules
• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset, where each rule is a binary
partitioning of a frequent itemset
Frequent Itemset Generation: Apriori
• Apriori principle:
• If an itemset is frequent, then all of its subsets must also be frequent

• Apriori principle holds due to the following property of the support measure:

X , Y : ( X  Y )  s( X )  s(Y )
• Support of an itemset never exceeds the support of its subsets
Frequent Itemset Generation: Apriori
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Pruned
ABCDE
supersets
Frequent Itemset Generation: Apriori
• A level-wise, candidate-generation-and-test approach (Agrawal &
Srikant 1994)

Data base D
1-candidates Freq 1-itemsets 2-candidates
TID Items
Itemset Sup Itemset Sup Itemset
10 a, c, d a 2 a 2 ab
20 b, c, e Scan D b 3 b 3 ac
30 a, b, c, c 3 c 3 ae
e
d 1 e 3 bc
40 b, e
e 3 be
Min_sup=2 ce
3-candidates Freq 2-itemsets Counting
Scan D Itemset Itemset Sup Itemset Sup
bce ac 2 ab 1
bc 2 ac 2 Scan D
be 3 ae 1
Freq 3-itemsets ce 2 bc 2
Itemset Sup be 3
bce 2 ce 2
TID Items

Apriori: Example 1
2
Bread, Milk
Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
Item Count Items (1-itemsets)
5 Bread, Milk, Diaper, Coke
Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
Itemset Count
{Bread,Milk,Diaper} 3
Rule Generation
• Given a frequent itemset L, find all non-empty subsets f  L such that
f → L – f satisfies the minimum confidence requirement
• If {A,B,C,D} is a frequent itemset, candidate rules:
ABC →D, ABD →C, ACD →B, BCD →A,
A →BCD, B →ACD, C →ABD, D →ABC
AB →CD, AC → BD, AD → BC, BC →AD,
BD →AC, CD →AB,

• If |L| = k, then there are 2k – 2 candidate association rules (ignoring L


→  and  → L)
Association Rules: Applications
• Supermarket shelf management
• Goal: to identify items which are bought together (by sufficiently
many customers)
• Approach: process POS data to find dependencies among items
• Example:
• If a customer buys diaper and milk then he is very likely to buy beer
• So stack six-packets next to diaper
Cosine Similarity
Item-Item similarity: Cosine similarity
𝑆1 = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
𝑆2 = [0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
Item-Item similarity: Cosine similarity

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy