Association Rule
Association Rule
Jing Gao
SUNY Buffalo
1
Association Rule Mining
• Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items
in the transaction
Market-Basket transactions
Example of Association Rules
TID Items
{Diaper} {Beer},
1 Bread, Milk {Milk, Bread} {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread} {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!
Definition: Frequent Itemset
• Itemset
– A collection of one or more items
• Example: {Milk, Bread, Diaper}
– k-itemset TID Items
• An itemset that contains k items 1 Bread, Milk
• Support count () 2 Bread, Diaper, Beer, Eggs
– Frequency of occurrence of an itemset 3 Milk, Diaper, Beer, Coke
– E.g. ({Milk, Bread,Diaper}) = 2 4 Bread, Milk, Diaper, Beer
• Support 5 Bread, Milk, Diaper, Coke
– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
– An itemset whose support is greater than
or equal to a minsup threshold
Definition: Association Rule
Association Rule
TID Items
– An implication expression of the form X
1 Bread, Milk
Y, where X and Y are itemsets
2 Bread, Diaper, Beer, Eggs
– Example:
3 Milk, Diaper, Beer, Coke
{Milk, Diaper} {Beer}
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Rule Evaluation Metrics
– Support (s)
Fraction of transactions that contain both Example:
X and Y {Milk, Diaper} Beer
– Confidence (c)
(Milk , Diaper, Beer) 2
Measures how often items in Y
appear in transactions that
s 0.4
|T| 5
contain X
(Milk, Diaper, Beer) 2
c 0.67
(Milk , Diaper ) 3
Association Rule Mining Task
• Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
Computationally prohibitive!
Mining Association Rules
Example of Rules:
TID Items
1 Bread, Milk {Milk,Diaper} {Beer} (s=0.4, c=0.67)
2 Bread, Diaper, Beer, Eggs {Milk,Beer} {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} {Milk} (s=0.4, c=0.67)
3 Milk, Diaper, Beer, Coke
{Beer} {Milk,Diaper} (s=0.4, c=0.67)
4 Bread, Milk, Diaper, Beer
{Diaper} {Milk,Beer} (s=0.4, c=0.5)
5 Bread, Milk, Diaper, Coke {Milk} {Diaper,Beer} (s=0.4, c=0.5)
Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
Mining Association Rules
• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support minsup
2. Rule Generation
– Generate high confidence rules from each frequent
itemset, where each rule is a binary partitioning of a
frequent itemset
A B C D E
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
X ,Y : ( X Y ) s( X ) s(Y )
– Support of an itemset never exceeds the support of its
subsets
– This is known as the anti-monotone property of support
Illustrating Apriori Principle
null
A B C D E
AB AC AD AE BC BD BE CD CE DE
Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Pruned
ABCDE
supersets
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
Tid Items
L1 {A} 2
C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}
• Potential Issues
– What if attribute has many possible values
• Example: attribute country has more than 200 possible
values
• Many of the attribute values may have very low support
– Potential solution: Aggregate the low-support attribute values
• Different methods:
– Discretization-based
– Statistics-based
Question
17