0% found this document useful (0 votes)

21 views

Association Rule Mining Spring 2022

The document discusses association rule mining. It begins by defining frequent patterns as patterns that occur frequently in a dataset. The goal of association rule mining is to find frequent patterns, associations, correlations, or causal structures among items in transactional data. It describes the Apriori algorithm, which works in two steps: (1) finding all frequent itemsets whose support is above a minimum threshold, and (2) generating strong association rules from the frequent itemsets that have high confidence. An example of applying the Apriori algorithm to market basket data is provided to illustrate the process.

Uploaded by

Rani Shamas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Association Rule Mining Spring 2022

Uploaded by

Rani Shamas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 84

Association Rule Mining

Association Rule Mining - Intro

Association Rule Mining - Intro
Frequent Pattern Analysis
🞂 Frequent pattern
🞂 a pattern (a set of items, subsequences, substructures, etc.) that
occurs frequently in a data set

🞂 Motivation: Finding inherent regularities in data

🞂 What products were often purchased together?
🞂 What are the subsequent purchases after buying a PC?
🞂 What kinds of DNA are sensitive to this new drug?
🞂 Can we automatically classify web documents?
Association Mining
🞂 Association rule mining
🞂 Finding frequent patterns, associations, correlations, or
causal structures among sets of items in data
Application
🞂 Items = products;
🞂 Baskets = sets of products someone bought in one trip to
the store
🞂 Suppose many people buy coke and diapers together
🞂 Run a sale on diapers; raise price of coke
🞂 Only useful if many buy diapers & coke
Association Rule Mining
Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items in
the transaction

Market-Basket transactions
Example of Association Rules

{Diaper} → {Cereal},
{Milk, Bread} → {Eggs,Coke},
{Cereal, Bread} → {Milk},
Definition: Frequent Itemset
🞂 Itemset
🞂 A collection of one or more items
🞂 E.g. {Milk, Bread, Diaper}
🞂 k-itemset
🞂 An itemset that contains k items
🞂 Support count (σ)
🞂 Frequency of occurrence of an itemset
🞂 E.g. σ({Milk, Bread,Diaper}) = 2
🞂 Support
🞂 Fraction of transactions that contain an itemset
🞂 E.g. s({Milk, Bread, Diaper}) = 2/5
🞂 Frequent Itemset
🞂 An itemset whose support is greater than or equal to a min support threshold
Definitions
• Association Rule
– An implication X → Y, where X and Y are
itemsets

– Example: {Milk} → {Cereal}

– Example: {Milk} → {Diaper}

– Example: {Milk, Diaper} → {Cereal}

– Example: {Milk, Diaper} → {Cereal,

Coke}
Rule Evaluation Metrics
🞂 Support Measure
🞂 This measure gives an idea of how frequent an itemset is in all the
transactions.
🞂 Consider itemset1 = {bread} and itemset2 = {shampoo}.
🞂 There will be far more transactions containing bread than those
containing shampoo.
🞂 So, itemset1 will generally have a higher support than itemset2.
🞂 Mathematically, support is the fraction of the total number of
transactions in which the itemset occurs.
Rule Evaluation Metrics
🞂 Confidence Measure
🞂 This measure defines the likeliness of occurrence of consequent
on the itemset given that the itemset already has the antecedents.
🞂 That is to answer the question of all the transactions containing
say, {Diaper, coke}, how many also had {Milk} on them?
🞂 We can say by common knowledge that {Diaper, coke} → {Milk}
should be a high confidence rule.
🞂 Technically, confidence is the conditional probability of
occurrence of consequent given the antecedent.
Rule Evaluation Metrics
• Association Rule
– An implication X → Y, where X and Y
are itemsets
– Example: {Milk, Diaper} → {Cereal}
• Rule Evaluation Metrics
– Support, s :
– Fraction of transactions that contain Example
both X and Y
:
– probability that a transaction contains
X∪Y
– Confidence, c :
– Measures how often items in Y appear
in transactions that contain X
– conditional probability that a
transaction having X also contains Y
Interesting Association Rules
🞂 Not all high‐confidence rules are interesting
🞂 The rule X → milk may have high confidence for many
itemsets X, because milk is just purchased very often
(independent of X)

🞂 Interesting rules are those with high positive or negative

interest values
Association Rule Mining
🞂 Goal: Find rules with high support/confidence
🞂 How to compute?
🞂 Support: Find sets of items that occur frequently
🞂 Confidence: Find frequency of subsets of supported itemsets

🞂 If we have all frequently occurring sets of items (frequent

itemsets), we can compute support and confidence!
Mining Association Rules:
Example
Min. support 50%
Transaction-id Items bought Min. confidence
10 A, B, C 50%
20 A, C
Frequent pattern Support
30 A, D
{A} 75%
40 B, E, F
{B} 50%
{C} 50%
{A, C} 50%

For rule A ⇒ C:
support = support({A}∪{C}) = 50%
confidence = support({A}∪{C})/support({A}) = 66.6%
Apriori Algorithm
🞂 Apriori property: any subset of a frequent itemset must be
frequent
🞂 if {cereal, diaper, nuts} is frequent, so is {cereal, diaper}
🞂 Every transaction having {cereal, diaper, nuts} also contains {cereal,
diaper}

ABC ABD ACD BCD

AB AC AD BC BD CD

A B C
D
Apriori Algorithm

🞂 Apriori pruning principle:

🞂 If there is any itemset which is infrequent, its superset should not be
generated/tested!

🞂 Method:
🞂 generate length (k+1) candidate itemsets from length k frequent itemsets,
and
🞂 test the candidates against DB
🞂 Performance studies show its efficiency and scalability
Illustration of the Apriori principle

Found to be
Infrequent

Infrequent supersets

Pruned
The Apriori algorithm
Ck = candidate itemsets of size k
Level-wise
Lk = frequent itemsets of size k
approach
1. k = 1, C1 = all items
2. While Ck not empty
Frequent
itemset 3. Scan the database to find which itemsets in Ck
generation
are frequent and put them into Lk
Candidate
generation
4. Use Lk to generate a collection of candidate
itemsets Ck+1 of size k+1
5. k = k+1
The Apriori Algorithm—An Example
Itemset sup
Itemset sup
Database TDB {A} 2 L1
C1 {A} 2
Tid Items {B} 3
{B} 3
10 A, C, D {C} 3
20 B, C, E 1st scan {C} 3
{D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
L2 {A, B} 1
Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}
C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2 28
The Apriori Algorithm—An Example
L L
1 Tid sup
Itemset Items Itemset sup L
2

{A} 10 2 A, C, D {A, C} 2 3
Itemset sup
{B} 20 3 {B, C} 2 {B, C, E} 2
B, C, E
{C} 3 {B, E} 3
30 A, B, C, E {C, E} 2
{E} 3
40 B, E

All Rules Frequency ≥ 50%,

BC E Confidence 100%:
CE B BE
BE C BCE
BC,E CE  B
CE,B
…
Apriori Example
🞂Minimum support count is 2.
Apriori Example
Apriori Example
Association Rule Mining Task
🞂Input: A set of transactions T, over a set of items I
🞂Output: All rules with items in I having
🞂 support ≥ minsup threshold
🞂 confidence ≥ minconf threshold
🞂 lift ≥ minlift threshold
🞂 interest ≥ minint threshold

🞂 From Book ‘Introduction to Data Mining’

🞂Why counting supports of candidates a problem?
🞂 The total number of candidates can be very huge
🞂 One transaction may contain many candidates
Mining Association Rules
🞂 Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support ≥ minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset, where
each rule is a partitioning of a frequent itemset into Left-Hand-
Side (LHS) and Right-Hand-Side (RHS)

Frequent itemset: {A,B,C,D}

Rule: AB→CD
Rule Generation
🞂 We have all frequent itemsets, how do we get the rules?
🞂 For every frequent itemset S, we find rules of the form L→S – L,
where L ⊂ S, that satisfy the minimum confidence requirement
🞂 Example: L = {A,B,C,D}
🞂 Candidate rules:
A →BCD, B →ACD, C →ABD, D →ABC
AB →CD, AC → BD, AD → BC, BD →AC, CD →AB,
ABC →D, BCD →A, BC →AD,
🞂 If |L| = k, then there are 2k – 2 candidate association rules
(ignoring L → ∅ and ∅ → L)
Challenges of Frequent Pattern Mining
🞂 Challenges
🞂 Multiple scans of transaction database
🞂 Huge number of candidates
🞂 Tedious workload of support counting for candidates
Improving Apriori: general ideas
🞂 Reduce passes of transaction database scans
🞂 Reduce the number of candidates (M)
🞂 Complete search: M=2d
🞂 Use pruning techniques to reduce M

🞂 Reduce the number of transactions (N)

🞂 Reduce size of N as the size of itemset increases
🞂 Used by DHP and vertical-based mining algorithms

🞂 Reduce the number of comparisons (NM)

🞂 Use efficient data structures to store the candidates or
transactions
🞂 No need to match every candidate against every transaction
DIC: Reduce Number of Scans
ABCD 🞂 Once both A and D are determined
frequent, the counting of AD begins
ABC ABD ACD BCD 🞂 Once all length-2 subsets of BCD are
determined frequent, the counting of
BCD begins
AB AC BC AD BD CD
Transactions
1-
A B C D
Aprior 2-
itemsets
i itemsets
…
{}
Itemset 1-
lattice itemsets
2-
S. Brin R. Motwani, J. Ullman, and S. DIC items 3-
Tsur. Dynamic itemset counting and items
implication rules for market basket data.
In SIGMOD’97
Apriori vs. DIC
🞂 Apriori
🞂 level-wise
🞂 many passes
🞂 DIC
🞂 reduce the number of passes
🞂 fewer candidate itemsets than sampling
🞂 example : 40,000 transaction, M = 10,000
DIC - Counting large itemsets

🞂 Itemsets : a large lattice

🞂 count just the minimal small itemsets
🞂 the itemsets that do not include any other small itemsets
🞂 mark itemset
🞂 Solid box - confirmed large itemset
🞂 Solid circle - confirmed small itemset
🞂 Dashed box - suspected large itemset
🞂 Dashed circle - suspected small itemset
Fig 2. An itemsets lattice
DIC algorithm

● The empty itemset is marked with a soild box. All the 1-itemsets are
marked with dashed circles. All other itemsets are unmarked.
● Read M transactions. For each transaction, increment the respective
counters for the itemsets marked with dashes.
● If a dashed circle has a count that exceeds the support threshold, turn it
into a dashed square. If any immediate superset of it has all of its
subsets as solid or dashed squares, add new counter for it and make it
dashed circle.
● If a dashed itemset has been counted through all the transactions, make
it solid and stop counting it.
● If we are at the end of the transaction file, rewind to the beginning
● If any dashed itemsets remain, go to step 2.
Fig 3. Start of DIC algorithm
Fig 4. After M transactions
Fig 5. After 2M transactions
Fig 6. After one pass
DIC- Data structure
🞂 like the hash tree used in Apriori with a little extra
information
🞂 Every node stores
🞂 the last item in the itemset
🞂 counter, marker, its state
🞂 its branches if it is an interior node
DIC- Implication rules
🞂 conviction
🞂 more useful and intuitive measure
🞂 unlike confidence,
🞂 normalized based on both the antecedent and the consequent
🞂 unlike interest,
🞂 directional
🞂 actual implication as opposed to co-occurrence

🞂 support : P(A, B)
🞂 confidence : P(B|A) = P(A, B)/P(A)
🞂 interest : P(A, B)/P(A)P(B)
🞂 conviction : P(A)P(¬B)/P(A, ¬B)
Reducing Number of Candidates
🞂 Apriori principle:
🞂 If an itemset is frequent, then all of its subsets must also be
frequent

🞂 Apriori principle holds due to the following property of

the support measure:
🞂 Support of an itemset never exceeds the support of its subsets
🞂 This is known as the anti-monotone property of support
ECLAT (vertical format)
🞂 For each item, store a list of transaction ids (tids)
ECLAT
🞂 Determine support of any k-itemset by intersecting tid-lists of two of its (k-1)
subsets.

🞂 Advantage: very fast support counting

🞂 Disadvantage: intermediate tid-lists may become too large for memory

🞂 Using diffset to accelerate mining

🞂 Only keep track of differences of tids
🞂 T(B) = {1,2,5,7,8,10}, T(AB) = {1,5,7,8} , Diff_Set (AB, B) = {2,10}

🞂 M. Zaki et al. New algorithms for fast discovery of association rules. in KDD’97
The Partitioning Algorithm

🞂 Divide database into n partitions.

🞂 so that each portion can fit into memory.
🞂 Any itemset that is frequent in DB must be frequent in at least one
of the n partitions
🞂 Require 2 DB Scans
🞂 Scan 1: partition database and find local frequent patterns
🞂 Scan 2: consolidate global frequent patterns

DB1 + DB2 + + DBk = DB

sup1(i) < σDB1 sup2(i) < σDB2 supk(i) < σDBk sup(i) < σDB
The Partitioning Algorithm
🞂 Process one partition in main memory at a time:
🞂 For each partition, generate frequent itemsets using the Apriori algorithm
🞂 min_support for a partition = min_support of DB x no of transactions in that
partition (here min_support is in %)
🞂 Form Tidlist for all item sets to facilitate counting in the merge phase
🞂 After all partitions are processed, the local frequent itemsets are
merged into global frequent sets
🞂 support can be computed from the tidlists.
🞂 Partition (Savasere, Omiecinski, & Navathe, VLDB’95).
Is Apriori Fast Enough? — Performance
Bottlenecks

🞂 The bottleneck of Apriori: candidate generation

🞂 Breadth-first (i.e., level-wise) search
🞂 Huge candidate sets:
🞂 104 frequent 1-itemset will generate 107 candidate 2-itemsets
🞂 To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs to
generate 2100 ≈ 1030 candidates.
🞂 Multiple scans of database:
🞂 Needs n or (n +1 ) scans, n is the length of the longest pattern

55
Frequent Pattern Growth (FP)
🞂 First algorithm that allows frequent pattern mining without
generating candidate sets
🞂 Grow long patterns from short ones using local frequent items
🞂 “ab” is a frequent pattern
🞂 Get all transactions having “ab”: DB|ab
🞂 “d” is a local frequent item in DB|ab 🡪 abd is a frequent
pattern
Construct FP-tree from a Transaction Database

TID Items bought (ordered) frequent items

100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o, w} {f, b} min_support = 3
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
Header Table
1. Scan DB once, find frequent 1-
itemset (single item pattern) Item frequency head f: c:
2. Sort frequent items in frequency f 4
descending order, f-list c 4 c: b: b:
3. Scan DB again, construct FP- a 3
tree b 3 a: p:
m 3
p 3
m: b:
F-list = f-c-a-b-m-
p: m:
p
Construct FP-tree from a Transaction Database

TID Items bought (ordered) frequent items

100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o, w} {f, b} min_support = 3
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
Header Table
1. Scan DB once, find frequent 1-
itemset (single item pattern) Item frequency head f:4 c:1
2. Sort frequent items in frequency f 4
descending order, f-list c 4 c:3 b:1 b:1
3. Scan DB again, construct FP- a 3
tree b 3 a:3 p:1
m 3
p 3
m:2 b:1
F-list = f-c-a-b-m-
p:2 m:1
p
Step 1: Construct Conditional Pattern Base

🞂 Starting at the frequent item header table in the FP-tree

🞂 Traverse the FP-tree by following the link of each frequent item
🞂 Accumulate all of transformed prefix paths of that item to form a
conditional pattern base
{}
Header Table
Conditional pattern bases
Item frequency head f:4 c:1
itemcond. pattern base
f 4
c 4 c:3 b:1 b:1 c f:3
a 3 a fc:3
b 3 a:3 p:1 b fca:1, f:1, c:1
m 3
p 3 m fca:2, fcab:1
m:2 b:1
p fcam:2, cb:1
p:2 m:1
Step 2: Construct Conditional FP-tree

🞂 For each pattern-base

🞂 Accumulate the count for each item in the base
🞂 Construct the conditional FP-tree for the frequent items of
the pattern base
m-conditional pattern base:
{} fca:2, fcab:1
Header Table
Item frequency head f:4 c:1 {}
f 4
c 4 c:3 b:1 b:1
a 3 🡲 f:
b 3 a:3 p:1 3
m 3 c:
p 3 m:2 b:1 3
a:
p:2 m:1 m-conditional
3 FP-tree
Conditional Pattern Bases and
Conditional FP-Tree
Single FP-tree Path Generation
🞂 For single path the frequent patterns can be generated by
enumeration of all the combinations of the sub-paths
Frequent Pattern Growth Mining Method

🞂 Idea: Frequent pattern growth

🞂 Recursively grow frequent patterns by pattern and database
partition
🞂 Method
🞂 For each frequent item, construct its conditional pattern-base,
and then its conditional FP-tree
🞂 Repeat the process on each newly created conditional FP-tree
🞂 Until the resulting FP-tree is empty, or it contains only one
path—single path will generate all the combinations of its sub-
paths, each of which is a frequent pattern

65
FP-Growth
FP-Tree size
⮚ FP-Tree usually has a smaller size than the uncompressed
data
⮚ Typically many transactions share items (and hence prefixes).
⮚ Best case scenario: all transactions contain the same set of items.
⮚ 1 path in the FP-tree
⮚ Worst case scenario: every transaction has a unique set of items (no
items in common)
⮚ Size of the FP-tree is at least as large as the original data.
⮚ Storage requirements for the FP-tree are higher - need to store the
pointers between the nodes and the counters.
Example 2: FP-Tree Construction
Example 2: Conditional Pattern Base
Example 2

Let minSup = 2
Extract all frequent itemsets containing e.

Conditional Pattern base and FP tree for e

Example
⮚ Check if e is a frequent item
⮚ Add the counts along the linked list (dotted line).
⮚ Yes, count =3 so {e} is extracted as a frequent itemset.
⮚ As e is frequent, find frequent itemsets ending in e. i.e. de, ce
and ae.
Example
🞂 Example: e -> de -> ade ({d,e}, {a,d,e} are found to be
frequent)

•Example: e -> ce ({c,e} is found to be frequent)

Result

Frequent itemsets found (ordered by sufix and order in

which they are found):
Discusion
⮚ Advantages of FP-Growth
🞂 only 2 passes over data-set
🞂 “compresses” data-set
🞂 no candidate generation
🞂 much faster than Apriori
⮚ Disadvantages of FP-Growth
🞂 FP-Tree may not fit in memory!!
🞂 FP-Tree is expensive to build
FP-Growth vs. Apriori: Scalability With the
Support Threshold

Data set T25I20D10K

75
Closed Patterns and Max-Patterns

🞂 A long pattern contains huge number of sub-patterns,

🞂 Example:
🞂 {a1, …, a10} contains (101) + (102) + … + (1100) = 210 – 1 sub-patterns!
🞂 {a1, …, a100} contains (1001) + (1002) + … + (110000) = 2100 – 1 = 1.27*1030 sub-patterns!

🞂 Solution: Mine Closed patterns and Maximal-patterns

76
Maximal Frequent Itemset
Closet Itemset
🞂 An itemset is closed if none of its immediate supersets has
the same support as the itemset
Maximal vs Closed Itemsets
Maximal vs Closed Itemsets
Maximal vs Closed Itemsets
MaxMiner: Mining Max-Patterns

🞂 R. Bayardo. Efficiently mining long patterns from databases.

SIGMOD’98

🞂 Max-Miner algorithm
🞂 Efficiently extract only the maximal frequent itemsets
🞂 Roughly linear in the number of maximal frequent itemsets
🞂 “look ahead” , not bottom-up search
🞂 can prune all its subsets from consideration, by identifying a long
frequent itemset early on
Complete set-enumeration tree over four items
{}

1 2 3 4

1,2 1,3 1,4 2,3 2,4 3,4

• Set-enumeration tree search frame
work
1,2,3 1,3,4 2,3,4 • Breadth-first search
• to limit the number of passes

1,2,3,4 • Pruning strategies

• subset infrequency pruning (as
does Apriori)
• superset frequency pruning
{}
Max-Miner
1 2 3 4

1,2 1,3 1,4 2,3 2,4 3,4

🞂 Candidate group g,
🞂 head, h(g)
1,2,3 1,3,4 2,3,4 🞂 represents the itemsets enumerated by the
node
🞂 tail, t(g)
🞂 an ordered set
1,2,3,4 🞂 contains all items not in h(g) that can
potentially appear in any sub-node
🞂 ex. the node enumerating itemset {1}
⇒ h(g) = {1}, t(g) = {2, 3, 4}
{}
Max-Miner
1 2 3 4

1,2 1,3 1,4 2,3 2,4 3,4

🞂 counting the support of a candidate
group g,
1,2,3 1,3,4 2,3,4 🞂 computing the support of itemsets h(g),
h(g) ∪ t(g) and h(g) ∪ {i} for all i ∈ t(g)
🞂 superset-frequency pruning
🞂 halting sub-node expansion at any candidate
1,2,3,4 group g for which h(g) ∪ t(g) is frequent
🞂 subset-infrequency pruning
🞂 removing any such tail item from candidate
group before expanding its sub-nodes
{}
Max-Miner
1 2 3 4

1,2 1,3 1,4 2,3 2,4 3,4

1,2,3 1,3,4 2,3,4

1,2,3,4
CHARM: Mining by Exploring Vertical Data Format

🞂 Vertical format: t(AB) = {T11, T25, …}

🞂 tid-list: list of trans.-ids containing an itemset
🞂 Deriving closed patterns based on vertical intersections
🞂 t(X) = t(Y): X and Y always happen together
🞂 t(X) ⊂ t(Y): transaction having X always has Y
🞂 Using diffset to accelerate mining
🞂 Only keep track of differences of tids
🞂 t(X) = {T1, T2, T3}, t(XY) = {T1, T3}
🞂 Diffset (XY, X) = {T2}
🞂 CHARM (Zaki & Hsiao@SDM’02)
87
Mining Frequent Closed Patterns: CLOSET

🞂 Use FP-tree approach

🞂 Flist: list of all frequent items in support ascending order Min_sup=
🞂 Flist: d-a-f-e-c 2
TID Items
10 a, c, d, e, f
🞂 Divide search space 20 a, b, e
🞂 Patterns having d 30 c, e, f
40 a, c, d, f
🞂 Patterns having d but no a, etc. 50 c, e, f
🞂 Find frequent closed pattern recursively
🞂 Every transaction having d also has cfa 🡪 cfad is a frequent closed pattern
🞂 J. Pei, J. Han & R. Mao. “CLOSET: An Efficient Algorithm for Mining Frequent
Closed Itemsets", DMKD'00.
The Algorithms (State of the Art)
🞂 All
🞂 Apriori, FP-Growth, COFI*, ECLAT, Leap

🞂 Closed
🞂 CHARM, CLOSET+,COFI-CLOSED, Leap

🞂 Maximal
🞂 MaxMiner, MAFIA, GENMAX, COFI-MAX, Leap
Performance Evaluation of Algorithms
🞂 The FP-growth method was usually better than the best
implementation of the Apriori algorithm.

🞂 Apriori was generally better than other algorithms if the

support required was high
🞂 since high support leads to a smaller number of frequent items
which suits the Apriori algorithm.

🞂 At very low support, the number of frequent items became

large and none of the algorithms were able to handle large
frequent sets gracefully.
I found it to be quite easy, but if anyone has any

Questions
, I'm more than happy to provide the

Answers.
Feel free to ask anything!

PHOTOSYNTHESIS-Student Worksheet (With Answers)
54% (13)
PHOTOSYNTHESIS-Student Worksheet (With Answers)
8 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Vats / Passlock / Transponder Universal Alarm Bypass Module Model #Aspassiii
100% (2)
Vats / Passlock / Transponder Universal Alarm Bypass Module Model #Aspassiii
12 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
[2025-05-27]-FPM_LECTURE 9-
No ratings yet
[2025-05-27]-FPM_LECTURE 9-
35 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
DM Association
No ratings yet
DM Association
43 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
association rule
No ratings yet
association rule
22 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
Unit-4_Part-1
No ratings yet
Unit-4_Part-1
152 pages
06FPBasic
No ratings yet
06FPBasic
77 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Unit 4
No ratings yet
Unit 4
72 pages
DMDW Chapter 4(Updated)
No ratings yet
DMDW Chapter 4(Updated)
28 pages
Slides
No ratings yet
Slides
92 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Association Rules
No ratings yet
Association Rules
48 pages
Contents
No ratings yet
Contents
59 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
dm 2
No ratings yet
dm 2
71 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
Association Rules
No ratings yet
Association Rules
24 pages
DSTBD_9-DMassrules
No ratings yet
DSTBD_9-DMassrules
98 pages
CH - 5
No ratings yet
CH - 5
43 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
UNIT 4 .3 ASSOCIATION ANALYSIS
No ratings yet
UNIT 4 .3 ASSOCIATION ANALYSIS
50 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
Week 3
No ratings yet
Week 3
56 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Unit 5
No ratings yet
Unit 5
40 pages
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
No ratings yet
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
37 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
DM -Unit 2-PPT
No ratings yet
DM -Unit 2-PPT
49 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
Data Mining mod 2
No ratings yet
Data Mining mod 2
7 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
The Optimisation of The Single Surface M
No ratings yet
The Optimisation of The Single Surface M
17 pages
Specialized Stem-11 Basic-Calculus Q4 Clas2
No ratings yet
Specialized Stem-11 Basic-Calculus Q4 Clas2
13 pages
8.2 Configuration: 8.2.1 Cowsay
No ratings yet
8.2 Configuration: 8.2.1 Cowsay
6 pages
Tuto1 Ans ESE414 TOPIC 1
No ratings yet
Tuto1 Ans ESE414 TOPIC 1
6 pages
QBxi Model Paper
No ratings yet
QBxi Model Paper
2 pages
Position Sensorless Control Without Phase Shifter For
No ratings yet
Position Sensorless Control Without Phase Shifter For
13 pages
MEASURES-OF-POSITION-UNGROUPED-DATA
No ratings yet
MEASURES-OF-POSITION-UNGROUPED-DATA
5 pages
Ansi C63.2
No ratings yet
Ansi C63.2
19 pages
Drills and Ceremonies
50% (2)
Drills and Ceremonies
47 pages
Heat and Mass Transfer Syllabus
No ratings yet
Heat and Mass Transfer Syllabus
2 pages
How To Configure in GNS3
100% (1)
How To Configure in GNS3
13 pages
JVC rc-st3sl SM
No ratings yet
JVC rc-st3sl SM
25 pages
Us 20110168232 a 1
No ratings yet
Us 20110168232 a 1
26 pages
Exponential and Logarithmic Series
No ratings yet
Exponential and Logarithmic Series
3 pages
42CD4 Material
100% (1)
42CD4 Material
1 page
European Journal of Pharmaceutics and Biopharmaceutics: Verena Garsuch, Jörg Breitkreutz
No ratings yet
European Journal of Pharmaceutics and Biopharmaceutics: Verena Garsuch, Jörg Breitkreutz
7 pages
407 Gold SFC 29 Sales Pitch Bs6
No ratings yet
407 Gold SFC 29 Sales Pitch Bs6
22 pages
Plainmamba: Improving Non-Hierarchical Mamba in Visual Recognition
No ratings yet
Plainmamba: Improving Non-Hierarchical Mamba in Visual Recognition
22 pages
Huawei 3G Introduction PDF
No ratings yet
Huawei 3G Introduction PDF
64 pages
CHM 102 Past Test Questions
No ratings yet
CHM 102 Past Test Questions
15 pages
(Ebook) Building Java Programs_ A Back to Basics Approach (2nd Edition) by null pdf download
100% (2)
(Ebook) Building Java Programs_ A Back to Basics Approach (2nd Edition) by null pdf download
58 pages
Types of Graph Theory
No ratings yet
Types of Graph Theory
7 pages
Statistics WS - 5
No ratings yet
Statistics WS - 5
2 pages
Introduction To Trigonometry
No ratings yet
Introduction To Trigonometry
11 pages
04 HYDRAULIC MOTORS (Module-IV)
100% (1)
04 HYDRAULIC MOTORS (Module-IV)
19 pages
Name: School: Bugakan Elementary School District: Mati North District Subject: Science 5 & 6
No ratings yet
Name: School: Bugakan Elementary School District: Mati North District Subject: Science 5 & 6
5 pages
ME 411 - Lect13 - Laminar Premixed Flame (Part 04)
No ratings yet
ME 411 - Lect13 - Laminar Premixed Flame (Part 04)
15 pages
GC28-0627-2 MVS JobMgmt Sep79OCR
No ratings yet
GC28-0627-2 MVS JobMgmt Sep79OCR
134 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Association Rule Mining Spring 2022

Uploaded by

Association Rule Mining Spring 2022

Uploaded by

Association Rule Mining

Association Rule Mining - Intro

🞂 Motivation: Finding inherent regularities in data

– Example: {Milk} → {Cereal}

– Example: {Milk} → {Diaper}

– Example: {Milk, Diaper} → {Cereal}

– Example: {Milk, Diaper} → {Cereal,

🞂 Interesting rules are those with high positive or negative

🞂 If we have all frequently occurring sets of items (frequent

ABC ABD ACD BCD

🞂 Apriori pruning principle:

All Rules Frequency ≥ 50%,

🞂 From Book ‘Introduction to Data Mining’

Frequent itemset: {A,B,C,D}

🞂 Reduce the number of transactions (N)

🞂 Reduce the number of comparisons (NM)

🞂 Itemsets : a large lattice

🞂 Apriori principle holds due to the following property of

🞂 Advantage: very fast support counting

🞂 Using diffset to accelerate mining

🞂 Divide database into n partitions.

DB1 + DB2 + + DBk = DB

🞂 The bottleneck of Apriori: candidate generation

TID Items bought (ordered) frequent items

TID Items bought (ordered) frequent items

🞂 Starting at the frequent item header table in the FP-tree

🞂 For each pattern-base

🞂 Idea: Frequent pattern growth

Conditional Pattern base and FP tree for e

•Example: e -> ce ({c,e} is found to be frequent)

Frequent itemsets found (ordered by sufix and order in

Data set T25I20D10K

🞂 A long pattern contains huge number of sub-patterns,

🞂 Solution: Mine Closed patterns and Maximal-patterns

🞂 R. Bayardo. Efficiently mining long patterns from databases.

1,2 1,3 1,4 2,3 2,4 3,4

1,2,3,4 • Pruning strategies

1,2 1,3 1,4 2,3 2,4 3,4

1,2 1,3 1,4 2,3 2,4 3,4

1,2 1,3 1,4 2,3 2,4 3,4

1,2,3 1,3,4 2,3,4

🞂 Vertical format: t(AB) = {T11, T25, …}

🞂 Use FP-tree approach

🞂 Apriori was generally better than other algorithms if the

🞂 At very low support, the number of frequent items became

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.