0% found this document useful (0 votes)

6 views

DM Lect7

The document discusses frequent pattern mining and association rule mining. It defines key concepts like frequent itemsets, support, confidence and covers algorithms like Apriori for generating candidate itemsets and determining frequent patterns. The goal is to find inherent relationships within transactional data.

Uploaded by

هارون المقطري

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

DM Lect7

Uploaded by

هارون المقطري

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

DATA MINING

Association Rules
Lec 7

Mohammed
Taiz University
Outlines
• Basic Concepts
• Frequent Itemset Mining Methods
What Is Frequent Pattern Analysis?
• Frequent pattern
• a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

• First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of frequent itemsets and
association rule mining
• Finding frequent patters plays an essential role in mining association, correlation, and many
other interesting relationships among data.

• Motivation: Finding inherent regularities in data

• What products were often purchased together? — Beer and diapers?!

• What are the subsequent purchases after buying a PC?

• What kinds of DNA are sensitive to this new drug?

• Can we automatically classify web documents?

• Applications
• Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click
stream) analysis, and DNA sequence analysis
Market Basket Analysis
Basic Concepts: Transactional Data
• Market basket example:
• Basket 1:{}
• Basket 2:{}
• Basket 3:{}
• ….
• Basket n:{}

• Deﬁnitions:
• An item : an article in a basket, or an attribte-value pair.
• A transaction: items purchased in a basket.
• A transactional dataset: A set of transactions.
Basic Concepts: Frequent Patterns
Transaction-id Items bought
• Itemset
10 A, B, D
• A set of one or more items
• e.g.,{A,B,D} is an itemset. 20 A, C, D
• k -itemset X = {x1, …, xk} is an itemset with k items. 30 A, D, E
• (absolute) support , or,support count of X
• Frequency or occurrence of an itemset X 40 B, E, F
• e.g., A=3 50 B, C, D, E, F

• (relative) support ,s , the fraction of transactions that contains X

• i.e., the probability that a transaction contains X
• i.e.,A = 60%
• An itemset X isfrequent if X’s support is no less than a minsup threshold
Basic Concepts: Association Rules
• Find all the rules X  Y with minimum Transaction-id Items bought
support and conﬁdence 10 A, B, D
• support, s , probability that a transaction
contains X  Y 20 A, C, D
• conﬁdence, c, conditional probability 30 A, D, E
that a transaction having X also contains
Y 40 B, E, F
50 B, C, D, E, F
• Let sup = 50%, conf = 50%
m in m in

Custome Customer
• Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3 }
r buys A
• Association rules: buys
• A  D (60%, 100%) both
• 60% of all transactions show that A and
D are purchased together
• 100% of the costumers who purchased
Customer
A also bought D
buys D
• D  A (60%, 75%)
Basic Concepts: Association Rules
• An association rule is about prelateships between
disjoint itemsets X and (Y)
• It presents the pattern when X occur, Y also
occurs

• Association rules do not represent an sort of

causality or correlation between the two itemsets.
• X =>Y does not mean X causes Y, s no Causality
• X => Y can be different from Y=>X, unlike correlation

• Association rules assist in marketing, targeted

advertising …..etc.
Basic Concepts: Support and Conf idence
• Find all the rulesX  Y with minimum support and
conﬁdence.
• Support. S, probability that a transaction contains X  Y
• P( X  Y) = support(X,Y)/support(X) = count(XY)/count(X).

•A rule is strong if its support and conﬁdence are no less

than a minimal support and conﬁdence thresholds

• Strong is not
necessary interesting (proper choose f
threshold is necessary )
Association rule mining
• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high conﬁdence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent
itemset

• Because the second step is less costly than the

ﬁrst, the over all performance is measured by
the ﬁrst step.

• Frequent itemset generation is still

computationally expensive
Mining Frequent Itemsets task
• Input: A set of transactions T, over a set of items I
• Output: All itemsets with items in I having
• support ≥ minsup threshold

• Problem parameters:
• N = |T|: number of transactions
• d = |I|: number of (distinct) items
• w: max width of a transaction
• Number of possible itemsets?

• Scale of the problem:

• WalMart sells 100,000 items and can store billions of baskets.
• The Web has billions of words and many billions of pages.
The itemset lattice

Given d items, there are 2d possible

itemsets
Closed Patterns and Max-Patterns
• A long pattern contains a combinatorial number of sub-patterns
• e.g., {a1, …, a100} contains (1001) + (1002) + … + (110000) = 2100 – 1 = 1.27*1030 sub
-patterns!

• Solution
• Mine closed patterns and max-patterns instead
Closed Frequent Itemset
• An itemset X is closed if X isfrequent and there existsno super-pattern Y ‫כ‬
X,with the same support as X
• An itemset is closed if none of its immediate supersets has tha same support
as the itemset.
• Closed pattern is a lossless compression of frequent patterns.
• Reducing the number of patterns and rules.
T ID Ite m s Ite m s e t S u p p o rt Ite m s e t S u p p o rt

1 {A , B } {A } 4 {A , B , C } 2

2 {B , C , D } {B } 5 {A , B , D } 3
{C } 3 {A , C , D } 2
3 {A , B , C , D }
{D } 4 {B , C , D } 3
4 {A , B , D }
{A , B } 4 {A , B , C , D } 2
5 {A , B , C , D }
{A , C } 2
{A , D } 3
{B , C } 3
{B , D } 4
{C , D } 3
Maximal Frequent Itemset
• An itemset X is a max-pattern if X is frequent and there existsno frequent
super-pattern Y ‫ כ‬X.
• An itemset is maximal frequent if none of its immediate supersets is frequent.

Infrequent
Itemsets
Maximal Frequent Itemset

Maximal
Itemsets

Infrequent
Itemsets
Maximal vs Closed Itemsets
Frequent Itemset Mining Methods
• Scalable mining methods: Three major approaches
• Apriori (Agrawal & Srikant@VLDB’94)
• Freq. pattern growth (FPgrowth—Han, Pei & Yin
@SIGMOD’00)
• Vertical data format approach (Charm—Zaki & Hsiao
@SDM’02)
Apriori: A Candidate Generation-and-Test
Approach
• Apriori pruning principle: If there is any itemset which is
infrequent, its superset should not be generated/tested!
(Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)
• Method:
• Initially, scan DB once to get frequent 1-itemset
• Generate length (k+1) candidate itemsets from length k
frequent itemsets
• Test the candidates against DB
• Terminate when no frequent or candidate set can be
generated
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Database TDB Itemset sup
{A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2
The Apriori Algorithm
• Pseudo-code:
Ck : Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1;Lk !=;k ++) do begin
Ck+1 = candidates generated from Lk ;
for each transactiont in database do
increment the count of all candidates inCk+1
that are contained int
Lk+1 = candidates inCk+1 with min_support
end
return k Lk ;
Important Details of Apriori
• How to generate candidates?
• Step 1: self-joiningLk
• Step 2: pruning
• How to count supports of candidates?
• Example of Candidate-generation
• L3 = {abc, abd, acd, ace, bcd }
• Self-joining:L3 *L3
• abcd fromabc andabd
• acde fromacd andace
• Pruning:
• acde is removed becauseade is not inL3
• C4 ={abcd }
Improving the ef ficiency of Aprior
• Bottlenecks of the Aprior Approach:
• Candidate generation and test:
• Often generate a huge number of candidates.
• It is costly to repeatedly scan the whole database.

• Improving Apriori: general ideas

• Reduce passes of transaction database scans

• Shrink number of candidates (sampling)

• Facilitate support counting of candidates (hash table)

Which Patterns Are Interesting ?
• Strong Rules are not necessarily interesting
• Analyzing transactions at ALLELectronics:
• Purchase of computer games and videos
• 10000 transactions
• 6000  include computer games
• 7500  include videos
• 4000  include both

• Min support = 30%, min conﬁdence = 60

• Buys (X,”computer games”)  buys (X,”videos
”)[Support = 40%, Conﬁdence = 66%]

• Is it interesting?
•

= 0.89
ANY QUESTIONS

Association Rules
No ratings yet
Association Rules
48 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Week 3
No ratings yet
Week 3
56 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
[2025-05-27]-FPM_LECTURE 9-
No ratings yet
[2025-05-27]-FPM_LECTURE 9-
35 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Unit 3
No ratings yet
Unit 3
62 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Module 3
No ratings yet
Module 3
136 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Concepts and Techniques: - Chapter 6
No ratings yet
Concepts and Techniques: - Chapter 6
64 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Association
No ratings yet
Association
40 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
7 - Association Rule Analysis
No ratings yet
7 - Association Rule Analysis
16 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
DMDW Chapter 4(Updated)
No ratings yet
DMDW Chapter 4(Updated)
28 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
06 FPBasic
No ratings yet
06 FPBasic
74 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Slides 06FPBasic
No ratings yet
Slides 06FPBasic
30 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
06 FPBasic
No ratings yet
06 FPBasic
59 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
DM-u3
No ratings yet
DM-u3
38 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Chap4-PatternMiningBasic
No ratings yet
Chap4-PatternMiningBasic
52 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Tm 39 Capilized
No ratings yet
Tm 39 Capilized
2 pages
Bycatch_Academic_Vocabulary
No ratings yet
Bycatch_Academic_Vocabulary
2 pages
3 Lecture
No ratings yet
3 Lecture
21 pages
UML Sequence-Communication-Timing
No ratings yet
UML Sequence-Communication-Timing
86 pages
Case Note 7: Patient Details
100% (1)
Case Note 7: Patient Details
3 pages
g1 Full Manual
86% (7)
g1 Full Manual
141 pages
s500 Doc 04 Infos
No ratings yet
s500 Doc 04 Infos
33 pages
Mercado Nicole CaseStudyChap5
No ratings yet
Mercado Nicole CaseStudyChap5
7 pages
CIA Cover Letter Sample
100% (2)
CIA Cover Letter Sample
7 pages
Optimization For ML (2) : CS771: Introduction To Machine Learning Piyush Rai
No ratings yet
Optimization For ML (2) : CS771: Introduction To Machine Learning Piyush Rai
14 pages
Republic of The Philippines Civil Service Commission
No ratings yet
Republic of The Philippines Civil Service Commission
6 pages
Adminstrative Thinkers
100% (1)
Adminstrative Thinkers
214 pages
Correlation of Self-Efficacy On Mathematical Communication Skills For Prospective Primary School Teachers
No ratings yet
Correlation of Self-Efficacy On Mathematical Communication Skills For Prospective Primary School Teachers
5 pages
The Role of Gis in Oil Industry Management: April 2007
No ratings yet
The Role of Gis in Oil Industry Management: April 2007
11 pages
Lithium Battery Safety Training
No ratings yet
Lithium Battery Safety Training
49 pages
Court Documents Application Form
No ratings yet
Court Documents Application Form
5 pages
Look Up
No ratings yet
Look Up
26 pages
Talk Lyrics - Google Search PDF
No ratings yet
Talk Lyrics - Google Search PDF
3 pages
Seismic Design of Earth Retaining Structures
No ratings yet
Seismic Design of Earth Retaining Structures
13 pages
SEZ kakkanad DB
No ratings yet
SEZ kakkanad DB
104 pages
Power Umbilicals and Cables Used To Provide Power and Various Control
No ratings yet
Power Umbilicals and Cables Used To Provide Power and Various Control
21 pages
RF Dynamic Exit Signage
No ratings yet
RF Dynamic Exit Signage
67 pages
EDUNET-WEEK-1-SUBMISSION-DETAILS
No ratings yet
EDUNET-WEEK-1-SUBMISSION-DETAILS
4 pages
FDI in Retail: - Siddharth Shankar Paikray ERO0240651 Op Batch 83 Vijayawada
No ratings yet
FDI in Retail: - Siddharth Shankar Paikray ERO0240651 Op Batch 83 Vijayawada
8 pages
红绿灯fpga
No ratings yet
红绿灯fpga
3 pages
Law Insider Air-Conditioning Clause
No ratings yet
Law Insider Air-Conditioning Clause
3 pages
Excel Practice
No ratings yet
Excel Practice
1,288 pages
Topic 1 sociology of work and employment
No ratings yet
Topic 1 sociology of work and employment
7 pages
Fun Maths Homework For Year 6
100% (1)
Fun Maths Homework For Year 6
5 pages
Datasheet - Solargiga 560W JMPV-X1 72
No ratings yet
Datasheet - Solargiga 560W JMPV-X1 72
2 pages
Lore Book
No ratings yet
Lore Book
2 pages
Module 4 Electronic Fundamentals - Only Questions (B1)
100% (1)
Module 4 Electronic Fundamentals - Only Questions (B1)
28 pages
Sigma LT Hand-Held - Quick Start Guide
No ratings yet
Sigma LT Hand-Held - Quick Start Guide
2 pages
Examen1Z0 815
No ratings yet
Examen1Z0 815
65 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DM Lect7

Uploaded by

DM Lect7

Uploaded by

DATA MINING

• Motivation: Finding inherent regularities in data

• What are the subsequent purchases after buying a PC?

• What kinds of DNA are sensitive to this new drug?

• Can we automatically classify web documents?

• (relative) support ,s , the fraction of transactions that contains X

• Association rules do not represent an sort of

• Association rules assist in marketing, targeted

•A rule is strong if its support and conﬁdence are no less

• Because the second step is less costly than the

• Frequent itemset generation is still

• Scale of the problem:

Given d items, there are 2d possible

C3 Itemset L3 Itemset sup

• Improving Apriori: general ideas

• Reduce passes of transaction database scans

• Shrink number of candidates (sampling)

• Facilitate support counting of candidates (hash table)

• Min support = 30%, min conﬁdence = 60

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.