APRIORI Algorithm: Professor Anita Wasilewska Book Slides
APRIORI Algorithm: Professor Anita Wasilewska Book Slides
APRIORI Algorithm: Professor Anita Wasilewska Book Slides
A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)
Use the frequent itemsets to generate association rules.
Pseudo-code:
Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do
Consider a database, D , consisting of 9 transactions. Suppose min. support count required is 2 (i.e. min_sup = 2/9 = 22 % ) Let minimum confidence required is 70%. We have to first find out the frequent itemset using Apriori algorithm. Then, Association rules will be generated using min. support & min. confidence.
6 7 6 2 2
6 7 6 2 2
C1
L1
The set of frequent 1-itemsets, L1 , consists of the candidate 1itemsets satisfying minimum support.
In the first iteration of the algorithm, each item is a member of the set
of candidate.
Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5}
Scan D for count of each candidate
Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5}
Sup. Count 4 4 1 2 4 2 2 0 1 0
Itemset {I1, I2} {I1, I3} {I1, I5} {I2, I3} {I2, I4} {I2, I5}
Sup Count 4 4 2 4 2 2
L2
C2
C2
Sup. Count 2 2
Sup Count 2 2
C3
C3
L3
The generation of the set of candidate 3-itemsets, C3 , involves use of the Apriori Property. In order to find C3, we compute L2 Join L2. C3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}. Now, Join step is complete and Prune step will be used to reduce the size of C3. Prune step helps to avoid heavy computation due to large Ck.
Back To Example:
We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4}, {I2,I5}, {I1,I2,I3}, {I1,I2,I5}}.
Lets take l = {I1,I2,I5}. Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.
R2: I1 ^ I5 I2
Confidence = sc{I1,I2,I5}/sc{I1,I5} = 2/2 = 100% R2 is Selected.
R3: I2 ^ I5 I1
Confidence = sc{I1,I2,I5}/sc{I2,I5} = 2/2 = 100% R3 is Selected.
highly condensed, but complete for frequent pattern mining avoid costly database scans Develop an efficient, FP-tree-based frequent pattern mining method A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: sub-database test only!
Consider the same previous example of a database, D , consisting of 9 transactions. Suppose min. support count required is 2 (i.e. min_sup = 2/9 = 22 % ) The first scan of database is same as Apriori, which derives the set of 1-itemsets & their support counts. The set of frequent items is sorted in the order of descending support count. The resulting set is denoted as L = {I2:7, I1:6, I3:6, I4:2, I5:2}
{(I2 I1: 1),(I2: 2), (I1: 2)} <I2: 4, I1: 2>,<I1:2> {(I2: 4)} <I2: 4>
Mining the FP-Tree by creating conditional (sub) pattern bases Now, Following the above mentioned steps: Lets start from I5. The I5 is involved in 2 branches namely {I2 I1 I5: 1} and {I2 I1 I3 I5: 1}. Therefore considering I5 as suffix, its 2 corresponding prefix paths would be {I2 I1: 1} and {I2 I1 I3: 1}, which forms its conditional pattern base.
FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection
Reasoning
No candidate generation, no candidate test Use compact data structure Eliminate repeated database scan Basic operation is counting and FP-tree building