The Concept of Maximal Frequent Itemsets
The Concept of Maximal Frequent Itemsets
The Concept of Maximal Frequent Itemsets
Frequent Itemsets
NCU CSIE Database Laboratory
Kuo-Yu Huang
2002-04-15
Kuo-Yu Huang
Outline
Introduction
Max-Miner
MAFIA
GenMax
Conclusion
Kuo-Yu Huan
Introduction(1/2)
Interesting datasets with long patterns
Questionnaire results
Transactions database
Contain many frequently occurring items
A wide average record length
Introduction(2/2)
Maximal Frequent Itemsets
If it has no superset that is frequent.
eq
Items: a, b, c, d, e
Frequent Itemset: {a, b, c}
{a, b, c, d}, {a, b, c, e}, {a, b, c, d, e} are not
Frequent Itemset.
Maximal Frequent Itemsets: {a, b, c}
Kuo-Yu Huan
Max-Miner(1/4)
Efficiently mining long patterns from
databases
R. J. Bayardo
ACM SIGMOD98
Max-Miner
Abandons a bottom-up traversal
Attempts to look-ahead
Identify a long frequent itemset, prune all its
subsets.
Kuo-Yu Huan
Max-Miner(2/4)
Set-enumeration tree
Breadth-first search
Kuo-Yu Huan
Max-Miner(3/4)
Candidate group
Head: h(g)
Itemset enumerated by the node.
Tail: t(g)
An ordered set and contains all items not in h(g)
eg:Node {1}
h{g}: {1}
t{g}: {2, 3, 4}
Kuo-Yu Huan
Max-Miner(4/4)
Support counting
h(g), h(g)t{g}, h(g) {i} for all
If h(g)t{g} is frequent, then any itemset
enumerated by a sub-node will also be
frequent but no maximal.
If h(g){i} is infrequent, then any head of a
sub-node that contains item I will also be
infrequent.
Kuo-Yu Huan
MAFIA(1/4)
MAFIA: A Maximal Frequent Itemset
Algorithm for Transactional Databases.
D. Burdick, M. Calimlim, and J. Gehrke.
ICDE01
MAFIA
Integrates a depth-first traversal of the
itmset lattice with eiffective pruning
mechanisms
Kuo-Yu Huan
MAFIA(2/4)
Kuo-Yu Huan
10
MAFIA(3/4)
HUTMFI
Check Head Union Tail is in MFI
Stop searching and return
PEP
newNode = C i
Check newNode.support == C.support
Move I from C.tail to C.head
FHUT
newNode = C I
Whether I is the leftmost child in the tail
Kuo-Yu Huan
11
MAFIA(4/4)
Kuo-Yu Huan
12
GenMax(1/2)
Efficiently Mining Maximal Frequent
Itemsets
Karam Gouda and Mohammed J. Zaki.
ICDM01
GenMax
A backtrack search based algorithm for
mining maximal frequent itemsets.
Kuo-Yu Huan
13
GenMax(2/2)
Superset checking techniques
Do superset check only for Il+1Pl+1
Using check_status flag
Local maximal frequent itemsets
Kuo-Yu Huan
14
Conclusion(1/4)
Type I:
normal MFI distribution with not too long maximal patterns.
Type II:
Left-skewed distribution with longer pattern
Type III:
Exponential decay distribution with short maximal pattern
Type I
Type II
Type III
database
# of Items
Average length
# of records
Maximal pattern
length
Chess
Pumsb
76
7117
37
74
3196
49046
23(20%)
27(40%)
Connect
Pumsb*
130
7117
43
50
67557
49046
31(2.5%)
43(2.5%)
T10I4D100K
T40I10D100K
1000
1000
10
40
100,000
100,000
13(0.01%)
25(0.1%)
Kuo-Yu Huan
15
Conclusion(2/4)
Kuo-Yu Huan
16
Conclusion(3/4)
Kuo-Yu Huan
17
Conclusion(4/4)
Kuo-Yu Huan
18
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: