The Concept of Maximal Frequent Itemsets

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 18

The Concept of Maximal

Frequent Itemsets
NCU CSIE Database Laboratory
Kuo-Yu Huang
2002-04-15

Kuo-Yu Huang

NCU CSIE DBLab

Outline

Introduction
Max-Miner
MAFIA
GenMax
Conclusion

Kuo-Yu Huan

NCU CSIE DBLab

Introduction(1/2)
Interesting datasets with long patterns
Questionnaire results
Transactions database
Contain many frequently occurring items
A wide average record length

Apriori-like algorithms are inadequate


Enumerates every single frequent itemsets
Kuo-Yu Huan

NCU CSIE DBLab

Introduction(2/2)
Maximal Frequent Itemsets
If it has no superset that is frequent.
eq
Items: a, b, c, d, e
Frequent Itemset: {a, b, c}
{a, b, c, d}, {a, b, c, e}, {a, b, c, d, e} are not
Frequent Itemset.
Maximal Frequent Itemsets: {a, b, c}
Kuo-Yu Huan

NCU CSIE DBLab

Max-Miner(1/4)
Efficiently mining long patterns from
databases
R. J. Bayardo
ACM SIGMOD98

Max-Miner
Abandons a bottom-up traversal
Attempts to look-ahead
Identify a long frequent itemset, prune all its
subsets.
Kuo-Yu Huan

NCU CSIE DBLab

Max-Miner(2/4)
Set-enumeration tree
Breadth-first search

Kuo-Yu Huan

NCU CSIE DBLab

Max-Miner(3/4)
Candidate group
Head: h(g)
Itemset enumerated by the node.

Tail: t(g)
An ordered set and contains all items not in h(g)

eg:Node {1}
h{g}: {1}
t{g}: {2, 3, 4}

Kuo-Yu Huan

NCU CSIE DBLab

Max-Miner(4/4)
Support counting
h(g), h(g)t{g}, h(g) {i} for all
If h(g)t{g} is frequent, then any itemset
enumerated by a sub-node will also be
frequent but no maximal.
If h(g){i} is infrequent, then any head of a
sub-node that contains item I will also be
infrequent.
Kuo-Yu Huan

NCU CSIE DBLab

MAFIA(1/4)
MAFIA: A Maximal Frequent Itemset
Algorithm for Transactional Databases.
D. Burdick, M. Calimlim, and J. Gehrke.
ICDE01

MAFIA
Integrates a depth-first traversal of the
itmset lattice with eiffective pruning
mechanisms
Kuo-Yu Huan

NCU CSIE DBLab

MAFIA(2/4)

Kuo-Yu Huan

NCU CSIE DBLab

10

MAFIA(3/4)
HUTMFI
Check Head Union Tail is in MFI
Stop searching and return

PEP
newNode = C i
Check newNode.support == C.support
Move I from C.tail to C.head

FHUT
newNode = C I
Whether I is the leftmost child in the tail

Kuo-Yu Huan

NCU CSIE DBLab

11

MAFIA(4/4)

Kuo-Yu Huan

NCU CSIE DBLab

12

GenMax(1/2)
Efficiently Mining Maximal Frequent
Itemsets
Karam Gouda and Mohammed J. Zaki.
ICDM01

GenMax
A backtrack search based algorithm for
mining maximal frequent itemsets.
Kuo-Yu Huan

NCU CSIE DBLab

13

GenMax(2/2)
Superset checking techniques
Do superset check only for Il+1Pl+1
Using check_status flag
Local maximal frequent itemsets

Reordering the combine set


Diffsets propagation

Kuo-Yu Huan

NCU CSIE DBLab

14

Conclusion(1/4)
Type I:
normal MFI distribution with not too long maximal patterns.

Type II:
Left-skewed distribution with longer pattern

Type III:
Exponential decay distribution with short maximal pattern

Type I
Type II
Type III

database

# of Items

Average length

# of records

Maximal pattern
length

Chess
Pumsb

76
7117

37
74

3196
49046

23(20%)
27(40%)

Connect
Pumsb*

130
7117

43
50

67557
49046

31(2.5%)
43(2.5%)

T10I4D100K
T40I10D100K

1000
1000

10
40

100,000
100,000

13(0.01%)
25(0.1%)

Kuo-Yu Huan

NCU CSIE DBLab

15

Conclusion(2/4)

Kuo-Yu Huan

NCU CSIE DBLab

16

Conclusion(3/4)

Kuo-Yu Huan

NCU CSIE DBLab

17

Conclusion(4/4)

Kuo-Yu Huan

NCU CSIE DBLab

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy