Frequent Closed Pattern Mining Algorithm Based On COFI-Tree

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Frequent Closed Pattern Mining Algorithm Based on

COFI-Tree

Jihai Xiao1, Xiaohong Cui1, and Junjie Chen2


1
College of Textile Engineering and Art, Taiyuan University of Technology,
030600 JinZhong, China
2
College of Computer Science and Technology, Taiyuan University of Technology,
030024 TaiYuan, China
{xiaojihai,tycuixiaohong}@126.com, chenjj@tyut.edu.cn

Abstract. This paper proposes a frequent closed itemsets mining algorithm


based on FP-tree and COFI-tree. This algorithm adopts a relatively small
independent tree called COFI-tree. COFI-Tree doesn’t need to construct
conditional FP-Tree recursively and there is only one COFI-Tree in memory at
a time, therefore this new mining algorithm reduces memory usage. Experiment
shows that the new approach outperforms similar state-of-the-art algorithms
when mining extremely large datasets in terms of execution time.

Keywords: Frequent Closed Itemsets, FP-Tree, COFI-Tree.

1 Introduction
Frequent pattern mining plays an important role in data mining. Now there are many
mining algorithms about frequent closed itemsets, such as, CLOSET[1]. Through our
study, we find this algorithm have some problems, it builds conditional FP-Tree
recursively so that it requires more memory and CPU resources. In this paper, an
effective frequent closed pattern mining algorithm is advanced, and it based on FP-
Tree and COFI-Tree.

2 Related Work

There are many algorithms to address the problem of mining association rules [2], [6].
One of the important algorithms is the Apriori algorithm [6]. It also is the foundation
of other most known algorithms. However, when mining extremely large datasets, the
Apriori algorithm still suffers from two main problems of repeated I/O scanning and
high computational cost. Another innovative approach of discovering frequent
patterns, FP-Growth, was presented by Han et al. in [2]. It creates a compact tree-
structure, FP-Tree, representing frequent patterns, that reduces the multi-scan times
and improves the candidate itemset generation. The authors of FP-Tree algorithm
have validated that their algorithm is faster than the Apriori. However, It needs to

H. Deng et al. (Eds.): AICI 2011, CCIS 237, pp. 175–182, 2011.
© Springer-Verlag Berlin Heidelberg 2011
176 J. Xiao, X. Cui, and J. Chen

construct conditional FP-Tree recursively. This massive creation of conditional trees


makes FP-Tree algorithm not scalable to mine large datasets beyond few millions.
The COFI-tree[3](Co-Occurrence Frequent Item Tree) algorithm that we are
presenting in this paper is based on the core idea of the FP-Growth algorithm
proposed by Han et al. in [2].

3 Construction of FP-Tree
A FP-Tree[2] is constructed by scanning database twice. First, a scan of database
derives a list of frequent items, Second, the FP-Tree is constructed.

Definition 1 (FP-tree). A frequent-pattern tree is a tree structure


1. It consists of one root labeled as “null”, a set of item-prefix subtrees as the children
of the root, and a frequent-item-header table.
2. Each node in the item-prefix subtree consists of three fields: item-name, count, and
node-link, where item-name registers which item this node represents, count registers
the number of transactions represented by the portion of the path reaching this node,
and node-link links to the next node in the FP-tree carrying the same item-name, or
null if there is none.
3. Each entry in the frequent-item-header table consists of two fields, (1) item-name
and (2) head of node-link (a pointer pointing to the first node in the FP-tree carrying
the item-name.

Algorithm1 (Construction of FP-tree)


Input: A transaction database DB and a minimum support threshold ξ.
Output: FP-tree.
Method: The FP-tree is constructed as follows.
1. Scan the transaction database DB once. Collect F, the set of frequent items, and the
support of each frequent item. Sort F in support-descending order as L, the list of
frequent items.
2. Create the root of an FP-tree, T, and label it as “null”. For each transaction in DB
do the following.
Select the frequent items in transaction and sort them according to the order of L. Let
the sorted frequent-item list in transaction be [p | P], where p is the first element and P
is the remaining list. Call insert-tree ([p | P], T ).
The function insert tree ([p | P], T ) is performed as follows.
If T has a child N such that N.item-name = p.item-name, then increment N’s count
by 1; else create a new node N, with its count initialized to 1, its parent link linked to
T , and its node-link linked to the nodes with the same item-name via the node-link
structure. If P is nonempty, call insert-tree(P, N) recursively.
To understand effortlessly, let’s make an example.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy