An Efficient Algorithm For Mining
An Efficient Algorithm For Mining
An Efficient Algorithm For Mining
Association rules that satisfy both a user The prune step: Ck is a superset of Lk, that
specified minimum confidence and user is its members may or may not be frequent,
specified minimum support threshold are but all of the frequent, but all of the frequent
referred to as Strong Association Rules. k-itemsets are included in Ck. A scan of the
database to determine the count of each
Association Rule Mining is a two step candidate in Ck would result in the
process: determination of Lk. Ck, however can be
1. Find all frequent itemsets: each of huge, and so this could involve heavy
these itemsets will occur at least as computation. To reduce the size of Ck, the
frequently as a pre-determined Apriori property is used as follows. Any (k-
minimum support count. 1)-itemset that is not frequent cannot be a
2. Generate strong association rules subset of a frequent either and so can be
from the frequent itemsets: These removed from Ck. This subset testing can be
rules must satisfy minimum support done quickly by maintaining a hash tree of
and minimum confidence. all frequent itmesets.
In the same way find the frequent Frequent closed itemset: An itemset X is a
itemsets for all other Items. The FP-growth closed itemset if there exist no itemset X’
method transforms the problem of finding such that (1) X’ is a proper superset of X
long frequent patterns to looking for shorter and (2) every transaction containing X also
ones recursively and then concatenating the contains X’. A closed itemset X is frequent
suffix. It uses the least frequent items as if its support passes the given support
suffix, offering good selectivity. The method threshold.
substantially reduces the search costs.
How to find the complete set of
frequent closed itemsets efficiently from
large database, which is called the frequent
closed itemset mining problem
PROBLEM DEFINITION:
An itemset X is contained in
transaction <tid,Y> if X⊆ Y. Given a For the transaction database in table1 with
transaction database TDB, the support of an min_sup = 2, the divide and conquer method
itemset X, denoted as sup(X), is the number for mining frequent closed itemset.
of transactions in TDB which contain X. An
association rule R: X⇒Y is an implication 1) Find frequent items. Scan TDB to
between two itemsets X and Y where X, find the set of frequent items and
derive a global frequent item list,
Y⊂I and X∩Y =∅. The support of the rule,
called f_list, and f_list = {c:4, e:4,
denoted as sup(X⇒Y), is defined as sup
f:4, a:3, d:2}, where the items are
(XUY). The confidence of the rule, denoted
sorted in support descending order
as conf(X⇒Y), is defined as sup any infrequent item, such as b are
(XUY)/sup(X). omitted..
The requirement of mining the complete set 2) Divide search space. All the frequent
of association rules leads to two problems: closed itemsets can be divided into 5
non-overlap subsets based on the
f_list: (1) the ones containing items
d,(2) the ones containing item a but In the same way find the frequent
no d, (3) the ones containing item f closed itemsets for a, f, e, and c.
but no a not d, (4) the ones
containing e but no f, a nor d, and (5) 4) The set of frequent closed itemsets
the one containing only c. once all fund is {acdf :2, a :3, ae :2, cf :4, cef :3, e :
subsets are found, the complete set 4}
of frequent closed itemsets is done.
Optimization 1: Compress transactional
and conditional databases using FP-tree
structures. FP-tree compresses databases for
frequent itemset mining. Conditional
databases can be derived from FP-tree
efficiently.
PERFORMANCE STUDY