Mtech Project Seminar1

Overview of this Project
The association rule mining to mine the frequent patterns is a

fundamentally important task in the process of knowledge discovery in large
databases.
This project report the main focus lies in the generation of frequent
patterns which is the most important task in explanation of the fundamentals of
association rule mining.
This is done by analyzing the implementations of the well known

association rule mining algorithms like Apriori, Dynamic Item set Counting
Algorithm, FP-growth algorithm.
This experimental system is developed using Java under Windows XP

Operating System. Run time behaviors of these algorithms are analyzed and
compared using Mushroom dataset.
Outline
• Introduction
• Association Rule Mining to Frequent Patterns
• Implementation
• Conclusions
• Future Enhancements
• Bibliography
Introduction to Frequent Patterns
Frequent Pattern which is the most important task in explanation of fundamental of

association rule mining techniques
The well known association rule based algorithms to mine the frequent patterns :
 Apriori
 Dynamic Item Counting
 FP Growth
Association Rule Mining
 Association Rule mining is one of the fundamental data mining
 Association is a rule, which implies certain association relationships among set

of objects such as occur together or one implies the other.
 Goal of Association rule mining helps in finding interesting association

relationships among large set of data items.
 Each rule is assigned two factors: Support and Confidence

 Generally association rule mining is performed in two steps:
• Find all frequent item sets

The basic foundation of Association Rule algorithm is
fact that any subset of a frequent itemset must also be a frequent item set.
i.e., if {AB} is a frequent item set, both {A} and {B} should be a frequent
item set. Iteratively find frequent item sets with cardinality from 1 to k (k-
item set)
• Use frequent item sets to generate strong rules having minimum

confidence.
FP Array
• FP Array techniques that greatly reduce the needs to traverse FP Trees.
• FP Array techniques obtaining significance improved performance then

FP Tree based Algorithm.
• FP Array is new Algorithms in finding the Maximal and Closed Frequent

Item sets
FP Array Applications
• It generate the frequent patterns from the existing datasets.
• It Provides the minimum support to the given data inputs.
• Time Complexity for the searching the frequent item sets .
• It displays the no of records row and columns wise from the datasets
Rule to Mine Frequent Items
The frequent itemset mining algorithms are classified considering the following
aspects:
• The type of the discovered frequent itemset

• Using candidates
• The representation of the transactions
• The itemsets representation used in the algorithm
• The number of disk access
• The length of the maximal frequent pattern
Implemented algorithms work differ as follows:
APRIORI DIC FP
With Candidate 

generation
Without Candidate

generation
BFS  
DFS 
FP-Tree 
Stages in Knowledge Discovery in Frequent
Databases
 Selection - selecting and segmenting the data that are relevant to given
criteria.
· .  Preprocessing-data cleaning stage where unnecessary information is removed.
 Transformation-the data is made usable and navigable.
 Data Mining-extraction of patterns from the data
 Interpretation and Evaluation-The patterns in the data mining stage are

converted into knowledge to support decision-making
 Data Visualization-to examine the large volumes of data and detect the
patterns visually
Discoveries in Frequent Databases
·
Apriori Algorithm
The Apriori algorithm is the most popular association rule algorithm. Apriori
uses bottom up search.
 Apriori algorithm works as follows:
• The first step, Apriori algorithm generates Candidate 1 – itemsets.

Then, itemsets count and minimum support value are compared to find
the set L1 (frequent itemsets).
• The second step, algorithm use L1 to construct the set C2 of

Candidate 2 – itemsets. The process is finished when there are no more
candidates.
 In each phase, all the transaction in the data set are scanned.
 Finally, all frequent itemsets are returned.
 Disadvantage:
 Multiple database scan.
DIC Algorithm
DIC (Dynamic Itemset Counting) algorithm which uses fewer database

scan, presents a new approach for finding large itemsets.
 Aim of the DIC algorithm is improving the performance and eliminating

repeated database scan.
 DIC algorithm divides the database into partitions ( intervals M ) and use
a dynamic counting strategy. DIC algorithm determines some stop points for
itemset counting. Any appropriate points, during the database scan, stopping
counting, then starts to count with another itemsets.
 Four symbols to indicate the different states of itemsets:

Solid Box , Solid Circle, Dashed Box, Dashed Circle
 The algorithm is described as follows:
Step1: the empty itemset is marked with a solid box and all the 1-
itemsets into dashed circle.
Step2: After reading one interval of M transactions from database,

do the following steps:
• Check each itemset, in dashed circle. If it exceeds the support
threshold, change it from dashed circle to a dashed box.
• Check each super set of dashed circle. If all the subsets of
dashed circle are in solid box or dashed box, then add it into dashed
circle.
• Check each set in dashed circle and dashed box. If it has been
counted over all the transactions, change it into solid circle if it is in
circle or change it into solid box if it is in box.
Step3: End of transactions is reached then, go back to the

beginning and repeat step 2, until no itemset remains in dashed circle or
dashed box.
FP-Growth
 FP-Growth is an algorithm for generating frequent item sets for

association rules. This algorithm compresses a large database into a
compact, frequent pattern– tree (FP tree) structure.
 FP – tree structure stores all necessary information about frequent itemsets

in a database.
 A frequent pattern tree (or FP-tree in short) is defined as
1. The root labeled with “null” and set of items as the children of the root.
2. Each node contains of three fields: item-name (holds the frequent

item), count (number of transactions that share that node), and node-
link (next node in the FP-tree).
3. Frequent-item header table contains two fields, item-name and head

of node link (points to the first node in the FP-tree holding the item).
Use case Diagram for the proposed system
Apriori
Dynamic Itemset
Counting
Data Set
File
User
FP-Growth
Identifying Classes form the above Use cases
 Architectural design
The division of software into subsystems and components, as well as
the process of deciding how these will be connected and how they will
interact, include determining the interfaces.
GUI for
Selecting the
file ,support
and
algorithm
Dynamic FP- Matrix Based

Apriori Itemset Growth Association
Counting
 User interface design
The design of user interface is to display and obtain needed

information in an accessible, efficient manner. The user interface can employ
one or more windows. Each window should serve a clear, specific purpose.
Step1: Selection of the filename
Step 2: Display the contents of the file onto the text area
Step 3: Enter valid support
Step 5: Select the algorithm
Step 6: Display the frequent patterns for apriori
Step 7: If the selected algorithm is DIC, then enter the step length
Step 8: Display the frequent patterns for DIC
Step 8: Display the frequent patterns for FP-Growth
Step 9: Display the frequent patterns for MBA
RESULTS
The FPMiner tool is implemented using Java language and all the
experiments are performed on 1.7GHz PC machine with 256MB memory. The
Operating System is WindowsXP.
Experiment 1:
Execution times for different support for different algorithms can be

tabulated as follows:
Execution Execution
Execution time of
Support time of time of
FP-Growth
AprioriT DIC
50 187ms 226754ms 94ms
60 110ms 184297ms 74ms
70 78ms 161265ms 46ms
80 47ms 106953ms 32ms
90 32ms 74984ms 31ms
Experiment 2:
The number of frequent itemsets generated using different

algorithms:
Support Frequent itemsets generated
50 153
60 51
Apriori 70 31
80 23
90 9
50 153
60 51
MBA 70 31
80 23
90 9
CONCLUSION
 Frequent Pattern mining is used for finding frequent itemsets

among items in a given data set.
 The results show that
• Apriori cannot be run very effective than FP -Tree.

• Apriori on the other hand runs too slow because each transaction
contains density.
• DIC (Dynamic Itemset Counting) is much slower than every other
algorithm for the real -dataset.
• MBA is better than DIC but not very better than the other two in
the case of MUSHROOM dataset.
FUTURE ENHANCEMENT
 There are still many interesting research issues related to

the extensions of frequent pattern mining, such as mining structured
patterns by further development of these approaches, mining
approximate or fault-tolerant patterns in noisy environments, frequent-
pattern-based clustering and classification, and so on.
FP Array Techniques
 FP Array technique that greatly reduce the needs to traverse FP Trees.
 FP Array technique obtaining significance improved performance then

FP Tree based Algorithms.
 FP Array is new Algorithm in finding all Maximal and Closed Frequent

Item sets.
 Fp – tree use compact data structure based on the following properties,
- Frequent pattern generation mining perform one scan of database to

determine the set of frequent items.
- Method needs to store each item in a compact structure, thus more

than two database scan unnecessary.
- Each frequent item located in the FP – tree and each node hold items
and count of the frequent item.
- Each item have to be sorted in their frequency descending order.

Mtech Project Seminar1

Uploaded by

Copyright:

Available Formats

Mtech Project Seminar1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mtech Project Seminar1

Uploaded by

Copyright:

Available Formats

Overview of this Project

The association rule mining to mine the frequent patterns is a

This is done by analyzing the implementations of the well known

This experimental system is developed using Java under Windows XP

Frequent Pattern which is the most important task in explanation of fundamental of

 Dynamic Item Counting

 Association Rule mining is one of the fundamental data mining

 Association is a rule, which implies certain association relationships among set

 Goal of Association rule mining helps in finding interesting association

 Each rule is assigned two factors: Support and Confidence

• Find all frequent item sets

• Use frequent item sets to generate strong rules having minimum

• FP Array techniques obtaining significance improved performance then

• FP Array is new Algorithms in finding the Maximal and Closed Frequent

• It Provides the minimum support to the given data inputs.

• Time Complexity for the searching the frequent item sets .

• The type of the discovered frequent itemset

· .  Preprocessing-data cleaning stage where unnecessary information is removed.

 Transformation-the data is made usable and navigable.

 Data Mining-extraction of patterns from the data

 Interpretation and Evaluation-The patterns in the data mining stage are

 Apriori algorithm works as follows:

• The first step, Apriori algorithm generates Candidate 1 – itemsets.

• The second step, algorithm use L1 to construct the set C2 of

 Finally, all frequent itemsets are returned.

DIC (Dynamic Itemset Counting) algorithm which uses fewer database

 Aim of the DIC algorithm is improving the performance and eliminating

 Four symbols to indicate the different states of itemsets:

Step2: After reading one interval of M transactions from database,

Step3: End of transactions is reached then, go back to the

 FP-Growth is an algorithm for generating frequent item sets for

 FP – tree structure stores all necessary information about frequent itemsets

 A frequent pattern tree (or FP-tree in short) is defined as

2. Each node contains of three fields: item-name (holds the frequent

3. Frequent-item header table contains two fields, item-name and head

Dynamic FP- Matrix Based

The design of user interface is to display and obtain needed

Execution times for different support for different algorithms can be

The number of frequent itemsets generated using different

 Frequent Pattern mining is used for finding frequent itemsets

 The results show that

• Apriori cannot be run very effective than FP -Tree.

 There are still many interesting research issues related to

 FP Array technique that greatly reduce the needs to traverse FP Trees.

 FP Array technique obtaining significance improved performance then

 FP Array is new Algorithm in finding all Maximal and Closed Frequent

- Frequent pattern generation mining perform one scan of database to

- Method needs to store each item in a compact structure, thus more

- Each item have to be sorted in their frequency descending order.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.