Apriori Algorithm
Apriori Algorithm
Algorithm in a nutshell
1. Set a minimum support and confidence.
2. Take all the subset present in the transactions which have higher support than
minimum support.
3. Take all the rules of these subsets which have higher confidence than minimum
confidence.
4. Sort the rules by decreasing lift.
Consider the transaction dataset of a store where each transaction contains the list of
items purchased by the customers. Our goal is to find frequent set of items that are
purchased by the customers and generate the association rules for them.
We are assuming that minimum support count is 2 and minimum confidence is 50%.
Step 1: Create a table which has support count of all the items present in the
transaction database.
We will compare each item’s support count with the minimum support count we have
set. If the support count is less than minimum support count then we will remove
those items.
Step 2: Find all the superset with 2 items of all the items present in the last step.
Check all the subset of an itemset which are frequent or not and remove the infrequent
ones. (For example subset of { I2, I4 } are { I2 } and { I4 }but since I4 is not found
as frequent in previous step so we will not consider it ).
Since I4 was discarded in the previous one, so we are not taking any superset having
I4.
Now, remove all those itemset which has support count less than minimum support
count. So, the final dataset will be
Step 3: Find superset with 3 items in each set present in last transaction dataset.
Check all the subset of an itemset which are frequent or not and remove the infrequent
ones.
In this case if we select { I1, I2, I3 } we must have all the subset that is,
{ I1, I2 }, { I2, I3 }, { I1, I3 }. But we don’t have { I1, I3 } in our dataset. Same is
true for { I1, I3, I5 } and { I2, I3, I5 }.
Step 4: As we have discovered all the frequent itemset. We will generate strong
association rule. For that we have to calculate the confidence of each rule.
Since, All these association rules has confidence ≥50% then all can be considered as
strong association rules.
Step 5: We will calculate lift for all the strong association rules.
It means that there is a 25% chance that the customers who buy I3 are likely to buy
I2.