Skip to content

bowbowbow/Apriori

Repository files navigation

Apriori Assignment

C++ Implementation of Apriori Algorithm

To run the implementation

  1. Keep project files in one folder.

  2. compile using command make.

To compile without using the makefile, type the following command.

g++ -std=c++11 apriori.cpp -o apriori.exe

(Note that -std=c++11 option is must be given in g++.)

  1. Run using following command.

apriori.exe [min support (%)] [input file name] [output file name]

ex. apriori.exe 5 input.txt output.txt

Summary of the algorithm

Apriori algorithm is to find frequent itemsets using an iterative level-wise approach based on candidate generation.

Input: A database of transactions, the minimum support count threshold

Output: frequent itemsets in the database

The algorithm solves the problem with a two-step approach.

Step1. Frequent Itemset Generation

Generate all itemsets whose support with a value of minimum support or greater.

But the proccess requires a lot of computation (O(3^(k)-2^(k+1)+1), k=the number of item).

The key idea of apriori algorithmto reduce this operation is that any subset of a frequent itemset must be frequent.

Therefore, if there is any itemset which is infrequent, its superset should not be generated/tested.

As a result, follow the steps below to proceed with this process.

  1. Generate length (k+1)-candidate itemsets from length k frequent itemsets. (This process is called joining.)

  2. And delete newly generated (k+1)-items if the item set that removed one element is not in (k)-candidate. (This process is called pluning.)

  3. Calculate the support of the candidates and remove candidates with support less than min support.

  4. Proceed 1 again until there are no more candidates left.

Step2. Associate Rule Generation

Generate high confidence rules from each frequent itemset, where each rule is a binary partition of a frequent itemset.

Any other specification of the implementation and testing

Note that I use c++11, not c++. therefore -std=c++11 option is must be given in g++.

Since the test code uploaded to Hanyang Portal is for Windows, I added checker class for using in Unix.

About input file

Input file format

[item_id]\t[item_id]\n

[item_id]\t[item_id]\t[item_id]\t[item_id]\t[item_id]\n

[item_id]\t[item_id]\t[item_id]\t[item_id]\n

  • Row is transaction
  • [item_id] is a numerical value
  • There is no duplication of items in each transaction

About output file

output file format

[item_set]\t[associative_item_set]\t[support(%)]\t[confidence(%)]\n

[item_set]\t[associative_item_set]\t[support(%)]\t[confidence(%)]\n

  • Support: probability that a transaction contains [item_set] [associative_item_set]
  • Confidence: conditional probability that a transaction having [item_set] also contains [associative_item_set]
  • The value of support and confidence should be rounded to two decimal places.

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy