0% found this document useful (0 votes)
433 views

Candidate Generation and Pruning

Uploaded by

Megha Shenoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
433 views

Candidate Generation and Pruning

Uploaded by

Megha Shenoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Candidate generation and

pruning

Candidate generation and pruning are techniques used in data science,
particularly in association rule mining and frequent itemset mining.
• Candidate Generation: In this step, potential itemsets that may be
frequent are generated. For example, in market basket analysis (a
common application), if we have a transaction dataset where each
transaction contains items purchased by a customer, candidate
itemsets are combinations of items that might occur together
frequently. Let's say we have transactions:
• Transaction 1: {apple, banana, orange}
• Transaction 2: {apple, banana, mango}
• Transaction 3: {banana, orange, mango}
• Candidate 2-itemsets might include {apple, banana}, {apple, orange},
{banana, orange}, and {banana, mango}.
• Pruning: Pruning involves eliminating candidate itemsets that cannot be
frequent based on a minimum support threshold. Support is the
frequency of occurrence of an itemset in the dataset. If an itemset's
support is below a certain threshold, it is pruned, as it cannot be
frequent.
• For example, if the minimum support threshold is set to 2 (meaning an
itemset must appear in at least 2 transactions to be considered
frequent), then {apple, banana} and {banana, mango} are kept, while
{apple, orange} and {banana, orange} are pruned since they appear only
once each.
Rule generation in apriori algorithm

In the Apriori algorithm, rule generation is the process of deriving
association rules from frequent itemsets discovered during the
candidate generation and pruning phases.
• Here's how rule generation works with an example:
• Let's consider a dataset of transactions in a grocery store:
• Transaction 1: {bread, milk}
• Transaction 2: {bread, butter, cheese}
• Transaction 3: {bread, milk, butter}
• Transaction 4: {bread, butter}
• Transaction 5: {milk, cheese}
• Finding frequent itemsets: First, we apply the Apriori algorithm to find frequent
itemsets that meet a minimum support threshold. Let's assume the minimum support
threshold is set to 2 transactions.
• Frequent 1-itemsets: {bread}, {milk}, {butter}, {cheese}
• Frequent 2-itemsets: {bread, milk}, {bread, butter}, {milk, butter}
• Rule generation: Once we have the frequent itemsets, we generate association rules
from them. An association rule has the form "If {X} then {Y}", where X and Y are sets of
items.
• For each frequent itemset, we generate association rules by considering all possible
subsets of the itemset as the antecedent (X) and the remaining items as the consequent
(Y).
• For example:
• From {bread, milk}, we can generate two rules: {bread} -> {milk} and {milk} -> {bread}.
• From {bread, butter}, we can generate two rules: {bread} -> {butter} and {butter} -> {bread}.
• From {milk, butter}, we can generate two rules: {milk} -> {butter} and {butter} -> {milk}.
• Pruning redundant rules: After generating all possible rules, we can
prune redundant rules based on metrics like confidence or lift.
Confidence measures the proportion of transactions that contain {Y}
among the transactions that contain {X}. Lift measures how much
more often {X} and {Y} occur together than we would expect if they
were statistically independent.
• For example, if we find that the rule {bread} -> {butter} has a low
confidence, indicating that most transactions containing bread do not
contain butter, we might prune this rule.
Brute force method
• In data science, the brute force method refers to a straightforward and exhaustive
approach to solving a problem by considering all possible solutions without employing
any optimization or heuristics.
• Here's how the brute force method typically works:
• Enumerate all possibilities: The algorithm considers all possible combinations or
permutations of the problem space without any specific strategy to reduce the search
space.
• Evaluate each possibility: For each combination or permutation generated, the
algorithm evaluates its validity or optimality according to the problem's criteria or
constraints.
• Select the best solution: After evaluating all possibilities, the algorithm selects the
solution that meets the desired criteria or optimizes the objective function, if
applicable.
The brute force method is often used when:

• The problem space is small enough that considering all possibilities is


feasible.
• There are no efficient algorithms or heuristics available to solve the problem
more optimally.
• The problem demands an exact solution without approximation or
compromise.
• However, brute force methods can be impractical or even infeasible for large
problem spaces due to their exponential time complexity. In such cases,
more sophisticated algorithms employing optimization techniques or
heuristics are preferred.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy