0% found this document useful (0 votes)
5 views6 pages

Decision Tree Notes (1)

Decision Trees are supervised learning algorithms used for classification and regression by splitting data based on feature values. They utilize impurity measures like Entropy and Gini Index to determine the best splits, and can handle non-linear data without feature scaling. Key processes include building trees through recursive splitting, evaluating model performance with cross-validation, and tuning hyperparameters for optimal results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Decision Tree Notes (1)

Decision Trees are supervised learning algorithms used for classification and regression by splitting data based on feature values. They utilize impurity measures like Entropy and Gini Index to determine the best splits, and can handle non-linear data without feature scaling. Key processes include building trees through recursive splitting, evaluating model performance with cross-validation, and tuning hyperparameters for optimal results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Comprehensive Notes on Decision Trees (for Examination)

Introduction to Decision Trees

• A supervised learning algorithm used for both classification and regression.


• Splits data into branches based on feature values until a decision/leaf node is reached.

Interpretation of Decision Trees

• Root Node: Feature giving maximum purity gain


• Internal Nodes: Decision points
• Leaf Nodes: Final predictions

Building Decision Trees

• Choose best feature at each step using impurity measures (Entropy, Gini, etc.)
• Recursively split until stopping condition is met

⚖ Tree Models vs Linear Models

• Trees: Handle non-linearity, no need for feature scaling


• Linear Models: Work best with linearly separable data

Memory Tip: "Trees branch smartly, lines draw plainly"

Decision Trees for Regression

• Predicts mean value of target in each region


• Uses Mean Squared Error (MSE) or Mean Absolute Error (MAE) to measure impurity

1 n
Formula: MSE = n ∑i=1 (yi − yˉ)2

Example:

• Leaf 1: [200, 210, 190] → Predicted y = 200


• Leaf 2: [400, 390, 410] → Predicted y = 400

1
Regression Tree Building Process

1. Calculate MSE of the target variable.


2. Split data based on attributes, compute MSE for each resulting node.
3. Subtract resulting MSE from original MSE → MSE Reduction.
4. Choose attribute with highest MSE reduction.
5. Repeat recursively until MSE is low and node is homogeneous.
6. Final prediction at leaf = average of target values.

⚖ Impurity Measures for Classification

1. Classification Error:

E = 1 − max(pi )

2. Gini Index:

k
G = ∑i=1 pi (1 − pi )

3. Entropy:

k
D = − ∑i=1 pi log2 (pi )

Memory Tip: "Entropy = Uncertainty, Gini = Diversity, Error = Simplicity"

Information Gain (Entropy-Based)

Gain = D − DA

• D: Entropy before split


• D_A: Weighted avg. entropy after split

Example:

• Parent: 2 of class A, 2 of class B → D = 1.0


• Split: Left = [A, A], Right = [B, B] → D_A = 0
• Gain = 1 - 0 = 1.0 (Perfect Split)

Splitting Based on Feature Type

1. Nominal Categorical Feature (k categories): 2k−1 − 1 splits

2. Ordinal or Continuous Feature (n values):

2
3. Sort values

4. Try n − 1 split points between values

Goal: Maximize homogeneity (minimize impurity)

📊 Weighted Post-Split Impurity:

nL nR
Post-Impurity = n DL + n DR ΔImpurity = D − Post-Impurity

Choose split with maximum gain (i.e., largest ∆ Impurity).

Disadvantages of Decision Trees

Disadvantage Explanation

Overfitting Trees grow too deep, memorize data

Instability Small data changes → large tree change

Bias Favors features with many levels

Poor linear fit Can’t model smooth linear trends

Memory Tip: "D.O.S.E.: Deep trees, Outliers, Sensitive, Easy to overfit"

✂ Tree Truncation vs Tree Pruning

Method Description Risk

Truncation Pre-pruning, stop early Underfitting

Pruning Post-pruning, cut weak branches Better generalization

Hyperparameter Tuning for Trees

Hyperparameter Effect

max_depth Controls depth, prevents overfitting

min_samples_split Minimum samples to split node

min_samples_leaf Minimum samples per leaf

max_features Max features to consider per split

3
Hyperparameter Effect

criterion Impurity function: Gini, Entropy, MSE

ccp_alpha Cost complexity pruning

Tip: Use Grid Search or Randomized Search + Cross-Validation

🔍 Feature Importance

• Importance = Total impurity reduction caused by the feature across all splits

Example:

Feature Total Gain

Income 0.40

Age 0.04

City 0.01

Memory Tip: "More impurity it kills, more important it feels."

Log Base 2 Calculation (log₂)


log10 (x) log(x) ln(x)
Formula: log2 (x) = log10 (2) ≈ 0.3010 Or using natural log: log2 (x) = 0.6931

Steps on calculator:

1. Use log(x) or ln(x)


2. Divide by log(2) or ln(2)

🔍 Entropy with Equal Classes (3 Classes Example)

• Class Distribution: [A, B, C] → each with 1/3

Entropy = −3 × ( 13 log2 ( 13 )) = log2 (3) ≈ 1.585

Memory Tip: Maximum entropy = log2 (k) where k = number of equally likely classes.

4
⭐ Cross-Validation & K-Fold Cross-Validation

Cross-Validation

• Technique to evaluate model performance more reliably than a single train-test split
• Helps avoid overfitting by testing on multiple data subsets

♻ K-Fold Cross-Validation

1. Split dataset into K equal parts (folds)


2. For each fold:
3. Use it as test set, others as training set
4. Train and evaluate
5. Compute average score across K runs

Example (K=5):

• Run model 5 times, each time a different fold is test set

Memory Tip: "K parts, K turns as test. Judge by average."

GridSearchCV (Grid Search with Cross-Validation)

• Used to find the best hyperparameters for a model


• Tries all combinations of given parameters using cross-validation

Steps:

1. Define parameter grid:

param_grid = {
'max_depth': [3, 5, 10],
'min_samples_split': [2, 5, 10]
}

1. Apply GridSearchCV:

from sklearn.model_selection import GridSearchCV


model = DecisionTreeClassifier()
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X, y)

1. Best params:

grid_search.best_params_

5
Memory Tip: "Grid = Try All, CV = Test All"

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy