Decision Tree Notes (1)
Decision Tree Notes (1)
• Choose best feature at each step using impurity measures (Entropy, Gini, etc.)
• Recursively split until stopping condition is met
1 n
Formula: MSE = n ∑i=1 (yi − yˉ)2
Example:
1
Regression Tree Building Process
1. Classification Error:
E = 1 − max(pi )
2. Gini Index:
k
G = ∑i=1 pi (1 − pi )
3. Entropy:
k
D = − ∑i=1 pi log2 (pi )
Gain = D − DA
Example:
2
3. Sort values
nL nR
Post-Impurity = n DL + n DR ΔImpurity = D − Post-Impurity
Disadvantage Explanation
Hyperparameter Effect
3
Hyperparameter Effect
🔍 Feature Importance
• Importance = Total impurity reduction caused by the feature across all splits
Example:
Income 0.40
Age 0.04
City 0.01
Steps on calculator:
Memory Tip: Maximum entropy = log2 (k) where k = number of equally likely classes.
4
⭐ Cross-Validation & K-Fold Cross-Validation
Cross-Validation
• Technique to evaluate model performance more reliably than a single train-test split
• Helps avoid overfitting by testing on multiple data subsets
♻ K-Fold Cross-Validation
Example (K=5):
Steps:
param_grid = {
'max_depth': [3, 5, 10],
'min_samples_split': [2, 5, 10]
}
1. Apply GridSearchCV:
1. Best params:
grid_search.best_params_
5
Memory Tip: "Grid = Try All, CV = Test All"