Machine Learning required topics
Machine Learning required topics
1|Page
Module 2: Introduction to Machine Learning ........................................ 13
What is Machine Learning?............................................................................................................... 13
What is not Machine Learning? .................................................................................................... 13
Descriptive vs Prescriptive vs Predictive Analytics ........................................................................ 13
Supervised vs unsupervised Learning ........................................................................................... 13
Introduction to Data Pre-processing ................................................................................................. 13
What is Data Pre-processing? ....................................................................................................... 14
Importance of data quality in machine learning ....................................................................... 14
Overview of the data pre-processing pipeline .......................................................................... 14
Impact of poor data quality on model performance ................................................................ 14
Steps in Data Pre-processing ......................................................................................................... 14
Data Cleaning ............................................................................................................................ 14
Data Transformation ................................................................................................................. 14
Data Integration ........................................................................................................................ 14
Data Reduction .......................................................................................................................... 14
Data Discretization and Binning ................................................................................................ 14
Handling Missing Data ...................................................................................................................... 14
Causes of missing data (MCAR, MAR, MNAR)............................................................................... 14
Techniques to handle missing data ............................................................................................... 14
Deletion (listwise, pairwise) ...................................................................................................... 14
Imputation (mean, median, mode, forward/backward fill, KNN) ............................................. 14
Using algorithms that handle missing data inherently ............................................................. 14
Handling Outliers .......................................................................................................................... 14
Detection of outliers using statistical methods (Z-score, IQR) .................................................. 14
Treating outliers: Removal, transformation, and binning techniques....................................... 14
Application of domain knowledge for outlier handling ............................................................ 14
Handling Duplicate Data................................................................................................................ 14
Identifying and removing duplicates ......................................................................................... 14
Dealing with data inconsistencies ............................................................................................. 14
Feature Engineering .......................................................................................................................... 14
Importance of feature engineering in improving model performance ......................................... 14
Types of features (categorical, continuous, ordinal, etc.) ............................................................. 14
Encoding Categorical Variables ......................................................................................................... 14
Label encoding .......................................................................................................................... 14
One-hot encoding ..................................................................................................................... 14
Ordinal encoding ....................................................................................................................... 14
2|Page
Target encoding and its application .......................................................................................... 14
Feature Scaling and Normalization ............................................................................................... 14
Why scaling is important for ML algorithms ............................................................................. 14
Standardization (Z-score normalization) ................................................................................... 14
Min-Max scaling ........................................................................................................................ 14
Robust Scaler for handling outliers ........................................................................................... 14
Feature Transformation................................................................................................................. 14
Logarithmic, square root, and polynomial transformations ..................................................... 14
Binning continuous variables .................................................................................................... 14
Transformations to correct skewness ....................................................................................... 14
Dimensionality Reduction ................................................................................................................. 14
Curse of dimensionality ................................................................................................................ 15
When to apply dimensionality reduction...................................................................................... 15
Principal Component Analysis (PCA) ............................................................................................. 15
Concept of PCA ......................................................................................................................... 15
How PCA works: Eigenvalues and Eigenvectors ........................................................................ 15
Implementing PCA for dimensionality reduction in Python ..................................................... 15
Other Dimensionality Reduction Techniques ................................................................................ 15
Linear Discriminant Analysis (LDA)............................................................................................ 15
t-SNE (t-distributed Stochastic Neighbor Embedding) .............................................................. 15
UMAP (Uniform Manifold Approximation and Projection) ....................................................... 15
Handling Imbalanced Data ................................................................................................................ 15
Techniques to Handle Imbalanced Data........................................................................................ 15
Resampling methods:................................................................................................................ 15
Undersampling .......................................................................................................................... 15
Oversampling (SMOTE, ADASYN) .............................................................................................. 15
Cost-sensitive learning .............................................................................................................. 15
Feature Selection .............................................................................................................................. 15
Importance of selecting the right features ................................................................................... 15
Reducing overfitting and improving model interpretability.......................................................... 15
Techniques for Feature Selection .................................................................................................. 15
Filter Methods: .......................................................................................................................... 15
Wrapper Methods: .................................................................................................................... 15
Embedded Methods: ................................................................................................................ 15
Feature importance using Tree-based models (Random Forest, Gradient Boosting) ............... 15
Model Evaluation .............................................................................................................................. 15
3|Page
Overview of Model Evaluation ...................................................................................................... 15
Importance of evaluating machine learning models ................................................................ 15
Key challenges in model evaluation .......................................................................................... 15
Common metrics used for classification and regression .......................................................... 15
Train-Test Split and Cross-Validation ............................................................................................. 15
Introduction to train-test split................................................................................................... 15
Purpose of cross-validation (k-fold, stratified k-fold, leave-one-out)........................................ 15
Overfitting vs. underfitting: How they impact model evaluation ............................................. 15
Bias-Variance Tradeoff .................................................................................................................. 15
Definition and explanation of bias and variance....................................................................... 15
Impact of bias-variance tradeoff on model performance ......................................................... 15
Strategies for balancing bias and variance ................................................................................ 15
Metrics for Classification Models .................................................................................................. 15
Confusion Matrix........................................................................................................................... 16
Understanding true positives, false positives, true negatives, and false negatives .................. 16
How confusion matrix helps in model evaluation..................................................................... 16
Classification Metrics .................................................................................................................... 16
Accuracy: When to use and limitations .................................................................................... 16
Precision and Recall: Importance in imbalanced datasets ....................................................... 16
F1 Score: Balancing precision and recall ................................................................................... 16
Specificity and Sensitivity: Understanding the context of their use ........................................ 16
ROC (Receiver Operating Characteristic) Curve and AUC (Area Under the Curve): How to
interpret the ROC curve ............................................................................................................ 16
Precision-Recall Curve: Use cases for PR curve over ROC ........................................................ 16
Evaluating Multiclass Classification Models .................................................................................. 16
One-vs-Rest (OvR) and One-vs-One (OvO) strategies ............................................................... 16
Macro vs. micro averaging for multiclass metrics ..................................................................... 16
Metrics for Regression Models ..................................................................................................... 17
Error Metrics in Regression ........................................................................................................... 17
Mean Absolute Error (MAE): When to use ............................................................................... 17
Mean Squared Error (MSE) and Root Mean Squared Error (RMSE): Impact of large errors on
the model .................................................................................................................................. 17
R-squared (Coefficient of Determination): How well the model explains the variability ......... 17
Adjusted R-squared: When and why to use .............................................................................. 17
Additional Metrics for Regression ................................................................................................. 17
Mean Absolute Percentage Error (MAPE) ................................................................................. 17
4|Page
Explained Variance Score .......................................................................................................... 17
Visualizing Model Performance in Regression .............................................................................. 17
Residual plots and their importance ......................................................................................... 17
Interpreting the goodness of fit through residual distribution ................................................. 17
Cross-Validation and Resampling Techniques ............................................................................... 17
Train-Test Split ............................................................................................................................... 17
Overview of train-test split and its limitations .......................................................................... 17
K-Fold Cross-Validation ................................................................................................................. 17
Concept and working of k-fold cross-validation ........................................................................ 17
Stratified k-fold cross-validation for classification..................................................................... 17
Leave-One-Out Cross-Validation (LOOCV) .................................................................................... 17
Definition and use cases ........................................................................................................... 17
Computational complexity and when to avoid LOOCV ............................................................. 17
Other Cross-Validation Techniques ............................................................................................... 17
Shuffle split cross-validation ..................................................................................................... 17
Time-series cross-validation (when dealing with temporal data) ............................................. 17
Model Complexity and Regularization .......................................................................................... 17
How model complexity affects generalization .......................................................................... 17
L1 (Lasso) and L2 (Ridge) regularization in linear models ......................................................... 17
ElasticNet regularization for combining L1 and L2 .................................................................... 17
Grid Search and Random Search for Hyperparameter Tuning ...................................................... 17
Grid search for exhaustive hyperparameter tuning .................................................................. 17
Random search for efficient hyperparameter tuning ............................................................... 17
Evaluating models with cross-validated hyperparameters ....................................................... 17
Model Selection Criteria ............................................................................................................... 17
Selection based on performance metrics ................................................................................. 17
Tradeoff between bias and variance ......................................................................................... 17
Comparing models using cross-validation scores ..................................................................... 17
How to choose between simpler and complex models ............................................................ 17
MODULE 3: SUPERVISED LEARNING .................................................. 18
Introduction to Supervised Learning ................................................................................................ 18
Linear Regression .............................................................................................................................. 18
Understanding regression vs. classification tasks ..................................................................... 18
Simple Linear Regression vs. Multiple Linear Regression ......................................................... 18
Mathematical Representation ...................................................................................................... 18
Assumptions of Linear Regression ................................................................................................ 18
5|Page
Consequences of Violating Assumptions ...................................................................................... 18
Performance Metrics .................................................................................................................... 18
R-squared: Coefficient of determination .................................................................................. 18
Adjusted R-squared: Adjusting for the number of predictors in the model ............................. 18
Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).......................................... 18
Mean Absolute Error (MAE) ...................................................................................................... 18
Residual Analysis ........................................................................................................................... 18
Plotting residuals to check assumptions (linearity, homoscedasticity, normality) ................... 18
Identifying patterns in residuals ................................................................................................ 18
Decision Trees ................................................................................................................................... 18
Basic Structure of Decision Trees .................................................................................................. 18
Root node, internal nodes, leaf nodes ...................................................................................... 18
Splitting criteria and decision boundaries................................................................................. 18
Visual representation of Decision Trees .................................................................................... 18
Splitting Criteria............................................................................................................................. 18
Impurity Measures for Classification Trees ............................................................................... 18
Splitting Criteria for Regression Trees ....................................................................................... 18
Handling Continuous and Categorical Variables ....................................................................... 18
Types of Decision Tree................................................................................................................... 18
CART .......................................................................................................................................... 18
Stopping Criteria for Tree Construction ........................................................................................ 18
Maximum depth........................................................................................................................ 18
Minimum samples per leaf node .............................................................................................. 18
Minimum samples per split ...................................................................................................... 18
Pruning Techniques ....................................................................................................................... 18
Pre-pruning (early stopping) ..................................................................................................... 18
Post-pruning (cost-complexity pruning) .................................................................................... 19
Balancing tree depth and overfitting ........................................................................................ 19
Advantages and Disadvantages of Decision Trees ........................................................................ 19
Interpretability and transparency ............................................................................................. 19
Overfitting in deep trees ........................................................................................................... 19
Handling non-linear data .......................................................................................................... 19
Logistic Regression ............................................................................................................................ 19
Mathematics behind Logistic Regression ...................................................................................... 19
Sigmoid function and decision boundary ................................................................................. 19
Logit function and odds ratio .................................................................................................... 19
6|Page
Assumptions of Logistic Regression .............................................................................................. 19
Linearity of independent variables and log-odds ..................................................................... 19
Independence of observations ................................................................................................. 19
L1 (Lasso) and L2 (Ridge) regularization ........................................................................................ 19
Avoiding overfitting with regularization ........................................................................................ 19
Support Vector Machine (SVM) ........................................................................................................ 19
Concept of hyperplane and decision boundary ............................................................................ 19
Support vectors and margin .......................................................................................................... 19
Linear SVM for Classification ......................................................................................................... 19
Maximal margin classifier ......................................................................................................... 19
The role of support vectors in determining the hyperplane ..................................................... 19
Non-linear SVM and Kernel Trick .................................................................................................. 19
Why linear boundaries may not always work ........................................................................... 19
Kernel Functions........................................................................................................................ 19
Choosing the right kernel for the problem ............................................................................... 19
Hyperparameter Tuning in SVM .................................................................................................... 19
Tuning the cost parameter (C)................................................................................................... 19
Gamma parameter in RBF kernel .............................................................................................. 19
k-Nearest Neighbors (k-NN) .............................................................................................................. 19
Instance-based learning and lazy learning .................................................................................... 19
Majority voting in classification .................................................................................................... 19
Distance metrics: Euclidean distance, Manhattan distance.......................................................... 19
Choosing the value of k ................................................................................................................. 19
Effect of k on Model Performance ................................................................................................ 19
Bias-variance tradeoff in k-NN .................................................................................................. 19
Impact of large and small k-values ............................................................................................ 19
Distance-based Weighting ............................................................................................................ 19
Importance of distance in voting .............................................................................................. 19
Weighting neighbors by distance .............................................................................................. 19
Curse of Dimensionality ................................................................................................................ 20
Effect of high-dimensional spaces on k-NN performance ......................................................... 20
Ensemble Learning ............................................................................................................................ 20
Overview of Ensemble Learning ................................................................................................... 20
Definition and concept of ensemble methods.......................................................................... 20
Why use ensemble methods? (Improving accuracy, reducing overfitting, and variance) ........ 20
Types of ensemble techniques: Bagging, Boosting, Stacking.................................................... 20
7|Page
Benefits of Ensemble Learning...................................................................................................... 20
Model generalization and robustness ....................................................................................... 20
Overcoming bias-variance tradeoff ........................................................................................... 20
Handling complex decision boundaries .................................................................................... 20
Bagging (Bootstrap Aggregating) .................................................................................................. 20
Concept of bootstrapping and aggregation .............................................................................. 20
How bagging reduces variance ................................................................................................. 20
Weak vs. strong learners in ensemble learning ........................................................................ 20
Random Forests ............................................................................................................................ 20
Introduction to Random Forests as a bagging method ............................................................. 20
How Random Forests work: random sampling of features, building uncorrelated trees ......... 20
Hyperparameters in Random Forest (n_estimators, max_depth, min_samples_split, etc.) .... 20
Feature importance and out-of-bag (OOB) error ...................................................................... 20
Boosting ............................................................................................................................................ 20
Understanding Boosting ................................................................................................................ 20
Sequential learning: correcting errors from previous models .................................................. 20
Concept of weak learners and how boosting converts them into strong learners ................... 20
Differences between boosting and bagging .............................................................................. 20
AdaBoost (Adaptive Boosting) ...................................................................................................... 20
Core idea of re-weighting misclassified instances .................................................................... 20
How weak learners are combined to form a strong learner ..................................................... 20
Gradient Boosting Machines (GBM) ............................................................................................. 20
Concept of gradient descent in boosting .................................................................................. 20
Boosting decision trees sequentially ......................................................................................... 20
Understanding the residual error minimization process .......................................................... 20
XGBoost ......................................................................................................................................... 20
Improvements over traditional Gradient Boosting ................................................................... 20
Regularization in XGBoost (L1/L2 regularization) ...................................................................... 20
Speed and performance optimizations in XGBoost (parallelism, tree-pruning) ....................... 20
LightGBM and CatBoost ................................................................................................................ 20
Introduction to LightGBM (Leaf-wise tree growth, speed optimizations, handling large
datasets) .................................................................................................................................... 20
Introduction to CatBoost (handling categorical features effectively) ....................................... 20
Differences and advantages over XGBoost ............................................................................... 20
Blending and Voting (1 hour) ............................................................................................................ 20
Introduction to Blending ............................................................................................................... 21
8|Page
Concept of simple blending techniques in ensemble learning ................................................. 21
Blending different models based on their weighted contributions .......................................... 21
Voting Classifiers ........................................................................................................................... 21
Hard voting (majority voting) vs. soft voting (weighted probability voting) ............................. 21
Practical implementation using VotingClassifier in Scikit-learn ................................................ 21
Combining multiple classifiers such as Logistic Regression, k-NN, and SVM in a voting
ensemble ................................................................................................................................... 21
Evaluation of Ensemble Models ........................................................................................................ 21
Metrics for Evaluating Ensemble Models...................................................................................... 21
Accuracy, Precision, Recall, F1-score, ROC-AUC, Log-loss ......................................................... 21
Evaluating models using cross-validation.................................................................................. 21
Avoiding overfitting in ensemble models.................................................................................. 21
Comparing Single Learners vs. Ensemble Models......................................................................... 21
Why ensemble methods perform better than individual models ............................................ 21
Limitations and challenges of ensemble methods.................................................................... 21
Hyperparameter Tuning in Ensemble Methods ............................................................................ 21
Importance of Hyperparameter Tuning .................................................................................... 21
Tuning for Random Forest ......................................................................................................... 21
Tuning for Boosting Algorithms ................................................................................................. 21
Grid Search and Random Search............................................................................................... 21
Bayesian optimization for hyperparameter tuning ................................................................... 21
Unsupervised Learning ..................................................................................................................... 21
When to use unsupervised learning techniques........................................................................... 21
Types of Unsupervised Learning ................................................................................................... 21
Clustering: Finding hidden patterns or groupings in data ......................................................... 21
Dimensionality reduction: Reducing the complexity of data .................................................... 21
Association learning: Finding relationships between variables ................................................ 21
Types of Clustering ........................................................................................................................ 21
Hard clustering vs. soft clustering ............................................................................................. 21
Partitional clustering vs. hierarchical clustering........................................................................ 21
Centroid-based, density-based, and distribution-based clustering .......................................... 21
K-Means Clustering ........................................................................................................................... 21
Understanding the K-Means Algorithm ........................................................................................ 21
Concept of centroids and clusters............................................................................................. 21
Objective of K-Means: Minimizing within-cluster variance (inertia) ........................................ 21
Steps of K-Means algorithm: Initialization, assignment, and update steps .............................. 21
9|Page
Convergence of K-Means and how clusters are formed ........................................................... 21
Choosing the Right Number of Clusters (K)................................................................................... 22
Importance of selecting the correct number of clusters .......................................................... 22
Elbow method to identify the optimal number of clusters....................................................... 22
Silhouette score and its use in evaluating cluster quality ......................................................... 22
Advantages and Limitations of K-Means ....................................................................................... 22
Fast and efficient for large datasets .......................................................................................... 22
Assumes spherical cluster shapes ............................................................................................. 22
Sensitivity to initialization and outliers ..................................................................................... 22
10 | P a g e
Module 1: Basics of Statistics and Probability for Machine Learning
What are Statistics?
Population data refers to the complete data set whereas sample data refers to a part of the
population data which is used for analysis. Sampling is done to make analysis easier.
When using sample data for analysis, the formula of variance is slightly different. If there are total n
samples we divide by n-1 instead of n:
= sample variance
11 | P a g e
Sample/ ∑(𝑥 − 𝑥̅ )2
Standard Deviation is a measure that shows √
Population 𝑛−1
how much variation from the mean exists.
Standard Deviation
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛)
∑(𝑥 − 𝑥̅ )2
Sample/ =
Variance is the measure of spread of data 𝑛
Population
along its central values. Variance(sample)
Variance ∑(𝑥 − 𝑥̅ )2
=
𝑛−1
Class interval refers to the range of values Class Interval = Upper
Class Interval(CI)
assigned to a group of data points. Limit - Lower Limit
f is number of times
Number of times any particular value appears
Frequency(f) any value comes in a
in a data set is called frequency of that value.
dataset
Range = (Largest Data
Range is the difference between the largest
Range, (R) Value - Smallest Data
and smallest values of the data set
Value)
12 | P a g e
Descriptive vs. Inferential Statistics
Role of statistics in Machine Learning
Probability Theory
Random variables, Events, Sample Space
Conditional probability, Bayes’ Theorem, and its importance in ML
Probability Distributions
Discrete distributions: Binomial, Poisson
Continuous distributions: Uniform, Normal distribution
Application in ML algorithms like Naive Bayes, Logistic Regression
Correlation and Regression
Pearson’s correlation coefficient
Spearman’s Rank Correlation
Correlation vs. Causation
Application in feature selection
Statistical Inference
Hypothesis Testing
Null and Alternative Hypotheses
Type I & II errors
p-value and its significance
Z-tests, T-tests, and Chi-square tests
Application in A/B testing and model evaluation
Confidence Intervals
Exploratory Data Analysis
Measures of Central Tendency
Mean, Median, Mode
Application in data pre-processing
Measures of Dispersion
Variance, Standard Deviation, Range, Interquartile Range (IQR)
Importance in understanding data spread
Data Visualization
Histograms, Boxplots, Scatter plots
Identifying patterns, trends, and outliers
13 | P a g e
What is Data Pre-processing?
Importance of data quality in machine learning
Overview of the data pre-processing pipeline
Impact of poor data quality on model performance
Steps in Data Pre-processing
Data Cleaning
Data Transformation
Data Integration
Data Reduction
Data Discretization and Binning
Handling Missing Data
Causes of missing data (MCAR, MAR, MNAR)
Techniques to handle missing data
Deletion (listwise, pairwise)
Imputation (mean, median, mode, forward/backward fill, KNN)
Using algorithms that handle missing data inherently
Handling Outliers
Detection of outliers using statistical methods (Z-score, IQR)
Treating outliers: Removal, transformation, and binning techniques
Application of domain knowledge for outlier handling
Handling Duplicate Data
Identifying and removing duplicates
Dealing with data inconsistencies
Feature Engineering
Importance of feature engineering in improving model performance
Types of features (categorical, continuous, ordinal, etc.)
Encoding Categorical Variables
Label encoding
One-hot encoding
Ordinal encoding
Target encoding and its application
Feature Scaling and Normalization
Why scaling is important for ML algorithms
Standardization (Z-score normalization)
Min-Max scaling
Robust Scaler for handling outliers
Feature Transformation
Logarithmic, square root, and polynomial transformations
Binning continuous variables
Transformations to correct skewness
Dimensionality Reduction
14 | P a g e
Curse of dimensionality
When to apply dimensionality reduction
Principal Component Analysis (PCA)
Concept of PCA
How PCA works: Eigenvalues and Eigenvectors
Implementing PCA for dimensionality reduction in Python
Other Dimensionality Reduction Techniques
Linear Discriminant Analysis (LDA)
t-SNE (t-distributed Stochastic Neighbor Embedding)
UMAP (Uniform Manifold Approximation and Projection)
Handling Imbalanced Data
Techniques to Handle Imbalanced Data
Resampling methods:
Undersampling
Oversampling (SMOTE, ADASYN)
Cost-sensitive learning
Feature Selection
Importance of selecting the right features
Reducing overfitting and improving model interpretability
Techniques for Feature Selection
Filter Methods:
Correlation Matrix, Chi-square test, Mutual Information
Wrapper Methods:
Recursive Feature Elimination (RFE)
Embedded Methods:
Lasso and Ridge Regularization
Feature importance using Tree-based models (Random Forest, Gradient Boosting)
Model Evaluation
Overview of Model Evaluation
Importance of evaluating machine learning models
Key challenges in model evaluation
Common metrics used for classification and regression
Train-Test Split and Cross-Validation
Introduction to train-test split
Purpose of cross-validation (k-fold, stratified k-fold, leave-one-out)
Overfitting vs. underfitting: How they impact model evaluation
Bias-Variance Tradeoff
Definition and explanation of bias and variance
Impact of bias-variance tradeoff on model performance
Strategies for balancing bias and variance
Metrics for Classification Models
15 | P a g e
Confusion Matrix
Understanding true positives, false positives, true negatives, and false negatives
How confusion matrix helps in model evaluation
Classification Metrics
Accuracy: When to use and limitations
Precision and Recall: Importance in imbalanced datasets
F1 Score: Balancing precision and recall
Specificity and Sensitivity: Understanding the context of their use
ROC (Receiver Operating Characteristic) Curve and AUC (Area Under the Curve): How to interpret the
ROC curve
Precision-Recall Curve: Use cases for PR curve over ROC
Evaluating Multiclass Classification Models
One-vs-Rest (OvR) and One-vs-One (OvO) strategies
Macro vs. micro averaging for multiclass metrics
16 | P a g e
Metrics for Regression Models
Error Metrics in Regression
Mean Absolute Error (MAE): When to use
Mean Squared Error (MSE) and Root Mean Squared Error (RMSE): Impact of large errors on the model
R-squared (Coefficient of Determination): How well the model explains the variability
Adjusted R-squared: When and why to use
Additional Metrics for Regression
Mean Absolute Percentage Error (MAPE)
Explained Variance Score
Visualizing Model Performance in Regression
Residual plots and their importance
Interpreting the goodness of fit through residual distribution
Cross-Validation and Resampling Techniques
Train-Test Split
Overview of train-test split and its limitations
K-Fold Cross-Validation
Concept and working of k-fold cross-validation
Stratified k-fold cross-validation for classification
Leave-One-Out Cross-Validation (LOOCV)
Definition and use cases
Computational complexity and when to avoid LOOCV
Other Cross-Validation Techniques
Shuffle split cross-validation
Time-series cross-validation (when dealing with temporal data)
Model Complexity and Regularization
How model complexity affects generalization
L1 (Lasso) and L2 (Ridge) regularization in linear models
ElasticNet regularization for combining L1 and L2
Grid Search and Random Search for Hyperparameter Tuning
Grid search for exhaustive hyperparameter tuning
Random search for efficient hyperparameter tuning
Evaluating models with cross-validated hyperparameters
Model Selection Criteria
Selection based on performance metrics
Tradeoff between bias and variance
Comparing models using cross-validation scores
How to choose between simpler and complex models
17 | P a g e
MODULE 3: SUPERVISED LEARNING
Introduction to Supervised Learning
Linear Regression
Understanding regression vs. classification tasks
Simple Linear Regression vs. Multiple Linear Regression
Mathematical Representation
Assumptions of Linear Regression
Consequences of Violating Assumptions
Performance Metrics
R-squared: Coefficient of determination
Adjusted R-squared: Adjusting for the number of predictors in the model
Mean Squared Error (MSE) and Root Mean Squared Error (RMSE)
Mean Absolute Error (MAE)
Residual Analysis
Plotting residuals to check assumptions (linearity, homoscedasticity, normality)
Identifying patterns in residuals
Decision Trees
Basic Structure of Decision Trees
Root node, internal nodes, leaf nodes
Splitting criteria and decision boundaries
Visual representation of Decision Trees
Splitting Criteria
Impurity Measures for Classification Trees
Gini Index
Entropy and Information Gain (ID3 Algorithm)
Comparison of Gini vs. Entropy
Splitting Criteria for Regression Trees
Mean Squared Error (MSE)
Reduction in variance
Handling Continuous and Categorical Variables
Discretization of continuous features
Handling categorical features in Decision Trees
Types of Decision Tree
CART
Stopping Criteria for Tree Construction
Maximum depth
Minimum samples per leaf node
Minimum samples per split
Pruning Techniques
Pre-pruning (early stopping)
18 | P a g e
Post-pruning (cost-complexity pruning)
Balancing tree depth and overfitting
Advantages and Disadvantages of Decision Trees
Interpretability and transparency
Overfitting in deep trees
Handling non-linear data
Logistic Regression
Mathematics behind Logistic Regression
Sigmoid function and decision boundary
Logit function and odds ratio
Assumptions of Logistic Regression
Linearity of independent variables and log-odds
Independence of observations
L1 (Lasso) and L2 (Ridge) regularization
Avoiding overfitting with regularization
Support Vector Machine (SVM)
Concept of hyperplane and decision boundary
Support vectors and margin
Linear SVM for Classification
Maximal margin classifier
The role of support vectors in determining the hyperplane
Non-linear SVM and Kernel Trick
Why linear boundaries may not always work
Kernel Functions
Linear, Polynomial, RBF kernels
Choosing the right kernel for the problem
Hyperparameter Tuning in SVM
Tuning the cost parameter (C)
Gamma parameter in RBF kernel
k-Nearest Neighbors (k-NN)
Instance-based learning and lazy learning
Majority voting in classification
Distance metrics: Euclidean distance, Manhattan distance
Choosing the value of k
Effect of k on Model Performance
Bias-variance tradeoff in k-NN
Impact of large and small k-values
Distance-based Weighting
Importance of distance in voting
Weighting neighbors by distance
19 | P a g e
Curse of Dimensionality
Effect of high-dimensional spaces on k-NN performance
Ensemble Learning
Overview of Ensemble Learning
Definition and concept of ensemble methods
Why use ensemble methods? (Improving accuracy, reducing overfitting, and variance)
Types of ensemble techniques: Bagging, Boosting, Stacking
Benefits of Ensemble Learning
Model generalization and robustness
Overcoming bias-variance tradeoff
Handling complex decision boundaries
Bagging (Bootstrap Aggregating)
Concept of bootstrapping and aggregation
How bagging reduces variance
Weak vs. strong learners in ensemble learning
Random Forests
Introduction to Random Forests as a bagging method
How Random Forests work: random sampling of features, building uncorrelated trees
Hyperparameters in Random Forest (n_estimators, max_depth, min_samples_split, etc.)
Feature importance and out-of-bag (OOB) error
Boosting
Understanding Boosting
Sequential learning: correcting errors from previous models
Concept of weak learners and how boosting converts them into strong learners
Differences between boosting and bagging
AdaBoost (Adaptive Boosting)
Core idea of re-weighting misclassified instances
How weak learners are combined to form a strong learner
Gradient Boosting Machines (GBM)
Concept of gradient descent in boosting
Boosting decision trees sequentially
Understanding the residual error minimization process
XGBoost
Improvements over traditional Gradient Boosting
Regularization in XGBoost (L1/L2 regularization)
Speed and performance optimizations in XGBoost (parallelism, tree-pruning)
LightGBM and CatBoost
Introduction to LightGBM (Leaf-wise tree growth, speed optimizations, handling large datasets)
Introduction to CatBoost (handling categorical features effectively)
Differences and advantages over XGBoost
Blending and Voting (1 hour)
20 | P a g e
Introduction to Blending
Concept of simple blending techniques in ensemble learning
Blending different models based on their weighted contributions
Voting Classifiers
Hard voting (majority voting) vs. soft voting (weighted probability voting)
Practical implementation using VotingClassifier in Scikit-learn
Combining multiple classifiers such as Logistic Regression, k-NN, and SVM in a voting ensemble
Evaluation of Ensemble Models
Metrics for Evaluating Ensemble Models
Accuracy, Precision, Recall, F1-score, ROC-AUC, Log-loss
Evaluating models using cross-validation
Avoiding overfitting in ensemble models
Comparing Single Learners vs. Ensemble Models
Why ensemble methods perform better than individual models
Limitations and challenges of ensemble methods
Hyperparameter Tuning in Ensemble Methods
Importance of Hyperparameter Tuning
Impact of hyperparameters on the performance of ensemble models
Tuning for Random Forest
n_estimators, max_depth, min_samples_split, max_features, bootstrap, etc.
Tuning for Boosting Algorithms
Learning_rate, n_estimators, max_depth, min_child_weight, gamma, subsample in XGBoost/LightGBM
Grid Search and Random Search
Using GridSearchCV and RandomizedSearchCV in Scikit-learn
Bayesian optimization for hyperparameter tuning
Unsupervised Learning
When to use unsupervised learning techniques
Types of Unsupervised Learning
Clustering: Finding hidden patterns or groupings in data
Dimensionality reduction: Reducing the complexity of data
Association learning: Finding relationships between variables
Types of Clustering
Hard clustering vs. soft clustering
Partitional clustering vs. hierarchical clustering
Centroid-based, density-based, and distribution-based clustering
K-Means Clustering
Understanding the K-Means Algorithm
Concept of centroids and clusters
Objective of K-Means: Minimizing within-cluster variance (inertia)
Steps of K-Means algorithm: Initialization, assignment, and update steps
Convergence of K-Means and how clusters are formed
21 | P a g e
Choosing the Right Number of Clusters (K)
Importance of selecting the correct number of clusters
Elbow method to identify the optimal number of clusters
Silhouette score and its use in evaluating cluster quality
Advantages and Limitations of K-Means
Fast and efficient for large datasets
Assumes spherical cluster shapes
Sensitivity to initialization and outliers
22 | P a g e