0% found this document useful (0 votes)

49 views

Unit 4 Part 2

This document discusses principal component analysis (PCA), a technique for dimensionality reduction. PCA transforms a set of variables into a smaller set of variables called principal components, which contain most of the information in the original variables. The principal components are linear combinations of the original variables and are orthogonal to each other. PCA reduces the dimensionality of data while preserving as much information as possible.

Uploaded by

Prince Rathore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

Unit 4 Part 2

Uploaded by

Prince Rathore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Principal Component Analysis (PCA)

Principal component analysis, or PCA, is a dimensionality reduction method (feature

extraction method) that is often used to reduce the dimensionality of large data sets, by
transforming a large set of variables into a smaller one that still contains most of the
information in the large set.
Reducing the number of variables of a data set naturally comes at the expense of accuracy, but
the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because smaller
data sets are easier to explore and visualize, which makes analyzing the data points much easier
and faster for machine learning algorithms without extraneous variables to process.
In simple PCA — reduce the number of variables of a data set, while preserving as much
information as possible.
PCA will only analyze continuous variables, so categorical variables are simply ignored for
data input. If your data table includes categorical variables, they will not be included in the
process of generating principal components. However, they can be used for customizing the
graphical output (PC Scores plot) of the analysis. This can be helpful for visually identifying
specific groups of interest on the PC Scores plot.
Real Life Example- Reducing the size of images. PCA can be used to reduce the size of an
image without significantly impacting the quality. Beyond just reducing the size, this is useful
for image classification algorithms. Visualizing multidimensional data.
PCA transforms the variables into a new set of variables called as principal components.
These principal components are linear combination of original variables and are orthogonal.
The first principal component accounts for most of the possible variation of original data.
The second principal component does its best to capture the variance in the data.
There can be only two principal components for a two-dimensional data set.

PCA Algorithm- Step-by-Step Explanation of PCA

Step-01: Get data.

Step-02: Compute the mean vector (µ). (Sum of all the Values of Variable/ Total No. of
Variables)

Step-03: Standardize the Dataset- Subtract mean from the given data.
Aim is to standardize the range of the continuous initial variables so that each one of them
contributes equally to the analysis. The reason being if there are large differences between the
ranges of initial variables, those variables with larger ranges will dominate over those with
small ranges
Step-04: Calculate the covariance matrix.
Aim is to understand how the variables of the input data set are varying from the mean with
respect to each other, or in other words, to see if there is any relationship between them.
Because sometimes, variables are highly correlated in such a way that they contain redundant
information. So, in order to identify these correlations, we compute the Covariance matrix.
The covariance matrix is not more than a table that summarizes the correlations between all
the possible pairs of variables

Step-05: Identify the Principal Components -Calculate the eigen vectors and eigen values
of the covariance matrix.
Principal components are new variables that are constructed as linear combinations or mixtures
of the initial variables. These combinations are done in such a way that the new variables (i.e.,
principal components) are uncorrelated and most of the information within the initial variables
is squeezed or compressed into the first components. So, the idea is 10-dimensional data gives
you 10 principal components, but PCA tries to put maximum possible information in the first
component, then maximum remaining information in the second and so on.
Organizing information in principal components this way, will allow you to reduce
dimensionality without losing much information, and this by discarding the components with
low information and considering the remaining components as your new variables.

Step-06: Form Feature Vector by choosing components

It choose whether to keep all the components or discard those of lesser significance (of low
eigenvalues), and form with the remaining ones a matrix of vectors that we call Feature vector.
So, the Feature vector is simply a matrix that has as columns the eigenvectors of the
components that we decide to keep.
λ is an eigen value for a matrix, M if it is a solution of the characteristic equation
|M – λI| = 0.
Then after finding λ, use the following equation to find the eigen vector-
MX = λX
where-
 M = Covariance Matrix
 X = Eigen vector
 λ = Eigen value

Step-07: Deriving the new data set/ Recast the Data along the Principal Components Axes
Aim is to use the feature vector formed using the eigenvectors of the covariance matrix, to
reorient the data from the original axes to the ones represented by the principal components
(hence the name Principal Components Analysis). This can be done by multiplying the
transpose of the original data set by the transpose of the feature vector.
Then use PCA Algorithm to transform the patterns onto the eigen vector by using:
Feature vector gets transformed to
= Transpose of Eigen vector x (Feature Vector – Mean Vector)

Dimension Reduction using Best Subset Regression

I. Introduction to Dimension Reduction:
A. Definition: Dimension reduction is a crucial technique in statistics and machine learning
that aims to reduce the number of predictor variables in a model while preserving essential
information.
B. Importance: Simplifies models, reduces overfitting, enhances interpretability, and often
leads to better generalization on unseen data.

II. Best Subset Regression:

A. Definition: Best Subset Regression involves evaluating all possible combinations of
predictor variables and selecting the subset that yields the best model performance based on a
specific criterion.
B. Procedure:
1. Generate all possible subsets of predictors.
2. Fit regression models for each subset.
3. Evaluate models using criteria like R2 or mean squared error.
4. Select the subset with the best performance.
5. Validate the model on separate data.
III. Steps in Best Subset Regression:
A. Subset Generation:
1. If there are p predictors, there are 2p possible subsets.
2. Includes subsets with 0 predictors (null set) to subsets with all predictors.
For example, suppose we have k=3 candidate predictors—x1, x2, and x3—for our final
regression model. Then, there are 23=8 possible regression models we can consider:
 the one (1) model with no predictors
 the three (3) models with only one predictor each — the model with x1 alone; the model
with x2 alone; and the model with x3 alone
 the three (3) models with two predictors each — the model with x1 and x2; the model
with x1 and x3; and the model with x2 and x3
 and the one (1) model with all three predictors — that is, the model with x1, x2 and x3
That's 1 + 3 + 3 + 1 = 8 possible models to consider. It can be shown that when there are four
candidate predictors—x1, x2, x3 and x4—there are 16 possible regression models to consider. In
general, if there are k possible candidate predictors, then there are 2k possible regression models
containing the predictors.

B. Model Fitting:
1. Fit regression models for each subset, such as simple linear regression or multiple regression.
2. Use appropriate techniques to fit the models (e.g., least squares method).

C. Model Evaluation:
1. Use criteria like R2, adjusted R2, or cross-validation error.
2. Compare models to select the best subset.

D. Validation:
1. Validate the selected subset on new, unseen data to ensure generalizability.
2. Avoid overfitting by using techniques like cross-validation.

IV. Challenges and Considerations:

A. Computational Complexity:
1. Best Subset Regression becomes computationally expensive with a large number of
predictors.
2. Consider computational limitations and explore alternative methods for large datasets. B.
Overfitting:
1. Be cautious of overfitting, especially if the same data is used for variable selection and model
evaluation.
2. Mitigate overfitting using techniques like cross-validation.
C. Interpretability:
1. The selected subsets might be complex, impacting the interpretability of the model.
2. Balance between model complexity and interpretability based on the specific application.
V. Conclusion:
A. Recap the importance of dimension reduction and Best Subset Regression.
B. Emphasize the need for careful evaluation and validation to ensure the chosen subset
performs well on new data.
C. Encourage exploration of other dimension reduction techniques based on the specific dataset
and problem.
Understanding the context of your data and the problem you're solving is crucial in selecting
the appropriate dimension reduction technique.

Dimension Reduction using Bivariate Association Probability

Definition: Bivariate association probability measures the likelihood of two variables being
associated in a dataset.
Objective: Utilize association probabilities to identify relationships between variables for
dimension reduction.

Steps for Dimension Reduction using Bivariate Association Probability:

Step 1: Compute Association Probabilities
 Calculate the probability of association between every pair of variables in the dataset.
 Methods to calculate these probabilities include correlation coefficients, mutual
information, or chi-square tests, depending on variable types. Some are:
 Pearson Correlation Coefficient: Measures linear correlation between
variables, ranging from -1 to 1. Positive values indicate a positive correlation,
negative values indicate a negative correlation, and values close to 1 or -1
indicate a strong correlation.
 Mutual Information: Measures the amount of information shared between
variables. Higher values imply a stronger relationship.
 Chi-Square Test: Used for categorical variables to determine if there is a
significant association between them.
 Experimentation and Observation: In some cases, especially in experimental
sciences, association probabilities might be derived from real-world
experiments and observations.

Step 2: Identify Strong Associations

 Focus on pairs of variables with high association probabilities.
 Define a threshold above which associations are considered significant.

Step 3: Select Relevant Variables

 Choose variables involved in strong associations, as they provide meaningful
information.
 Discard variables with weak or no associations to simplify the dataset.

Numerical Example:
 Dataset: Consider variables A, B, C, and D.
 Association Probabilities:
1. P(A,B)=0.85 (strong association)
2. P(A,C)=0.40 (weak association)
3. P(A,D)=0.92 (strong association)
4. P(B,C)=0.65 (moderate association)
5. P(B,D)=0.90 (strong association)
6. P(C,D)=0.20 (very weak association)
Step 1: Compute Association Probabilities
 Utilize appropriate statistical measures to calculate probabilities, such as correlation
coefficients or mutual information scores.
Step 2: Identify Strong Associations
 Set a threshold (e.g., 0.70) for significant associations.
 Strong associations: A−B, A−D, B−D.
Step 3: Select Relevant Variables
 Variables A, B, and D are selected due to their strong associations.
 Variables C is discarded due to weak associations.

Challenges and Considerations:

A. Threshold Selection: The choice of association probability threshold impacts the results;
balance between false positives and false negatives is crucial.
B. Data Type Consideration: Different association measures are suitable for different data
types (continuous, categorical, mixed).
It Emphasize the importance of selecting meaningful associations for dimension reduction.
It also Encourage practical application and experimentation with various association measures
for different types of datasets.
Understanding bivariate association probabilities equips you with a valuable tool for
identifying relevant variables in a dataset. With practice and careful consideration of
thresholds, you can effectively reduce dimensions while retaining essential information.

Evaluation Methods for Prediction and Classification Problems

Prediction and Classification Problems:
 Prediction Problems: In prediction problems, the goal is to predict a continuous target
variable. For example, predicting house prices based on features like area, location, etc.
 Classification Problems: In classification problems, the goal is to categorize data
points into classes or labels. For example, classifying emails as spam or not spam.

Common Evaluation Metrics for Classification Problems:

 Accuracy: Ratio of correctly predicted instances to the total instances.
 Precision: (or Positive Predicted Value) Ratio of correctly predicted positive
observations to the total predicted positives. (Focus on relevant items)
 Recall (Sensitivity or True Positive Rate): Ratio of correctly predicted positive
observations to the all observations in actual class. (Captures actual positives)
 F1-Score: Harmonic mean of precision and recall. It ranges from 0 to 1, where 1 is the
best value.
 ROC Curve (Receiver Operating Characteristic Curve): A graphical representation
of the true positive rate vs. false positive rate.
 AUC (Area Under the Curve): The area under the ROC curve. AUC close to 1
indicates a good model.

Common Evaluation Metrics for Prediction Problems:

 Mean Absolute Error (MAE): Average of absolute errors between predicted and
actual values.
 Mean Squared Error (MSE): Average of squared errors between predicted and actual
values. It penalizes larger errors more heavily than MAE.
 Root Mean Squared Error (RMSE): Square root of MSE, gives the error in the same
units as the target variable.
 R-Squared (R²) Score: Proportion of the variance in the dependent variable that is
predictable from the independent variables. Ranges from 0 to 1, where 1 indicates a
perfect fit.

Cross-Validation Techniques:
 K-Fold Cross-Validation: Divides data into K folds and uses each fold as a testing set
while using the remaining K-1 folds as training data, repeating the process K times.
 Stratified K-Fold Cross-Validation: Ensures that each fold is representative of all
strata of the data i.e. each fold maintains the same class distribution as the original data.
 Leave-One-Out Cross-Validation (LOOCV): Uses a single observation as the
validation set and the rest of the data as the training set.

Overfitting and Underfitting:

 Overfitting: Model performs well on the training data but poorly on unseen data.
Occurs when the model is too complex.
 Underfitting: Model is too simple to capture the underlying trend of the data, performs
poorly on both training and unseen data.

Model Selection:
 Grid Search: Technique to find the best performing combination of hyperparameters
for a model by systematically testing a range of hyperparameters.
 Random Search: Randomly selects combinations of hyperparameters for evaluation, ,
useful when computation resources are limited.
 Model Ensembling: Combining predictions from multiple models to improve overall
performance.

Challenges and Considerations:

 Imbalanced Data: Techniques to handle datasets where one class is significantly more
frequent than the others.
 Bias and Fairness: There is an importance of fairness in machine learning models,
especially in sensitive domains.

Prediction Measures
Prediction measures refer to the metrics and techniques used to assess the accuracy and
reliability of predictions made by a model. These measures are essential for evaluating the
performance of predictive models and understanding how well they generalize to new, unseen
data. Here are some common prediction measures used in machine learning:

Mean Absolute Error (MAE):

Mean Absolute Error (MAE) is a metric used to measure the average absolute differences
between predicted values and actual values. It provides a straightforward and easy-to-
understand way to quantify the accuracy of a predictive model.
Formula:

Where:

 n is the total number of observations.

 represents the actual values.
 represents the predicted values.

Interpretation:
A smaller MAE indicates a better fit of the model to the data. MAE is measured in the same
units as the data, making it easy to understand in the context of the problem.

Numerical Example:
Let's consider a simple dataset of actual and predicted values for house prices:
Actual Prices: [200,300,400,500,600] (in thousands)
Predicted Prices: [220,280,380,480,590] (in thousands)

Using the MAE formula, we can calculate MAE as follows:

MAE= (1/5) (∣200−220∣+∣300−280∣+∣400−380∣+∣500−480∣+∣600−590∣)
MAE = (1/5) ( 20+ 20+ 20+ 20+10) = 18

In this example, the MAE is calculated as 18. It means, on average, the predictions differ from
the actual values by $18,000. This indicates that the predictive model's average error in
estimating house prices is $18,000.
MAE is a useful metric for evaluating the accuracy of regression models. It gives a clear
understanding of how well the predictions match the actual data points and is often used in
various real-world applications where understanding prediction accuracy is crucial.

Mean Squared Error (MSE)

Mean Squared Error (MSE) is a commonly used metric in regression analysis to measure the
average of the squares of errors, i.e., the average squared difference between predicted values
and actual values. It gives more weight to large errors, making it a suitable metric for
applications where larger errors are more significant.
Formula:

Where:

 n is the total number of observations.

 represents the actual values.
 represents the predicted values.

Interpretation:
A smaller MSE indicates a better fit of the model to the data. MSE is measured in the square
of the units of the data, making it sensitive to outliers and large errors.

Using the MSE formula, we can calculate MSE as follows:

MSE= (1/5) ((200−220)2+(300−280)2+(400−380)2+(500−480)2+(600−590)2)
MSE = (1/5) ( 400+ 400+ 400+ 400+100) = 340

In this example, the MSE is calculated as 340. It means, on average, the squared difference
between the predicted house prices and the actual prices is (340,000)2. This metric gives higher
penalties to larger errors, making it suitable for applications where accurately predicting
outliers is crucial.
MSE is a valuable metric for evaluating the accuracy of regression models, especially when
you want to emphasize larger errors in the predictions. However, MSE is sensitive to outliers,
so it's essential to consider the data characteristics and the problem context when choosing
evaluation metrics.

Root Mean Squared Error (RMSE)

Root Mean Squared Error (RMSE) is a widely used metric to evaluate the accuracy of a
predictive model, particularly in regression problems. RMSE represents the square root of the
average of squared differences between predicted values and actual values. It is similar to Mean
Squared Error (MSE) but is in the same unit as the target variable, providing a more
interpretable measure of prediction error.
Formula:

Where:

 n is the total number of observations.

 represents the actual values.
 represents the predicted values.
Interpretation:
A smaller RMSE indicates a better fit of the model to the data. RMSE is particularly useful
when you want to understand the average magnitude of errors in the same unit as the target
variable.
Numerical Example:
Let's consider a simple dataset of actual and predicted values for house prices:
Actual Prices: [200,300,400,500,600] (in thousands)
Predicted Prices: [220,280,380,480,590] (in thousands)

Using the RMSE formula, we can calculate RMSE as follows:

RMSE =
1
√ ((200 − 220)2 + (300 − 280)2 + (400 − 380)2 + (500 − 480)2 + (600 − 590)2 )
5

1
RMSE = √5 (400 + 400 + 400 + 400 + 100)

RMSE= √340)
RMSE ≈ 18.44
In this example, the RMSE is calculated as approximately 18.44. It means, on average, the
difference between the predicted house prices and the actual prices is approximately $18,440.
This metric provides a clear understanding of the average prediction error in the same unit as
the house prices.

RMSE is a valuable metric for evaluating the accuracy of regression models. It's particularly
useful when you want to understand the average magnitude of errors, especially in applications
where the prediction errors need to be interpretable in the same unit as the target variable.

Mean Absolute Percentage Error (MAPE)

Mean Absolute Percentage Error (MAPE) is a metric used to measure the accuracy of a
forecasting model, particularly in time series analysis. MAPE calculates the average percentage
difference between predicted values and actual values. It gives a clear understanding of the
prediction errors in terms of percentages.
Formula:

Where:

 n is the total number of observations.

 represents the actual values.
 represents the predicted values.
Interpretation:
MAPE expresses the prediction error as a percentage of the actual value. A lower MAPE
indicates a more accurate model, as it represents a smaller percentage difference between
predicted and actual values.
Numerical Example:
Let's consider a dataset of actual and predicted sales figures for a product over a period of time:
Actual Sales: [150,200,180,250,300] (in units)
Predicted Sales: [140,220,190,240,280] (in units)
Using the MAPE formula, we can calculate MAPE as follows:
1 150−140 200−220 180−190 250−240 300−280
MAPE = 5 (| |+ | |+| |+| |+| | ) ∗ 100%
150 200 180 250 300
1
MAPE = 5 (|0.067| + |0.1| + |0.056| + |0.04 | + |0.067 | ) ∗ 100%

MAPE ≈ 7.4 %
In this example, the MAPE is calculated as approximately 7.4%. It means, on average, the
predictions differ from the actual sales figures by about 7.4% of the actual values. This metric
provides a clear understanding of the average percentage error in predictions.

MAPE is a useful metric for evaluating the accuracy of forecasting models, especially in
business and economics, where understanding prediction errors as percentages is essential. It
helps analysts and decision-makers assess the reliability of their forecasts in practical
applications.

Mean Squared Percentage Error (MSPE)

MSPE is calculated by squaring the percentage difference between predicted and actual values
and then taking the average over all observations.
Formula:

Where:

 n is the total number of observations.

 represents the actual values.
 represents the predicted values.

Interpretation
A lower MSPE indicates a better fit of the forecasting model to the data, as it represents smaller
squared percentage errors.

Numerical Example:
Let's consider a dataset of actual and predicted sales figures for a product over a period of time:
Actual Sales: [150,200,180,250,300] (in units)
Predicted Sales: [140,220,190,240,280] (in units)
Using the MSPE formula, we can calculate MSPE as follows:

1 150−140 2 200−220 2 180−190 2 250−240

MSPE = (( ∗ 100%) + ( ∗ 100%) + ( ∗ 100%) + ( ∗
5 150 200 180 250
2 300−280 2
100%) + ( ∗ 100%) )
300
1 10 2 −20 2 −10 2 10 2 20
MSPE = ((150 ∗ 100%) + ( 200 ∗ 100%) + ( 180 ∗ 100%) + (250 ∗ 100%) + (300 ∗
5
2
100%) )

1
MSPE= ((6.67%)2 + (−10%)2 + (−5.56%)2 + (4%)2 + (6.67%)2 )
5
1
MSPE ≈ (0.44% + 1% + 0.31% + 0.16% + 0.44%)
5
12.35%
MSPE ≈ 5

MSPE ≈ 0.47 %

In this example, the MSPE is calculated as approximately 0.47%. It means, on average, the
squared percentage difference between the predicted and actual sales figures is 0.47%. A
smaller MSPE indicates a better fit of the forecasting model.

MSPE is a useful metric for evaluating the accuracy of forecasting models, especially when
you want to understand the prediction errors in terms of percentages. It helps data scientists
and analysts assess the reliability of their forecasts in practical applications.

Root Mean Squared Percentage Error (RMSPE)

Root Mean Squared Percentage Error (RMSPE) is a metric used to evaluate the accuracy of
predictions in forecasting models, particularly in time series analysis. RMSPE measures the
percentage difference between predicted and actual values, squares these differences,
calculates the average, and then takes the square root. RMSPE provides a more interpretable
measure of prediction error in percentage terms.
Formula:

Where:

 n is the total number of observations.

 represents the actual values.
 represents the predicted values.
Interpretation
A smaller RMSPE indicates a better fit of the forecasting model to the data, as it represents
smaller squared percentage errors, providing an error measure in percentage terms.

Numerical Example:
Let's consider a dataset of actual and predicted sales figures for a product over a period of
time:
Actual Sales: [150,200,180,250,300] (in units)
Predicted Sales: [140,220,190,240,280] (in units)
Using the RMSPE formula, we can calculate RMSPE as follows:

RMSPE =
1 150−140 2 200−220 2 180−190 2 250−240 2 300−280 2
√ (( ∗ 100%) + ( ∗ 100%) + ( ∗ 100%) + ( ∗ 100%) + ( ∗ 100%) )
5 150 200 180 250 300

RMSPE =
1 10 2 −20 2 −10 2 10 2 20 2
√ (( ∗ 100%) + ( ∗ 100%) + ( 180 ∗ 100%) + (250 ∗ 100%) + (300 ∗ 100%) )
5 150 200

1
RMSPE= √5 ((6.67%)2 + (−10%)2 + (−5.56%)2 + (4%)2 + (6.67%)2 )

1
RMSPE ≈ √5 (0.44% + 1% + 0.31% + 0.16% + 0.44%)

12.35%
RMSPE ≈ √ 5

RMSPE ≈ √0.47%
RMSPE ≈ 0.687%

In this example, the RMSPE is calculated as approximately 0.687%0.687%. It means, on

average, the squared percentage difference between the predicted and actual sales figures is
0.687%0.687%. A smaller RMSPE indicates a better fit of the forecasting model, with
prediction errors expressed in percentage terms.

RMSPE is a valuable metric for evaluating the accuracy of time series forecasting models,
especially when you want to understand prediction errors in terms of percentages. It provides
a more interpretable measure of prediction accuracy, allowing data scientists and analysts to
assess the reliability of their forecasts.

Application to Validation and Test Data Sets

In the realm of machine learning, predictive measures are fundamental tools for evaluating the
performance of models. The application of these measures to both validation and test data sets
is crucial in determining how well a model generalizes to unseen data. Let's explore how this
process unfolds.
1. Validation Data Set:
 Purpose: During the training phase, a subset of the data, known as the validation data
set, is used to fine-tune the model's hyperparameters. It serves as an intermediate step
between the training data and the final evaluation on the test data.
 Evaluation: Predictive measures such as Mean Absolute Error (MAE), Mean Squared
Error (MSE), Root Mean Squared Error (RMSE), or others, are calculated using
predictions made on the validation data. These measures help in comparing different
models and hyperparameters to select the best-performing configuration.
2. Test Data Set:
 Purpose: Once the model is trained and tuned using the training and validation data,
it's evaluated on the test data set. This set mimics real-world scenarios where the model
encounters entirely new, unseen data.
 Evaluation: Similar predictive measures, such as MAE, MSE, RMSE, etc., are
calculated using predictions made on the test data. This evaluation provides a reliable
estimate of the model's performance in practical applications.

Predictive measures are crucial for understanding a model's accuracy and reliability. It's a
fundamental step in the machine learning workflow and aids in making data-driven decisions
in various domains.

Avoiding Overtraining
Overtraining, also known as overfitting, occurs when a machine learning model performs
exceptionally well on the training data but fails to generalize to new, unseen data. This happens
because the model has essentially memorized the training data, including its noise and outliers,
instead of learning the underlying patterns. To ensure the model's effectiveness in real-world
scenarios, it's crucial to avoid overtraining. Here are some effective techniques:
1. Increase Training Data:
Providing more diverse and abundant training data can help the model generalize better.
With a larger dataset, the model is exposed to a wider range of patterns and variations
in the data.
2. Feature Selection:
Careful selection of relevant features can significantly impact the model's performance.
Irrelevant or redundant features can introduce noise and confuse the learning process.
Use techniques like feature importance analysis to identify and select the most
informative features.
3. Cross-Validation:
Instead of relying on a single train-test split, use techniques like k-fold cross-validation.
This divides the data into multiple folds and trains the model on different subsets,
ensuring that the model's performance is evaluated across various parts of the data.
4. Regularization:
Regularization techniques, such as L1 (Lasso) and L2 (Ridge) regularization, add
penalty terms to the model's loss function, discouraging overly complex models. These
penalties prevent the model from assigning too much importance to any particular
feature, thus mitigating overfitting.
5. Early Stopping:
During the training process, monitor the model's performance on a validation dataset.
If the performance starts degrading on the validation set while improving on the training
set, stop the training early. This prevents the model from memorizing the noise in the
training data.
6. Ensemble Methods:
Ensemble methods like Random Forests and Gradient Boosting combine predictions
from multiple models. These methods often lead to more robust and generalized
models, as they mitigate the biases and errors of individual models.
7. Neural Network Techniques:
In neural networks, techniques like dropout layers and batch normalization help in
preventing overfitting. Dropout layers randomly deactivate neurons during training,
while batch normalization normalizes input batches to stabilize learning.
8. Hyperparameter Tuning:
Carefully tune hyperparameters, such as learning rate and regularization strength, using
techniques like grid search or random search. Optimal hyperparameters ensure the
model's balance between complexity and generalization.

Avoiding overtraining is critical to ensuring that machine learning models generalize well to
new data. By employing a combination of techniques like increasing training data, feature
selection, cross-validation, regularization, early stopping, ensemble methods, and careful
hyperparameter tuning, data scientists can build models that are both accurate and robust in
real-world applications.

Statistics For Computing II COM 216 PDF
88% (8)
Statistics For Computing II COM 216 PDF
27 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Decision Making: Submitted By-Ankita Mishra
No ratings yet
Decision Making: Submitted By-Ankita Mishra
20 pages
Chapters 1-4 Multiple Choice Practice
50% (2)
Chapters 1-4 Multiple Choice Practice
7 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
1501589578da-mod15-Q1-e-text
No ratings yet
1501589578da-mod15-Q1-e-text
9 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Week 9 Lecture - Revision Test-dual-translated
No ratings yet
Week 9 Lecture - Revision Test-dual-translated
92 pages
16. Principal Component Analysis
No ratings yet
16. Principal Component Analysis
27 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
2. PCA
No ratings yet
2. PCA
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
W4.2 DataPreProcessing-PCA (1)
No ratings yet
W4.2 DataPreProcessing-PCA (1)
22 pages
Unit-3
No ratings yet
Unit-3
28 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
PCA_Notes
No ratings yet
PCA_Notes
3 pages
ACPusingR
No ratings yet
ACPusingR
25 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Basic Theory
No ratings yet
Basic Theory
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Module 5 - BECE309L - AIML - Part2
No ratings yet
Module 5 - BECE309L - AIML - Part2
34 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
PCA_gl
No ratings yet
PCA_gl
8 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
cheat sheet
No ratings yet
cheat sheet
2 pages
Lecture 9 -Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 -Data Prep - Reduction - PCA-M
44 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Data Analytics
No ratings yet
Data Analytics
28 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
STAT502
No ratings yet
STAT502
13 pages
Unit_5(Dimensionality_Reduction)
No ratings yet
Unit_5(Dimensionality_Reduction)
96 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
PR- Unit 4
No ratings yet
PR- Unit 4
15 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
1e4c29be7e1567894a99a800e980173c
No ratings yet
1e4c29be7e1567894a99a800e980173c
1 page
Unit 1 PP
No ratings yet
Unit 1 PP
16 pages
7a579999016afeab0c11e453d26d91e3
No ratings yet
7a579999016afeab0c11e453d26d91e3
1 page
Unit 2 PP
No ratings yet
Unit 2 PP
36 pages
Grow in Forest Review
No ratings yet
Grow in Forest Review
7 pages
Serial Iz Ability
No ratings yet
Serial Iz Ability
1 page
01 - Trends and Structure of National Income
No ratings yet
01 - Trends and Structure of National Income
16 pages
Yoga Bootcamp - 3 Days
No ratings yet
Yoga Bootcamp - 3 Days
3 pages
Unit 4
No ratings yet
Unit 4
33 pages
Iskcon Centre Cleaning Chart
No ratings yet
Iskcon Centre Cleaning Chart
1 page
The Unheard Story of Radharani - 230923 - 73o4i3o
No ratings yet
The Unheard Story of Radharani - 230923 - 73o4i3o
2 pages
Unit 3 - Part 2
No ratings yet
Unit 3 - Part 2
17 pages
Succession Planning
No ratings yet
Succession Planning
3 pages
Client Side Programming (Javascript)
No ratings yet
Client Side Programming (Javascript)
22 pages
Training HRM
No ratings yet
Training HRM
12 pages
Five Horses - Arjuna and The Bhagavad Gita
No ratings yet
Five Horses - Arjuna and The Bhagavad Gita
2 pages
Unit 4
No ratings yet
Unit 4
5 pages
2023 - 02 - 13 10 - 30 Office Lens
No ratings yet
2023 - 02 - 13 10 - 30 Office Lens
9 pages
Group Discussion
No ratings yet
Group Discussion
5 pages
Potential Appraisal
No ratings yet
Potential Appraisal
4 pages
Career Planning HRM
No ratings yet
Career Planning HRM
6 pages
Statistical Analysis And Data Display An Intermediate Course With Examples In R Second Edition 2nd Ed 2015 Richard M Heiberger pdf download
100% (2)
Statistical Analysis And Data Display An Intermediate Course With Examples In R Second Edition 2nd Ed 2015 Richard M Heiberger pdf download
85 pages
Ferreira et al. 2015. Protein and E of castrated goat
No ratings yet
Ferreira et al. 2015. Protein and E of castrated goat
7 pages
Rap Music Use, Perceived Peer Behavior, and Sexual Initiation Among Ethnic Minority Youth
No ratings yet
Rap Music Use, Perceived Peer Behavior, and Sexual Initiation Among Ethnic Minority Youth
6 pages
NGM - Research Assignment (SPSS)
No ratings yet
NGM - Research Assignment (SPSS)
23 pages
Chute Channel Dynamics in Large Sand Bed
No ratings yet
Chute Channel Dynamics in Large Sand Bed
17 pages
Clare Thesis
No ratings yet
Clare Thesis
436 pages
Anderson F. Survival Analysis by Example. Hands On Approach Using R 2016
No ratings yet
Anderson F. Survival Analysis by Example. Hands On Approach Using R 2016
42 pages
Economic Instructor Manual
80% (15)
Economic Instructor Manual
29 pages
18 Simultaneous Equation Models Two Stage Least Squares Estimation
No ratings yet
18 Simultaneous Equation Models Two Stage Least Squares Estimation
6 pages
Taper Functions and Their Application in Forest Inventory: A. Kozak, D. D. J. G
No ratings yet
Taper Functions and Their Application in Forest Inventory: A. Kozak, D. D. J. G
7 pages
Chudir Bhai
No ratings yet
Chudir Bhai
32 pages
2000 (LTPP) FHWA-RD-98-085 Temperature Predictions and Adjustment Factors For Asphalt Pavemen PDF
No ratings yet
2000 (LTPP) FHWA-RD-98-085 Temperature Predictions and Adjustment Factors For Asphalt Pavemen PDF
49 pages
1995 L.G. Underhill Relative Masses of Primary Feathers
No ratings yet
1995 L.G. Underhill Relative Masses of Primary Feathers
9 pages
Load Forecast Guidline
No ratings yet
Load Forecast Guidline
57 pages
Explanation in Causal Inference Methods for Mediation and Interaction Open Access Download
No ratings yet
Explanation in Causal Inference Methods for Mediation and Interaction Open Access Download
15 pages
Stroop Color-Word Interference Test: Normative Data For The Latin American Spanish Speaking Adult Population
No ratings yet
Stroop Color-Word Interference Test: Normative Data For The Latin American Spanish Speaking Adult Population
34 pages
Me Chapter 4 Demand Estimation Covid 19 Student Version
No ratings yet
Me Chapter 4 Demand Estimation Covid 19 Student Version
18 pages
2009 - SHORT - MOSS - LUMPKIN - Research in Social Entrepeneurship - Past Constributions and Future Opportunities
No ratings yet
2009 - SHORT - MOSS - LUMPKIN - Research in Social Entrepeneurship - Past Constributions and Future Opportunities
34 pages
Mengistu Hone Final Tss
No ratings yet
Mengistu Hone Final Tss
106 pages
Two-Variable Regression Analysis
No ratings yet
Two-Variable Regression Analysis
34 pages
Data Science Upgrad
No ratings yet
Data Science Upgrad
13 pages
9STEPSBinomial Logistic Regression EDWINABU
No ratings yet
9STEPSBinomial Logistic Regression EDWINABU
10 pages
Kolowski MicrohabitatUseBobcats 2002
No ratings yet
Kolowski MicrohabitatUseBobcats 2002
12 pages
2020-Street Trees, Construction, and Longevity - Tree Growth and Response Over Four Decades (1979 To 2018)
No ratings yet
2020-Street Trees, Construction, and Longevity - Tree Growth and Response Over Four Decades (1979 To 2018)
19 pages
Uncovering The Relationship Between Whistleblowing and Organizational Identity
No ratings yet
Uncovering The Relationship Between Whistleblowing and Organizational Identity
19 pages
Statistics For The Non Statistician (Greenhalgh 1997)
No ratings yet
Statistics For The Non Statistician (Greenhalgh 1997)
7 pages
Gruber 6e Lecture Slides Ch03
No ratings yet
Gruber 6e Lecture Slides Ch03
36 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 4 Part 2

Uploaded by

Unit 4 Part 2

Uploaded by

Principal Component Analysis (PCA)

Principal component analysis, or PCA, is a dimensionality reduction method (feature

PCA Algorithm- Step-by-Step Explanation of PCA

Step-01: Get data.

Step-06: Form Feature Vector by choosing components

Dimension Reduction using Best Subset Regression

II. Best Subset Regression:

IV. Challenges and Considerations:

Dimension Reduction using Bivariate Association Probability

Steps for Dimension Reduction using Bivariate Association Probability:

Step 2: Identify Strong Associations

Step 3: Select Relevant Variables

Challenges and Considerations:

Evaluation Methods for Prediction and Classification Problems

Common Evaluation Metrics for Classification Problems:

Common Evaluation Metrics for Prediction Problems:

Overfitting and Underfitting:

Challenges and Considerations:

Mean Absolute Error (MAE):

 n is the total number of observations.

Using the MAE formula, we can calculate MAE as follows:

Mean Squared Error (MSE)

 n is the total number of observations.

Using the MSE formula, we can calculate MSE as follows:

Root Mean Squared Error (RMSE)

 n is the total number of observations.

Using the RMSE formula, we can calculate RMSE as follows:

Mean Absolute Percentage Error (MAPE)

 n is the total number of observations.

Mean Squared Percentage Error (MSPE)

 n is the total number of observations.

1 150−140 2 200−220 2 180−190 2 250−240

Root Mean Squared Percentage Error (RMSPE)

 n is the total number of observations.

In this example, the RMSPE is calculated as approximately 0.687%0.687%. It means, on

Application to Validation and Test Data Sets

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.