Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Feature Engineering: Short Study

Seminar Report by
Anant Dashpute
SC20M144

Indian Institute of Space Science and Technology,


Department of Mathematics

1 Introduction

Feature engineering is the process of using domain knowledge to extract features from raw data. The
motivation is to use these extra features to improve the quality of results from a machine learning
process, compared with supplying only the raw data to the machine learning process.

1.1 The Importance of Features

The quality of the features in your data set has major impact on the quality of the insights so in
order to calculate Feature Importance, we refers to techniques that calculate a score for all the input
features for a given model — the scores simply represent the “importance” of each feature. A higher
score means that the specific feature will have a larger effect on the model that is being used to predict
a certain variable

Data Alone Is Not Enough The functions we want to learn in the real world are not drawn uniformly
from the set of all mathematically possible functions! Sometimes problem will be their format,
duplicates in data or some hidden info which is not extracted in raw data. Also, Similar entries in
data set have similar classes, limited dependences, or limited complexity. But this is not always
possible in practical life, that’s why raw data feed is not enough to do very well using machine
learning.
If Raw data is given to Model you may see bad results. Most Important problem to look up is
Over-fitting which is caused by as follows:

– Large Bias which mostly happen in the case of linear predictors problem
– Large Variance which mostly happen in the case of decision trees problem

1.2 How feature engineering works?

Feature engineering is the process of selecting, manipulating, Creating and trans- forming raw data into
features that can be used in supervised learning. in mathematical terms, is the act of converting raw
observations into desired features
2 Seminar Report by Anant Dashpute SC20M144

using statistical or machine learning approaches. Feature engineering is a technique to create new
variables that aren’t in the training set with the goal of simplifying and speeding up data
transformations while also enhancing model accuracy.

– Feature Scaling and transformation :


Feature transformation is the process of modifying your data but keeping the information. These
modifications will make Machine Learning algorithms understanding easier, which will deliver
better results.
– Feature Extraction:
Feature extraction is the process of extracting features from a data set to identify useful
information. Without distorting the original relationships
– Exploratory Data Analysis :
Exploratory data analysis (EDA) is a powerful and simple tool that can be used to improve your
understanding of your data, by exploring its properties. The technique is often applied when the
goal is to create new hypotheses or find patterns in the data.
– Feature Creation:
Creating features involves creating new variables which will be most helpful for our model. This
can be adding or removing some features.
– Feature selection:
it offers a simple yet effective way to overcome this challenge by eliminat- ing redundant and
irrelevant data. Removing the irrelevant data improves learning accuracy, reduces the computation
time, and facilitates an enhanced understanding for the learning model or data

1.3 Why learn Feature engineering?

Before the “deep learning era”, a computer vision engineer ,who faced the prob- lem of classification,
had to manually pick the type of features of the image (e.g., histogram of oriented gradients (HOG))
and then train a classifier. Obviously, an engineer might have chosen features which were not best or
best for problem. So, we do the Feature Selection. Which is also a part of Feature engineering. Unless
when Back propagation was introduced, deep learning was able to learn to choose the best features
automatically. in general, we do not require feature extraction in deep learning. But Feature
Engineering is more than that. Even though deep learning network can extract features, if the
problem is difficult, the network can struggle to find the right representation / features.
Remember that Not all problems need Deep Learning Approach. If raw data is given as input the
following reasons may give problem:

1. Less Data
2. Imbalanced data
3. Human errors (Null Values)
4. Computational power need
5. Complexity of problem
Feature Engineering: Short Study 3

2 Feature Transformation

Feature transformation is the process of modifying your data but keeping the information. These
modifications will make Machine Learning algorithms under- standing easier, which will deliver better
results.
Simple functions Often used to make the data more like some standard dis- tribution, to better
satisfy assumptions of a particular algorithm.

1. log(x)
2. log(x+1)
3. sqrt(x)
4. sqrt(x+1).
5. etc

2.1 Logarithm transformation

1. It helps to handle skewed data and after transformation, the distribution becomes more
approximate to normal.
2. In most of the cases the magnitude order of the data changes within the range of the data.
3. It also decreases the effect of the outliers, due to the normalization of mag- nitude differences
and the model become more robust.

2.2 Binning transformation

Binning can be applied on both categorical and numerical data: The main mo- tivation of binning is to
make the model more robust and prevent over-fitting, however, it has a cost to the performance. The
trade-off between performance and over-fitting is the key point of the binning process.

1. Numerical binning: binning might be redundant due to its effect on model performance.
2. Categorical binning: the labels with low frequencies probably affect the ro- bustness of statistical
models negatively. Thus, assigning a general category to these less frequent values helps to keep
the robustness of the model.

2.3 Feature Scaling

In most cases, the numerical features of the dataset do not have a certain range and they differ from
each other.Also algorithms like linear regression, logistic regression, neural network, etc. that use
gradient descent as an optimization technique require data to be scaled. It is important for algorithms
that work based on distance also: such as k-NN or k-Means As The difference in ranges of features will
cause different step sizes for each feature. Which causes training of Model to take extra time.
4 Seminar Report by Anant Dashpute SC20M144

Different types of Scaling

1. Normalization
(a) Normalization is good to use when you know that the distribution of your data does not
follow a Gaussian distribution.
(b) This can be useful in algorithms that do not assume any distribution of the data like K-Nearest
Neighbors and Neural Networks.
(c) Due to the decreased standard deviations, the effects of the outliers in- creases. So before
normalization, it is recommended to handle the out- liers.

2. Standardization
(a) Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian
distribution. However, this does not have to be necessarily true.
(b) Unlike normalization, standardization does not have a bounding range. So, even if you have
outliers in your data, they will not be affected by standardization.
(c) If the standard deviation of features is different, their range also would differ from each other.
This reduces the effect of the outliers in the fea- tures.

3. MaxAbsScalar It takes the absolute maximum value of each column and divides each value in the
column by the maximum value.

4. Robust Scalar It scales the data by the InterQuartile Range(IQR)

5. Quantile Transformer Scale Using CDF it converts the variable distri- bution to a normal
distribution

6. Unit Vector Scaler/Normalizer Each Feature is re-scaled independently of other samples so


that its norm (l1, l2, or inf) equals one. Coverts to [-1 to 1] when there are negative values in
our data.

3 Feature preprocessing

Feature preprocessing is one of the most important steps in developing a machine learning model. It
consists of the creation of features as well as the cleaning of the data.

4 Feature Creation

Feature extraction refers to the process of transforming raw data into numerical features that can be
processed while preserving the information in the original
Feature Engineering: Short Study 5

data set. It yields better results than applying machine learning directly to the raw data.
Feature engineering, also known as feature creation, is the process of con- structing new features
from existing data to train a machine learning model. Typically, feature creation is a drawn-out manual
process, relying on domain knowledge, intuition, and data manipulation. Well-conceived new features
can sometimes capture the important information in a dataset much more effectively than the original
features.

5 Feature Selection

Main goal of feature selection is to reduce dimensionality of data without creating new features.
Other motivations are removing of: Redundant features which are highly cor- related features
contain duplicate information Irrelevant features which contain no information useful for
discriminating outcome Noisy features which contain some Unnecessary information
Most common search strategy is Filter approaches:

1. Filter Approach
(a) Score each feature individually for its ability to discriminate outcome.
(b) Rank features by score.
(c) Select top k ranked features.

Common scoring metrics for individual features are as Follows:


(a) t-test or ANOVA for (continuous features)
(b) Chi-square test (categorical features)
(c) Gini index or information Gain

2. Embedded Approach: Feature selection occurs naturally as part of the machine learning
algorithm
(a) L1-regularized linear regression
(b) Optimised Penalty method

3. Wrapper Approaches
Main goal is to find best subset of features before machine learning algorithm is run. Most
common search strategies are greedy:
(a) Random selection
(b) Forward selection
(c) Backward elimination

Scoring uses some chosen machine learning algorithm. Each feature subset is scored by training the
model using only that subset, then assessing accuracy in the usual way (e.g., cross-validation)
6 Seminar Report by Anant Dashpute SC20M144

4. Exhaustive search If Feature count is not too large, the we can check all possible subsets of size k.
This is essentially the same as random selection but done exhaustively.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy