Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
Seminar Report by
Anant Dashpute
SC20M144
1 Introduction
Feature engineering is the process of using domain knowledge to extract features from raw data. The
motivation is to use these extra features to improve the quality of results from a machine learning
process, compared with supplying only the raw data to the machine learning process.
The quality of the features in your data set has major impact on the quality of the insights so in
order to calculate Feature Importance, we refers to techniques that calculate a score for all the input
features for a given model — the scores simply represent the “importance” of each feature. A higher
score means that the specific feature will have a larger effect on the model that is being used to predict
a certain variable
Data Alone Is Not Enough The functions we want to learn in the real world are not drawn uniformly
from the set of all mathematically possible functions! Sometimes problem will be their format,
duplicates in data or some hidden info which is not extracted in raw data. Also, Similar entries in
data set have similar classes, limited dependences, or limited complexity. But this is not always
possible in practical life, that’s why raw data feed is not enough to do very well using machine
learning.
If Raw data is given to Model you may see bad results. Most Important problem to look up is
Over-fitting which is caused by as follows:
– Large Bias which mostly happen in the case of linear predictors problem
– Large Variance which mostly happen in the case of decision trees problem
Feature engineering is the process of selecting, manipulating, Creating and trans- forming raw data into
features that can be used in supervised learning. in mathematical terms, is the act of converting raw
observations into desired features
2 Seminar Report by Anant Dashpute SC20M144
using statistical or machine learning approaches. Feature engineering is a technique to create new
variables that aren’t in the training set with the goal of simplifying and speeding up data
transformations while also enhancing model accuracy.
Before the “deep learning era”, a computer vision engineer ,who faced the prob- lem of classification,
had to manually pick the type of features of the image (e.g., histogram of oriented gradients (HOG))
and then train a classifier. Obviously, an engineer might have chosen features which were not best or
best for problem. So, we do the Feature Selection. Which is also a part of Feature engineering. Unless
when Back propagation was introduced, deep learning was able to learn to choose the best features
automatically. in general, we do not require feature extraction in deep learning. But Feature
Engineering is more than that. Even though deep learning network can extract features, if the
problem is difficult, the network can struggle to find the right representation / features.
Remember that Not all problems need Deep Learning Approach. If raw data is given as input the
following reasons may give problem:
1. Less Data
2. Imbalanced data
3. Human errors (Null Values)
4. Computational power need
5. Complexity of problem
Feature Engineering: Short Study 3
2 Feature Transformation
Feature transformation is the process of modifying your data but keeping the information. These
modifications will make Machine Learning algorithms under- standing easier, which will deliver better
results.
Simple functions Often used to make the data more like some standard dis- tribution, to better
satisfy assumptions of a particular algorithm.
1. log(x)
2. log(x+1)
3. sqrt(x)
4. sqrt(x+1).
5. etc
1. It helps to handle skewed data and after transformation, the distribution becomes more
approximate to normal.
2. In most of the cases the magnitude order of the data changes within the range of the data.
3. It also decreases the effect of the outliers, due to the normalization of mag- nitude differences
and the model become more robust.
Binning can be applied on both categorical and numerical data: The main mo- tivation of binning is to
make the model more robust and prevent over-fitting, however, it has a cost to the performance. The
trade-off between performance and over-fitting is the key point of the binning process.
1. Numerical binning: binning might be redundant due to its effect on model performance.
2. Categorical binning: the labels with low frequencies probably affect the ro- bustness of statistical
models negatively. Thus, assigning a general category to these less frequent values helps to keep
the robustness of the model.
In most cases, the numerical features of the dataset do not have a certain range and they differ from
each other.Also algorithms like linear regression, logistic regression, neural network, etc. that use
gradient descent as an optimization technique require data to be scaled. It is important for algorithms
that work based on distance also: such as k-NN or k-Means As The difference in ranges of features will
cause different step sizes for each feature. Which causes training of Model to take extra time.
4 Seminar Report by Anant Dashpute SC20M144
1. Normalization
(a) Normalization is good to use when you know that the distribution of your data does not
follow a Gaussian distribution.
(b) This can be useful in algorithms that do not assume any distribution of the data like K-Nearest
Neighbors and Neural Networks.
(c) Due to the decreased standard deviations, the effects of the outliers in- creases. So before
normalization, it is recommended to handle the out- liers.
2. Standardization
(a) Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian
distribution. However, this does not have to be necessarily true.
(b) Unlike normalization, standardization does not have a bounding range. So, even if you have
outliers in your data, they will not be affected by standardization.
(c) If the standard deviation of features is different, their range also would differ from each other.
This reduces the effect of the outliers in the fea- tures.
3. MaxAbsScalar It takes the absolute maximum value of each column and divides each value in the
column by the maximum value.
5. Quantile Transformer Scale Using CDF it converts the variable distri- bution to a normal
distribution
3 Feature preprocessing
Feature preprocessing is one of the most important steps in developing a machine learning model. It
consists of the creation of features as well as the cleaning of the data.
4 Feature Creation
Feature extraction refers to the process of transforming raw data into numerical features that can be
processed while preserving the information in the original
Feature Engineering: Short Study 5
data set. It yields better results than applying machine learning directly to the raw data.
Feature engineering, also known as feature creation, is the process of con- structing new features
from existing data to train a machine learning model. Typically, feature creation is a drawn-out manual
process, relying on domain knowledge, intuition, and data manipulation. Well-conceived new features
can sometimes capture the important information in a dataset much more effectively than the original
features.
5 Feature Selection
Main goal of feature selection is to reduce dimensionality of data without creating new features.
Other motivations are removing of: Redundant features which are highly cor- related features
contain duplicate information Irrelevant features which contain no information useful for
discriminating outcome Noisy features which contain some Unnecessary information
Most common search strategy is Filter approaches:
1. Filter Approach
(a) Score each feature individually for its ability to discriminate outcome.
(b) Rank features by score.
(c) Select top k ranked features.
2. Embedded Approach: Feature selection occurs naturally as part of the machine learning
algorithm
(a) L1-regularized linear regression
(b) Optimised Penalty method
3. Wrapper Approaches
Main goal is to find best subset of features before machine learning algorithm is run. Most
common search strategies are greedy:
(a) Random selection
(b) Forward selection
(c) Backward elimination
Scoring uses some chosen machine learning algorithm. Each feature subset is scored by training the
model using only that subset, then assessing accuracy in the usual way (e.g., cross-validation)
6 Seminar Report by Anant Dashpute SC20M144
4. Exhaustive search If Feature count is not too large, the we can check all possible subsets of size k.
This is essentially the same as random selection but done exhaustively.