0% found this document useful (0 votes)
24 views

ML - Unit - 1 (24-25)

Uploaded by

pubgmobilesd23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

ML - Unit - 1 (24-25)

Uploaded by

pubgmobilesd23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Paavai Engineering College Department of MCA

UNIT I

MACHINE LEARNING FUNDAMENTALS

UNIT-I 1. 1
Paavai Engineering College Department of MCA

CONTENTS
TECHNICAL TERMS
1.1 MACHINE LEARNING LANDSCAPE:
1.1.1 INTRODUCTION
1.2 TYPES OF MACHINE LEARNING SYSTEMS
1.2.1 Supervised Learning Systems
1.2.2 UnSupervised Learning Systems
1.2.3 Semi-Supervised Learning Systems
1.2.4 Reinforcement Learning Systems
1.2.5 MAIN CHALLENGES OF MACHINE LEARNING
1.3 TESTING AND VALIDATING
1.4 END TO END MACHINE LEARNING PROJECT:
1.4.1 Working With Real Data
1.4.2 Discover and Generalize the Data to Gain Insights
1.4.3 Prepare the Data for Machine Learning Algorithms
1.4.4 Select and Train a Model
1.4.5 Fine – Tune the Model
TECHNICAL TERMS
S.NO TERMS LITERAL TECHNICAL MEANING
MEANING
1 LOGICAL IMAGINATION VIRTUAL

2 MANIPULATE IMPLEMENT EXECUTE


3 COGNITIVE THINKING MENTAL PROCESS

4. CONSTRUCTIVISM COLLABORATION PRE-EXISTING KNOWLEDGE

5. NOISE SOUND ERROR


6. PREDICTION GUESS FORECAST

7. REGULARISED REGULARISED CONSISTANT

8. VARIANCE DIFFERENCE QUALITY

9. BIAS INFLUENCE PREJUDICE

10. CONVERGENCE COMBINING CONSOLIDATION

UNIT-I
INTRODUCTION 1. 2
1.0 Machine Learning

1. Machine learning is a growing technology which enables computers to learn automatically from past data.
2. Machine learning uses various algorithms for building mathematical models and making predictions using
historical data or information.
3. Currently, it is being used for various tasks such as image recognition, speech recognition, email
filtering, Facebook auto-tagging, recommender system, and many more.

1.0.1 What is Machine Learning


Paavai Engineering College Department of MCA

1. Machine Learning is said as a subset of artificial intelligence that is mainly concerned with the development of
algorithms which allow a computer to learn from the data and past experiences on their own.
2. The term machine learning was first introduced by Arthur Samuel in 1959.

1.0.2 Definition

Machine learning enables a machine to automatically learn from data, improve performance from experiences,
and predict things without being explicitly programmed.

1.0.3 How does Machine Learning work

1. A Machine Learning system learns from historical data, builds the prediction models, and whenever it
receives new data, predicts the output for it.
2. The accuracy of predicted output depends upon the amount of data.
3. As the huge amount of data helps to build a better model which predicts the output more accurately.

1.0.4 Features of Machine Learning:


o UNIT-I
Machine learning uses data to detect various patterns in a given dataset. 1. 3
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge amount of the data.
Machine Learning is somewhat like that smart assistant, but it exists in computers and apps. It's a way for machines to
learn from experience, much like humans do, but they use data instead of personal experiences."
Paavai Engineering College Department of MCA

MACHINE LEARNING Vs HUMAN LEARNING

BASIC PRINCIPLES OF MACHINE LEARNING


1. Learning from Data
2. Identifying Patterns
3. Making Predictions or Decisions
4. Improving over time
5. Generalization

1. LEARNING FROM DATA


Wearable technology and health data
Wearable devices like fitness trackers, smartwatches, and health monitors continuously collect a vast
amount of data, including heart rate, steps taken, sleep patterns, and even blood oxygen levels.
Machine learning algorithms analyze this data to identify patterns and trends. For example, an algorithm might
learn to recognize the correlation between certain activities and changes in heart rate or sleep quality.

UNIT-I 1. 4

Example - Health Monitoring Devices


Paavai Engineering College Department of MCA

2. IDENTIFYING PATTERNS
2.1 Language Translation Services
 Learning from Examples: Just like how you learn a language by listening and practicing, machine
learning algorithms learn from lots of examples. They look at texts in two languages to understand
how words and sentences are translated from one language to another.
 Finding Patterns:
The algorithms find patterns in these texts. For example, they learn how common phrases in
English are said in Spanish, and they remember these patterns.
 Translating New Sentences:
When you ask the algorithm to translate something new, it uses the patterns it learned to translate
the new sentences. This is how Google Translate or other translation apps work. They use these
patterns to guess the best translation.

3. MAKING PREDICTIONS OR DECISIONS


3.1 Fraud Detection
Imagine you're the manager of a bank's security system, designed to spot unusual activities (fraudulent
transactions) among regular customer transactions.
3.2 Training with Examples:
First, the security system is given lots of information about normal banking transactions and examples of past
frauds. It learns patterns - like typical withdrawal amounts, usual locations for transactions, and normal times for
activity.
3.3 Detecting Unusual Activities:
As customers use their bank accounts, the security system checks each transaction. It's looking for things that don't
fit the usual patterns it learned, like very large withdrawals, transactions in unusual locations, or several
transactions in a very short time.
3.4 Fraud Detection
Adjusting to New Tactics:
Sometimes, fraudsters change their methods. The security system might initially miss these new tactics, but it's
designed to learn from these events.
It constantly updates its understanding of what fraud looks like, improving its detection methods.
Thus, a Machine learning can improve the bank's security system and detect frauds by
 Training on historical data to recognize normal and fraudulent patterns.
 Using this knowledge to identify possible fraud in ongoing transactions.
 Continuously learning and adapting to new fraudulent behaviors.

4. IMPROVING OVER TIME


4.1 Content Moderation on Social Platforms
4.1.1 Initial Scenario:
 Starting Point: Imagine a social media platform, "SocialNet," introduces a machine learning system
for content moderation. Its primary goal is to filter out posts that contain hate speech.
 Early Training: The system is initially trained on a dataset of posts, some of which are labeled as hate
speech by human moderators.
 Early Implementation:
First Attempts: In the beginning, the system is good at catching obvious cases of hate speech, like posts
with clearly offensive language. But it struggles with more subtle cases, like coded language or sarcasm.
 Human Involvement: Human moderators review the system's decisions. When the system makes a
UNIT-I mistake – say, flagging a sarcastic comment as hate speech – the human 1. 5 moderator corrects it.
4.1.2 Learning and Adapting:
 Feedback Loop: Each correction by a human moderator is fed back into the system. It learns from
these corrections, gradually understanding the nuances of language, context, and even cultural
references.
 Evolving Content: As users on SocialNet start using new slang or references, the system initially might
misinterpret some of this content. However, with continuous updates from new examples, it begins to
recognize these evolving trends.
4.1.3 Enhanced Moderation:
 Greater Accuracy: Over time, the system becomes more adept. It starts to correctly identify subtle hate
speech while reducing false positives (like mistakenly flagged sarcastic comments).
Paavai Engineering College Department of MCA

 Reduced Human Load: As the system improves, the reliance on human moderators for routine content
checks decreases, allowing them to focus on more complex moderation tasks.
5. GENERALIZATION
5.1 Self Driving Car
 Training Phase:
1. Learning to Drive: Imagine a self-driving car is being trained using machine learning. During its
training, it's exposed to various driving scenarios - city streets, highways, different weather conditions,
daytime and nighttime driving, etc.
2. Data Collection: It collects data from sensors, cameras, and GPS, learning how to recognize road signs,
traffic lights, pedestrians, and other vehicles. It also learns how to respond to these - when to stop, how
to navigate turns, and how to maintain a safe distance from other objects.

3. Generalization:
New City, New Roads: Now, suppose this self-driving car is placed in a completely new city that it has
never 'seen' during its training.
4. Applying Learned Skills: The car's machine learning system uses the general driving rules and patterns
it learned during training to navigate the new environment. It recognizes stop signs (even if they look
slightly different), understands traffic light signals, and knows how to avoid pedestrians and other
vehicles, despite never encountering these specific roads and conditions before.
5. Adaptability: If the car can successfully navigate this new environment, it shows good generalization.
It means the car's machine learning model didn't just memorize specific roads and scenarios but learned
general driving rules applicable to various situations.
CONCLUSION: Machine learning, with its diverse applications, is transforming industries and everyday
life. Its ability to learn from data, recognize complex patterns, and make informed predictions or decisions
is what makes it a revolutionary tool. Whether it's enhancing personal health, breaking language barriers,
securing financial transactions, moderating online content, automating driving, or optimizing
manufacturing processes, ML's impact is profound and growing.
Types of Machine Learning

Machine learning is a subset of AI, which enables the machine to automatically learn from data,
improve performance from past experiences, and make predictions.

Machine learning contains a set of algorithms that work on a huge amount of data.

Data is fed to these algorithms to train them, and on the basis of training, they build the model & perform
a specific task.

These ML algorithms help to solve different business problems like Regression, Classification, Forecasting,
Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly four types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
UNIT-I 1. 6
Paavai Engineering College Department of MCA

1. Supervised Machine Learning

As its name suggests, Supervised machine learning is based on supervision.

It means in the supervised learning technique, we train the machines using the "labelled" dataset, and based
on the training, the machine predicts the output.

Here, the labelled data specifies that some of the inputs are already mapped to the output.

More preciously, we can say; first, we train the machine with the input and corresponding output, and then
we ask the machine to predict the output using the test dataset.

Examples are suppose we have an input dataset of cats and dog images. So, first, we will provide the
training to the machine to understand the images, such as the shape & size of the tail of cat and dog,
Shape of eyes, colour, height (dogs are taller, cats are smaller), etc.

After completion of training, we input the picture of a cat and ask the machine to identify the object and
predict the output.

Now, the machine is well trained, so it will check all the features of the object, such as height, shape, colour,
eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category.

This is the process of how the machine identifies the objects in Supervised Learning.

The main goal of the supervised learning technique is to map the input variable(x) with the output
variable(y).

Some real-world applications of supervised learning are Risk Assessment, Fraud Detection, Spam
filtering, etc.

Categories of Supervised Machine Learning

UNIT-I Supervised machine learning can be classified into two types of problems,
1. 7which are given below:

o Classification
o Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the output variable is
categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification algorithms predict the
categories present in the dataset.

Some real-world examples of classification algorithms are Spam Detection, Email filtering, etc.
Paavai Engineering College Department of MCA

Some popular classification algorithms are given below:

o Random Forest Algorithm


o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm

b) Regression

Regression algorithms are used to solve regression problems in which there is a linear relationship between input and
output variables.

These are used to predict continuous output variables, such as market trends, weather prediction, etc.Some popular
Regression algorithms are given below:

o Simple Linear Regression Algorithm


o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression

Advantages:

o Since supervised learning work with the labelled dataset so we can have an exact idea about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.

Disadvantages:

o These algorithms are not able to solve complex tasks.


o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.

Applications of Supervised Learning

Some common applications of Supervised Learning are given below:

o Image Segmentation:
Supervised Learning algorithms are used in image segmentation. In this process, image classification is performed
on different image data with pre-defined labels.
o Medical Diagnosis:
Supervised algorithms are also used in the medical field for diagnosis purposes. It is done by using medical images
and past labelled data with labels for disease conditions. With such a process, the machine can identify a disease
for the new patients.
UNIT-I 1. 8
o Fraud Detection - Supervised Learning classification algorithms are used for identifying fraud transactions, fraud
customers, etc. It is done by using historic data to identify the patterns that can lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are used. These algorithms classify an
email as spam or not spam. The spam emails are sent to the spam folder.
o Speech Recognition - Supervised learning algorithms are also used in speech recognition. The algorithm is trained
with voice data, and various identifications can be done using the same, such as voice-activated passwords, voice
commands, etc.

2. Unsupervised Machine Learning

Unsupervised learning is different from the Supervised learning technique; as its name suggests, there is no need for
Paavai Engineering College Department of MCA

supervision.

It means, in unsupervised machine learning, the machine is trained using the unlabeled dataset, and the machine
predicts the output without any supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor labelled, and the model
acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the unsorted dataset according
to the similarities, patterns, and differences.

Machines are instructed to find the hidden patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of fruit images, and we input it into
the machine learning model.

The images are totally unknown to the model, and the task of the machine is to find the patterns and categories of the
objects.

So, now the machine will discover its patterns and differences, such as colour difference, shape difference, and predict
the output when it is tested with the test dataset.

Categories of Unsupervised Machine Learning

Unsupervised Learning can be further classified into two types, which are given below:

o Clustering
o Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the data.

It is a way to group the objects into a cluster such that the objects with the most similarities remain in one group and
have fewer or no similarities with the objects of other groups.

An example of the clustering algorithm is grouping the customers by their purchasing behaviour.

Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm


o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis

2) Association
UNIT-I 1. 9
Association rule learning is an unsupervised learning technique, which finds interesting relations among variables
within a large dataset.

The main aim of this learning algorithm is to find the dependency of one data item on another data item and map
those variables accordingly so that it can generate maximum profit.

This algorithm is mainly applied in Market Basket analysis, Web usage mining, continuous production, etc.

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-growth algorithm.

Advantages:
Paavai Engineering College Department of MCA
o These algorithms can be used for complicated tasks compared to the supervised ones because these algorithms
work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset is easier as compared to
the labelled dataset.

Disadvantages:

o The output of an unsupervised algorithm can be less accurate as the dataset is not labelled, and algorithms are not
trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled dataset that does not map
with the output.

Applications of Unsupervised Learning

o Network Analysis: Unsupervised learning is used for identifying plagiarism and copyright in document network
analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use unsupervised learning techniques for building
recommendation applications for different web applications and e-commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of unsupervised learning, which can identify
unusual data points within the dataset. It is used to discover fraudulent transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is used to extract particular information
from the database. For example, extracting information of each user located at a particular location.

3. Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies between Supervised and
Unsupervised machine learning.

It represents the intermediate ground between Supervised (With Labelled training data) and Unsupervised learning
(with no labelled training data) algorithms and uses the combination of labelled and unlabeled datasets during the
training period.

Although Semi-supervised learning is the middle ground between supervised and unsupervised learning and operates
on the data that consists of a few labels, it mostly consists of unlabeled data.

As labels are costly, but for corporate purposes, they may have few labels. It is completely different from supervised
and unsupervised learning as they are based on the presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the concept of Semi-
supervised learning is introduced.

The main aim of semi-supervised learning is to effectively use all the available data, rather than only labelled data
like in supervised learning.

Initially, similar data is clustered along with an unsupervised learning algorithm, and further, it helps to label the
unlabeled data into labelled data.

UNIT-I 1. 10 data.
It is because labelled data is a comparatively more expensive acquisition than unlabeled

Example is Supervised learning is where a student is under the supervision of an instructor at home and college.

Further, if that student is self-analyzing the same concept without any help from the instructor, it comes under
unsupervised learning.

Under semi-supervised learning, the student has to revise himself after analyzing the same concept under the guidance
of an instructor at college.

Advantages:
Paavai Engineering College Department of MCA
o It is simple and easy to understand the algorithm.
o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.

Disadvantages:

o Iterations results may not be stable.


o We cannot apply these algorithms to network-level data.
o Accuracy is low.

4. Reinforcement Learning

Reinforcement learning works on a feedback-based process, in which an AI agent (A software component)


automatically explore its surrounding by hitting & trail, taking action, learning from experiences, and
improving its performance.

Agent gets rewarded for each good action and get punished for each bad action; hence the goal of reinforcement
learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents learn from their experiences
only.

The reinforcement learning process is similar to a human being; for example, a child learns various things by
experiences in his day-to-day life.

An example of reinforcement learning is to play a game, where the Game is the environment, moves of an agent at
each step define states, and the goal of the agent is to get a high score.

Agent receives feedback in terms of punishment and rewards.

Due to its way of working, reinforcement learning is employed in different fields such as Game theory, Operation
Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision Process(MDP). In MDP, the agent
constantly interacts with the environment and performs actions; at each action, the environment responds and
generates a new state.

Categories of Reinforcement Learning

Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning specifies increasing the tendency that the
required behaviour would occur again by adding something. It enhances the strength of the behaviour of the agent
and positively impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works exactly opposite to the positive RL.
It increases the tendency that the specific behaviour would occur again by avoiding the negative condition.

Real-world Use cases of Reinforcement Learning


UNIT-I 1. 11

o VideoGames:
RL algorithms are much popular in gaming applications. It is used to gain super-human performance. Some
popular games that use RL algorithms are AlphaGO and AlphaGO Zero.
o Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed that how to use RL in computer
to automatically learn and schedule resources to wait for different jobs in order to minimize average job slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial and manufacturing area, and
Paavai Engineering College Department of MCA
these robots are made more powerful with reinforcement learning. There are different industries that have their
vision of building intelligent robots using AI and Machine learning technology.
o Text Mining
Text-mining, one of the great applications of NLP, is now being implemented with the help of Reinforcement
Learning by Salesforce company.

Advantages

o It helps in solving complex real-world problems which are difficult to be solved by general techniques.
o The learning model of RL is similar to the learning of human beings; hence most accurate results can be found.
o Helps in achieving long term results.

Disadvantages

o RL algorithms are not preferred for simple problems.


o RL algorithms require huge data and computations.
o Too much reinforcement learning can lead to an overload of states which can weaken the results.

Main Challenges of Machine Learning


7 Major Challenges Faced By Machine Learning Professionals
There are a lot of challenges that machine learning professionals face to inculcate ML skills and create an
application from scratch. What are these challenges?
1. Poor Quality of Data
Data plays a significant role in the machine learning process.
One of the significant issues that machine learning professionals face is the absence of good quality data.
Unclean and noisy data can make the whole process extremely exhausting.
Therefore, we need to ensure that the process of data preprocessing which includes removing outliers, filtering
missing values, and removing unwanted features, is done with the utmost level of perfection.
2. Underfitting of Training Data
This process occurs when data is unable to establish an accurate relationship between input and output variables.
It signifies the data is too simple to establish a precise relationship. To overcome this issue:
 Maximize the training time
 Enhance the complexity of the model
 Add more features to the data
 Reduce regular parameters
 Increasing the training time of model

3. Overfitting of Training Data

Overfitting refers to a machine learning model trained with a massive amount of data that negatively affect its
performance.
Unfortunately, this is one of the significant issues faced by machine learning professionals.
UNIT-I 1. 12
This means that the algorithm is trained with noisy and biased data, which will affect its overall performance.
Consider a model trained to differentiate between a cat, a rabbit, a dog, and a tiger.
The training data contains 1000 cats, 1000 dogs, 1000 tigers, and 4000 Rabbits.
Then there is a considerable probability that it will identify the cat as a rabbit.
In this example, we had a vast amount of data, but it was biased; hence the prediction was negatively affected.
We can tackle this issue by:
 Analyzing the data with the utmost level of perfection
 Use data augmentation technique
Paavai Engineering College Department of MCA
 Remove outliers in the training set
 Select a model with lesser features
4. Machine Learning is a Complex Process

The machine learning industry is young and is continuously changing.


Rapid hit and trial experiments are being carried on.
The process is transforming, and hence there are high chances of error which makes the learning complex.
It includes analyzing the data, removing data bias, training data, applying complex mathematical calculations, and
a lot more.
Hence it is a really complicated process which is another big challenge for Machine learning professionals.

5. Lack of Training Data

The most important task you need to do in the machine learning process is to train the data to achieve an
accurate output.
Less amount training data will produce inaccurate or too biased predictions.
Consider a machine learning algorithm similar to training a child.
One day you decided to explain to a child how to distinguish between an apple and a watermelon. You will take an
apple and a watermelon and show him the difference between both based on their color, shape, and taste.
In this way, soon, he will attain perfection in differentiating between the two.
But on the other hand, a machine-learning algorithm needs a lot of data to distinguish.
For complex problems, it may even require millions of data to be trained.
Therefore we need to ensure that Machine learning algorithms are trained with sufficient amounts of data.
6. Slow Implementation
This is one of the common issues faced by machine learning professionals.
The machine learning models are highly efficient in providing accurate results, but it takes a tremendous amount
of time.
Slow programs, data overload, and excessive requirements usually take a lot of time to provide accurate results.
Further, it requires constant monitoring and maintenance to deliver the best output.

7. Imperfections in the Algorithm When Data Grows

So you have found quality data, trained it amazingly, and the predictions are really concise and accurate.
Yay, you have learned how to create a machine learning algorithm!! But wait, there is a twist; the model may
become useless in the future as data grows.
The best model of the present may become inaccurate in the coming Future and require further rearrangement.
So you need regular monitoring and maintenance to keep the algorithm working.
UNIT-I 1. 13
This is one of the most exhausting issues faced by machine learning professionals.
Conclusion: Machine learning is all set to bring a big bang transformation in technology.
It is one of the most rapidly growing technologies used in medical diagnosis, speech recognition, robotic training,
product recommendations, video surveillance, and this list goes on.
This continuously evolving domain offers immense job satisfaction, excellent opportunities, global exposure, and
exorbitant salary.
It is a high risk and a high return technology.
Before starting your machine learning journey, ensure that you carefully examine the challenges mentioned above.
To learn this fantastic technology, you need to plan carefully, stay patient, and maximize your efforts.
Once you win this battle, you can conquer the Future of work and land your dream job!
Paavai Engineering College Department of MCA

Training vs Testing vs Validation Sets


In this article, we are going to see how to Train, Test and Validate the Sets.
The fundamental purpose for splitting the dataset is to assess how effective will the trained model be in generalizing
to new data. This split can be achieved by using train_test_split function of scikit-learn.
Training Set
This is the actual dataset from which a model trains .i.e. the model sees and learns from this data to predict the outcome
or to make the right decisions. Most of the training data is collected from several resources and then preprocessed and
organized to provide proper performance of the model. Type of training data hugely determines the ability of the
model to generalize .i.e. the better the quality and diversity of training data, the better will be the performance of the
model. This data is more than 60% of the total data available for the project.
Example:
Python3

# Importing numpy & scikit-learn

import numpy as np from sklearn.model_selection import train_test_split

# Making a dummy array to

# represent x,y for example

# Making a array for x ranging

# from 0-15 then reshaping it

# to form a matrix of shape 8x2

x = np.arange(16).reshape((8,2))

# y is just a list of 0-7 number

# representing target variable

y = range(8)

# Splitting dataset in 80-20 fashion .i.e.

# Testing set is 20% of total data

# Training set is 80% of total data


UNIT-I 1. 14

x_train, x_test, y_train, y_test = train_test_split(x,y,

train_size=0.8,

random_state=42)
Paavai Engineering College Department of MCA

# Training set

print("Training set x: ",x_train)

print("Training set y: ",y_train)

Output:
Training set x: [[ 0 1]
[14 15]
[ 4 5]
[ 8 9]
[ 6 7]
[12 13]]
Training set y: [0, 7, 2, 4, 3, 6]
Explanation:
 Firstly we created a dummy matrix of 8×2 shape using NumPy library to represent input x. And a list of 0 to 7 integers
representing our target variable y.
 Now in order to split our dataset into training and testing data, a function named train_test_split of sklearn library is
used.
 Input data x with target variable y is passed as parameters to function which then divides the dataset into 2 parts on
the size given in train_size i.e. if train_size=0.8 is given then the dataset will be divided in such an way that the
training set will be 80% of given dataset and testing set will be 20% of given dataset.
 And as we specify random_state to be a positive number, train_test_split function will randomly split data.
Testing Set
This dataset is independent of the training set but has a somewhat similar type of probability distribution of classes
and is used as a benchmark to evaluate the model, used only after the training of the model is complete. Testing set is
usually a properly organized dataset having all kinds of data for scenarios that the model would probably be facing
when used in the real world. Often the validation and testing set combined is used as a testing set which is not
considered a good practice. If the accuracy of the model on training data is greater than that on testing data then the
model is said to have overfitting. This data is approximately 20-25% of the total data available for the project.
Example:
Python3

# Importing numpy & scikit-learn

UNIT-I
import numpy as np 1. 15

from sklearn.model_selection import train_test_split

# Making a dummy array to represent x,y for example

# Making a array for x ranging from 0-15 then


Paavai Engineering College Department of MCA

# reshaping it to form a matrix of shape 8x2

x = np.arange(16).reshape((8, 2))

# y is just a list of 0-7 number representing

# target variable

y = range(8)

# Splitting dataset in 80-20 fashion .i.e.

# Training set is 80% of total data

# Testing set is 20% of total data

x_train, x_test, y_train, y_test = train_test_split(x, y,

test_size=0.2,

random_state=42)

# Testing set

print("Testing set x: ", x_test)

print("Testing set y: ", y_test)

Output:
Testing set x: [[ 2 3] [10 11]]
Testing set y: [1, 5]
Explanation:
 To show how the train_test_split function works we first created a dummy matrix of 8×2 shape using NumPy library
to represent input x. And a list of 0 to 7 integers representing our target variable y.
 Now in order to split our dataset into training and testing data, input data x with target variable y is passed as
parameters to function which then divides the dataset into 2 parts on the size given in test_size i.e. if test_size=0.2 is
given then the dataset will be divided in such an away that testing set will be 20% of given dataset and training set
will be 80% of given dataset.
 And as we specify random_state to be a positive number, train_test_split function will randomly split data.
Validation Set
The validation set is used to fine-tune the hyperparameters of the model and is considered a part of the training of the
model. The model only sees this data for evaluation but does not learn from this data,
UNIT-I providing an objective unbiased
1. 16
evaluation of the model. Validation dataset can be utilized for regression as well by interrupting training of model
when loss of validation dataset becomes greater than loss of training dataset .i.e. reducing bias and variance. This data
is approximately 10-15% of the total data available for the project but this can change depending upon the number of
hyperparameters .i.e. if model has quite many hyperparameters then using large validation set will give better results.
Now, whenever the accuracy of model on validation data is greater than that on training data then the model is said
to have generalized well.
Example:
Python3
Paavai Engineering College Department of MCA

# Importing numpy & scikit-learn

import numpy as np

from sklearn.model_selection import train_test_split

# Making a dummy array to represent x,y for example

# Making a array for x ranging from 0-23 then reshaping it

# to form a matrix of shape 8x3

x = np.arange(24).reshape((8,3))

# y is just a list of 0-7 number representing

# target variable

y = range(8)

# Splitting dataset in 80-20 fashion .i.e.

# Training set is 80% of total data

# Combined set of testing & validation is

# 20% of total data

x_train, x_Combine, y_train, y_Combine = train_test_split(x,y,

train_size=0.8,

random_state=42)

# Splitting combined dataset in 50-50 fashion .i.e.

# Testing set is 50% of combined dataset

# Validation set is 50% of combined dataset


UNIT-I 1. 17

x_val, x_test, y_val, y_test = train_test_split(x_Combine,

y_Combine,

test_size=0.5,
Paavai Engineering College Department of MCA

random_state=42)

# Training set

print("Training set x: ",x_train)

print("Training set y: ",y_train)

print(" ")

# Testing set

print("Testing set x: ",x_test)

print("Testing set y: ",y_test)

print(" ")

# Validation set

print("Validation set x: ",x_val)

print("Validation set y: ",y_val)

Output:
Training set x: [[ 0 1 2]
[21 22 23]
[ 6 7 8]
[12 13 14]
[ 9 10 11]
[18 19 20]]
Training set y: [0, 7, 2, 4, 3, 6]

Testing set x: [[15 16 17]]


Testing set y: [5]

Validation set x: [[3 4 5]]


UNIT-I
Validation set y: [1] 1. 18

Explanation:
So as to get the validation set, a dummy matrix of 8×3 shape is created using the NumPy library to represent input x. And
a list of 0 to 7 integers representing our target variable y.
Now it gets a bit tricky to divide dataset into 3 parts. To begin with, the dataset is divided into two parts, input data x
with target variable y is passed as parameters to function which then divides the dataset into 2 parts on the size given in
train_size (from this we’ll get our training set) i.e. if train_size=0.8 is given then the dataset will be divided in such a way
that training set will be 80% of given dataset and another set will be 20% of given dataset.
So now we have validation and testing combined set having 20% of the initially given dataset. This dataset is divided
further to get validation set and testing set, output of above distribution is then passed as parameters to train_test_split
Paavai Engineering College Department of MCA
again which then divides the combined dataset into 2 parts on the size given in test_size .i.e. if test_size=0.5 is given then
the dataset will be divided in such a way that testing set and validation set will be 50% of the combined dataset.

End to End Machine Learning Project Pipeline

Introduction
Pre-Requisites

Basic understanding of Linear Regression Algorithm. If you have no idea about the algorithm,
please refer to the link before going to the later part of the article, so that you have a basic understanding
of all the concepts which we will cover.

Step 1: Import Necessary Dependencies


In this step, we will import the necessary libraries such as
For Linear Algebra: Numpy
For Data Preprocessing, and CSV File I/O: Pandas
For Model Building and Evaluation: Scikit-Learn
For Data Visualization: Matplotlib, and Seaborn, etc.

import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns
%matplotlib inline
Step 2: Study the Data

Here we will work on the E-commerce Customers dataset (CSV file). It has Customer information,
such as Email, Address, and color Avatar. Then it also has numerical value columns:
Average Session Length: Average session of in-store style advice sessions.
Time on App: Average time spent by the customer on App in minutes
Time on Website: Average time spent by the customer on Website in minutes
Length of Membership: From how many years the customer has been a member.

Step 3: Read and Load the Dataset

In this step, we will read and load the dataset using some basic function of pandas such as

For Load the CSV file: pd.read_csv( )


Toprint some initial rows of the dataset: df.head( )
Statistical Details for Numerical Columns: df.describe( )
UNIT-I
Basic Information About the dataset: df.info ( ) 1. 19
Load the Dataset df = pd.read_csv('Ecommerce Customers.csv')
Print Some Initial Rows of the Dataset: df.head()
Output:
Paavai Engineering College Department of MCA

Statistical Details for Numerical Columns: df.describe()


Output:

Basic Information About the Dataset: df.info()

Output:

UNIT-I 1. 20

Step 4: Exploratory Data Analysis(EDA)


In this step, we will explore the data and try to find some insights by visualizing the data properly, by
using the Seaborn library functions such as

Joint plot:
Time on Website vs Yearly Amount Spent Time on App vs Yearly Amount Spent
Paavai Engineering College Department of MCA
Time on App vs Length of membership

Pair plot: for the complete dataset

Implot: Length of Membership vs Yearly Amount Spent

Use seaborn to create a joint plot to compare the Time on Website and Yearly Amount Spent
columns.sns.jointplot(x='Time onWebsite',y='Yearly Amount Spent',data=df)
Output:

Do the same but with the Time on App column instead

sns.jointplot(x='Time on App',y='Yearly Amount Spent',data=df)

Output:

Use joint plot to create a 2D hex bin plot comparing Time on App and Length ofMembership
sns.jointplot(x='Time on App',y='Length of Membership',kind="hex",data=df)
Output:

UNIT-I 1. 21
Paavai Engineering College Department of MCA

Let’s explore these types of relationships across the entire data set. Use Pair plot torecreate the plot
below:

sns.pairplot(df)

Based on this plot what looks to be the most correlated feature with the Yearly Amount Spent?

Length of Membership
Create a linear model plot (using seaborn’s lmplot) of Yearly Amount Spent vs. Length of Membership

sns.lmplot(x='Length of Membership',y='Yearly Amount Spent',data=df)

Step 5: Splitting of Data into Training and Testing Data

Now that we have explored the data a bit, it’s time to go ahead and split our initial data into training
and testing subsets. Here we set a variable X i.e, independent columns as the numerical features of
the customers, and a variable y i.e, dependent column as the “Yearly Amount Spent” column.
Separate Dependent and Independent Variable
X = customers[['Avg. Session Length', 'Time on App', 'Time on Website', 'Length of Membership']] y =
customers['Yearly Amount Spent']

UNIT-I Use model_selection.train_test_split from sklearn to split the data into training
1. 22and testing sets

Set test_size=0.20 and random_state=105

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test =


train_test_split(X, y, test_size=0.20, random_state=105)

Step 6: Training the Model using Linear Regression

Now, at this step we are able to train our model on our training data using Linear Regression.
Import LinearRegression from sklearn.linear_model from sklearn.linear_model import LinearRegression
Paavai Engineering College Department of MCA
Create an instance of a LinearRegression() model named lm.

lr_model = LinearRegression()

Train/fit lm on the training data

lr_model.fit(X_train,y_train)
Output:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
Print out the coefficients of the model

lr_model.coef_
Output:
array([25.98154972, 38.59015875, 0.19040528, 61.27909654])
Step 7: Predictions on Test Data

Now that we have train our model, let’s evaluate its performance by doing the predictions on the
unseen data.
Use lr_model.predict() to predict off the X_test set of the data

predictions = lr_model.predict(X_test)
Create a scatterplot of the real test values versus the predicted values

plt.scatter(y_test,predictions)

plt.xlabel('Y Test')

plt.ylabel('Predicted Y')

Output:

Step 8: Evaluating the Model

To evaluate our model performance, we will be calculating the residual sum of squares and the explained
variance score (R2).

UNIT-I Determine the metrics such as Mean Absolute Error, Mean Squared Error, and
1. 23
the Root Mean Squared
Error.

from sklearn import metrics print('MAE :'," ", metrics.mean_absolute_error(y_test,predictions))


print('MSE:'," ", metrics.mean_squared_error(y_test,predictions)) print('RMAE :'," ",
np.sqrt(metrics.mean_squared_error(y_test,predictions)))
Output:

MAE : 7.2281486534308295 MSE : 79.8130516509743 RMAE : 8.933815066978626


Step 9: Explore the Residuals
Paavai Engineering College Department of MCA
By observed the metrics calculated in the above steps, we should have a very good model with a
good fit. Now, let’s quickly explore the residuals to make sure that everything was okay with our
dataset and finalize our model.
To see the above thing, try to plot a histogram of the residuals and make sure it looks normally distributed.

Use seaborn distplot, or just plt.hist()

sns.distplot(y_test - predictions,bins=50)

Output:

Step 10: Model Evaluation


Now, it’s time to conclude our model i.e, let’s see the interpretation of all the coefficients of the
model to get a better idea.
Recreate the dataframe below

coeffecients = pd.DataFrame(lm.coef_,X.columns) coeffecients.columns = ['Coeffecient'] coeffecients

Output:

How can you interpret these coefficients?


Keeping all other features constant, a one-unit increase in Avg. Session Length is associated with an increase
of 25.98 total dollars spent.
UNIT-I 1. 24
Keeping all other features constant, a one-unit increase in Time on the Website is associated with an increase
of 0.19 total dollars spent.
Also, Keeping all other features constant, a one-unit increase in Length of Membership is associated with an
increase of 61.27 total dollars spent.
Conclusion
Mastering the end-to-end machine learning project pipeline in data science is a transformative skill. It
empowers you to navigate the intricate journey from data preprocessing to model deployment confidently
and clearly. As you’ve delved into the details of this article, you’ve taken a significant step toward
becoming a proficient data scientist.
Paavai Engineering College Department of MCA
But remember, learning is an ongoing process. If you want to solidify your expertise and embark on a
transformative learning journey, our Blackbelt Program awaits. With over 50 meticulously designed data
science projects, it offers a hands-on experience that propels your skills to new heights. Each project is a
stepping stone towards becoming a true data science blackbelt. So, don’t miss out on the opportunity to
elevate your career and join the ranks of skilled practitioners in the data science landscape. Enroll in our
Blackbelt Program today and embark on a journey that will shape your future.

Steps in anMLProject

1 Look at the big picture

2 Get the data

3 Explore and visualize the data to gain insights

4 Prepare the data for machine learning algorithms

5 Select a model and train it

6 Fine-tune your model

7 Present your solution

8 Launch, monitor, and maintain your system

UNIT-I 1. 25
Paavai Engineering College Department of MCA

GettingRealData

UNIT-I 1. 26
Paavai Engineering College Department of MCA

1. Look At The Big Picture


FrametheProblem
The goal is to predict the median housing price from the other metrics in the data, such as number of
bedrooms, location, and income in the area.

• The prediction will be used to make investment decisions.


• See the data pipeline below

SystemDesign
Supervised learning
Data is labeled
Regression
Model will predict a value
Batch learning
No additional data will be added later
Types of Regression
Multiple regression
Uses multiple features to predict a value
Univariate regression
Predicts a single value
Multivariate regression
Predicts multiple values
SelectaPerformanceMeasure
Root Mean Square Error (RMSE)
• Adds up the error for each item of data
• The most commonly used measure for regression tasks

• Also called the Euclidean norm, or l2


UNIT-I 1. 27
Paavai Engineering College Department of MCA

SelectaPerformanceMeasure

• Mean Absolute Error (MAE)


• Preferred if data has many outliers
• Also called Manhattan norm, or l1

ChecktheAssumptions
We're assuming the price will be used as a numerical value
If the next stage just uses categories, like "cheap", "medium", or "expensive" we should be using
classification instead of regression.
2GetTheData

UNIT-I 1. 28
Paavai Engineering College Department of MCA

LoadDatafromGithub

head()ShowsFirstFiveRows

UNIT-I 1. 29
Paavai Engineering College Department of MCA

UNIT-I 1. 30
Paavai Engineering College Department of MCA

value_counts()

• ocean_proximity is not numeric

UNIT-I 1. 31
Paavai Engineering College Department of MCA

describe()ShowsStatistics
Histograms

• Show distribution of numerical attributes

MedianIncome
It's not in dollars. It's been scaled and capped at 15 max and 0.5 min. Numbers represent roughly tens of thousands of
dollars.Preprocessed attributes are common in ML, this should be OK.

UNIT-I 1. 32
Paavai Engineering College Department of MCA
OtherCappedValues

• Housing median age and median house value were capped

• Median house value is our target, which we want to predict

• It being capped limits the value of our model

• If we want to predict beyond $500,000, there are two options:

• Collect proper labels for the capped districts

• Remove those districts from the training and test sets


ScaleandSkewing
• These attributes have very different scales
• We'll fix them with feature scaling.
• Many histograms are skewed right.
• They extend more to the right than the left
• We'll transform them to fix that
T
t
Test Sets-
Take 20% of the data and set it aside.
There are two ways to choose the test set.
Randomly
Stratified sampling
RandomSampling
• Fine for large data sets
• But may introduce sampling bias
• Consider a sample from a population that is 51% female
• A random sample
• Might contain only 48%
• or 54% females

StratifiedSampling
• Take the important feature and gather it into categories
• Then sample the correct number from each category

UNIT-I 1. 33
Paavai Engineering College Department of MCA
• Training and test sets match now

3 Explore And Visualize The Data To Gain Insights


Visualizing Geographical Data
Scatterplot misses detail as dots cover other dots.

Transparency
• Alpha = 0.2 shows more detail in high-density areas

UNIT-I 1. 34
Paavai Engineering College Department of MCA

AddPricewithColor
• Areas near the ocean and with higher population
density have higher prices

UNIT-I 1. 35
Paavai Engineering College Department of MCA
Correlations Strongest correlations with median_house_value: an_income, total_rooms,
housing_median_age, latitude

ScatterMatrix: Strongest relationship is median_income

median_income
• Correlation is strong
• Clusters of points at $500,000. $450,000. and $350,000

UNIT-I 1. 36
Paavai Engineering College Department of MCA

CorrelationAssumesaLine

priment withAttribute Combinations : bedrooms_ratio has a high correlation


4.Prepare The Data For Machine Learning Algorithms
Cleanthedata
• Some data is missing the total_bedrooms value.
• Three ways to fix this:
• Get rid of the corresponding districts.
• Get rid of the whole attribute.
• Set the missing values to some value
UNIT-I 1. 37
Paavai Engineering College Department of MCA
(zero, the mean, the median, etc.). This
is called imputation.
Handling Text andCategorical Attributes
• ocean_proximity has only a few values

• Replacing them with numbers will make it easier for ML to


handle the data

• But falsely implies that some values are closer to others

One-HotVectors
• A better way to represent such data
FeatureScalingandTransformation
• Number of rooms ranges from 6 to 39,320
• Median incomes range from 0 to 15
• Models will weight number of rooms far more highly than income
• To prevent this, scale data in one of two ways:
min-max scaling
• Every value ranges from 0 to 1
• Or -1 to 1 for neural nets
standardization
Subtract the mean, then divide by standard deviation.
Does not limit the range strictly.Less affected by outliers.
HeavyT
Heavy Fit

UNIT-I 1. 38
Paavai Engineering College Department of MCA
• Values far from the mean are not exponentially rare
• Take square root or log to get closer to a Gaussian
Do this before normalization
• Another solution is bucketizing
Grouping values into ranges

5 SelectAModelAndTrain It LinearRegression
The first prediction is off by more than $200,000!
LinearRegression
The root mean squared error is over $68,000
The median_housing_values range from $120,000 to $265,000
Pretty bad predictions.
DecisionTreeRegressor
A more powerful model capable of finding complex nonlinear
suggests overfitting
BetterEvaluationUsingCross-Validation
Splits the training set into 10 subsets called folds
Trains the model 10 times on 9 folds.
Evaluating each one on the remaining fold
RandomForestRegressor
Results are somewhat better, Error $47,000.But on the training set, the error is $17,000
Still a lot of overfitting.
5. FINE TUNE YOUR MODEL
GRID SEARCH
Scikit-Learn's GridSearchCSV class
Tell it which hyperparameters you want to try, and what values to try.
It will use cross-validation to evaluate them.
RandomizedSearch

UNIT-I 1. 39
Paavai Engineering College Department of MCA
Evaluates a fixed number of random hyper parameter values.
Useful when the hyper parameter search space is large
EnsembleMethods
Combines several models together.

6. Launch, Monitor, AndMaintainYour System


Launch, Monitor, and Maintain Your System
Deploy your trained model as needed
Perhaps as a Web app

PerformanceMonitoring
 A component may break, causing performance to drop Or it may
drop gradually, die to model rot.
 The parameters go out of date.
 One measure of performance is downstream metrics.
 Number of recommended products sold per day.
 Or send human raters sample pictures of products the model
classified, to verify them
It can be a lot of work to set up good performance monitoring.
AutomaticUpdatingandRetraining
Collect fresh data and label it
Write a script to train the moden and fine-tune the hyperparameters periodically

Write another script to evaluate both the new model and the previous model on the
updated test set.
Evaluate input data quality
Keep backups of every model
Be ready to roll back.

UNIT-I 1. 40
Paavai Engineering College Department of MCA

QUESTION BANK
PART A

1. Define Machine Learning with diagram.


2. How does Machine Learning work?
3. Draw the diagram of Machine Learning Vs Human Learning.
4. List the Features of Machine Learning.State Kirchhoff's Voltage law.
5. Justify the basic principles of Machine Learning.

6. Write the Real Time Example of Learning from Data.

7. How to Machine Learning to do the Fraud Detection?

8. State the Self-Driving Car principles steps in Machine Learning.

9. Write the type names of Machine Learning.

10. Draw the Diagram of Machine Learning Types with Neat Presentation.

11. Which type is based on Supervision and what it means?

12. List the categories of Supervised Machine Learning with their algorithms.

13. Which algorithm has a linear relationship between Input and Output variables?

14. Write the Applications of Supervised Learning Systems in lMachine Learning.

15. What is the main aim of the Unsupervised Learning Algorithm in Machine Learning?

16. List some of the popular Clustering Algorithms of Unsupervised Learning systems.

17. Write the Advantages and Disadvantages of Unsupervised Learning Systems in


Machine Learning.

18. Write the Applications of Unsupervised Learning Systems in Machine Learning


Algorithms.

19. Which Machine Learning Algorithm that lies between Supervised and Unsupervised
Learning Algorithm.

20. Write the pros and cons of Semisupervised learning algorithm in Machine Learning.

UNIT-I 1. 41
Paavai Engineering College Department of MCA
21. Which algorithm works on a Feed back process in Machine Learning Algorithm?

22. Write the categories of Reinforcement Algorithm in Machine Learning Algorithm.

23. What are the Real world use cases of Reinforcement Learning Systems?

24. Write the Major Challenges of Machine Learning.

25. What is the main purpose for splitting the dataset in Training Data set?

26. Write a program to do the Testing Data set.

27. Write the steps for End – to – End Machine Learning Project.

28. Write the codings for Read and Load the Dataset.

29. Justify the Exploratory Data Analysis (EDA) in Machine Learning.

30. Which Regression is used to training the model in Machine Learning Systems?

31. What is an end-to-end pipeline in Machine Learning ?

32. Write the names of Popular Open Data Repositories in Machine Learning.

33. Draw the Look at the Big Picture of ML pipeline for Real Estate Investments.

34. Which measurement has been preferred if data has many outliers in Machine Learning
Systems/

35. Write the Other capped values in ML end-to- end project.

36. Which step is used to check the parameters go out of date.

37. Which system is collect Fresh data and label it.

38. What is meant by Random Forest Regression in Machine Learning Algorithm?

39. Which method are combines several models together in ML?

40. What are the two ways of Feature Scaling and Transformation in ML end- to – end
project.
PART B

1. Illustrate the Machine Learning Works, Features and Basic Principles of Machine
Learning.
2. Explain the Types of Machine Learning for the following with their categories
a) Supervised Learning Algorithm.
b) Semi-Supervised Learning Algorithm.
c) Un-Supervised Learning Algorithm.
d) Reinforcement Learning Algorithm.

UNIT-I 1. 42
Paavai Engineering College Department of MCA

3. Discuss about the Real-world cases for the following algorithms


a) Supervised Learning Algorithm.
b) Semi-Supervised Learning Algorithm.
c) Un-Supervised Learning Algorithm.
d) Reinforcement Learning Algorithm.

4. Discuss about the Advantage and Disadvantages for the following algorithms
a) Supervised Learning Algorithm.
b) Semi-Supervised Learning Algorithm.
c) Un-Supervised Learning Algorithm.
d) Reinforcement Learning Algorithm.
5. Elucidate the Main Challenges of ML Algorithms and write the steps to overcome that
issues in ML.
6. Describe the Training and Testing and Validation with their coding and Output.
7. Explain the first three steps of ML Project Pipeline with their statistical Details.
8. Explain the Exploratory Data Analysis(EDA) with their diagram in Machine Learning
Project.
9. Elucidate the Explore the Residuals and Model Evaluation in Machine Learning
Algorithm.
10. Illustrate the Get the Real Data and Look at the Big Picture in ML.

UNIT-I 1. 43

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy