Early Stopping in Practice
Early Stopping in Practice
B. Chen Follow
Save
59 1
Photo by Samuel Bourke on Unsplash
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 1/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
In this article, we will focus on adding and customizing Early Stopping in our
machine learning model and look at an example of how we do this in practice with
Keras and TensorFlow 2.0.
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 2/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
As the epochs go by, the algorithm leans and its error on the training set naturally
goes down, and so does its error on the validation set. However, after a while, the
validation error stops decreasing and actually starts to go back up. This indicates
that the model has started to overfit the training data. With Early Stopping, you just
stop training as soon as the validation error reaches the minimum.
In the following article, we are going to add and customize Early Stopping in our
machine learning model.
Source code
This is a step by step tutorial and all instructions are in this article. For source code,
please check out my Github machine learning repo.
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 3/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
Dataset preparation
This tutorial uses the Anderson Iris flower (iris) dataset for demonstration. The
dataset contains a set of 150 records under five attributes: sepal length, sepal width,
petal length, petal width, and class (known as target from sklearn datasets).
First, let’s import the libraries and obtain iris dataset from scikit-learn library. You
can also download it from the UCI Iris dataset.
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
For the purpose of exploring data, let’s load data into a DataFrame
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 4/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
We notice the label column is a categorical feature and will need to convert it to one-
hot encoding. Otherwise, our machine learning algorithm won’t be able to directly
take in that as input.
Next, let’s create X and y. Keras and TensorFlow 2.0 only take in Numpy array as
inputs, so we will have to convert DataFrame back to Numpy array.
Finally, let’s split the dataset into a training set (80%)and a test set (20%) using
train_test_split() from sklearn library.
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 5/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
The first layer (also known as the input layer) has the input_shape to set the
input size (4,)
The input layer has 64 units, followed by 3 dense layers, each with 128 units.
Then there are further 3 dense layers, each with 64 units. All these layers use the
ReLU activation function.
The output Dense layer has 3 units and the softmax activation function.
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 6/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
For simplicity, use accuracy as our evaluation metrics to evaluate the model
during training and testing.
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
After that, we can call model.fit() to fit our model to the training data.
history = model.fit(
X_train,
y_train,
epochs=200,
validation_split=0.25,
batch_size=40,
verbose=2
)
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 7/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
It is preferable to create a small function for plotting metrics. Let’s go ahead and
create a function plot_metric() .
%matplotlib inline
%config InlineBackend.figure_format = 'svg'def
plot_metric(history, metric):
train_metrics = history.history[metric]
val_metrics = history.history['val_'+metric]
epochs = range(1, len(train_metrics) + 1)
plt.plot(epochs, train_metrics)
plt.plot(epochs, val_metrics)
plt.title('Training and validation '+ metric)
plt.xlabel("Epochs")
plt.ylabel(metric)
plt.legend(["train_"+metric, 'val_'+metric])
plt.show()
From the above graph, we can see that the model has overfitted the training data,
so it outperforms the validation set.
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 8/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
First, let’s import EarlyStopping callback and create an early stopping object
early_stopping .
early_stopping = EarlyStopping()
history = model.fit(
X_train,
y_train,
epochs=200,
validation_split=0.25,
batch_size=40,
verbose=2,
callbacks=[early_stopping]
)
You can see that early_stopping get passed in a list to the callbacks argument. It is
a list because in practice we might be passing a number of callbacks for performing
different tasks, for example debugging and learning rate scheduler.
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 9/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
Note: your output can be different due to the different weight initialization.
The training gets terminated at Epoch 6 due to the increase of val_loss value and
that is exactly the conditions monitor='val_loss' and patience=0 .
It’s often more convenient to look at a plot, let’s run plot_metric(history, 'loss') to
get a clear picture. In the below graph, validation loss is shown in orange and it’s
clear that validation error increases at Epoch 6.
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 10/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
Apart from the options monitor and patience we mentioned early, the other 2
options min_delta and mode are likely to be used quite often.
mode='auto' : Should be one of auto , min or max . In 'min' mode, training will
stop when the quantity monitored has stopped decreasing; in 'max' mode it will
stop when the quantity monitored has stopped increasing; in 'auto' mode, the
direction is automatically inferred from the name of the monitored quantity.
custom_early_stopping = EarlyStopping(
monitor='val_accuracy',
patience=8,
min_delta=0.001,
mode='max'
)
history = model.fit(
X_train,
y_train,
epochs=200,
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 11/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
validation_split=0.25,
batch_size=40,
verbose=2,
callbacks=[custom_early_stopping]
)
This time, the training gets terminated at Epoch 9 as there are 8 epochs with no
improvement on validation accuracy (It has to be ≥ 0.001 to count as an
improvement). For a clear picture, let’s look at a plot representation of accuracy by
running plot_metric(history, 'accuracy') . In the below graph, validation accuracy
is shown in orange and it’s clear that validation accuracy hasn’t got any
improvement.
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 12/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
That’s it
Thanks for reading.
Stay tuned if you are interested in the practical aspect of machine learning.
References
[1] Hands-on Machine Learning with scikit-learn, keras, and tensorflow:
concepts, tools, and techniques to build intelligent system
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 13/14
01.02.2023, 17:17 Early Stopping in Practice: an example with Keras and TensorFlow 2.0 | by B. Chen | Towards Data Science
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-
edge research to original features you don't want to miss. Take a look.
By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.
https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd 14/14