01_mnist.ipynb (4) - JupyterLab
01_mnist.ipynb (4) - JupyterLab
Header
1.1 Objectives
Understand how deep learning can solve problems traditional programming methods cannot
Learn about the MNIST handwritten digits dataset
Use the torchvision to load the MNIST dataset and prepare it for training
Create a simple neural network to perform image classification
Train the neural network using the prepped MNIST dataset
Observe the performance of the trained neural network
# Visualization tools
import torchvision
import torchvision.transforms.v2 as transforms
import torchvision.transforms.functional as F
import matplotlib.pyplot as plt
In PyTorch, we can use our GPU in our operations by setting the device to cuda . The function torch.cuda.is_available() will
confirm PyTorch can recognize the GPU.
100.28.217.36/lab/lab/tree/01_mnist.ipynb 1/23
3/13/25, 9:51 AM 01_mnist
Out[2]: True
Image classification, which asks a program to correctly classify an image it has never seen before into its correct class, is near
impossible to solve with traditional programming techniques. How could a programmer possibly define the rules and conditions to
correctly classify a huge variety of images, especially taking into account images that they have never seen?
100.28.217.36/lab/lab/tree/01_mnist.ipynb 2/23
3/13/25, 9:51 AM 01_mnist
When working with images for deep learning, we need both the images themselves, usually denoted as X , and also, correct labels
for these images, usually denoted as Y . Furthermore, we need X and Y values both for training the model, and then, a separate
set of X and Y values for validating the performance of the model after it has been trained.
We can imagine these X and Y pairs as a set of flash cards. A student can train with one set of flashcards, and to validate the
student learned the correct concepts, a teacher might quiz the student with a different set of flash cards.
The process of preparing data for analysis is called Data Engineering. To learn more about the differences between training data
and validation data (as well as test data), check out this article by Jason Brownlee.
We will also use the TorchVision library. One of the many helpful features that it provides are modules containing helper methods
for many common datasets, including MNIST.
We will begin by loading both the train and valid datasets for MNIST.
We stated above that the MNIST dataset contained 70,000 grayscale images of handwritten digits. By executing the following cells,
we can see that TorchVision has partitioned 60,000 of these PIL Images for training, and 10,000 for validation (after training).
100.28.217.36/lab/lab/tree/01_mnist.ipynb 3/23
3/13/25, 9:51 AM 01_mnist
In [4]: train_set
In [5]: valid_set
Note: The Split for valid_set is stated as Test , but we will be using the data for validation in our hands-on exercises. To
learn more about the difference between Train , Valid , and Test datasets, please view this article by Kili.
In [7]: x_0
Out[7]:
In [8]: type(x_0)
Out[8]: PIL.Image.Image
In [9]: y_0
Out[9]: 5
100.28.217.36/lab/lab/tree/01_mnist.ipynb 4/23
3/13/25, 9:51 AM 01_mnist
In [10]: type(y_0)
Out[10]: int
1.3 Tensors
If a vector is a 1-dimensional array, and a matrix is a 2-dimensional array, a tensor is an n-dimensional array representing any
number of dimensions. Most modern neural network frameworks are powerful tensor processing tools.
One example of a 3-dimensional tensor could be pixels on a computer screen. The different dimensions would be width, height,
and color channel. Video games use matrix mathematics to calculate pixel values in a similar way to how neural networks calculate
tensors. This is why GPUs are effective tensor processesing machines.
Let's convert our images into tensors so we can later process them with a neural network. TorchVision has a useful function to
convert PIL Images into tensors with the ToTensor class:
PyTorch tensors have a number of useful properies and methods. We can verify the data type:
In [12]: x_0_tensor.dtype
Out[12]: torch.float32
We can verify the minimum and maximum values. PIL Images have a potential integer range of [0, 255], but the ToTensor class
converts it to a float range of [0.0, 1.0].
In [13]: x_0_tensor.min()
Out[13]: tensor(0.)
100.28.217.36/lab/lab/tree/01_mnist.ipynb 5/23
3/13/25, 9:51 AM 01_mnist
In [14]: x_0_tensor.max()
Out[14]: tensor(1.)
We can also view the size of each dimension. PyTorch uses a C x H x W convention, which means the first dimension is color
channel, the second is height, and the third is width.
Since these images are black and white, there is only 1 color channel. The images are square being 28 pixels tall and wide:
In [15]: x_0_tensor.size()
In [16]: x_0_tensor
100.28.217.36/lab/lab/tree/01_mnist.ipynb 6/23
3/13/25, 9:51 AM 01_mnist
100.28.217.36/lab/lab/tree/01_mnist.ipynb 7/23
3/13/25, 9:51 AM 01_mnist
100.28.217.36/lab/lab/tree/01_mnist.ipynb 8/23
3/13/25, 9:51 AM 01_mnist
In [17]: x_0_tensor.device
Out[17]: device(type='cpu')
100.28.217.36/lab/lab/tree/01_mnist.ipynb 9/23
3/13/25, 9:51 AM 01_mnist
The .cuda method will fail if a GPU is not recognized by PyTorch. In order to make our code flexible, we can send our tensor to
the device we identified at the start of this notebook. This way, our code will run much faster if a GPU is available, but the code
will not break if there is no available GPU.
In [19]: x_0_tensor.to(device).device
Sometimes, it can be hard to interpret so many numbers. Thankfully, TorchVision can convert C x H x W tensors back into a PIL
image with the to_pil_image function.
100.28.217.36/lab/lab/tree/01_mnist.ipynb 10/23
3/13/25, 9:51 AM 01_mnist
1.4.1 Transforms
The Compose fuction combines a list of transforms. We will learn more about transforms in a later notebook, but have copied the
trans definition below as an introduction.
100.28.217.36/lab/lab/tree/01_mnist.ipynb 11/23
3/13/25, 9:51 AM 01_mnist
Before, we only applied trans to one value. There are multiple ways we can apply our list of transforms to a dataset. One such
way is to set it to a dataset's transform variable.
1.4.2 DataLoaders
If our dataset is a deck of flash cards, a DataLoader defines how we pull cards from the deck to train an AI model. We could show
our models the entire dataset at once. Not only does this take a lot of computational resources, but research shows using a smaller
batch of data is more efficient for model training.
For example, if our batch_size is 32, we will train our model by shuffling the deck and drawing 32 cards. We do not need to
shuffle for validation as the model is not learning, but we will still use a batch_size to prevent memory errors.
The batch size is something the model developer decides, and the best value will depend on the problem being solved. Research
shows 32 or 64 is sufficient for many machine learning problems and is the default in some machine learning frameworks, so we will
use 32 here.
In [23]: batch_size = 32
100.28.217.36/lab/lab/tree/01_mnist.ipynb 12/23
3/13/25, 9:51 AM 01_mnist
More information about these layers is available in this blog post by Sarita.
In [24]: layers = []
layers
Out[24]: []
In [26]: nn.Flatten()(test_matrix)
Nothing happened? That's because neural networks expect to recieve a batch of data. Currently, the Flatten layer sees three vectors
as opposed to one 2d matrix. To fix this, we can "batch" our data by adding an extra pair of brackets. Since test_matrix is now a
tensor, we can do that with the shorthand below. None adds a new dimension where : selects all the data in a tensor.
100.28.217.36/lab/lab/tree/01_mnist.ipynb 13/23
3/13/25, 9:51 AM 01_mnist
In [28]: nn.Flatten()(batch_test_matrix)
Now that we've gotten the hang of the Flatten layer, let's add it to our list of layers .
In [30]: layers = [
nn.Flatten()
]
layers
In order to create these weights, Pytorch needs to know the size of our inputs and how many neurons we want to create. Since
we've flattened our images, the size of our inputs is the number of channels, number of pixels vertically, and number of pixels
horizontally multiplied together.
In [31]: input_size = 1 * 28 * 28
100.28.217.36/lab/lab/tree/01_mnist.ipynb 14/23
3/13/25, 9:51 AM 01_mnist
Choosing the correct number of neurons is what puts the "science" in "data science" as it is a matter of capturing the statistical
complexity of the dataset. For now, we will use 512 neurons. Try playing around with this value later to see how it affects training
and to start developing a sense for what this number means.
We will learn more about activation functions later, but for now, we will use the relu activation function, which in short, will help our
network to learn how to make more sophisticated guesses about data than if it were required to make guesses based on some
strictly linear function.
In [32]: layers = [
nn.Flatten(),
nn.Linear(input_size, 512), # Input
nn.ReLU(), # Activation for input
]
layers
In [33]: layers = [
nn.Flatten(),
nn.Linear(input_size, 512), # Input
nn.ReLU(), # Activation for input
nn.Linear(512, 512), # Hidden
nn.ReLU() # Activation for hidden
]
layers
100.28.217.36/lab/lab/tree/01_mnist.ipynb 15/23
3/13/25, 9:51 AM 01_mnist
We will not assign the relu function to the output layer. Instead, we will apply a loss function covered in the next section.
In [34]: n_classes = 10
layers = [
nn.Flatten(),
nn.Linear(input_size, 512), # Input
nn.ReLU(), # Activation for input
nn.Linear(512, 512), # Hidden
nn.ReLU(), # Activation for hidden
nn.Linear(512, n_classes) # Output
]
layers
100.28.217.36/lab/lab/tree/01_mnist.ipynb 16/23
3/13/25, 9:51 AM 01_mnist
Out[35]: Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=784, out_features=512, bias=True)
(2): ReLU()
(3): Linear(in_features=512, out_features=512, bias=True)
(4): ReLU()
(5): Linear(in_features=512, out_features=10, bias=True)
)
Much like tensors, when the model is first initialized, it will be processed on a CPU. To have it process with a GPU, we can use
to(device) .
In [36]: model.to(device)
Out[36]: Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=784, out_features=512, bias=True)
(2): ReLU()
(3): Linear(in_features=512, out_features=512, bias=True)
(4): ReLU()
(5): Linear(in_features=512, out_features=10, bias=True)
)
To check which device a model is on, we can check which device the model parameters are on. Check out this stack overflow post
for more information.
In [37]: next(model.parameters()).device
PyTorch 2.0 introduced the ability to compile a model for faster performance. Learn more about it here.
100.28.217.36/lab/lab/tree/01_mnist.ipynb 17/23
3/13/25, 9:51 AM 01_mnist
"Training a model with data" is often also called "fitting a model to data." Put another way, it highlights that the shape of the model
changes over time to more accurately understand the data that it is being given.
Next, we select an optimizer for our model. If the loss_function provides a grade, the optimizer tells the model how to learn
from this grade to do better next time.
In order to accurately calculate accuracy, we should compare the number of correct classifications compared to the total number of
predictions made. Since we're showing data to the model in batches, our accuracy can be calculated along with these batches.
First, the total number of predictions is the same size as our dataset. Let's assign the size of our datasets to N where n is
synonymous with the batch size .
100.28.217.36/lab/lab/tree/01_mnist.ipynb 18/23
3/13/25, 9:51 AM 01_mnist
Next, we'll make a function to calculate the accuracy for each batch. The result is a fraction of the total accuracy, so we can add the
accuracy of each batch together to get the total.
model.train()
for x, y in train_loader:
x, y = x.to(device), y.to(device)
output = model(x)
optimizer.zero_grad()
batch_loss = loss_function(output, y)
batch_loss.backward()
optimizer.step()
loss += batch_loss.item()
accuracy += get_batch_accuracy(output, y, train_N)
print('Train - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))
100.28.217.36/lab/lab/tree/01_mnist.ipynb 19/23
3/13/25, 9:51 AM 01_mnist
Similarly, this is the code for validating the model with data it did not train on. Can you spot some differences with the train
function?
model.eval()
with torch.no_grad():
for x, y in valid_loader:
x, y = x.to(device), y.to(device)
output = model(x)
An epoch is one complete pass through the entire dataset. Let's train and validate the model for 5 epochs to see how it learns.
In [45]: epochs = 5
100.28.217.36/lab/lab/tree/01_mnist.ipynb 20/23
3/13/25, 9:51 AM 01_mnist
Epoch: 0
Train - Loss: 382.8538 Accuracy: 0.9385
Valid - Loss: 37.6474 Accuracy: 0.9603
Epoch: 1
Train - Loss: 158.9386 Accuracy: 0.9740
Valid - Loss: 25.4440 Accuracy: 0.9748
Epoch: 2
Train - Loss: 109.2504 Accuracy: 0.9816
Valid - Loss: 26.8595 Accuracy: 0.9749
Epoch: 3
Train - Loss: 86.6604 Accuracy: 0.9853
Valid - Loss: 24.1881 Accuracy: 0.9769
Epoch: 4
Train - Loss: 63.4470 Accuracy: 0.9888
Valid - Loss: 27.2187 Accuracy: 0.9765
We're already close to 100%! Let's see if it's true by testing it on our original sample. We can use our model like a function:
There should be ten numbers, each corresponding to a different output neuron. Thanks to how the data is structured, the index of
each number matches the corresponding handwritten number. The 0th index is a prediction for a handwritten 0, the 1st index is a
prediction for a handwritten 1, and so on.
We can use the argmax function to find the index of the highest value.
In [48]: y_0
100.28.217.36/lab/lab/tree/01_mnist.ipynb 21/23
3/13/25, 9:51 AM 01_mnist
Out[48]: 5
1.7 Summary
The model did quite well! The accuracy quickly reached close to 100%, as did the validation accuracy. We now have a model that
can be used to accurately detect and classify hand-written images.
The next step would be to use this model to classify new not-yet-seen handwritten images. This is called inference. We'll explore the
process of inference in a later exercise.
It's worth taking a moment to appreciate what we've done here. Historically, the expert systems that were built to do this kind of
task were extremely complicated, and people spent their careers building them (check out the references on the official MNIST
page and the years milestones were reached).
MNIST is not only useful for its historical influence on Computer Vision, but it's also a great benchmark and debugging tool. Having
trouble getting a fancy new machine learning architecture working? Check it against MNIST. If it can't learn on this dataset, chances
are it won't learn on more complicated images and datasets.
1.7.2 Next
In this section you learned how to build and train a simple neural network for image classification. In the next section, you will be
asked to build your own neural network and perform data preparation to solve a different image classification problem.
Header
100.28.217.36/lab/lab/tree/01_mnist.ipynb 22/23
3/13/25, 9:51 AM 01_mnist
100.28.217.36/lab/lab/tree/01_mnist.ipynb 23/23