An End-to-End Tutorial Running Convolution Neural Network on MCU with uTensor
My attempt to showcase CNN on uTensor did not succeed at the COSCUP Taiwan, in 2018. The problem was, I accidentally fed the same image to all the input classes and it was as good as random guesses. On CIFAR10, that yield about 10% accuracy. Despite the fact that the classifier was actually working!
That’s a backstory of CNN on uTensor.
In this post, I’m going to guide you through how to build a CNN with uTensor. We will be using utensor-cli
to seamlessly convert a simple CNN trained in Tensorflow to an equivalent uTensor implementation, compile and run on a MCU (STM32F767, to be specific).
If you are new to uTensor, you may want to read this great MLP tutorial by one of uTensor core developer, , before moving on.
Ok, Let’s get started!
Get the Demo Code
First things first, clone the repo:
% git clone https://github.com/uTensor/simple_cnn_tutorial.git
% cd simple_cnn_tutorial
Next, set up the environment by running the following commands:
# setup a python virtual environment and activate it
% python2.7 -m virtualenv .venv
% source .venv/bin/activate# install mbed-cli and setup libraries
% pip install mbed-cli
% mbed deploy# install uTensor cli *after* setting up the libraries
% pip install utensor_cgen==0.3.3.dev2
After the installation, you are good to go if there is no any other error.
In the repo, there are a few python scripts you may want to explore:
model.py
is the script where the CNN graph is built.train.py
is the training script. You can runpython train.py --help
to see all available hyper-parameters if you want to train one by yourself. In this tutorial, we’ll use a pre-trained protobuf file,cifar10_cnn.pb
.cifar10_cnn.pb
is the pre-trained model we’ll use in this tutorial.prepare_test_data.py
is a python script which will generateimg_data.h
for you, which is the data to run a test with on MCU.
If you want to see the results right now, you can attach your develop board and run make compile
. It should compile and flash the image to your develop board. Then follow the instructions in “Getting the Output” section of ’s tutorial.
From Tensorflow to uTensor
Activate the virtual environment with utensor-cli
. We will first identify the output nodes and then generate the C++ files from the model.
To inspect the graph and find the output node: utensor-cli show <model.pb>
This command shows all the operations/nodes in the given model file. For example, if you run utensor-cli show cifar10_cnn.pb --oneline
, this is what you see on the terminal:
As you can see, this command prints out all the graph nodes in the given cifar10_cnn.pb
.
It’s required to provide a list of output nodes, with--output-nodes
option when we want to convert a protobuf file into uTensor C++ files. show
command is useful and lightweight to explore the model, comparing with other tools such as tensorboard
.
In cifar10_cnn.pb
, the fully_connect_2/logits
is what we are looking for. It’s the output node in this model.
Then, use the convert
command to convert the file into C++ files:
% utensor-cli convert cifar10_cnn.pb --output-nodes \
fully_connect_2/logits
You’ll see this on the console:
At this point, you should find the uTensor implementation of your model in models
directory, which is generated by utensor-cli
.
Finally, generate a batch of testing images by running:
% python prepare_test_data.py # generate testing image batch
It simply downloads the cifar10 dataset and randomly generate a batch of images to test with. You can see the pixel value (normalized to [0, 1]) in the img_data.h
.
Behold, CNN on MCU
In the main.cpp
, you can see how to use the cli-generated model files. A brief summary:
- Create a
Context
object - Build the neural network with the helper function generated by the cli,
get_cifar10_cnn_ctx
in this tutorial. - Get the tensor you want to use for inference.
- Evaluate the neural network.
Note that uTensor
will clear tensors with 0 reference count after evaluation. You’ll get an error or dangling pointer issue if you get the tensor after neural network evaluation.
Ok, enough coding stuff. Here comes the fun part, we are ready to compile and run a CNN on MCU.
First, attach your develop board to you computer:
Next, run:
% mbed compile -m auto -t GCC_ARM --profile \
uTensor/build_profile/release.json -f
mbed
will automatically detect the model of your develop board, with-m auto
flag.- use a profile to build the binary,
release.json
in this tutorial. - and flash the binary image to your board, with
-f
flag enabled.
If everything goes well, you should see:
It may take a while to finish, just be patient ; )
To see the serial output of my NUCLEO-F767ZI, I use CoolTerm (you can download it from here). Setup CoolTerm as described in ’s tutorial and press the reset button, you should see:
The accuracy may vary depending on the random image batch you generate with prepare_test_data.py
. If you have bad luck with this batch, just try another one : P
Congrats! Our CNN is alive on MCU.
What’s Next
The uTensor team is very active and trying to bring data science to the realm of edge computing.
In this tutorial, we provide a show case where you can do end-to-end application with uTensor and other neural network framework like Tensorflow.
If you are a big fan of PyTorch and wondering if PyTorch is on the roadmap of uTensor, don’t worry. I’m a fan of PyTorch too and our team are working on making PyTorch integrated with uTensor just like what we did with Tensorflow. We’re making progress but you are welcomed to join in.
You can find the codes at https://github.com/uTensor, there are two repos you should go through first:
- uTensor: the C++ runtime
- utensor_cgen: uTensor CLI written in python with ❤
Finally, a special thanks to and for their helpful advice and helping me debugging this tutorial.