The Embedded Rust Book
The Embedded Rust Book
Welcome to The Embedded Rust Book: An introductory book about using the Rust
Programming Language on "Bare Metal" embedded systems, such as Microcontrollers.
Scope
The goals of this book are:
Get developers up to speed with embedded Rust development. i.e. How to set up a
development environment.
Share current best practices about using Rust for embedded development. i.e. How to
best use Rust language features to write more correct embedded software.
Serve as a cookbook in some cases. e.g. How do I mix C and Rust in a single project?
This book tries to be as general as possible but to make things easier for both the readers
and the writers it uses the ARM Cortex-M architecture in all its examples. However, the
book doesn't assume that the reader is familiar with this particular architecture and
explains details particular to this architecture where required.
You are comfortable using the Rust Programming Language, and have written, run,
and debugged Rust applications on a desktop environment. You should also be
familiar with the idioms of the 2018 edition as this book targets Rust 2018.
Other Resources
If you are unfamiliar with anything mentioned above or if you want more information about
a specific topic mentioned in this book you might find some of these resources helpful.
Rust, Embedded
Comprehensive Rust 🦀 :
Teaching material for a 1-day
class on bare-metal Rust
Bare Metal
development
Interrupts Interrupt -
Memory-mapped
Memory-mapped I/O -
IO/Peripherals
SPI, UART, RS232, Stack Exchange about -
USB, I2C, TTL SPI, UART, and other
Topic Resource Description
interfaces
Translations
This book has been translated by generous volunteers. If you would like your translation
listed here, please open a PR to add it.
Japanese (repository)
Chinese (repository)
For this reason, we suggest purchasing the STM32F3DISCOVERY development board for the
purpose of following the examples in this book.
If you have trouble following the instructions in this book or find that some section of the
book is not clear enough or hard to follow then that's a bug and it should be reported in
the issue tracker of this book.
Pull requests fixing typos and adding new content are very welcome!
Re-using this material
This book is distributed under the following licenses:
The code samples and free-standing Cargo projects contained within this book are
licensed under the terms of both the MIT License and the Apache License v2.0.
The written prose, pictures and diagrams contained within this book are licensed
under the terms of the Creative Commons CC-BY-SA v4.0 license.
TL;DR: If you want to use our text or images in your work, you need to:
Give the appropriate credit (i.e. mention this book on your slide, and provide a link to
the relevant page)
Provide a link to the CC-BY-SA v4.0 licence
Indicate if you have changed the material in any way, and make any changes to our
material available under the same licence
48 KiB of RAM.
General purpose Input Output (GPIO) and other types of pins accessible through
the two rows of headers along side the board.
A USB interface accessible through the USB port labeled "USB USER".
An accelerometer as part of the LSM303DLHC chip.
For a more detailed list of features and further specifications of the board take a look at the
STMicroelectronics website.
A word of caution: be careful if you want to apply external signals to the board. The
microcontroller STM32F303VCT6 pins take a nominal voltage of 3.3 volts. For further
information consult the 6.2 Absolute maximum ratings section in the manual
A no_std Rust Environment
The term Embedded Programming is used for a wide range of different classes of
programming. Ranging from programming 8-Bit MCUs (like the ST72325xx) with just a few
KB of RAM and ROM, up to systems like the Raspberry Pi (Model B 3+) which has a 32/64-bit
4-core Cortex-A53 @ 1.4 GHz and 1GB of RAM. Different restrictions/limitations will apply
when writing code depending on what kind of target and use case you have.
Hosted Environments
These kinds of environments are close to a normal PC environment. What this means is
that you are provided with a System Interface E.G. POSIX that provides you with primitives
to interact with various systems, such as file systems, networking, memory management,
threads, etc. Standard libraries in turn usually depend on these primitives to implement
their functionality. You may also have some sort of sysroot and restrictions on RAM/ROM-
usage, and perhaps some special HW or I/Os. Overall it feels like coding on a special-
purpose PC environment.
As mentioned before using libstd requires some sort of system integration, but this is not
only because libstd is just providing a common way of accessing OS abstractions, it also
provides a runtime. This runtime, among other things, takes care of setting up stack
overflow protection, processing command line arguments, and spawning the main thread
before a program's main function is invoked. This runtime also won't be available in a
no_std environment.
Summary
#![no_std] is a crate-level attribute that indicates that the crate will link to the core-crate
instead of the std-crate. The libcore crate in turn is a platform-agnostic subset of the std
crate which makes no assumptions about the system the program will run on. As such, it
provides APIs for language primitives like floats, strings and slices, as well as APIs that
expose processor features like atomic operations and SIMD instructions. However it lacks
APIs for anything that involves platform integration. Because of these properties no_std
and libcore code can be used for any kind of bootstrapping (stage 0) code like bootloaders,
firmware or kernels.
Overview
* Only if you use the alloc crate and use a suitable allocator like alloc-cortex-m.
** Only if you use the collections crate and configure a global default allocator.
** HashMap and HashSet are not available due to a lack of a secure random number
generator.
See Also
RFC-1184
Tooling
Dealing with microcontrollers involves using several different tools as we'll be dealing with
an architecture different than your laptop's and we'll have to run and debug programs on
a remote device.
We'll use all the tools listed below. Any recent version should work when a minimum
version is not specified, but we have listed the versions we have tested.
Rust 1.31, 1.31-beta, or a newer toolchain PLUS ARM Cortex-M compilation support.
cargo-binutils ~0.1.4
qemu-system-arm . Tested versions: 3.0.0
OpenOCD >=0.8. Tested versions: v0.9.0 and v0.10.0
GDB with ARM support. Version 7.12 or newer highly recommended. Tested versions:
7.10, 7.11, 7.12 and 8.1
cargo-generate or git . These tools are optional but will make it easier to follow
along with the book.
The text below explains why we are using these tools. Installation instructions can be found
on the next page.
cargo-generate OR git
Bare metal programs are non-standard ( no_std ) Rust programs that require some
adjustments to the linking process in order to get the memory layout of the program right.
This requires some additional files (like linker scripts) and settings (like linker flags). We
have packaged those for you in a template such that you only need to fill in the missing
information (such as the project name and the characteristics of your target hardware).
Our template is compatible with cargo-generate : a Cargo subcommand for creating new
Cargo projects from templates. You can also download the template using git , curl ,
wget , or your web browser.
cargo-binutils
cargo-binutils is a collection of Cargo subcommands that make it easy to use the LLVM
tools that are shipped with the Rust toolchain. These tools include the LLVM versions of
objdump , nm and size and are used for inspecting binaries.
The advantage of using these tools over GNU binutils is that (a) installing the LLVM tools is
the same one-command installation ( rustup component add llvm-tools-preview )
regardless of your OS and (b) tools like objdump support all the architectures that rustc
supports -- from ARM to x86_64 -- because they both share the same LLVM backend.
qemu-system-arm
QEMU is an emulator. In this case we use the variant that can fully emulate ARM systems.
We use QEMU to run embedded programs on the host. Thanks to this you can follow some
parts of this book even if you don't have any hardware with you!
GDB
A debugger is a very important component of embedded development as you may not
always have the luxury to log stuff to the host console. In some cases, you may not even
have LEDs to blink on your hardware!
In general, LLDB works as well as GDB when it comes to debugging but we haven't found an
LLDB counterpart to GDB's load command, which uploads the program to the target
hardware, so currently we recommend that you use GDB.
OpenOCD
GDB isn't able to communicate directly with the ST-Link debugging hardware on your
STM32F3DISCOVERY development board. It needs a translator and the Open On-Chip
Debugger, OpenOCD, is that translator. OpenOCD is a program that runs on your
laptop/PC and translates between GDB's TCP/IP based remote debug protocol and ST-
Link's USB based protocol.
OpenOCD also performs other important work as part of its translation for the debugging
of the ARM Cortex-M based microcontroller on your STM32F3DISCOVERY development
board:
It knows how to interact with the memory mapped registers used by the ARM
CoreSight debug peripheral. It is these CoreSight registers that allow for:
Breakpoint/Watchpoint manipulation
Reading and writing of the CPU registers
Detecting when the CPU has been halted for a debug event
Continuing CPU execution after a debug event has been encountered
etc.
It also knows how to erase and write to the microcontroller's FLASH
Installing the tools
This page contains OS-agnostic installation instructions for a few of the tools:
Rust Toolchain
NOTE Make sure you have a compiler version equal to or newer than 1.31 . rustc -V
should return a date newer than the one shown below.
$ rustc -V
rustc 1.31.1 (b6c32da9b 2018-12-18)
For bandwidth and disk usage concerns the default installation only supports native
compilation. To add cross compilation support for the ARM Cortex-M architectures choose
one of the following compilation targets. For the STM32F3DISCOVERY board used for the
examples in this book, use the thumbv7em-none-eabihf target.
cargo-binutils
WINDOWS: prerequisite C++ Build Tools for Visual Studio 2019 is installed.
https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?
sku=BuildTools&rel=16
cargo-generate
Note: on some Linux distros (e.g. Ubuntu) you may need to install the packages libssl-dev
and pkg-config prior to installing cargo-generate.
OS-Specific Instructions
Linux
Windows
macOS
Linux
Here are the installation commands for a few Linux distributions.
Packages
Ubuntu 18.04 or newer / Debian stretch or newer
NOTE gdb-multiarch is the GDB command you'll use to debug your ARM Cortex-M
programs
NOTE arm-none-eabi-gdb is the GDB command you'll use to debug your ARM Cortex-
M programs
Fedora 27 or newer
Arch Linux
NOTE arm-none-eabi-gdb is the GDB command you'll use to debug ARM Cortex-M
programs
udev rules
This rule lets you use OpenOCD with the Discovery board without root privilege.
Create the file /etc/udev/rules.d/70-st-link.rules with the contents shown below.
If you had the board plugged to your laptop, unplug it and then plug it again.
lsusb
(..)
Bus 001 Device 018: ID 0483:374b STMicroelectronics ST-LINK/V2.1
(..)
Take note of the bus and device numbers. Use those numbers to create a path like
/dev/bus/usb/<bus>/<device> . Then use this path like so:
ls -l /dev/bus/usb/001/018
user::rw-
user:you:rw-
$ # GDB
$ brew install armmbed/formulae/arm-none-eabi-gcc
$ # OpenOCD
$ brew install openocd
$ # QEMU
$ brew install qemu
NOTE If OpenOCD crashes you may need to install the latest version using:
$ # GDB
$ sudo port install arm-none-eabi-gcc
$ # OpenOCD
$ sudo port install openocd
$ # QEMU
$ sudo port install qemu
arm-none-eabi-gdb
ARM provides .exe installers for Windows. Grab one from here, and follow the
instructions. Just before the installation process finishes tick/select the "Add path to
environment variable" option. Then verify that the tools are in your %PATH% :
$ arm-none-eabi-gdb -v
GNU gdb (GNU Tools for Arm Embedded Processors 7-2018-q2-update) 8.1.0.20180315-git
(..)
OpenOCD
There's no official binary release of OpenOCD for Windows but if you're not in the mood to
compile it yourself, the xPack project provides a binary distribution, here. Follow the
provided installation instructions. Then update your %PATH% environment variable to
include the path where the binaries were installed.
( C:\Users\USERNAME\AppData\Roaming\xPacks\@xpack-dev-tools\openocd\0.10.0-
13.1\.content\bin\ , if you've been using the easy install)
$ openocd -v
Open On-Chip Debugger 0.10.0
(..)
QEMU
Grab QEMU from the official website.
Connect your laptop / PC to the discovery board using a Mini-USB USB cable. The discovery
board has two USB connectors; use the one labeled "USB ST-LINK" that sits on the center of
the edge of the board.
Also check that the ST-LINK header is populated. See the picture below; the ST-LINK header
is highlighted.
NOTE: Old versions of openocd, including the 0.10.0 release from 2017, do not contain
the new (and preferable) interface/stlink.cfg file; instead you may need to use
interface/stlink-v2.cfg or interface/stlink-v2-1.cfg .
You should get the following output and the program should block the console:
Open On-Chip Debugger 0.10.0
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "hla_swd". To override use
'transport select <transport>'.
adapter speed: 1000 kHz
adapter_nsrst_delay: 100
Info : The selected transport took over low-level target control. The results might
differ compared to plain JTAG/SWD
none separate
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : clock speed 950 kHz
Info : STLINK v2 JTAG v27 API v2 SWIM v15 VID 0x0483 PID 0x374B
Info : using stlink api v2
Info : Target voltage: 2.919881
Info : stm32f3x.cpu: hardware has 6 breakpoints, 4 watchpoints
The contents may not match exactly but you should get the last line about breakpoints and
watchpoints. If you got it then terminate the OpenOCD process and move to the next
section.
If you didn't get the "breakpoints" line then try one of the following commands.
If one of those commands works it means you got an old hardware revision of the
discovery board. That won't be a problem but commit that fact to memory as you'll need to
configure things a bit differently later on. You can move to the next section.
If none of the commands work as a normal user then try to run them with root permission
(e.g. sudo openocd .. ). If the commands do work with root permission then check that the
udev rules have been correctly set.
If you have reached this point and OpenOCD is not working please open an issue and we'll
help you out!
Getting Started
In this section we'll walk you through the process of writing, building, flashing and
debugging embedded programs. You will be able to try most of the examples without any
special hardware as we will show you the basics using QEMU, a popular open-source
hardware emulator. The only section where hardware is required is, naturally enough, the
Hardware section, where we use OpenOCD to program an STM32F3DISCOVERY.
QEMU
We'll start writing a program for the LM3S6965, a Cortex-M3 microcontroller. We have
chosen this as our initial target because it can be emulated using QEMU so you don't need
to fiddle with hardware in this section and we can focus on the tooling and the
development process.
IMPORTANT We'll use the name "app" for the project name in this tutorial. Whenever you
see the word "app" you should replace it with the name you selected for your project. Or,
you could also name your project "app" and avoid the substitutions.
Using cargo-generate
cd app
Using git
[package]
authors = ["{{authors}}"] # "{{authors}}" -> "John Smith"
edition = "2018"
name = "{{project-name}}" # "{{project-name}}" -> "app"
version = "0.1.0"
# ..
[[bin]]
name = "{{project-name}}" # "{{project-name}}" -> "app"
test = false
bench = false
Using neither
Grab the latest snapshot of the cortex-m-quickstart template and extract it.
Or you can browse to cortex-m-quickstart , click the green "Clone or download" button
and then click "Download ZIP".
Then fill in the placeholders in the Cargo.toml file as done in the second part of the "Using
git " version.
Program Overview
For convenience here are the most important parts of the source code in src/main.rs :
#![no_std]
#![no_main]
use panic_halt as _;
use cortex_m_rt::entry;
#[entry]
fn main() -> ! {
loop {
// your code goes here
}
}
This program is a bit different from a standard Rust program so let's take a closer look.
#![no_std] indicates that this program will not link to the standard crate, std . Instead it
will link to its subset: the core crate.
#![no_main] indicates that this program won't use the standard main interface that most
Rust programs use. The main (no pun intended) reason to go with no_main is that using
the main interface in no_std context requires nightly.
use panic_halt as _; . This crate provides a panic_handler that defines the panicking
behavior of the program. We will cover this in more detail in the Panicking chapter of the
book.
#[entry] is an attribute provided by the cortex-m-rt crate that's used to mark the entry
point of the program. As we are not using the standard main interface we need another
way to indicate the entry point of the program and that'd be #[entry] .
fn main() -> ! . Our program will be the only process running on the target hardware so
we don't want it to end! We use a divergent function (the -> ! bit in the function
signature) to ensure at compile time that'll be the case.
Cross compiling
The next step is to cross compile the program for the Cortex-M3 architecture. That's as
simple as running cargo build --target $TRIPLE if you know what the compilation target
( $TRIPLE ) should be. Luckily, the .cargo/config.toml in the template has the answer:
To cross compile for the Cortex-M3 architecture we have to use thumbv7m-none-eabi . That
target is not automatically installed when installing the Rust toolchain, it would now be a
good time to add that target to the toolchain, if you haven't done it yet:
Since the thumbv7m-none-eabi compilation target has been set as the default in your
.cargo/config.toml file, the two commands below do the same:
Inspecting
Now we have a non-native ELF binary in target/thumbv7m-none-eabi/debug/app . We can
inspect it using cargo-binutils .
With cargo-readobj we can print the ELF headers to confirm that this is an ARM binary.
Note that:
cargo-size can print the size of the linker sections of the binary.
app :
section size addr
.vector_table 1024 0x0
.text 92 0x400
.rodata 0 0x45c
.data 0 0x20000000
.bss 0 0x20000000
.debug_str 2958 0x0
.debug_loc 19 0x0
.debug_abbrev 567 0x0
.debug_info 4929 0x0
.debug_ranges 40 0x0
.debug_macinfo 1 0x0
.debug_pubnames 2035 0x0
.debug_pubtypes 1892 0x0
.ARM.attributes 46 0x0
.debug_frame 100 0x0
.debug_line 867 0x0
Total 14570
IMPORTANT: ELF files contain metadata like debug information so their size on disk does
not accurately reflect the space the program will occupy when flashed on a device. Always
use cargo-size to check how big a binary really is.
NOTE if the above command complains about Unknown command line argument see
the following bug report: https://github.com/rust-embedded/book/issues/269
NOTE this output can differ on your system. New versions of rustc, LLVM and libraries
can generate different assembly. We truncated some of the instructions to keep the
snippet small.
app: file format ELF32-arm-little
Reset:
406: bl #0x24e
40a: movw r0, #0x0
< .. truncated any more instructions .. >
DefaultHandler_:
656: b #-0x4 <DefaultHandler_>
UsageFault:
657: strb r7, [r4, #0x3]
DefaultPreInit:
658: bx lr
__pre_init:
659: strb r7, [r0, #0x1]
__nop:
65a: bx lr
HardFaultTrampoline:
65c: mrs r0, msp
660: b #-0x2 <HardFault_>
HardFault_:
662: b #-0x4 <HardFault_>
HardFault:
663: <unknown>
Running
Next, let's see how to run an embedded program on QEMU! This time we'll use the hello
example which actually does something.
#![no_main]
#![no_std]
use panic_halt as _;
use cortex_m_rt::entry;
use cortex_m_semihosting::{debug, hprintln};
#[entry]
fn main() -> ! {
hprintln!("Hello, world!").unwrap();
// exit QEMU
// NOTE do not run this on hardware; it can corrupt OpenOCD state
debug::exit(debug::EXIT_SUCCESS);
loop {}
}
This program uses something called semihosting to print text to the host console. When
using real hardware this requires a debug session but when using QEMU this Just Works.
qemu-system-arm \
-cpu cortex-m3 \
-machine lm3s6965evb \
-nographic \
-semihosting-config enable=on,target=native \
-kernel target/thumbv7m-none-eabi/debug/examples/hello
Hello, world!
The command should successfully exit (exit code = 0) after printing the text. On *nix you
can check that with the following command:
echo $?
-cpu cortex-m3 . This tells QEMU to emulate a Cortex-M3 CPU. Specifying the CPU
model lets us catch some miscompilation errors: for example, running a program
compiled for the Cortex-M4F, which has a hardware FPU, will make QEMU error during
its execution.
-kernel $file . This tells QEMU which binary to load and run on the emulated
machine.
Typing out that long QEMU command is too much work! We can set a custom runner to
simplify the process. .cargo/config.toml has a commented out runner that invokes
QEMU; let's uncomment it:
[target.thumbv7m-none-eabi]
# uncomment this to make `cargo run` execute programs on QEMU
runner = "qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic -
semihosting-config enable=on,target=native -kernel"
This runner only applies to the thumbv7m-none-eabi target, which is our default compilation
target. Now cargo run will compile the program and run it on QEMU:
Debugging an embedded device involves remote debugging as the program that we want to
debug won't be running on the machine that's running the debugger program (GDB or
LLDB).
Remote debugging involves a client and a server. In a QEMU setup, the client will be a GDB
(or LLDB) process and the server will be the QEMU process that's also running the
embedded program.
qemu-system-arm \
-cpu cortex-m3 \
-machine lm3s6965evb \
-nographic \
-semihosting-config enable=on,target=native \
-gdb tcp::3333 \
-S \
-kernel target/thumbv7m-none-eabi/debug/examples/hello
This command won't print anything to the console and will block the terminal. We have
passed two extra flags this time:
-gdb tcp::3333 . This tells QEMU to wait for a GDB connection on TCP port 3333.
-S . This tells QEMU to freeze the machine at startup. Without this the program would
have reached the end of main before we had a chance to launch the debugger!
Next we launch GDB in another terminal and tell it to load the debug symbols of the
example:
gdb-multiarch -q target/thumbv7m-none-eabi/debug/examples/hello
NOTE: you might need another version of gdb instead of gdb-multiarch depending on
which one you installed in the installation chapter. This could also be arm-none-eabi-gdb or
just gdb .
Then within the GDB shell we connect to QEMU, which is waiting for a connection on TCP
port 3333.
You'll see that the process is halted and that the program counter is pointing to a function
named Reset . That is the reset handler: what Cortex-M cores execute upon booting.
core::num::bignum::Big32x40::mul_small () at src/libcore/num/bignum.rs:254
src/libcore/num/bignum.rs: No such file or directory.
That's a known glitch. You can safely ignore those warnings, you're most likely at
Reset().
This reset handler will eventually call our main function. Let's skip all the way there using a
breakpoint and the continue command. To set the breakpoint, let's first take a look where
we would like to break in our code, with the list command.
list main
This will show the source code, from the file examples/hello.rs.
6 use panic_halt as _;
7
8 use cortex_m_rt::entry;
9 use cortex_m_semihosting::{debug, hprintln};
10
11 #[entry]
12 fn main() -> ! {
13 hprintln!("Hello, world!").unwrap();
14
15 // exit QEMU
We would like to add a breakpoint just before the "Hello, world!", which is on line 13. We do
that with the break command:
break 13
We can now instruct gdb to run up to our main function, with the continue command:
continue
Continuing.
We are now close to the code that prints "Hello, world!". Let's move forward using the next
command.
next
16 debug::exit(debug::EXIT_SUCCESS);
At this point you should see "Hello, world!" printed on the terminal that's running qemu-
system-arm .
$ qemu-system-arm (..)
Hello, world!
next
quit
Hardware
By now you should be somewhat familiar with the tooling and the development process. In
this section we'll switch to real hardware; the process will remain largely the same. Let's
dive in.
Does the ARM core include an FPU? Cortex-M4F and Cortex-M7F cores do.
How much Flash memory and RAM does the target device have? e.g. 256 KiB of Flash
and 32 KiB of RAM.
Where are Flash memory and RAM mapped in the address space? e.g. RAM is
commonly located at address 0x2000_0000 .
You can find this information in the data sheet or the reference manual of your device.
In this section we'll be using our reference hardware, the STM32F3DISCOVERY. This board
contains an STM32F303VCT6 microcontroller. This microcontroller has:
40 KiB of RAM located at address 0x2000_0000. (There's another RAM region but for
simplicity we'll ignore it).
Configuring
We'll start from scratch with a fresh template instance. Refer to the previous section on
QEMU for a refresher on how to do this without cargo-generate .
$ cargo generate --git https://github.com/rust-embedded/cortex-m-quickstart
Project Name: app
Creating project called `app`...
Done! New project created /tmp/app
$ cd app
NOTE: As you may remember from the previous chapter, we have to install all targets
and this is a new one. So don't forget to run the installation process rustup target
add thumbv7em-none-eabihf for this target.
The second step is to enter the memory region information into the memory.x file.
$ cat memory.x
/* Linker script for the STM32F303VCT6 */
MEMORY
{
/* NOTE 1 K = 1 KiBi = 1024 bytes */
FLASH : ORIGIN = 0x08000000, LENGTH = 256K
RAM : ORIGIN = 0x20000000, LENGTH = 40K
}
NOTE: If you for some reason changed the memory.x file after you had made the first
build of a specific build target, then do cargo clean before cargo build , because
cargo build may not track updates of memory.x .
We'll start with the hello example again, but first we have to make a small change.
// exit QEMU
// NOTE do not run this on hardware; it can corrupt OpenOCD state
// debug::exit(debug::EXIT_SUCCESS);
loop {}
}
You can now cross compile programs using cargo build and inspect the binaries using
cargo-binutils as you did before. The cortex-m-rt crate handles all the magic required to
get your chip running, as helpfully, pretty much all Cortex-M CPUs boot in the same
fashion.
Debugging
Debugging will look a bit different. In fact, the first steps can look different depending on
the target device. In this section we'll show the steps required to debug a program running
on the STM32F3DISCOVERY. This is meant to serve as a reference; for device specific
information about debugging check out the Debugonomicon.
As before we'll do remote debugging and the client will be a GDB process. This time,
however, the server will be OpenOCD.
As done during the verify section connect the discovery board to your laptop / PC and
check that the ST-LINK header is populated.
On a terminal run openocd to connect to the ST-LINK on the discovery board. Run this
command from the root of the template; openocd will pick up the openocd.cfg file which
indicates which interface file and target file to use.
cat openocd.cfg
# Sample OpenOCD configuration for the STM32F3DISCOVERY development board
# Depending on the hardware revision you got you'll have to pick ONE of these
# interfaces. At any time only one interface should be commented out.
NOTE If you found out that you have an older revision of the discovery board during
the verify section then you should modify the openocd.cfg file at this point to use
interface/stlink-v2.cfg .
$ openocd
Open On-Chip Debugger 0.10.0
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "hla_swd". To override use
'transport select <transport>'.
adapter speed: 1000 kHz
adapter_nsrst_delay: 100
Info : The selected transport took over low-level target control. The results might
differ compared to plain JTAG/SWD
none separate
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : clock speed 950 kHz
Info : STLINK v2 JTAG v27 API v2 SWIM v15 VID 0x0483 PID 0x374B
Info : using stlink api v2
Info : Target voltage: 2.913879
Info : stm32f3x.cpu: hardware has 6 breakpoints, 4 watchpoints
On another terminal run GDB, also from the root of the template.
gdb-multiarch -q target/thumbv7em-none-eabihf/debug/examples/hello
NOTE: like before you might need another version of gdb instead of gdb-multiarch
depending on which one you installed in the installation chapter. This could also be arm-
none-eabi-gdb or just gdb .
Next connect GDB to OpenOCD, which is waiting for a TCP connection on port 3333.
(gdb) target remote :3333
Remote debugging using :3333
0x00000000 in ?? ()
Now proceed to flash (load) the program onto the microcontroller using the load
command.
(gdb) load
Loading section .vector_table, size 0x400 lma 0x8000000
Loading section .text, size 0x1518 lma 0x8000400
Loading section .rodata, size 0x414 lma 0x8001918
Start address 0x08000400, load size 7468
Transfer rate: 13 KB/sec, 2489 bytes/write.
The program is now loaded. This program uses semihosting so before we do any
semihosting call we have to tell OpenOCD to enable semihosting. You can send commands
to OpenOCD using the monitor command.
You can see all the OpenOCD commands by invoking the monitor help command.
Like before we can skip all the way to main using a breakpoint and the continue command.
(gdb) continue
Continuing.
NOTE If GDB blocks the terminal instead of hitting the breakpoint after you issue the
continue command above, you might want to double check that the memory region
information in the memory.x file is correctly set up for your device (both the starts and
lengths).
After advancing the program with next you should see "Hello, world!" printed on the
OpenOCD console, among other stuff.
$ openocd
(..)
Info : halted: PC: 0x08000502
Hello, world!
Info : halted: PC: 0x080004ac
Info : halted: PC: 0x080004ae
Info : halted: PC: 0x080004b0
Info : halted: PC: 0x080004b4
Info : halted: PC: 0x080004b8
Info : halted: PC: 0x080004bc
The message is only displayed once as the program is about to enter the infinite loop
defined in line 19: loop {}
(gdb) quit
A debugging session is active.
Quit anyway? (y or n)
Debugging now requires a few more steps so we have packed all those steps into a single
GDB script named openocd.gdb . The file was created during the cargo generate step, and
should work without any modifications. Let's have a peek:
cat openocd.gdb
target extended-remote :3333
load
Alternatively, you can turn <gdb> -x openocd.gdb into a custom runner to make cargo run
build a program and start a GDB session. This runner is included in .cargo/config.toml
but it's commented out.
[target.thumbv7m-none-eabi]
# uncomment this to make `cargo run` execute programs on QEMU
# runner = "qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic -
semihosting-config enable=on,target=native -kernel"
You may well find that the code you need to access the peripherals in your micro-controller
has already been written, at one of the following levels:
Micro-architecture Crate - This sort of crate handles any useful routines common to
the processor core your microcontroller is using, as well as any peripherals that are
common to all micro-controllers that use that particular type of processor core. For
example the cortex-m crate gives you functions to enable and disable interrupts,
which are the same for all Cortex-M based micro-controllers. It also gives you access
to the 'SysTick' peripheral included with all Cortex-M based micro-controllers.
Peripheral Access Crate (PAC) - This sort of crate is a thin wrapper over the various
memory-wrapper registers defined for your particular part-number of micro-
controller you are using. For example, tm4c123x for the Texas Instruments Tiva-C
TM4C123 series, or stm32f30x for the ST-Micro STM32F30x series. Here, you'll be
interacting with the registers directly, following each peripheral's operating
instructions given in your micro-controller's Technical Reference Manual.
HAL Crate - These crates offer a more user-friendly API for your particular processor,
often by implementing some common traits defined in embedded-hal. For example,
this crate might offer a Serial struct, with a constructor that takes an appropriate
set of GPIO pins and a baud rate, and offers some sort of write_byte function for
sending data. See the chapter on Portability for more information on embedded-hal.
Board Crate - These crates go one step further than a HAL Crate by pre-configuring
various peripherals and GPIO pins to suit the specific developer kit or board you are
using, such as stm32f3-discovery for the STM32F3DISCOVERY board.
Board Crate
A board crate is the perfect starting point, if you're new to embedded Rust. They nicely
abstract the HW details that might be overwhelming when starting studying this subject,
and makes standard tasks easy, like turning a LED on or off. The functionality it exposes
varies a lot between boards. Since this book aims at staying hardware agnostic, the board
crates won't be covered by this book.
But if you're working on a system that doesn't yet have dedicated board crate, or you need
functionality not provided by existing crates, read on as we start from the bottom, with the
micro-architecture crates.
Micro-architecture crate
Let's look at the SysTick peripheral that's common to all Cortex-M based micro-controllers.
We can find a pretty low-level API in the cortex-m crate, and we can use it like this:
#![no_std]
#![no_main]
use cortex_m::peripheral::{syst, Peripherals};
use cortex_m_rt::entry;
use panic_halt as _;
#[entry]
fn main() -> ! {
let peripherals = Peripherals::take().unwrap();
let mut systick = peripherals.SYST;
systick.set_clock_source(syst::SystClkSource::Core);
systick.set_reload(1_000);
systick.clear_current();
systick.enable_counter();
while !systick.has_wrapped() {
// Loop
}
loop {}
}
The functions on the SYST struct map pretty closely to the functionality defined by the
ARM Technical Reference Manual for this peripheral. There's nothing in this API about
'delaying for X milliseconds' - we have to crudely implement that ourselves using a while
loop. Note that we can't access our SYST struct until we have called Peripherals::take() -
this is a special routine that guarantees that there is only one SYST structure in our entire
program. For more on that, see the Peripherals section.
#![no_std]
#![no_main]
use cortex_m_rt::entry;
use tm4c123x;
#[entry]
pub fn init() -> (Delay, Leds) {
let cp = cortex_m::Peripherals::take().unwrap();
let p = tm4c123x::Peripherals::take().unwrap();
We've accessed the PWM0 peripheral in exactly the same way as we accessed the SYST
peripheral earlier, except we called tm4c123x::Peripherals::take() . As this crate was auto-
generated using svd2rust, the access functions for our register fields take a closure, rather
than a numeric argument. While this looks like a lot of code, the Rust compiler can use it to
perform a bunch of checks for us, but then generate machine-code which is pretty close to
hand-written assembler! Where the auto-generated code isn't able to determine that all
possible arguments to a particular accessor function are valid (for example, if the SVD
defines the register as 32-bit but doesn't say if some of those 32-bit values have a special
meaning), then the function is marked as unsafe . We can see this in the example above
when setting the load and compa sub-fields using the bits() function.
Reading
The read() function returns an object which gives read-only access to the various sub-
fields within this register, as defined by the manufacturer's SVD file for this chip. You can
find all the functions available on special R return type for this particular register, in this
particular peripheral, on this particular chip, in the tm4c123x documentation.
if pwm.ctl.read().globalsync0().is_set() {
// Do a thing
}
Writing
The write() function takes a closure with a single argument. Typically we call this w . This
argument then gives read-write access to the various sub-fields within this register, as
defined by the manufacturer's SVD file for this chip. Again, you can find all the functions
available on the 'w' for this particular register, in this particular peripheral, on this
particular chip, in the tm4c123x documentation. Note that all of the sub-fields that we do
not set will be set to a default value for us - any existing content in the register will be lost.
pwm.ctl.write(|w| w.globalsync0().clear_bit());
Modifying
If we wish to change only one particular sub-field in this register and leave the other sub-
fields unchanged, we can use the modify function. This function takes a closure with two
arguments - one for reading and one for writing. Typically we call these r and w
respectively. The r argument can be used to inspect the current contents of the register,
and the w argument can be used to modify the register contents.
pwm.ctl.modify(|r, w| w.globalsync0().clear_bit());
The modify function really shows the power of closures here. In C, we'd have to read into
some temporary value, modify the correct bits and then write the value back. This means
there's considerable scope for error:
uint32_t temp = pwm0.ctl.read();
temp |= PWM0_CTL_GLOBALSYNC0;
pwm0.ctl.write(temp);
uint32_t temp2 = pwm0.enable.read();
temp2 |= PWM0_ENABLE_PWM4EN;
pwm0.enable.write(temp); // Uh oh! Wrong variable!
use cortex_m_rt::entry;
use tm4c123x_hal as hal;
use tm4c123x_hal::prelude::*;
use tm4c123x_hal::serial::{NewlineMode, Serial};
use tm4c123x_hal::sysctl;
#[entry]
fn main() -> ! {
let p = hal::Peripherals::take().unwrap();
let cp = hal::CorePeripherals::take().unwrap();
loop {
writeln!(uart, "Hello, World!\r\n").unwrap();
}
}
Semihosting
Semihosting is a mechanism that lets embedded devices do I/O on the host and is mainly
used to log messages to the host console. Semihosting requires a debug session and
pretty much nothing else (no extra wires!) so it's super convenient to use. The downside is
that it's super slow: each write operation can take several milliseconds depending on the
hardware debugger (e.g. ST-Link) you use.
#![no_main]
#![no_std]
use panic_halt as _;
use cortex_m_rt::entry;
use cortex_m_semihosting::hprintln;
#[entry]
fn main() -> ! {
hprintln!("Hello, world!").unwrap();
loop {}
}
If you run this program on hardware you'll see the "Hello, world!" message within the
OpenOCD logs.
$ openocd
(..)
Hello, world!
(..)
QEMU understands semihosting operations so the above program will also work with
qemu-system-arm without having to start a debug session. Note that you'll need to pass the
-semihosting-config flag to QEMU to enable semihosting support; these flags are already
included in the .cargo/config.toml file of the template.
$ # this program will block the terminal
$ cargo run
Running `qemu-system-arm (..)
Hello, world!
There's also an exit semihosting operation that can be used to terminate the QEMU
process. Important: do not use debug::exit on hardware; this function can corrupt your
OpenOCD session and you will not be able to debug more programs until you restart it.
#![no_main]
#![no_std]
use panic_halt as _;
use cortex_m_rt::entry;
use cortex_m_semihosting::debug;
#[entry]
fn main() -> ! {
let roses = "blue";
if roses == "red" {
debug::exit(debug::EXIT_SUCCESS);
} else {
debug::exit(debug::EXIT_FAILURE);
}
loop {}
}
$ cargo run
Running `qemu-system-arm (..)
$ echo $?
1
One last tip: you can set the panicking behavior to exit(EXIT_FAILURE) . This will let you
write no_std run-pass tests that you can run on QEMU.
For convenience, the panic-semihosting crate has an "exit" feature that when enabled
invokes exit(EXIT_FAILURE) after logging the panic message to the host stderr.
#![no_main]
#![no_std]
use cortex_m_rt::entry;
use cortex_m_semihosting::debug;
#[entry]
fn main() -> ! {
let roses = "blue";
assert_eq!(roses, "red");
loop {}
}
$ cargo run
Running `qemu-system-arm (..)
panicked at 'assertion failed: `(left == right)`
left: `"blue"`,
right: `"red"`', examples/hello.rs:15:5
$ echo $?
1
where VERSION is the version desired. For more information on dependencies features
check the specifying dependencies section of the Cargo book.
Panicking
Panicking is a core part of the Rust language. Built-in operations like indexing are runtime
checked for memory safety. When out of bounds indexing is attempted this results in a
panic.
In the standard library panicking has a defined behavior: it unwinds the stack of the
panicking thread, unless the user opted for aborting the program on panics.
In programs without standard library, however, the panicking behavior is left undefined. A
behavior can be chosen by declaring a #[panic_handler] function. This function must
appear exactly once in the dependency graph of a program, and must have the following
signature: fn(&PanicInfo) -> ! , where PanicInfo is a struct containing information about
the location of the panic.
Given that embedded systems range from user facing to safety critical (cannot crash)
there's no one size fits all panicking behavior but there are plenty of commonly used
behaviors. These common behaviors have been packaged into crates that define the #
[panic_handler] function. Some examples include:
You may be able to find even more crates searching for the panic-handler keyword on
crates.io.
A program can pick one of these behaviors simply by linking to the corresponding crate.
The fact that the panicking behavior is expressed in the source of an application as a single
line of code is not only useful as documentation but can also be used to change the
panicking behavior according to the compilation profile. For example:
#![no_main]
#![no_std]
// ..
In this example the crate links to the panic-halt crate when built with the dev profile
( cargo build ), but links to the panic-abort crate when built with the release profile ( cargo
build --release ).
The use panic_abort as _; form of the use statement is used to ensure the
panic_abort panic handler is included in our final executable while making it clear to
the compiler that we won't explicitly use anything from the crate. Without the as _
rename, the compiler would warn that we have an unused import. Sometimes you
might see extern crate panic_abort instead, which is an older style used before the
2018 edition of Rust, and should now only be used for "sysroot" crates (those
distributed with Rust itself) such as proc_macro , alloc , std , and test .
An example
Here's an example that tries to index an array beyond its length. The operation results in a
panic.
#![no_main]
#![no_std]
use panic_semihosting as _;
use cortex_m_rt::entry;
#[entry]
fn main() -> ! {
let xs = [0, 1, 2];
let i = xs.len();
let _y = xs[i]; // out of bounds access
loop {}
}
This example chose the panic-semihosting behavior which prints the panic message to the
host console using semihosting.
$ cargo run
Running `qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb (..)
panicked at 'index out of bounds: the len is 3 but the index is 4',
src/main.rs:12:13
You can try changing the behavior to panic-halt and confirm that no message is printed
in that case.
Exceptions
Exceptions, and interrupts, are a hardware mechanism by which the processor handles
asynchronous events and fatal errors (e.g. executing an invalid instruction). Exceptions
imply preemption and involve exception handlers, subroutines executed in response to the
signal that triggered the event.
Other than the exception attribute exception handlers look like plain functions but there's
one more difference: exception handlers can not be called by software. Following the
previous example, the statement SysTick(); would result in a compilation error.
This behavior is pretty much intended and it's required to provide a feature: static mut
variables declared inside exception handlers are safe to use.
#[exception]
fn SysTick() {
static mut COUNT: u32 = 0;
// `COUNT` has transformed to type `&mut u32` and it's safe to use
*COUNT += 1;
}
As you may know, using static mut variables in a function makes it non-reentrant. It's
undefined behavior to call a non-reentrant function, directly or indirectly, from more than
one exception / interrupt handler or from main and one or more exception / interrupt
handlers.
Safe Rust must never result in undefined behavior so non-reentrant functions must be
marked as unsafe . Yet I just told that exception handlers can safely use static mut
variables. How is this possible? This is possible because exception handlers can not be
called by software thus reentrancy is not possible.
Note that the exception attribute transforms definitions of static variables inside the
function by wrapping them into unsafe blocks and providing us with new appropriate
variables of type &mut of the same name. Thus we can dereference the reference via
* to access the values of the variables without needing to wrap them in an unsafe
block.
A complete example
Here's an example that uses the system timer to raise a SysTick exception roughly every
second. The SysTick exception handler keeps track of how many times it has been called
in the COUNT variable and then prints the value of COUNT to the host console using
semihosting.
NOTE: You can run this example on any Cortex-M device; you can also run it on QEMU
#![deny(unsafe_code)]
#![no_main]
#![no_std]
use panic_halt as _;
use core::fmt::Write;
use cortex_m::peripheral::syst::SystClkSource;
use cortex_m_rt::{entry, exception};
use cortex_m_semihosting::{
debug,
hio::{self, HStdout},
};
#[entry]
fn main() -> ! {
let p = cortex_m::Peripherals::take().unwrap();
let mut syst = p.SYST;
loop {}
}
#[exception]
fn SysTick() {
static mut COUNT: u32 = 0;
static mut STDOUT: Option<HStdout> = None;
*COUNT += 1;
// Lazy initialization
if STDOUT.is_none() {
*STDOUT = hio::hstdout().ok();
}
If you run this on the Discovery board you'll see the output on the OpenOCD console. Also,
the program will not stop when the count reaches 9.
fn DefaultHandler() {
loop {}
}
This function is provided by the cortex-m-rt crate and marked as #[no_mangle] so you
can put a breakpoint on "DefaultHandler" and catch unhandled exceptions.
#[exception]
fn DefaultHandler(irqn: i16) {
// custom default handler
}
The irqn argument indicates which exception is being serviced. A negative value indicates
that a Cortex-M exception is being serviced; and zero or a positive value indicate that a
device specific exception, AKA interrupt, is being serviced.
NOTE: This program won't work, i.e. it won't crash, on QEMU because qemu-system-
arm -machine lm3s6965evb doesn't check memory loads and will happily return 0 on
reads to invalid memory.
#![no_main]
#![no_std]
use panic_halt as _;
use core::fmt::Write;
use core::ptr;
#[entry]
fn main() -> ! {
// read a nonexistent memory location
unsafe {
ptr::read_volatile(0x3FFF_FFFE as *const u32);
}
loop {}
}
#[exception]
fn HardFault(ef: &ExceptionFrame) -> ! {
if let Ok(mut hstdout) = hio::hstdout() {
writeln!(hstdout, "{:#?}", ef).ok();
}
loop {}
}
The HardFault handler prints the ExceptionFrame value. If you run this you'll see
something like this on the OpenOCD console.
$ openocd
(..)
ExceptionFrame {
r0: 0x3ffffffe,
r1: 0x00f00000,
r2: 0x20000000,
r3: 0x00000000,
r12: 0x00000000,
lr: 0x080008f7,
pc: 0x0800094a,
xpsr: 0x61000000
}
The pc value is the value of the Program Counter at the time of the exception and it points
to the instruction that triggered the exception.
You can lookup the value of the program counter 0x0800094a in the disassembly. You'll see
that a load operation ( ldr r0, [r0] ) caused the exception. The r0 field of
ExceptionFrame will tell you the value of register r0 was 0x3fff_fffe at that time.
Interrupts
Interrupts differ from exceptions in a variety of ways but their operation and use is largely
similar and they are also handled by the same interrupt controller. Whereas exceptions are
defined by the Cortex-M architecture, interrupts are always vendor (and often even chip)
specific implementations, both in naming and functionality.
Interrupts do allow for a lot of flexibility which needs to be accounted for when attempting
to use them in an advanced way. We will not cover those uses in this book, however it is a
good idea to keep the following in mind:
Interrupt handlers look like plain functions (except for the lack of arguments) similar to
exception handlers. However they can not be called directly by other parts of the firmware
due to the special calling conventions. It is however possible to generate interrupt requests
in software to trigger a diversion to the interrupt handler.
Similar to exception handlers it is also possible to declare static mut variables inside the
interrupt handlers for safe state keeping.
#[interrupt]
fn TIM2() {
static mut COUNT: u32 = 0;
For a more detailed description about the mechanisms demonstrated here please refer to
the exceptions section.
IO
These peripherals are useful because they allow a developer to offload processing to them,
avoiding having to handle everything in software. Similar to how a desktop developer
would offload graphics processing to a video card, embedded developers can offload some
tasks to peripherals allowing the CPU to spend its time doing something else important, or
doing nothing in order to save power.
If you look at the main circuit board in an old-fashioned home computer from the 1970s or
1980s (and actually, the desktop PCs of yesterday are not so far removed from the
embedded systems of today) you would expect to see:
A processor
A RAM chip
A ROM chip
An I/O controller
The RAM chip, ROM chip and I/O controller (the peripheral in this system) would be joined
to the processor through a series of parallel traces known as a 'bus'. This bus carries
address information, which selects which device on the bus the processor wishes to
communicate with, and a data bus which carries the actual data. In our embedded
microcontrollers, the same principles apply - it's just that everything is packed on to a
single piece of silicon.
However, unlike graphics cards, which typically have a Software API like Vulkan, Metal, or
OpenGL, peripherals are exposed to our Microcontroller with a hardware interface, which
is mapped to a chunk of the memory.
Although 32 bit microcontrollers have a real and linear address space from 0x0000_0000 ,
and 0xFFFF_FFFF , they generally only use a few hundred kilobytes of that range for actual
memory. This leaves a significant amount of address space remaining. In earlier chapters,
we were talking about RAM being located at address 0x2000_0000 . If our RAM was 64 KiB
long (i.e. with a maximum address of 0xFFFF) then addresses 0x2000_0000 to 0x2000_FFFF
would correspond to our RAM. When we write to a variable which lives at address
0x2000_1234 , what happens internally is that some logic detects the upper portion of the
address (0x2000 in this example) and then activates the RAM so that it can act upon the
lower portion of the address (0x1234 in this case). On a Cortex-M we also have our Flash
ROM mapped in at address 0x0000_0000 up to, say, address 0x0007_FFFF (if we have a 512
KiB Flash ROM). Rather than ignore all remaining space between these two regions,
Microcontroller designers instead mapped the interface for peripherals in certain memory
locations. This ends up looking something like this:
This interface is how interactions with the hardware are made, no matter what language is
used, whether that language is Assembly, C, or Rust.
A First Attempt
The Registers
Let's look at the 'SysTick' peripheral - a simple timer which comes with every Cortex-M
processor core. Typically you'll be looking these up in the chip manufacturer's data sheet
or Technical Reference Manual, but this example is common to all ARM Cortex-M cores, let's
look in the ARM reference manual. We see there are four registers:
The C Approach
In Rust, we can represent a collection of registers in exactly the same way as we do in C -
with a struct .
#[repr(C)]
struct SysTick {
pub csr: u32,
pub rvr: u32,
pub cvr: u32,
pub calib: u32,
}
The qualifier #[repr(C)] tells the Rust compiler to lay this structure out like a C compiler
would. That's very important, as Rust allows structure fields to be re-ordered, while C does
not. You can imagine the debugging we'd have to do if these fields were silently re-
arranged by the compiler! With this qualifier in place, we have our four 32-bit fields which
correspond to the table above. But of course, this struct is of no use by itself - we need a
variable.
Now, the problem is that compilers are clever. If you make two writes to the same piece of
RAM, one after the other, the compiler can notice this and just skip the first write entirely.
In C, we can mark variables as volatile to ensure that every read or write occurs as
intended. In Rust, we instead mark the accesses as volatile, not the variable.
So, we've fixed one of our four problems, but now we have even more unsafe code!
Fortunately, there's a third party crate which can help - volatile_register .
#[repr(C)]
struct SysTick {
pub csr: RW<u32>,
pub rvr: RW<u32>,
pub cvr: RW<u32>,
pub calib: RO<u32>,
}
Now, the volatile accesses are performed automatically through the read and write
methods. It's still unsafe to perform writes, but to be fair, hardware is a bunch of mutable
state and there's no way for the compiler to know whether these writes are actually safe,
so this is a good default position.
The Rusty Wrapper
We need to wrap this struct up into a higher-layer API that is safe for our users to call. As
the driver author, we manually verify the unsafe code is correct, and then present a safe
API for our users so they don't have to worry about it (provided they trust us to get it
right!).
#[repr(C)]
struct RegisterBlock {
pub csr: RW<u32>,
pub rvr: RW<u32>,
pub cvr: RW<u32>,
pub calib: RO<u32>,
}
impl SystemTimer {
pub fn new() -> SystemTimer {
SystemTimer {
p: unsafe { &mut *(0xE000_E010 as *mut RegisterBlock) }
}
}
Now, the problem with this approach is that the following code is perfectly acceptable to
the compiler:
fn thread1() {
let mut st = SystemTimer::new();
st.set_reload(2000);
}
fn thread2() {
let mut st = SystemTimer::new();
st.set_reload(1000);
}
Our &mut self argument to the set_reload function checks that there are no other
references to that particular SystemTimer struct, but they don't stop the user creating a
second SystemTimer which points to the exact same peripheral! Code written in this
fashion will work if the author is diligent enough to spot all of these 'duplicate' driver
instances, but once the code is spread out over multiple modules, drivers, developers, and
days, it gets easier and easier to make these kinds of mistakes.
Mutable Global State
Unfortunately, hardware is basically nothing but mutable global state, which can feel very
frightening for a Rust developer. Hardware exists independently from the structures of the
code we write, and can be modified at any time by the real world.
Well, we can, but for the Borrow Checker, we need to have exactly one instance of each
peripheral, so Rust can handle this correctly. Well, luckily in the hardware, there is only one
instance of any given peripheral, but how can we expose that in the structure of our code?
Singletons
fn main() {
let _ = unsafe {
THE_SERIAL_PORT.read_speed();
};
}
But this has a few problems. It is a mutable global variable, and in Rust, these are always
unsafe to interact with. These variables are also visible across your whole program, which
means the borrow checker is unable to help you track references and ownership of these
variables.
struct Peripherals {
serial: Option<SerialPort>,
}
impl Peripherals {
fn take_serial(&mut self) -> SerialPort {
let p = replace(&mut self.serial, None);
p.unwrap()
}
}
static mut PERIPHERALS: Peripherals = Peripherals {
serial: Some(SerialPort),
};
This structure allows us to obtain a single instance of our peripheral. If we try to call
take_serial() more than once, our code will panic!
fn main() {
let serial_1 = unsafe { PERIPHERALS.take_serial() };
// This panics!
// let serial_2 = unsafe { PERIPHERALS.take_serial() };
}
Although interacting with this structure is unsafe , once we have the SerialPort it
contained, we no longer need to use unsafe , or the PERIPHERALS structure at all.
This has a small runtime overhead because we must wrap the SerialPort structure in an
option, and we'll need to call take_serial() once, however this small up-front cost allows
us to leverage the borrow checker throughout the rest of our program.
use cortex_m::singleton;
fn main() {
// OK if `main` is executed only once
let x: &'static mut bool =
singleton!(: bool = false).unwrap();
}
cortex_m docs
Additionally, if you use cortex-m-rtic , the entire process of defining and obtaining these
peripherals are abstracted for you, and you are instead handed a Peripherals structure
that contains a non- Option<T> version of all of the items you define.
// cortex-m-rtic v0.5.x
#[rtic::app(device = lm3s6965, peripherals = true)]
const APP: () = {
#[init]
fn init(cx: init::Context) {
static mut X: u32 = 0;
// Cortex-M peripherals
let core: cortex_m::Peripherals = cx.core;
But why?
But how do these Singletons make a noticeable difference in how our Rust code works?
impl SerialPort {
const SER_PORT_SPEED_REG: *mut u32 = 0x4000_1000 as _;
fn read_speed(
&self // <------ This is really, really important
) -> u32 {
unsafe {
ptr::read_volatile(Self::SER_PORT_SPEED_REG)
}
}
}
Because we are using a singleton, there is only one way or place to obtain a
SerialPort structure
To call the read_speed() method, we must have ownership or a reference to a
SerialPort structure
These two factors put together means that it is only possible to access the hardware if we
have appropriately satisfied the borrow checker, meaning that at no point do we have
multiple mutable references to the same hardware!
fn main() {
// missing reference to `self`! Won't work.
// SerialPort::read_speed();
fn setup_spi_port(
spi: &mut SpiPort,
cs_pin: &mut GpioPin
) -> Result<()> {
// ...
}
This isn't:
This allows us to enforce whether code should or should not make changes to hardware at
compile time, rather than at runtime. As a note, this generally only works across one
application, but for bare metal systems, our software will be compiled into a single
application, so this is not usually a restriction.
Static Guarantees
Rust's type system prevents data races at compile time (see Send and Sync traits). The
type system can also be used to check other properties at compile time; reducing the need
for runtime checks in some cases.
When applied to embedded programs these static checks can be used, for example, to
enforce that configuration of I/O interfaces is done properly. For instance, one can design
an API where it is only possible to initialize a serial interface by first configuring the pins
that will be used by the interface.
One can also statically check that operations, like setting a pin low, can only be performed
on correctly configured peripherals. For example, trying to change the output state of a pin
configured in floating input mode would raise a compile error.
And, as seen in the previous chapter, the concept of ownership can be applied to
peripherals to ensure that only certain parts of a program can modify a peripheral. This
access control makes software easier to reason about compared to the alternative of
treating peripherals as global mutable state.
Typestate Programming
The concept of typestates describes the encoding of information about the current state of
an object into the type of that object. Although this can sound a little arcane, if you have
used the Builder Pattern in Rust, you have already started using Typestate Programming!
impl FooBuilder {
pub fn new(starter: u32) -> Self {
Self {
a: starter,
b: starter,
}
}
fn main() {
let x = foo_module::FooBuilder::new(10)
.double_a()
.into_foo();
println!("{:#?}", x);
}
In this example, there is no direct way to create a Foo object. We must create a
FooBuilder , and properly initialize it before we can obtain the Foo object we want.
Strong Types
Because Rust has a Strong Type System, there is no easy way to magically create an
instance of Foo , or to turn a FooBuilder into a Foo without calling the into_foo()
method. Additionally, calling the into_foo() method consumes the original FooBuilder
structure, meaning it can not be reused without the creation of a new instance.
This allows us to represent the states of our system as types, and to include the necessary
actions for state transitions into the methods that exchange one type for another. By
creating a FooBuilder , and exchanging it for a Foo object, we have walked through the
steps of a basic state machine.
Peripherals as State Machines
The peripherals of a microcontroller can be thought of as set of state machines. For
example, the configuration of a simplified GPIO pin could be represented as the following
tree of states:
Disabled
Enabled
Configured as Output
Output: High
Output: Low
Configured as Input
Input: High Resistance
Input: Pulled Low
Input: Pulled High
If the peripheral starts in the Disabled mode, to move to the Input: High Resistance
mode, we must perform the following steps:
1. Disabled
2. Enabled
3. Configured as Input
4. Input: High Resistance
If we wanted to move from Input: High Resistance to Input: Pulled Low , we must
perform the following steps:
Similarly, if we want to move a GPIO pin from configured as Input: Pulled Low to Output:
High , we must perform the following steps:
Hardware Representation
Typically the states listed above are set by writing values to given registers mapped to a
GPIO peripheral. Let's define an imaginary GPIO Configuration Register to illustrate this:
Bit
Name Value Meaning Notes
Number(s)
enable 0 0 disabled Disables the GPIO
1 enabled Enables the GPIO
Sets the direction to
direction 1 0 input
Input
Sets the direction to
1 output
Output
Sets the input as high
input_mode 2..3 00 hi-z
resistance
01 pull-low Input pin is pulled low
Input pin is pulled
10 pull-high
high
Invalid state. Do not
11 n/a
set
Output pin is driven
output_mode 4 0 set-low
low
Output pin is driven
1 set-high
high
0 if input is < 1.5v, 1 if
input_status 5 x in-val
input >= 1.5v
impl GpioConfig {
pub fn set_enable(&mut self, is_enabled: bool) {
self.periph.modify(|_r, w| {
w.enable().set_bit(is_enabled)
});
}
However, this would allow us to modify certain registers that do not make sense. For
example, what happens if we set the output_mode field when our GPIO is configured as an
input?
In general, use of this structure would allow us to reach states not defined by our state
machine above: e.g. an output that is pulled low, or an input that is set high. For some
hardware, this may not matter. On other hardware, it could cause unexpected or
undefined behavior!
Although this interface is convenient to write, it doesn't enforce the design contracts set
out by our hardware implementation.
Design Contracts
In our last chapter, we wrote an interface that didn't enforce design contracts. Let's take
another look at our imaginary GPIO configuration register:
Bit
Name Value Meaning Notes
Number(s)
enable 0 0 disabled Disables the GPIO
1 enabled Enables the GPIO
Sets the direction to
direction 1 0 input
Input
Sets the direction to
1 output
Output
Sets the input as high
input_mode 2..3 00 hi-z
resistance
01 pull-low Input pin is pulled low
Input pin is pulled
10 pull-high
high
Invalid state. Do not
11 n/a
set
Output pin is driven
output_mode 4 0 set-low
low
Output pin is driven
1 set-high
high
0 if input is < 1.5v, 1 if
input_status 5 x in-val
input >= 1.5v
If we instead checked the state before making use of the underlying hardware, enforcing
our design contracts at runtime, we might write code that looks like this instead:
/// GPIO interface
struct GpioConfig {
/// GPIO Configuration structure generated by svd2rust
periph: GPIO_CONFIG,
}
impl GpioConfig {
pub fn set_enable(&mut self, is_enabled: bool) {
self.periph.modify(|_r, w| {
w.enable().set_bit(is_enabled)
});
}
self.periph.modify(|r, w| {
w.direction().set_bit(is_output)
});
Ok(())
}
if self.periph.read().direction().bit_is_set() {
// Direction must be input
return Err(());
}
self.periph.modify(|_r, w| {
w.input_mode().variant(variant)
});
Ok(())
}
if self.periph.read().direction().bit_is_clear() {
// Direction must be output
return Err(());
}
self.periph.modify(|_r, w| {
w.output_mode.set_bit(is_high)
});
Ok(())
}
if self.periph.read().direction().bit_is_set() {
// Direction must be input
return Err(());
}
Ok(self.periph.read().input_status().bit_is_set())
}
}
Because we need to enforce the restrictions on the hardware, we end up doing a lot of
runtime checking which wastes time and resources, and this code will be much less
pleasant for the developer to use.
Type States
But what if instead, we used Rust's type system to enforce the state transition rules? Take
this example:
/// GPIO interface
struct GpioConfig<ENABLED, DIRECTION, MODE> {
/// GPIO Configuration structure generated by svd2rust
periph: GPIO_CONFIG,
enabled: ENABLED,
direction: DIRECTION,
mode: MODE,
}
Now let's see what the code using this would look like:
/*
* Example 1: Unconfigured to High-Z input
*/
let pin: GpioConfig<Disabled, _, _> = get_gpio();
/*
* Example 2: High-Z input to Pulled Low input
*/
let pulled_low = input_pin.into_input_pull_down();
let pin_state = pulled_low.bit_is_set();
/*
* Example 3: Pulled Low input to Output, set high
*/
let output_pin = pulled_low.into_enabled_output();
output_pin.set_bit(true);
This is definitely a convenient way to store the state of the pin, but why do it this way? Why
is this better than storing the state as an enum inside of our GpioConfig structure?
Also, because these states are enforced by the type system, there is no longer room for
errors by consumers of this interface. If they try to perform an illegal state transition, the
code will not compile!
Zero Cost Abstractions
Type states are also an excellent example of Zero Cost Abstractions - the ability to move
certain behaviors to compile time execution or analysis. These type states contain no
actual data, and are instead used as markers. Since they contain no data, they have no
actual representation in memory at runtime:
use core::mem::size_of;
let _ = size_of::<Enabled>(); // == 0
let _ = size_of::<Input>(); // == 0
let _ = size_of::<PulledHigh>(); // == 0
let _ = size_of::<GpioConfig<Enabled, Input, PulledHigh>>(); // == 0
struct Enabled;
Structures defined like this are called Zero Sized Types, as they contain no actual data.
Although these types act "real" at compile time - you can copy them, move them, take
references to them, etc., however the optimizer will completely strip them away.
The GpioConfig we return never exists at runtime. Calling this function will generally boil
down to a single assembly instruction - storing a constant register value to a register
location. This means that the type state interface we've developed is a zero cost
abstraction - it uses no more CPU, RAM, or code space tracking the state of GpioConfig ,
and renders to the same machine code as a direct register access.
Nesting
In general, these abstractions may be nested as deeply as you would like. As long as all
components used are zero sized types, the whole structure will not exist at runtime.
For complex or deeply nested structures, it may be tedious to define all possible
combinations of state. In these cases, macros may be used to generate all
implementations.
Portability
In embedded environments portability is a very important topic: Every vendor and even
each family from a single manufacturer offers different peripherals and capabilities and
similarly the ways to interact with the peripherals will vary.
A common way to equalize such differences is via a layer called Hardware Abstraction layer
or HAL.
Hardware abstractions are sets of routines in software that emulate some platform-
specific details, giving programs direct access to the hardware resources.
Embedded systems are a bit special in this regard since we typically do not have operating
systems and user installable software but firmware images which are compiled as a whole
as well as a number of other constraints. So while the traditional approach as defined by
Wikipedia could potentially work it is likely not the most productive approach to ensure
portability.
What is embedded-hal?
In a nutshell it is a set of traits which define implementation contracts between HAL
implementations, drivers and applications (or firmwares). Those contracts include both
capabilities (i.e. if a trait is implemented for a certain type, the HAL implementation
provides a certain capability) and methods (i.e. if you can construct a type implementing a
trait it is guaranteed that you have the methods specified in the trait available).
Microcontroller
The main reason for having the embedded-hal traits and crates implementing and using
them is to keep complexity in check. If you consider that an application might have to
implement the use of the peripheral in the hardware as well as the application and
potentially drivers for additional hardware components, then it should be easy to see that
the re-usability is very limited. Expressed mathematically, if M is the number of peripheral
HAL implementations and N the number of drivers then if we were to reinvent the wheel
for every application then we would end up with M*N implementations while by using the
API provided by the embedded-hal traits will make the implementation complexity
approach M+N. Of course there're additional benefits to be had, such as less trial-and-
error due to a well-defined and ready-to-use APIs.
A HAL implementation provides the interfacing between the hardware and the users of the
HAL traits. Typical implementations consist of three parts:
Driver
A driver has to be initialized with an instance of type that implements a certain trait of
the embedded-hal which is ensured via trait bound and provides its own type instance with
a custom set of methods allowing to interact with the driven device.
Application
The application binds the various parts together and ensures that the desired functionality
is achieved. When porting between different systems, this is the part which requires the
most adaptation efforts, since the application needs to correctly initialize the real
hardware via the HAL implementation and the initialisation of different hardware differs,
sometimes drastically so. Also the user choice often plays a big role, since components can
be physically connected to different terminals, hardware buses sometimes need external
hardware to match the configuration or there are different trade-offs to be made in the
use of internal peripherals (e.g. multiple timers with different capabilities are available or
peripherals conflict with others).
Concurrency
Concurrency happens whenever different parts of your program might execute at different
times or out of order. In an embedded context, this includes:
Since many embedded programs need to deal with interrupts, concurrency will usually
come up sooner or later, and it's also where many subtle and difficult bugs can occur.
Luckily, Rust provides a number of abstractions and safety guarantees to help us write
correct code.
No Concurrency
The simplest concurrency for an embedded program is no concurrency: your software
consists of a single main loop which just keeps running, and there are no interrupts at all.
Sometimes this is perfectly suited to the problem at hand! Typically your loop will read
some inputs, perform some processing, and write some outputs.
#[entry]
fn main() {
let peripherals = setup_peripherals();
loop {
let inputs = read_inputs(&peripherals);
let outputs = process(inputs);
write_outputs(&peripherals, outputs);
}
}
Since there's no concurrency, there's no need to worry about sharing data between parts
of your program or synchronising access to peripherals. If you can get away with such a
simple approach this can be a great solution.
In Rust, such static mut variables are always unsafe to read or write, because without
taking special care, you might trigger a race condition, where your access to the variable is
interrupted halfway through by an interrupt which also accesses that variable.
For an example of how this behaviour can cause subtle errors in your code, consider an
embedded program which counts rising edges of some input signal in each one-second
period (a frequency counter):
#[entry]
fn main() -> ! {
set_timer_1hz();
let mut last_state = false;
loop {
let state = read_signal_level();
if state && !last_state {
// DANGER - Not actually safe! Could cause data races.
unsafe { COUNTER += 1 };
}
last_state = state;
}
}
#[interrupt]
fn timer() {
unsafe { COUNTER = 0; }
}
Each second, the timer interrupt sets the counter back to 0. Meanwhile, the main loop
continually measures the signal, and incremements the counter when it sees a change
from low to high. We've had to use unsafe to access COUNTER , as it's static mut , and that
means we're promising the compiler we won't cause any undefined behaviour. Can you
spot the race condition? The increment on COUNTER is not guaranteed to be atomic — in
fact, on most embedded platforms, it will be split into a load, then the increment, then a
store. If the interrupt fired after the load but before the store, the reset back to 0 would be
ignored after the interrupt returns — and we would count twice as many transitions for
that period.
Critical Sections
So, what can we do about data races? A simple approach is to use critical sections, a context
where interrupts are disabled. By wrapping the access to COUNTER in main in a critical
section, we can be sure the timer interrupt will not fire until we're finished incrementing
COUNTER :
#[entry]
fn main() -> ! {
set_timer_1hz();
let mut last_state = false;
loop {
let state = read_signal_level();
if state && !last_state {
// New critical section ensures synchronised access to COUNTER
cortex_m::interrupt::free(|_| {
unsafe { COUNTER += 1 };
});
}
last_state = state;
}
}
#[interrupt]
fn timer() {
unsafe { COUNTER = 0; }
}
In this example, we use cortex_m::interrupt::free , but other platforms will have similar
mechanisms for executing code in a critical section. This is also the same as disabling
interrupts, running some code, and then re-enabling interrupts.
Note we didn't need to put a critical section inside the timer interrupt, for two reasons:
If COUNTER was being shared by multiple interrupt handlers that might preempt each other,
then each one might require a critical section as well.
This solves our immediate problem, but we're still left writing a lot of unsafe code which we
need to carefully reason about, and we might be using critical sections needlessly. Since
each critical section temporarily pauses interrupt processing, there is an associated cost of
some extra code size and higher interrupt latency and jitter (interrupts may take longer to
be processed, and the time until they are processed will be more variable). Whether this is
a problem depends on your system, but in general, we'd like to avoid it.
It's worth noting that while a critical section guarantees no interrupts will fire, it does not
provide an exclusivity guarantee on multi-core systems! The other core could be happily
accessing the same memory as your core, even without interrupts. You will need stronger
synchronisation primitives if you are using multiple cores.
Atomic Access
On some platforms, special atomic instructions are available, which provide guarantees
about read-modify-write operations. Specifically for Cortex-M: thumbv6 (Cortex-M0, Cortex-
M0+) only provide atomic load and store instructions, while thumbv7 (Cortex-M3 and
above) provide full Compare and Swap (CAS) instructions. These CAS instructions give an
alternative to the heavy-handed disabling of all interrupts: we can attempt the increment, it
will succeed most of the time, but if it was interrupted it will automatically retry the entire
increment operation. These atomic operations are safe even across multiple cores.
#[entry]
fn main() -> ! {
set_timer_1hz();
let mut last_state = false;
loop {
let state = read_signal_level();
if state && !last_state {
// Use `fetch_add` to atomically add 1 to COUNTER
COUNTER.fetch_add(1, Ordering::Relaxed);
}
last_state = state;
}
}
#[interrupt]
fn timer() {
// Use `store` to write 0 directly to COUNTER
COUNTER.store(0, Ordering::Relaxed)
}
This time COUNTER is a safe static variable. Thanks to the AtomicUsize type COUNTER can
be safely modified from both the interrupt handler and the main thread without disabling
interrupts. When possible, this is a better solution — but it may not be supported on your
platform.
A note on Ordering : this affects how the compiler and hardware may reorder instructions,
and also has consequences on cache visibility. Assuming that the target is a single core
platform Relaxed is sufficient and the most efficient choice in this particular case. Stricter
ordering will cause the compiler to emit memory barriers around the atomic operations;
depending on what you're using atomics for you may or may not need this! The precise
details of the atomic model are complicated and best described elsewhere.
We can abstract our counter into a safe interface which can be safely used anywhere else
in our code. For this example, we'll use the critical-section counter, but you could do
something very similar with atomics.
use core::cell::UnsafeCell;
use cortex_m::interrupt;
impl CSCounter {
pub fn reset(&self, _cs: &interrupt::CriticalSection) {
// By requiring a CriticalSection be passed in, we know we must
// be operating inside a CriticalSection, and so can confidently
// use this unsafe block (required to call UnsafeCell::get).
unsafe { *self.0.get() = 0 };
}
#[entry]
fn main() -> ! {
set_timer_1hz();
let mut last_state = false;
loop {
let state = read_signal_level();
if state && !last_state {
// No unsafe here!
interrupt::free(|cs| COUNTER.increment(cs));
}
last_state = state;
}
}
#[interrupt]
fn timer() {
// We do need to enter a critical section here just to obtain a valid
// cs token, even though we know no other interrupt could pre-empt
// this one.
interrupt::free(|cs| COUNTER.reset(cs));
This design requires that the application pass a CriticalSection token in: these tokens are
only safely generated by interrupt::free , so by requiring one be passed in, we ensure we
are operating inside a critical section, without having to actually do the lock ourselves. This
guarantee is provided statically by the compiler: there won't be any runtime overhead
associated with cs . If we had multiple counters, they could all be given the same cs ,
without requiring multiple nested critical sections.
This also brings up an important topic for concurrency in Rust: the Send and Sync traits.
To summarise the Rust book, a type is Send when it can safely be moved to another thread,
while it is Sync when it can be safely shared between multiple threads. In an embedded
context, we consider interrupts to be executing in a separate thread to the application
code, so variables accessed by both an interrupt and the main code must be Sync.
For most types in Rust, both of these traits are automatically derived for you by the
compiler. However, because CSCounter contains an UnsafeCell , it is not Sync, and
therefore we could not make a static CSCounter : static variables must be Sync, since
they can be accessed by multiple threads.
To tell the compiler we have taken care that the CSCounter is in fact safe to share between
threads, we implement the Sync trait explicitly. As with the previous use of critical sections,
this is only safe on single-core platforms: with multiple cores, you would need to go to
greater lengths to ensure safety.
Mutexes
We've created a useful abstraction specific to our counter problem, but there are many
common abstractions used for concurrency.
One such synchronisation primitive is a mutex, short for mutual exclusion. These constructs
ensure exclusive access to a variable, such as our counter. A thread can attempt to lock (or
acquire) the mutex, and either succeeds immediately, or blocks waiting for the lock to be
acquired, or returns an error that the mutex could not be locked. While that thread holds
the lock, it is granted access to the protected data. When the thread is done, it unlocks (or
releases) the mutex, allowing another thread to lock it. In Rust, we would usually implement
the unlock using the Drop trait to ensure it is always released when the mutex goes out of
scope.
Using a mutex with interrupt handlers can be tricky: it is not normally acceptable for the
interrupt handler to block, and it would be especially disastrous for it to block waiting for
the main thread to release a lock, since we would then deadlock (the main thread will never
release the lock because execution stays in the interrupt handler). Deadlocking is not
considered unsafe: it is possible even in safe Rust.
To avoid this behaviour entirely, we could implement a mutex which requires a critical
section to lock, just like our counter example. So long as the critical section must last as
long as the lock, we can be sure we have exclusive access to the wrapped variable without
even needing to track the lock/unlock state of the mutex.
This is in fact done for us in the cortex_m crate! We could have written our counter using it:
use core::cell::Cell;
use cortex_m::interrupt::Mutex;
#[entry]
fn main() -> ! {
set_timer_1hz();
let mut last_state = false;
loop {
let state = read_signal_level();
if state && !last_state {
interrupt::free(|cs|
COUNTER.borrow(cs).set(COUNTER.borrow(cs).get() + 1));
}
last_state = state;
}
}
#[interrupt]
fn timer() {
// We still need to enter a critical section here to satisfy the Mutex.
interrupt::free(|cs| COUNTER.borrow(cs).set(0));
}
We're now using Cell , which along with its sibling RefCell is used to provide safe interior
mutability. We've already seen UnsafeCell which is the bottom layer of interior mutability
in Rust: it allows you to obtain multiple mutable references to its value, but only with
unsafe code. A Cell is like an UnsafeCell but it provides a safe interface: it only permits
taking a copy of the current value or replacing it, not taking a reference, and since it is not
Sync, it cannot be shared between threads. These constraints mean it's safe to use, but we
couldn't use it directly in a static variable as a static must be Sync.
So why does the example above work? The Mutex<T> implements Sync for any T which is
Send — such as a Cell . It can do this safely because it only gives access to its contents
during a critical section. We're therefore able to get a safe counter with no unsafe code at
all!
This is great for simple types like the u32 of our counter, but what about more complex
types which are not Copy? An extremely common example in an embedded context is a
peripheral struct, which generally is not Copy. For that, we can turn to RefCell .
Sharing Peripherals
Device crates generated using svd2rust and similar abstractions provide safe access to
peripherals by enforcing that only one instance of the peripheral struct can exist at a time.
This ensures safety, but makes it difficult to access a peripheral from both the main thread
and an interrupt handler.
To safely share peripheral access, we can use the Mutex we saw before. We'll also need to
use RefCell , which uses a runtime check to ensure only one reference to a peripheral is
given out at a time. This has more overhead than the plain Cell , but since we are giving
out references rather than copies, we must be sure only one exists at a time.
Finally, we'll also have to account for somehow moving the peripheral into the shared
variable after it has been initialised in the main code. To do this we can use the Option
type, initialised to None and later set to the instance of the peripheral.
use core::cell::RefCell;
use cortex_m::interrupt::{self, Mutex};
use stm32f4::stm32f405;
#[entry]
fn main() -> ! {
// Obtain the peripheral singletons and configure it.
// This example is from an svd2rust-generated crate, but
// most embedded device crates will be similar.
let dp = stm32f405::Peripherals::take().unwrap();
let gpioa = &dp.GPIOA;
#[interrupt]
fn timer() {
// This time in the interrupt we'll just clear PA0.
interrupt::free(|cs| {
// We can use `unwrap()` because we know the interrupt wasn't enabled
// until after MY_GPIO was set; otherwise we should handle the potential
// for a None value.
let gpioa = MY_GPIO.borrow(cs).borrow();
gpioa.as_ref().unwrap().odr.modify(|_, w| w.odr1().clear_bit());
});
}
That's quite a lot to take in, so let's break down the important lines.
Our shared variable is now a Mutex around a RefCell which contains an Option . The
Mutex ensures we only have access during a critical section, and therefore makes the
variable Sync, even though a plain RefCell would not be Sync. The RefCell gives us
interior mutability with references, which we'll need to use our GPIOA . The Option lets us
initialise this variable to something empty, and only later actually move the variable in. We
cannot access the peripheral singleton statically, only at runtime, so this is required.
interrupt::free(|cs| MY_GPIO.borrow(cs).replace(Some(dp.GPIOA)));
Inside a critical section we can call borrow() on the mutex, which gives us a reference to
the RefCell . We then call replace() to move our new value into the RefCell .
interrupt::free(|cs| {
let gpioa = MY_GPIO.borrow(cs).borrow();
gpioa.as_ref().unwrap().odr.modify(|_, w| w.odr1().set_bit());
});
Finally, we use MY_GPIO in a safe and concurrent fashion. The critical section prevents the
interrupt firing as usual, and lets us borrow the mutex. The RefCell then gives us an
&Option<GPIOA> , and tracks how long it remains borrowed - once that reference goes out
of scope, the RefCell will be updated to indicate it is no longer borrowed.
Since we can't move the GPIOA out of the &Option , we need to convert it to an
&Option<&GPIOA> with as_ref() , which we can finally unwrap() to obtain the &GPIOA which
lets us modify the peripheral.
#[entry]
fn main() -> ! {
let mut cp = cm::Peripherals::take().unwrap();
let dp = stm32f405::Peripherals::take().unwrap();
interrupt::free(|cs| {
G_TIM.borrow(cs).replace(Some(tim));
});
loop {
wfi();
}
}
#[interrupt]
fn timer() {
interrupt::free(|cs| {
if let Some(ref mut tim)) = G_TIM.borrow(cs).borrow_mut().deref_mut() {
tim.start(1.hz());
}
});
}
Whew! This is safe, but it is also a little unwieldy. Is there anything else we can do?
RTIC
One alternative is the RTIC framework, short for Real Time Interrupt-driven Concurrency. It
enforces static priorities and tracks accesses to static mut variables ("resources") to
statically ensure that shared resources are always accessed safely, without requiring the
overhead of always entering critical sections and using reference counting (as in RefCell ).
This has a number of advantages such as guaranteeing no deadlocks and giving extremely
low time and memory overhead.
The framework also includes other features like message passing, which reduces the need
for explicit shared state, and the ability to schedule tasks to run at a given time, which can
be used to implement periodic tasks. Check out the documentation for more information!
Real Time Operating Systems
Another common model for embedded concurrency is the real-time operating system
(RTOS). While currently less well explored in Rust, they are widely used in traditional
embedded development. Open source examples include FreeRTOS and ChibiOS. These
RTOSs provide support for running multiple application threads which the CPU swaps
between, either when the threads yield control (called cooperative multitasking) or based
on a regular timer or interrupts (preemptive multitasking). The RTOS typically provide
mutexes and other synchronisation primitives, and often interoperate with hardware
features such as DMA engines.
At the time of writing, there are not many Rust RTOS examples to point to, but it's an
interesting area so watch this space!
Multiple Cores
It is becoming more common to have two or more cores in embedded processors, which
adds an extra layer of complexity to concurrency. All the examples using a critical section
(including the cortex_m::interrupt::Mutex ) assume the only other execution thread is the
interrupt thread, but on a multi-core system that's no longer true. Instead, we'll need
synchronisation primitives designed for multiple cores (also called SMP, for symmetric
multi-processing).
These typically use the atomic instructions we saw earlier, since the processing system will
ensure that atomicity is maintained over all cores.
Covering these topics in detail is currently beyond the scope of this book, but the general
patterns are the same as for the single-core case.
Collections
Eventually you'll want to use dynamic data structures (AKA collections) in your program.
std provides a set of common collections: Vec , String , HashMap , etc. All the collections
implemented in std use a global dynamic memory allocator (AKA the heap).
As core is, by definition, free of memory allocations these implementations are not
available there, but they can be found in the alloc crate that's shipped with the compiler.
If you need collections, a heap allocated implementation is not your only option. You can
also use fixed capacity collections; one such implementation can be found in the heapless
crate.
Using alloc
The alloc crate is shipped with the standard Rust distribution. To import the crate you
can directly use it without declaring it as a dependency in your Cargo.toml file.
#![feature(alloc)]
use alloc::vec::Vec;
To be able to use any collection you'll first need use the global_allocator attribute to
declare the global allocator your program will use. It's required that the allocator you
select implements the GlobalAlloc trait.
For completeness and to keep this section as self-contained as possible we'll implement a
simple bump pointer allocator and use that as the global allocator. However, we strongly
suggest you use a battle tested allocator from crates.io in your program instead of this
allocator.
// Bump pointer allocator implementation
use cortex_m::interrupt;
Apart from selecting a global allocator the user will also have to define how Out Of Memory
(OOM) errors are handled using the unstable alloc_error_handler attribute.
#![feature(alloc_error_handler)]
use cortex_m::asm;
#[alloc_error_handler]
fn on_oom(_layout: Layout) -> ! {
asm::bkpt();
loop {}
}
Once all that is in place, the user can finally use the collections in alloc .
#[entry]
fn main() -> ! {
let mut xs = Vec::new();
xs.push(42);
assert!(xs.pop(), Some(42));
loop {
// ..
}
}
If you have used the collections in the std crate then these will be familiar as they are
exact same implementation.
Using heapless
heapless requires no setup as its collections don't depend on a global memory allocator.
Just use its collections and proceed to instantiate them:
#[entry]
fn main() -> ! {
let mut xs: Vec<_, U8> = Vec::new();
xs.push(42).unwrap();
assert_eq!(xs.pop(), Some(42));
loop {}
}
You'll note two differences between these collections and the ones in alloc .
First, you have to declare upfront the capacity of the collection. heapless collections never
reallocate and have fixed capacities; this capacity is part of the type signature of the
collection. In this case we have declared that xs has a capacity of 8 elements that is the
vector can, at most, hold 8 elements. This is indicated by the U8 (see typenum ) in the type
signature.
Second, the push method, and many other methods, return a Result . Since the heapless
collections have fixed capacity all operations that insert elements into the collection can
potentially fail. The API reflects this problem by returning a Result indicating whether the
operation succeeded or not. In contrast, alloc collections will reallocate themselves on
the heap to increase their capacity.
As of version v0.4.x all heapless collections store all their elements inline. This means that
an operation like let x = heapless::Vec::new(); will allocate the collection on the stack,
but it's also possible to allocate the collection on a static variable, or even on the heap
( Box<Vec<_, _>> ).
Trade-offs
Keep these in mind when choosing between heap allocated, relocatable collections and
fixed capacity collections.
With heap allocations Out Of Memory is always a possibility and can occur in any place
where a collection may need to grow: for example, all alloc::Vec.push invocations can
potentially generate an OOM condition. Thus some operations can implicitly fail. Some
alloc collections expose try_reserve methods that let you check for potential OOM
conditions when growing the collection but you need be proactive about using them.
If you exclusively use heapless collections and you don't use a memory allocator for
anything else then an OOM condition is impossible. Instead, you'll have to deal with
collections running out of capacity on a case by case basis. That is you'll have deal with all
the Result s returned by methods like Vec.push .
OOM failures can be harder to debug than say unwrap -ing on all Result s returned by
heapless::Vec.push because the observed location of failure may not match with the
location of the cause of the problem. For example, even vec.reserve(1) can trigger an
OOM if the allocator is nearly exhausted because some other collection was leaking
memory (memory leaks are possible in safe Rust).
Memory usage
Reasoning about memory usage of heap allocated collections is hard because the capacity
of long lived collections can change at runtime. Some operations may implicitly reallocate
the collection increasing its memory usage, and some collections expose methods like
shrink_to_fit that can potentially reduce the memory used by the collection -- ultimately,
it's up to the allocator to decide whether to actually shrink the memory allocation or not.
Additionally, the allocator may have to deal with memory fragmentation which can increase
the apparent memory usage.
On the other hand if you exclusively use fixed capacity collections, store most of them in
static variables and set a maximum size for the call stack then the linker will detect if you
try to use more memory than what's physically available.
Furthermore, fixed capacity collections allocated on the stack will be reported by -Z emit-
stack-sizes flag which means that tools that analyze stack usage (like stack-sizes ) will
include them in their analysis.
However, fixed capacity collections can not be shrunk which can result in lower load factors
(the ratio between the size of the collection and its capacity) than what relocatable
collections can achieve.
If you are building time sensitive applications or hard real time applications then you care,
maybe a lot, about the worst case execution time of the different parts of your program.
The alloc collections can reallocate so the WCET of operations that may grow the
collection will also include the time it takes to reallocate the collection, which itself depends
on the runtime capacity of the collection. This makes it hard to determine the WCET of, for
example, the alloc::Vec.push operation as it depends on both the allocator being used
and its runtime capacity.
On the other hand fixed capacity collections never reallocate so all operations have a
predictable execution time. For example, heapless::Vec.push executes in constant time.
Ease of use
alloc requires setting up a global allocator whereas heapless does not. However,
heapless requires you to pick the capacity of each collection that you instantiate.
The alloc API will be familiar to virtually every Rust developer. The heapless API tries to
closely mimic the alloc API but it will never be exactly the same due to its explicit error
handling -- some developers may feel the explicit error handling is excessive or too
cumbersome.
Design Patterns
This chapter aims to collect various useful design patterns for embedded Rust.
HAL Design Patterns
This is a set of common and recommended patterns for writing hardware abstraction
layers (HALs) for microcontrollers in Rust. These patterns are intended to be used in
addition to the existing Rust API Guidelines when writing HALs for microcontrollers.
Checklist
Naming
Interoperability
Predictability
GPIO
HAL Design Patterns Checklist
Naming (crate aligns with Rust naming conventions)
The crate is named appropriately (C-CRATE-NAME)
Interoperability (crate interacts nicely with other library functionality)
Wrapper types provide a destructor method (C-FREE)
HALs reexport their register access crate (C-REEXPORT-PAC)
Types implement the embedded-hal traits (C-HAL-TRAITS)
Predictability (crate enables legible code that acts how it looks)
Constructors are used instead of extension traits (C-CTOR)
GPIO Interfaces (GPIO Interfaces follow a common pattern)
Pin types are zero-sized by default (C-ZST-PIN)
Pin types provide methods to erase pin and port (C-ERASED-PIN)
Pin state should be encoded as type parameters (C-PIN-STATE)
Naming
The method should shut down and reset the peripheral if necessary. Calling new with the
raw peripheral returned by free should not fail due to an unexpected state of the
peripheral.
If the HAL type requires other non- Copy objects to be constructed (for example I/O pins),
any such object should be released and returned by free as well. free should return a
tuple in that case.
For example:
impl Timer {
pub fn new(periph: TIMER0) -> Self {
Self(periph)
}
A PAC should be reexported under the name pac , regardless of the actual name of the
crate, as the name of the HAL should already make it clear what PAC is being accessed.
Types implement the embedded-hal traits (C-HAL-TRAITS)
Types provided by the HAL should implement all applicable traits provided by the
embedded-hal crate.
Each GPIO Interface or Port should implement a split method returning a struct with
every pin.
Example:
impl PortA {
pub fn split(self) -> PortAPins {
PortAPins {
pa0: PA0,
pa1: PA1,
// ...
}
}
}
Example:
/// Port A, pin 0.
pub struct PA0;
impl PA0 {
pub fn erase_pin(self) -> PA {
PA { pin: 0 }
}
}
impl PA {
pub fn erase_port(self) -> Pin {
Pin {
port: Port::A,
pin: self.pin,
}
}
}
enum Port {
A,
B,
C,
D,
}
Additional, chip-specific state (eg. drive strength) may also be encoded in this way, using
additional type parameters.
Methods for changing the pin state should be provided as into_input and into_output
methods.
Additionally, with_{input,output}_state methods should be provided that temporarily
reconfigure a pin in a different state without moving it.
The following methods should be provided for every pin type (that is, both erased and non-
erased pin types should provide the same API):
Pin state should be bounded by sealed traits. Users of the HAL should have no need to add
their own state. The traits can provide HAL-specific methods required to implement the pin
state API.
Example:
mod sealed {
pub trait Sealed {}
}
Preprocessor
In embedded C it is very common to use the preprocessor for a variety of purposes, such
as:
In Rust there is no preprocessor, and so many of these use cases are addressed
differently. In the rest of this section we cover various alternatives to using the
preprocessor.
The closest match to #ifdef ... #endif in Rust are Cargo features. These are a little more
formal than the C preprocessor: all possible features are explicitly listed per crate, and can
only be either on or off. Features are turned on when you list a crate as a dependency, and
are additive: if any crate in your dependency tree enables a feature for another crate, that
feature will be enabled for all users of that crate.
For example, you might have a crate which provides a library of signal processing
primitives. Each one might take some extra time to compile or declare some large table of
constants which you'd like to avoid. You could declare a Cargo feature for each component
in your Cargo.toml :
[features]
FIR = []
IIR = []
#[cfg(feature="FIR")]
pub mod fir;
#[cfg(feature="IIR")]
pub mod iir;
You can similarly include code blocks only if a feature is not enabled, or if any combination
of features are or are not enabled.
Additionally, Rust provides a number of automatically-set conditions you can use, such as
target_arch to select different code based on architecture. For full details of the
conditional compilation support, refer to the conditional compilation chapter of the Rust
reference.
The conditional compilation will only apply to the next statement or block. If a block can not
be used in the current scope then the cfg attribute will need to be used multiple times. It's
worth noting that most of the time it is better to simply include all the code and allow the
compiler to remove dead code when optimising: it's simpler for you and your users, and in
general the compiler will do a good job of removing unused code.
These are new to stable Rust as of 1.31, so documentation is still sparse. The functionality
available to const fn is also very limited at the time of writing; in future Rust releases it is
expected to expand on what is permitted in a const fn .
Macros
Rust provides an extremely powerful macro system. While the C preprocessor operates
almost directly on the text of your source code, the Rust macro system operates at a
higher level. There are two varieties of Rust macro: macros by example and procedural
macros. The former are simpler and most common; they look like function calls and can
expand to a complete expression, statement, item, or pattern. Procedural macros are
more complex but permit extremely powerful additions to the Rust language: they can
transform arbitrary Rust syntax into new Rust syntax.
In general, where you might have used a C preprocessor macro, you probably want to see
if a macro-by-example can do the job instead. They can be defined in your crate and easily
used by your own crate or exported for other users. Be aware that since they must expand
to complete expressions, statements, items, or patterns, some use cases of C
preprocessor macros will not work, for example a macro that expands to part of a variable
name or an incomplete set of items in a list.
As with Cargo features, it is worth considering if you even need the macro. In many cases a
regular function is easier to understand and will be inlined to the same code as a macro.
The #[inline] and #[inline(always)] attributes give you further control over this
process, although care should be taken here as well — the compiler will automatically
inline functions from the same crate where appropriate, so forcing it to do so
inappropriately might actually lead to decreased performance.
Explaining the entire Rust macro system is out of scope for this tips page, so you are
encouraged to consult the Rust documentation for full details.
Build System
Most Rust crates are built using Cargo (although it is not required). This takes care of many
difficult problems with traditional build systems. However, you may wish to customise the
build process. Cargo provides build.rs scripts for this purpose. They are Rust scripts
which can interact with the Cargo build system as required.
provide build-time information, for example statically embedding the build date or Git
commit hash into your executable
generate linker scripts at build time depending on selected features or other logic
change the Cargo build configuration
add extra static libraries to link against
At present there is no support for post-build scripts, which you might traditionally have
used for tasks like automatic generation of binaries from the build objects or printing build
information.
Cross-Compiling
Using Cargo for your build system also simplifies cross-compiling. In most cases it suffices
to tell Cargo --target thumbv6m-none-eabi and find a suitable executable in
target/thumbv6m-none-eabi/debug/myapp .
For platforms not natively supported by Rust, you will need to build libcore for that target
yourself. On such platforms, Xargo can be used as a stand-in for Cargo which automatically
builds libcore for you.
int16_t arr[16];
int i;
for(i=0; i<sizeof(arr)/sizeof(arr[0]); i++) {
process(arr[i]);
}
In Rust this is an anti-pattern: indexed access can be slower (as it needs to be bounds
checked) and may prevent various compiler optimisations. This is an important distinction
and worth repeating: Rust will check for out-of-bounds access on manual array indexing to
guarantee memory safety, while C will happily index outside the array.
Iterators provide a powerful array of functionality you would have to implement manually
in C, such as chaining, zipping, enumerating, finding the min or max, summing, and more.
Iterator methods can also be chained, giving very readable data processing code.
See the Iterators in the Book and Iterator documentation for more details.
References vs Pointers
In Rust, pointers (called raw pointers) exist but are only used in specific circumstances, as
dereferencing them is always considered unsafe -- Rust cannot provide its usual
guarantees about what might be behind the pointer.
In most cases, we instead use references, indicated by the & symbol, or mutable references,
indicated by &mut . References behave similarly to pointers, in that they can be
dereferenced to access the underlying values, but they are a key part of Rust's ownership
system: Rust will strictly enforce that you may only have one mutable reference or multiple
non-mutable references to the same value at any given time.
In practice this means you have to be more careful about whether you need mutable
access to data: where in C the default is mutable and you must be explicit about const , in
Rust the opposite is true.
One situation where you might still use raw pointers is interacting directly with hardware
(for example, writing a pointer to a buffer into a DMA peripheral register), and they are also
used under the hood for all peripheral access crates to allow you to read and write
memory-mapped registers.
Volatile Access
In C, individual variables may be marked volatile , indicating to the compiler that the value
in the variable may change between accesses. Volatile variables are commonly used in an
embedded context for memory-mapped registers.
void ISR() {
// Signal that the interrupt has occurred
signalled = true;
}
void driver() {
while(true) {
// Sleep until signalled
while(!signalled) { WFI(); }
// Reset signalled indicator
signalled = false;
// Perform some task that was waiting for the interrupt
run_task();
}
}
#[interrupt]
fn ISR() {
// Signal that the interrupt has occurred
// (In real code, you should consider a higher level primitive,
// such as an atomic type).
unsafe { core::ptr::write_volatile(&mut SIGNALLED, true) };
}
fn driver() {
loop {
// Sleep until signalled
while unsafe { !core::ptr::read_volatile(&SIGNALLED) } {}
// Reset signalled indicator
unsafe { core::ptr::write_volatile(&mut SIGNALLED, false) };
// Perform some task that was waiting for the interrupt
run_task();
}
}
We can pass &mut SIGNALLED into the function requiring *mut T , since &mut T
automatically converts to a *mut T (and the same for *const T )
We need unsafe blocks for the read_volatile / write_volatile methods, since they
are unsafe functions. It is the programmer's responsibility to ensure safe use: see
the methods' documentation for further details.
It is rare to require these functions directly in your code, as they will usually be taken care
of for you by higher-level libraries. For memory mapped peripherals, the peripheral access
crates will implement volatile access automatically, while for concurrency primitives there
are better abstractions available (see the Concurrency chapter).
In Rust this is controlled by the repr attribute on a struct or union. The default
representation provides no guarantees of layout, so should not be used for code that
interoperates with hardware or C. The compiler may re-order struct members or insert
padding and the behaviour may change with future versions of Rust.
struct Foo {
x: u16,
y: u8,
z: u16,
}
fn main() {
let v = Foo { x: 0, y: 0, z: 0 };
println!("{:p} {:p} {:p}", &v.x, &v.y, &v.z);
}
#[repr(C)]
struct Foo {
x: u16,
y: u8,
z: u16,
}
fn main() {
let v = Foo { x: 0, y: 0, z: 0 };
println!("{:p} {:p} {:p}", &v.x, &v.y, &v.z);
}
#[repr(packed)]
struct Foo {
x: u16,
y: u8,
z: u16,
}
fn main() {
let v = Foo { x: 0, y: 0, z: 0 };
// References must always be aligned, so to check the addresses of the
// struct's fields, we use `std::ptr::addr_of!()` to get a raw pointer
// instead of just printing `&v.x`.
let px = std::ptr::addr_of!(v.x);
let py = std::ptr::addr_of!(v.y);
let pz = std::ptr::addr_of!(v.z);
println!("{:p} {:p} {:p}", px, py, pz);
}
Finally, to specify a specific alignment, use repr(align(n)) , where n is the number of bytes
to align to (and must be a power of two):
#[repr(C)]
#[repr(align(4096))]
struct Foo {
x: u16,
y: u8,
z: u16,
}
fn main() {
let v = Foo { x: 0, y: 0, z: 0 };
let u = Foo { x: 0, y: 0, z: 0 };
println!("{:p} {:p} {:p}", &v.x, &v.y, &v.z);
println!("{:p} {:p} {:p}", &u.x, &u.y, &u.z);
}
Note we can combine repr(C) with repr(align(n)) to obtain an aligned and C-compatible
layout. It is not permissible to combine repr(align(n)) with repr(packed) , since
repr(packed) sets the alignment to 1 . It is also not permissible for a repr(packed) type to
contain a repr(align(n)) type.
For further details on type layouts, refer to the type layout chapter of the Rust Reference.
Other Resources
In this book:
A little C with your Rust
A little Rust with your C
The Rust Embedded FAQs
Rust Pointers for C Programmers
I used to use pointers - now what?
Interoperability
Interoperability between Rust and C code is always dependent on transforming data
between the two languages. For this purpose, there is a dedicated module in the stdlib
called std::ffi .
std::ffi provides type definitions for C primitive types, such as char , int , and long . It
also provides some utility for converting more complex types such as strings, mapping
both &str and String to C types that are easier and safer to handle.
() c_void void
A value of a C primitive type can be used as one of the corresponding Rust type and vice
versa, since the former is simply a type alias of the latter. For example, the following code
compiles on platforms where unsigned int is 32-bit long.
fn foo(num: u32) {
let c_num: c_uint = num;
let r_num: u32 = c_num;
}
We are collecting examples and use cases for this on our issue tracker in issue #61.
Interoperability with RTOSs
Integrating Rust with an RTOS such as FreeRTOS or ChibiOS is still a work in progress;
especially calling RTOS functions from Rust can be tricky.
We are collecting examples and use cases for this on our issue tracker in issue #62.
A little C with your Rust
Using C or C++ inside of a Rust project consists of two major parts:
As C++ does not have a stable ABI for the Rust compiler to target, it is recommended to use
the C ABI when combining Rust with C or C++.
First, we will cover manually translating these definitions from C/C++ to Rust.
Typically, libraries written in C or C++ will provide a header file defining all types and
functions used in public interfaces. An example file may look like this:
/* File: cool.h */
typedef struct CoolStruct {
int x;
int y;
} CoolStruct;
extern "C" {
pub fn cool_function(
i: cty::c_int,
c: cty::c_char,
cs: *mut CoolStruct
);
}
Let's take a look at this definition one piece at a time, to explain each of the parts.
#[repr(C)]
pub struct CoolStruct { ... }
By default, Rust does not guarantee order, padding, or the size of data included in a
struct . In order to guarantee compatibility with C code, we include the #[repr(C)]
attribute, which instructs the Rust compiler to always use the same rules C does for
organizing data within a struct.
pub x: cty::c_int,
pub y: cty::c_int,
Due to the flexibility of how C or C++ defines an int or char , it is recommended to use
primitive data types defined in cty , which will map types from C to types in Rust.
This statement defines the signature of a function that uses the C ABI, called
cool_function . By defining the signature without defining the body of the function, the
definition of this function will need to be provided elsewhere, or linked into the final library
or binary from a static library.
i: cty::c_int,
c: cty::c_char,
cs: *mut CoolStruct
Similar to our datatype above, we define the datatypes of the function arguments using C-
compatible definitions. We also retain the same argument names, for clarity.
We have one new type here, *mut CoolStruct . As C does not have a concept of Rust's
references, which would look like this: &mut CoolStruct , we instead have a raw pointer. As
dereferencing this pointer is unsafe , and the pointer may in fact be a null pointer, care
must be taken to ensure the guarantees typical of Rust when interacting with C or C++
code.
Rather than manually generating these interfaces, which may be tedious and error prone,
there is a tool called bindgen which will perform these conversions automatically. For
instructions of the usage of bindgen, please refer to the bindgen user's manual, however
the typical process consists of the following:
1. Gather all C or C++ headers defining interfaces or datatypes you would like to use with
Rust.
2. Write a bindings.h file, which #include "..." 's each of the files you gathered in step
one.
3. Feed this bindings.h file, along with any compilation flags used to compile your code
into bindgen . Tip: use Builder.ctypes_prefix("cty") / --ctypes-prefix=cty and
Builder.use_core() / --use-core to make the generated code #![no_std]
compatible.
4. bindgen will produce the generated Rust code to the output of the terminal window.
This file may be piped to a file in your project, such as bindings.rs . You may use this
file in your Rust project to interact with C/C++ code compiled and linked as an external
library. Tip: don't forget to use the cty crate if your types in the generated bindings
are prefixed with cty .
For embedded projects, this most commonly means compiling the C/C++ code to a static
archive (such as cool-library.a ), which can then be combined with your Rust code at the
final linking step.
If the library you would like to use is already distributed as a static archive, it is not
necessary to rebuild your code. Just convert the provided interface header file as described
above, and include the static archive at compile/link time.
If your code exists as a source project, it will be necessary to compile your C/C++ code to a
static library, either by triggering your existing build system (such as make , CMake , etc.), or
by porting the necessary compilation steps to use a tool called the cc crate. For both of
these steps, it is necessary to use a build.rs script.
Rust build.rs build scripts
A build.rs script is a file written in Rust syntax, that is executed on your compilation
machine, AFTER dependencies of your project have been built, but BEFORE your project is
built.
The full reference may be found here. build.rs scripts are useful for generating code
(such as via bindgen), calling out to external build systems such as Make , or directly
compiling C/C++ through use of the cc crate.
For projects with complex external projects or build systems, it may be easiest to use
std::process::Command to "shell out" to your other build systems by traversing relative
paths, calling a fixed command (such as make library ), and then copying the resulting
static library to the proper location in the target build directory.
While your crate may be targeting a no_std embedded platform, your build.rs executes
only on machines compiling your crate. This means you may use any Rust crates which will
run on your compilation host.
For projects with limited dependencies or complexity, or for projects where it is difficult to
modify the build system to produce a static library (rather than a final binary or
executable), it may be easier to instead utilize the cc crate, which provides an idiomatic
Rust interface to the compiler provided by the host.
fn main() {
cc::Build::new()
.file("src/foo.c")
.compile("foo");
}
The build.rs is placed at the root of the package. Then cargo build will compile and
execute it before the build of the package. A static archive named libfoo.a is generated
and placed in the target directory.
A little Rust with your C
Using Rust code inside a C or C++ project mostly consists of two parts.
Apart from cargo and meson , most build systems don't have native Rust support. So
you're most likely best off just using cargo for compiling your crate and any dependencies.
Setting up a project
Create a new cargo project as usual.
There are flags to tell cargo to emit a systems library, instead of its regular rust target.
This also allows you to set a different output name for your library, if you want it to differ
from the rest of your crate.
[lib]
name = "your_crate"
crate-type = ["cdylib"] # Creates dynamic lib
# crate-type = ["staticlib"] # Creates static lib
Building a C API
Because C++ has no stable ABI for the Rust compiler to target, we use C for any
interoperability between different languages. This is no exception when using Rust inside
of C and C++ code.
#[no_mangle]
The Rust compiler mangles symbol names differently than native code linkers expect. As
such, any function that Rust exports to be used outside of Rust needs to be told not to be
mangled by the compiler.
extern "C"
By default, any function you write in Rust will use the Rust ABI (which is also not stabilized).
Instead, when building outwards facing FFI APIs we need to tell the compiler to use the
system ABI.
Depending on your platform, you might want to target a specific ABI version, which are
documented here.
Putting these parts together, you get a function that looks roughly like this.
#[no_mangle]
pub extern "C" fn rust_function() {
Just as when using C code in your Rust project you now need to transform data from and
to a form that the rest of the application will understand.
cargo will create a my_lib.so / my_lib.dll or my_lib.a file, depending on your platform
and settings. This library can simply be linked by your build system.
However, calling a Rust function from C requires a header file to declare the function
signatures.
Every function in your Rust-ffi API needs to have a corresponding header function.
#[no_mangle]
pub extern "C" fn rust_function() {}
void rust_function();
etc.
There is a tool to automate this process, called cbindgen which analyses your Rust code
and then generates headers for your C and C++ projects from it.
At this point, using the Rust functions from C is as simple as including the header and
calling them!
#include "my-rust-project.h"
rust_function();
Unsorted topics
Optimizations: the speed size tradeoff
Everyone wants their program to be super fast and super small but it's usually not possible
to have both characteristics. This section discusses the different optimization levels that
rustc provides and how they affect the execution time and binary size of a program.
No optimizations
This is the default. When you call cargo build you use the development (AKA dev ) profile.
This profile is optimized for debugging so it enables debug information and does not
enable any optimizations, i.e. it uses -C opt-level = 0 .
At least for bare metal development, debuginfo is zero cost in the sense that it won't
occupy space in Flash / ROM so we actually recommend that you enable debuginfo in the
release profile -- it is disabled by default. That will let you use breakpoints when debugging
release builds.
[profile.release]
# symbols are nice and they don't increase the size on Flash
debug = true
No optimizations is great for debugging because stepping through the code feels like you
are executing the program statement by statement, plus you can print stack variables
and function arguments in GDB. When the code is optimized, trying to print variables
results in $0 = <value optimized out> being printed.
The biggest downside of the dev profile is that the resulting binary will be huge and slow.
The size is usually more of a problem because unoptimized binaries can occupy dozens of
KiB of Flash, which your target device may not have -- the result: your unoptimized binary
doesn't fit in your device!
Optimizing dependencies
There's a Cargo feature named profile-overrides that lets you override the optimization
level of dependencies. You can use that feature to optimize all dependencies for size while
keeping the top crate unoptimized and debugger friendly.
Beware that generic code can sometimes be optimized alongside the crate where it is
instantiated, rather than the crate where it is defined. If you create an instance of a generic
struct in your application and find that it pulls in code with a large footprint, it may be that
increasing the optimisation level of the relevant dependencies has no effect.
Here's an example:
# Cargo.toml
[package]
name = "app"
# ..
[profile.dev.package."*"] # +
opt-level = "z" # +
That's a 6 KiB reduction in Flash usage without any loss in the debuggability of the top
crate. If you step into a dependency then you'll start seeing those <value optimized out>
messages again but it's usually the case that you want to debug the top crate and not the
dependencies. And if you do need to debug a dependency then you can use the profile-
overrides feature to exclude a particular dependency from being optimized. See example
below:
# ..
Both opt-level = 2 and 3 optimize for speed at the expense of binary size, but level 3
does more vectorization and inlining than level 2 . In particular, you'll see that at opt-
level equal to or greater than 2 LLVM will unroll loops. Loop unrolling has a rather high
cost in terms of Flash / ROM (e.g. from 26 bytes to 194 for a zero this array loop) but can
also halve the execution time given the right conditions (e.g. number of iterations is big
enough).
Currently there's no way to disable loop unrolling in opt-level = 2 and 3 so if you can't
afford its cost you should optimize your program for size.
If you want your release binaries to be optimized for size then change the
profile.release.opt-level setting in Cargo.toml as shown below.
[profile.release]
# or "z"
opt-level = "s"
These two optimization levels greatly reduce LLVM's inline threshold, a metric used to
decide whether to inline a function or not. One of Rust principles are zero cost
abstractions; these abstractions tend to use a lot of newtypes and small functions to hold
invariants (e.g. functions that borrow an inner value like deref , as_ref ) so a low inline
threshold can make LLVM miss optimization opportunities (e.g. eliminate dead branches,
inline calls to closures).
When optimizing for size you may want to try increasing the inline threshold to see if that
has any effect on the binary size. The recommended way to change the inline threshold is
to append the -C inline-threshold flag to the other rustflags in .cargo/config.toml .
# .cargo/config.toml
# this assumes that you are using the cortex-m-quickstart template
[target.'cfg(all(target_arch = "arm", target_os = "none"))']
rustflags = [
# ..
"-C", "inline-threshold=123", # +
]
What value to use? As of 1.29.0 these are the inline thresholds that the different
optimization levels use:
You should try 225 and 275 when optimizing for size.
Performing math functionality with #
[no_std]
If you want to perform math related functionality like calculating the squareroot or the
exponential of a number and you have the full standard library available, your code might
look like this:
fn main() {
let float: f32 = 4.82832;
let floored_float = float.floor();
Without standard library support, these functions are not available. An external crate like
libm can be used instead. The example code would then look like this:
#![no_main]
#![no_std]
use panic_halt as _;
use cortex_m_rt::entry;
use cortex_m_semihosting::{debug, hprintln};
use libm::{exp, floorf, sin, sqrtf};
#[entry]
fn main() -> ! {
let float = 4.82832;
let floored_float = floorf(float);
loop {}
}
If you need to perform more complex operations like DSP signal processing or advanced
linear algebra on your MCU, the following crates might help you
BSP
A Board Support Crate provides a high level interface configured for a specific board. It
usually depends on a HAL crate. There is a more detailed description on the memory-
mapped registers page or for a broader overview see this video.
FPU
HAL
I2C
PAC
SVD
System View Description is an XML file format used to describe the programmers view of a
microcontroller device. You can read more about it on the ARM CMSIS documentation site.
UART
USART