ES Slides
ES Slides
1 - Introduction
© Lothar Thiele
Computer Engineering and Networks Laboratory
Lecture Organization
122
1-2
Organization
WWW: https://www.tec.ee.ethz.ch/education/lectures/embedded-systems.html
Lecture: Lothar Thiele, thiele@ethz.ch; Michele Magno <michele.magno@pbl.ee.ethz.ch>
Coordination: Seonyeong Heo (ETZ D97.7) <seoheo@ethz.ch>
References:
P. Marwedel: Embedded System Design, Springer, ISBN 978-3-319-85812-8/978-3-030-
60909-2, 2018/2021.
G.C. Buttazzo: Hard Real-Time Computing Systems. Springer Verlag, ISBN 978-1-4614-
0676-1, 2011.
Edward A. Lee and Sanjit A. Seshia: Introduction to Embedded Systems, A Cyber-
Physical Systems Approach, Second Edition, MIT Press, ISBN 978-0-262-53381-2, 2017.
Sources: The slides contain ideas and material of J. Rabaey, K. Keuzer, M. Wolf, P.
Marwedel, P. Koopman, E. Lee, P. Dutta, S. Seshia, and from the above cited books.
1-3
Organization Summary
Lectures are held on Mondays from 14:15 to 16:00 in ETF C1 until further notice.
Life streaming and slides are available via the web page of the lecture. In
addition, you find audio and video recordings of most of the slides as well as
recordings of this years and last years life streams on the web page of the
lecture.
Exercises take place on Wednesdays and Fridays from 16:15 to 17:00 via Zoom.
On Wednesdays the lecture material is summarized, hints on how to approach
the solution are given and a sample question is solved. On Fridays, the correct
solutions are discussed.
Laboratories take place on Wednesdays and Fridays from 16:15 to 18:00 (the
latest). On Wednesdays the session starts with a short introduction via Zoom
and then questions can be asked via Zoom. Fridays are reserved for questions
via Zoom.
1-4
Further Material via the Web Page
1-5
When and where?
1-6
What will you learn?
Theoretical foundations and principles of the analysis and design of embedded
systems.
Practical aspects of embedded system design, mainly software design.
1-7
Please read carfully!!
https://www.tec.ee.ethz.ch/education/lectures/embedded-systems.html
1-8
What you got already…
1-9
Be careful and please do not …
1 - 10
You have to return the board at the end!
1 - 11
Embedded Systems - Impact
1 - 12
Embedded Systems
Embedded systems (ES) = information processing systems
embedded into a larger product
•Examples:
© Edward Lee
1 - 15
Embedded System
Embedded System
CYBER Communication
WORLD
observing influencing
PHYSICAL
WORLD physical/biological/social
processes
Nature
Use feedback to influence the dynamics of the physical
world by taking smart decisions in the cyber world
1 - 19
Reactivity & Timing
Embedded systems are often reactive:
Reactive systems must react to stimuli from the system environment :
1 - 20
Predictability & Dependability
1 - 22
Comparison
Embedded Systems: General Purpose Computing
Few applications that are known at Broad class of applications.
design-time.
Not programmable by end user. Programmable by end user.
1 - 23
Lecture Overview
1 - 24
Components and Requirements by Example
1 - 25
1 - 26
1 - 27
1 - 28
Components and Requirements by Example
- Hardware System Architecture -
1 - 29
High-Level Block Diagram View
low power CPU higher performance CPU
• enabling power to the rest of the system • sensor reading and motor control
• battery charging and voltage • flight control
measurement • telemetry (including the battery voltage)
• wireless radio (boot and operate) • additional user development
• detect and check expansion boards • USB connection
UART:
• communication protocol (Universal
Asynchronous Receiver/Transmitter)
• exchange of data packets to and from
interfaces (wireless, USB)
1 - 30
EEPROM:
• electrically erasable programmable
High-Level Block Diagram View read-only memory
Acronyms: sensor board • used for firmware (part of data and
software that usually is not
• Wkup: Wakeup signal changed, configuration data)
• GPIO: General-purpose input/output • can not be easily overwritten in
signal comparison to Flash
• SPI: Serial Peripheral Interface Bus
• I2C: Inter-Integrated Circuit (Bus)
• PWM: Pulse-width modulated Signal
• VCC: power-supply
Flash memory:
• non-volatile random-access memory
for program and data
1 - 31
1 - 32
High-Level Physical View
1 - 33
High-Level Physical View
1 - 34
Low-Level Schematic Diagram View
LEDs
(1 page out of 3)
1 - 35
Low-Level Schematic Diagram View
1 - 37
High-Level Software View
real-time tasks for motor control (gathering sensor values and pilot commands,
sensor fusion, automatic control, driving motors using PWM (pulse width
modulation, … ) but also
1 - 39
High-Level Software View
Block diagram of the stabilization system:
1 - 41
What can you do to increase performance?
1 - 42
From Computer Engineering
1 - 43
From Computer Engineering
iPhone Prozessor A12
• 2 processor cores
- high performance
• 4 processor cores - less
performant
• Acceleration for
Neural Networks
• Graphics processor
• Caches
1 - 46
What can you do to decrease power consumption?
1 - 47
Embedded Multicore Example
Trends:
Specialize multicore processors towards real-time processing and low power
consumption (parallelism can decrease energy consumption)
Target domains:
1 - 48
Why does higher parallelism help in reducing power?
1 - 49
System-on-Chip
Samsung Galaxy S6
– Exynos 7420 System on a Chip (SoC)
– 8 ARM Cortex processing cores
•Exynos 5422
(4 x A57, 4 x A53)
– 30 nanometer: transistor gate width
1 - 50
How to manage extreme workload variability?
1 - 51
System-on-Chip
Samsung Galaxy S6
– Exynos 7420 System on a Chip (SoC)
– 8 ARM Cortex processing cores
•Exynos 5422
(4 x A57, 4 x A53)
– 30 nanometer: transistor gate width
1 - 52
From Computer Engineering
iPhone Prozessor A12
• 2 processor cores
- high performance
• 4 processor cores - less
performant
• Acceleration for
Neural Networks
• Graphics processor
• Caches
1 - 53
Components and Requirements by Example
- Systems -
1 - 55
Zero Power Systems and Sensors
Streaming information to
and from the physical world:
• “Smart Dust”
• Sensor Networks
• Cyber-Physical Systems
• Internet-of-Things (IoT)
1 - 56
Zero Power Systems and Sensors
Low power and energy constraints (portable or unattended devices) are increasingly important,
as well as temperature constraints (overheating).
There is increasing interest in energy harvesting to achieve long term autonomous operation.
1 - 58
Embedded Systems
2. Software Development
© Lothar Thiele
Computer Engineering and Networks Laboratory
Where we are …
2-2
Remember: Computer Engineering I
Compilation of a C program to machine language program:
textual representation
of instructions
binary representation
of instructions and data
2-3
Embedded Software Development
Software Developer
previous
slide Software
Source Code Simulator
Binary operating
Compiler Code system
FPGA
Flash
sensors
actuators
Debugger
micro-
processor RAM
host PC
2-5
Software Development (ES-Lab)
Software development is nowadays usually done with the support of an IDE
(Integrated Debugger and Editor / Integrated Development Environment)
edit and build the code
debug and validate
2-6
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)
relocatable
object file
Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2-7
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)
relocatable
object file
Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2-8
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)
relocatable
object file
Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2-9
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)
relocatable
object file
Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2 - 10
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)
relocatable
object file
Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2 - 11
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)
relocatable
object file
Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2 - 12
Much more in the ES-PreLab …
The Pre-lab is intended for students with missing background in software
development in C and working with an integrated development environment.
2 - 13
Much more in the ES-PreLab …
The Pre-lab is intended for students with missing background in software
development in C and working with an integrated development environment.
2 - 14
Embedded Systems
3. Hardware Software Interface
© Lothar Thiele
Computer Engineering and Networks Laboratory
Do you Remember ?
3-2
Where we are …
3-3
3-4
High-Level Physical View
3-5
High-Level Physical View
3-6
What you will learn …
Hardware-Software Interfaces in Embedded Systems
Storage
SRAM / DRAM / Flash
Memory Map
Input and Output
UART Protocol
Memory Mapped Device Access
SPI Protocol
Interrupts
Clocks and Timers
Clocks
Watchdog Timer
System Tick
Timer and PWM
3-7
Storage
3-8
Remember … ?
3-9
MSP432P401R (ES-Lab)
3 - 10
Storage
SRAM / DRAM / Flash
3 - 11
Static Random Access Memory (SRAM)
Single bit is stored in a bi-stable circuit
Static Random Access Memory is used for
caches
register file within the processor core
small but fast memories
Read:
1. Pre-charge all bit-lines to average voltage
2. decode address (n+m bits)
3. select row of cells using n single-bit word lines (WL)
4. selected bit-cells drive all bit-lines BL (2m pairs)
5. sense difference between bit-line pairs and read out
Write:
select row and overwrite bit-lines using strong signals
3 - 12
Dynamic Random Access (DRAM)
Single bit is stored as a charge in a capacitor
Bit cell loses charge when read, bit cell drains
over time
Slower access than with SRAM due to small
storage capacity in comparison to capacity of
bit-line.
Higher density than SRAM (1 vs. 6 transistors
per bit)
3 - 14
DRAM – Typical Access Process
3. Column Access 4. Data Transfer and Bus Transmission
3 - 15
Flash Memory
Electrically modifiable, non‐volatile storage
Principle of operation:
Transistor with a second “floating” gate
Floating gate can trap electrons
This results in a detectable change in
threshold voltage
3 - 16
NAND and NOR Flash Memory
3 - 17
Example: Reading out NAND Flash
Selected word-line (WL) : Target voltage (Vtarget)
Unselected word-lines : Vread is high enough to have a low resistance in all
transistors in this row
3 - 18
Storage
Memory Map
3 - 19
Example: Memory Map in MSP432 (ES-Lab)
Available memory:
The processor used in the lab (MSP432P401R) has built in 256kB flash memory,
64kB SRAM and 32kB ROM (Read Only Memory).
Address space:
The processor uses 32 bit addresses. Therefore, the addressable memory space is
4 GByte (= 232 Byte) as each memory location corresponds to 1 Byte.
The address space is used to address the memories (reading and writing), to
address the peripheral units, and to have access to debug and trace information
(memory mapped microarchitecture).
The address space is partitioned into zones, each one with a dedicated use. The
following is a simplified description to introduce the basic concepts.
3 - 20
Example: Memory Map in MSP432 (ES-Lab)
Memory map:
hexadecimal
representation
of a 32 bit
binary number;
each digit
corresponds
to 4 bit
hexadecimal
representation
of a 32 bit
binary number;
each digit
corresponds
…
to 4 bit
…
0011 1111 …. 1111
0010 0000 …. 0000
from base address
diff. = 0001 1111 …. 1111
229 different addresses
capacity = 229 Byte =
512 MByte 3 - 22
Example: Memory Map in MSP432 (ES-Lab)
Memory map:
hexadecimal
representation
of a 32 bit
Schematic of LaunchPad as used in the Lab:
binary number;
each digit
corresponds
to 4 bit
hexadecimal
representation
of a 32 bit
binary number;
each digit
corresponds
to 4 bit
hexadecimal
representation
of a 32 bit
binary number;
each digit
corresponds
to 4 bit
3 - 27
Device Communication
Very often, a processor needs to exchange information with other processors or
devices. To satisfy various needs, there exists many different communication
protocols, such as
UART (Universal Asynchronous Receiver-Transmitter)
SPI (Serial Peripheral Interface Bus)
I2C (Inter-Integrated Circuit)
USB (Universal Serial Bus)
3 - 28
Remember?
low power CPU higher performance CPU
• enabling power to the rest of the system • sensor reading and motor control
• battery charging and voltage • flight control
measurement • telemetry (including the battery voltage)
• wireless radio (boot and operate) • additional user development
• detect and check expansion boards • USB connection
UART:
• communication protocol (Universal
Asynchronous Receiver/Transmitter)
• exchange of data packets to and from
interfaces (wireless, USB)
3 - 29
Input and Output
UART Protocol
3 - 30
UART
Serial communication of bits via a single signal, i.e. UART provides parallel-to-
serial and serial-to-parallel conversion.
Sender and receiver need to agree on the transmission rate.
Transmission of a serial packet starts with a start bit, followed by data bits and
finalized using a stop bit: 1-2 stop
6-9 data bits
bits
There exist many variations of this simple scheme. for detecting single bit errors
3 - 31
UART
The receiver runs an internal clock whose frequency is an exact multiple of the
expected bit rate.
When a Start bit is detected, a counter begins to count clock cycles e.g. 8 cycles
until the midpoint of the anticipated Start bit is reached.
The clock counter counts a
further 16 cycles, to the
middle of the first Data bit,
and so on until the Stop bit.
3 - 32
UART with MSP432 (ES-Lab)
host PC
3 - 33
UART with MSP432 (Lab)
3 - 34
Input and Output
Memory Mapped Device Access
3 - 35
Memory-Mapped Device Access
3 - 37
Software Interface
Part of C program that prints a character to a UART terminal on the host PC:
...
static const eUSCI_UART_Config uartConfig =
{
EUSCI_A_UART_CLOCKSOURCE_SMCLK, // SMCLK Clock Source
39, // BRDIV = 39 , integral part
1, // UCxBRF = 1 , fractional part * 16 data structure uartConfig
0, // UCxBRS = 0
contains the configuration
EUSCI_A_UART_NO_PARITY, // No Parity
EUSCI_A_UART_LSB_FIRST, // LSB First
of the UART
EUSCI_A_UART_ONE_STOP_BIT, // One stop bit
EUSCI_A_UART_MODE, // UART mode
EUSCI_A_UART_OVERSAMPLING_BAUDRATE_GENERATION}; // Oversampling Mode
GPIO_setAsPeripheralModuleFunctionInputPin(GPIO_PORT_P1,
GPIO_PIN2 | GPIO_PIN3, GPIO_PRIMARY_MODULE_FUNCTION ); //Configure CPU signals use uartConfig to write to
UART_initModule(EUSCI_A0_BASE, &uartConfig); // Configuring UART Module A0
UART_enableModule(EUSCI_A0_BASE); // Enable UART module A0
eUSCI_A0 configuration
registers
UART_transmitData(EUSCI_A0_BASE,'a'); // Write character ‘a’ to UART
start UART
...
3 - 40
SPI (Serial Peripheral Interface Bus)
Typically communicate across short distances
Characteristics:
4-wire synchronized (clocked) communications bus
supports single master and multiple slaves
always full-duplex: Communicates in both directions simultaneously
multiple Mbps transmission speeds can be achieved
transfer data in 4 to 16 bit serial packets
Bus wiring:
MOSI (Master Out Slave In) – carries data out of master to slave
MISO (Master In Slave Out) – carries data out of slave to master
Both MOSI and MISO are active during every transmission
SS (or CS) – signal to select each slave chip
System clock SCLK – produced by master to synchronize transfers
3 - 41
SPI (Serial Peripheral Interface Bus)
More detailed circuit diagram:
details vary between
different vendors and
implementations
Timing diagram:
3 - 43
Interrupts
3 - 44
Interrupts
A hardware interrupt is an electronic alerting signal sent to the CPU from another
component, either from an internal peripheral or from an external device.
3 - 45
Interrupts
...
MSP432
3 - 46
Processing of an Interrupt (MSP432 ES-Lab)
Timer_A0
IFG register
IFG register
3 - 49
Processing of an Interrupt
IFG register
interrupt
vector
table
3 - 50
Processing of an Interrupt
IFG register
3 - 51
Processing of an Interrupt
Detailed interrupt processing flow:
int main(void)
{
clear interrupt
•...
flag and enable
•GPIO_setAsOutputPin(GPIO_PORT_P1, GPIO_PIN0);
interrupt in
•GPIO_setAsInputPinWithPullUpResistor(GPIO_PORT_P1, GPIO_PIN1);
periphery
•GPIO_clearInterruptFlag(GPIO_PORT_P1, GPIO_PIN1);
enable interrupts •GPIO_enableInterrupt(GPIO_PORT_P1, GPIO_PIN1);
in the controller
(NVIC) •Interrupt_enableInterrupt(INT_PORT1);
•Interrupt_enableMaster();
enter low power
mode LPM3 •while (1) PCM_gotoLPM3();
}
3 - 53
Example: Interrupt Processing
Port 1, pin 1 (which has a switch connected to it) is configured as an input with interrupts enabled
and port 1, pin 0 (which has an LED connected) is configured as an output.
When the switch is pressed, the LED output is toggled.
•while (1)
•{
•new = GPIO_getInputPinValue(GPIO_PORT_P1, GPIO_PIN1);
continuously get the •if (!new & old)
signal at pin1 and •{
detect falling edge •GPIO_toggleOutputOnPin(GPIO_PORT_P1, GPIO_PIN0);
•}
•old = new;
•}
}
3 - 55
Polling vs. Interrupt
What are advantages and disadvantages?
We compare polling and interrupt based on the utilization of the CPU by using a
simplified timing model.
Definitions:
utilization u: average percentage, the processor is busy
computation c: processing time of handling the event
overhead h: time overhead for handling the interrupt
period P: polling period
interarrival time T: minimal time between two events
deadline D: maximal time between event arrival and finishing event processing with D ≤ T.
polling interrupt events
P ≥T
c c h1 c h2 h =h1 + h2 ≤D ≤D
3 - 56
Polling vs. Interrupts
For the following considerations, we suppose that the interarrival time between
events is T. This makes the results a bit easier to understand.
3 - 57
Polling vs. Interrupts
Design problem: D and T are given by application requirements. h and c are given by
the implementation. When to use interrupt and when polling when considering the
resulting system utilization? What is the best value for the polling period P?
3 - 59
Clocks and Timers
Clocks
3 - 60
Clocks
Microcontrollers usually have many different clock sources that have different
frequency (relates to precision)
energy consumption
stability, e.g., crystal-controlled clock vs. digitally controlled oszillator
As an example, the MSP432 (ES-Lab) has the following clock sources:
frequency precision current comment
LFXTCLK 32 kHz 0.0001% / °C 150 nA external crystal
… 0.005% / °C
HFXTCLK 48 MHz 0.0001% / °C 550 µA external crystal
… 0.005% / °C
DCOCLK 3 MHz 0.025% / °C N/A internal
VLOCLK 9.4 kHz 0.1% / °C 50 nA internal
REFOCLK 32 kHz 0.012% / °C 0.6 µA internal
MODCLK 25 MHz 0.02% / °C 50 µA internal
SYSOSC 5 MHz 0.03% / °C 30 µA internal
3 - 61
Clocks and Timers MSP432 (ES-Lab)
3 - 62
Clocks and Timers MSP432 (ES-Lab)
3 - 63
Clocks
From these basic clocks, several internally available clock signals are derived.
They can be used for clocking peripheral units, the CPU, memory, and the various
timers.
3 - 65
Watchdog Timer
Watchdog Timers provide system fail-safety:
If their counter ever rolls over (back to zero), they reset the processor. The goal
here is to prevent your system from being inactive (deadlock) due to some
unexpected fault.
To prevent your system from continuously resetting itself, the counter should be
reset at appropriate intervals.
CPU Watchdog Timer (WDT_A)
pause counting up
•WDT_A_holdTimer();
reset counter to 0
WDT_A_clearTimer();
3 - 67
SysTick MSP432 (ES-Lab)
SysTick is a simple decrementing 24 bit counter that is part of the NVIC
controller (Nested Vector Interrupt Controller). Its clock source is MCLK and it
reloads to period-1 after reaching 0.
It’s a very simple timer, mainly used for periodic interrupts or measuring time.
int main(void) {
...
GPIO_setAsOutputPin(GPIO_PORT_P1, GPIO_PIN0);
SysTick_enableModule();
SysTick_setPeriod(1500000); if MCLK has a frequency of 3 MHz,
SysTick_enableInterrupt(); an interrupt is generated every 0.5 s.
Interrupt_enableMaster();
while (1) PCM_gotoLPM0(); go to low power mode LP0 after executing the ISR
}
void SysTick_Handler(void) {
MAP_GPIO_toggleOutputOnPin(GPIO_PORT_P1, GPIO_PIN0); }
3 - 68
SysTick MSP432 (ES-Lab)
Example for measuring the execution time of some parts of a program:
int main(void) {
int32_t start, end, duration;
...
SysTick_enableModule();
if MCLK has frequency of 3 MHz,
SysTick_setPeriod(0x01000000);
the counter rolls over every ~5.6 seconds
SysTick_disableInterrupt();
as (224 / (3 106) = 5.59
start = SysTick_getValue();
3 - 69
3 - 70
Clocks and Timers
Timer and PWM
3 - 71
Timer
Usually, embedded microprocessors have several elaborate timers that allow to
capture the current time or time differences, triggered by hardware or software
events,
generate interrupts when a certain time is reached (stop watch, timeout),
generate interrupts when counters overflow,
generate periodic interrupts, for example in order to periodically execute tasks,
generate specific output signals, for example PWM (pulse width modulation).
0xFFFF
0xFFFE
clock input counter interrupt on
0xFFFD
register overflow /
0x0002
roll-over
each pulse of the 0x0001
• the value of counter register is stored in • the value of the compare register can be
capture register at the time of the capture set by software
event (input signals, software) • as soon as the values of the counter and
• the value can be read by software compare register are equal, compare
actions can be taken such as interrupt,
• at the time of the capture, further actions signaling peripherals, changing pin values,
can be triggered (interrupt, signal) resetting the counter register 3 - 73
Timer
Pulse Width Modulation (PWM) can be used to change the average power of a
signal.
The use case could be to change the speed of a motor or to modulate the light
intensity of an LED.
0xFFFF counter one compare register
register is used to define the
period
0x0000
output signal
3 - 74
Timer Example MSP432 (ES-Lab)
Example: Configure Timer in “continuous mode”. Goal: generate periodic interrupts.
TXCLK (external)
ACLK
SMCLK
inverted TXCLK
3 - 75
Timer Example MSP432 (ES-Lab)
Example: Configure Timer in “continuous mode”. Goal: generate periodic interrupts.
0xFFFF
0x0000
Interrupt
3 - 76
Timer Example MSP432 (ES-Lab)
Example: Configure Timer in “continuous mode”. Goal: generate periodic interrupts,
but with configurable periods.
int main(void) {
...
const Timer_A_ContinuousModeConfig continuousModeConfig = {
TIMER_A_CLOCKSOURCE_ACLK,
TIMER_A_CLOCKSOURCE_DIVIDER_1, clock source is ACLK (32.768 kHz);
divider is 1 (count frequency 32.768 kHz); so far,
TIMER_A_TAIE_INTERRUPT_DISABLE,
TIMER_A_DO_CLEAR}; no interrupt on roll-over; nothing
... happens
configure continuous mode
of timer instance A0 only the
Timer_A_configureContinuousMode(TIMER_A0_BASE, &continuousModeConfig);
counter is
Timer_A_startCounter(TIMER_A0_BASE, TIMER_A_CONTINUOUS_MODE); running
...
while(1) PCM_gotoLPM0(); }
start counter A0 in
continuous mode
3 - 77
Timer Example MSP432 (ES-Lab)
Example:
For a periodic interrupt, we need to add a compare register and an ISR.
The following code should be added as a definition:
#define PERIOD 32768
void TA0_N_IRQHandler(void) { the register TA0IV contains the interrupt flags for
the registers; after being read, the highest priority
switch(TA0IV) { interrupt (smallest register number) is cleared
case 0x0002: //flag for register CCR1 automatically.
TA0CCR1 = TA0CCR1 + PERIOD;
... // do something every PERIOD
default: break; the register TA0CCR1 contains the compare
} value of compare register 1.
}
other cases in the switch statement may be used
to handle other capture and compare registers
3 - 79
Timer Example MSP432 (ES-Lab)
Example: This principle can be used to generate several periodic interrupts with
one timer.
TA0CCR2 TA0CCR2
TA0CCR1
TA0CCR1 TA0CCR1
0xFFFF
TA0CCR2 TA0CCR2
TA0CCR1
3 - 80
Embedded Systems
4. Programming Paradigms
© Lothar Thiele
Computer Engineering and Networks Laboratory
Where we are …
4-2
Reactive Systems and Timing
4-3
Timing Guarantees
Hard real-time systems can be often found in safety-critical applications. They
need to provide the result of a computation within a fixed time bound.
Typical application domains:
avionics, automotive, train systems, automatic control including robotics,
manufacturing, media content production
4-4
Simple Real-Time Control System
A/D
Input Control-Law D/A
Computation
A/D
4-5
Real-Time Systems
In many cyber-physical systems (CPSs), correct timing is a matter of correctness, not
performance: an answer arriving too late is consider to be an error.
4-6
Real-Time Systems
4-7
Real-Time Systems
4-8
Real-Time Systems
4-9
Real-Time Systems
4 - 10
Real-Time Systems
4 - 11
Real-Time Systems
Embedded controllers are often expected to finish the processing of data and
events reliably within defined time bounds. Such a processing may involve
sequences of computations and communications.
Essential for the analysis and design of a real-time system: Upper bounds on the
execution times of all tasks are statically known. This also includes the
communication of information via a wired or wireless connection.
Analogously, one can define the lower bound on the execution time, the Best-Case
Execution Time (BCET).
4 - 12
Distribution of Execution Times
Distribution of execution times
Unsafe:
Best Case Execution Time
Execution Time Measurement
Upper bound
Worst Case
Execution Time
Execution Time
4 - 13
Modern Hardware Features
Modern processors increase the average performance (execution of tasks) by
using caches, pipelines, branch prediction, and speculation techniques, for
example.
These features make the computation of the WCET very difficult: The
execution times of single instructions vary widely.
The microarchitecture has a large time-varying internal state that is changed by
the execution of instructions and that influences the execution times of
instructions.
Best case - everything goes smoothely: no cache miss, operands ready, needed
resources free, branch correctly predicted.
Worst case - everything goes wrong: all loads miss the cache, resources needed
are occupied, operands are not ready.
The span between the best case and worst case may be several hundred cycles.
4 - 14
Methods to Determine the Execution Time of a Task
•execution time
•Worst-Case
•Best-Case
4 - 16
Determine the WCET
Complexity of determining the WCET of tasks:
In the general case, it is even undecidable whether a finite bound exists.
For restricted classes of programs it is possible, in principle. Computing accurate
bounds is simple for „old“ architectures, but very complex for new architectures with
pipelines, caches, interrupts, and virtual memory, for example.
4 - 17
Different Programming Paradigms
4 - 18
Why Multiple Tasks on one Embedded Device?
The concept of concurrent tasks reflects our intuition about the functionality of
embedded systems.
4 - 19
Example: Engine Control
Typical Tasks:
spark control
crankshaft sensing
fuel/air mixture
oxygen sensor
engine
Kalman filter – control
controller
algorithm
4 - 20
Overview
There are many structured ways of programming an embedded system.
In this lecture, only the main principles will be covered:
time triggered approaches
periodic
cyclic executive
generic time-triggered scheduler
4 - 21
Time-Triggered Systems
Pure time-triggered model:
no interrupts are allowed, except by timers
the schedule of tasks is computed off-line and therefore, complex sophisticated
algorithms can be used
the scheduling at run-time is fixed and therefore, it is deterministic
the interaction with environment happens through polling
interrupt polling
Timer
interfaces
CPU to sensor/
actuator
set timer 4 - 22
Simple Periodic TT Scheduler
A timer interrupts regularly with period P.
All tasks have same period P.
T1 T2 T3 T1 T2 T3 T1 T2 T3
t
t(0)
P
Properties:
later tasks, for example T2 and T3, have unpredictable starting times
the communication between tasks or the use of common resources is safe, as
there is a static ordering of tasks, for example T2 starts after finishing T1
as a necessary precondition, the sum of WCETs of all tasks within a period is
bounded by the period P:
4 - 23
Simple Periodic Time-Triggered Scheduler
usually done offline
main:
determine table of tasks (k, T(k)), for k=0,1,…,m-1;
i=0; set the timer to expire at initial phase t(0);
while (true) sleep();
set CPU to low power mode;
Timer Interrupt: processing starts again after interrupt
i=i+1; k T(k)
set the timer to expire at i*P + t(0); 0 T1
for (k=0,…,m-1){ execute task T(k); } 1 T2
return;
2 T3
for example using a function pointer in C; 3 T4
task(= function) returns after finishing. 4 T5
m=5
4 - 24
Time-Triggered Cyclic Executive Scheduler
Suppose now, that tasks may have different periods.
To accommodate this situation, the period P is partitioned into frames of length f.
T1 T3 T2 T1 T4 T2 T1 T2 T1 T1 T2
t
0 2 4 6 8 10 12 14 16 18 20
f P
4 - 25
Time-Triggered Cyclic Executive Scheduling
Examples for periodic tasks: sensory data acquisition, control loops, action
planning and system monitoring.
When a control application consists of several concurrent periodic tasks with
individual timing constraints, the schedule has to guarantee that each periodic
instance is regularly activated at its proper rate and is completed within its
deadline.
Definitions:
: denotes the set of all periodic tasks
i : denotes a periodic task
i, j : denotes the jth instance of task i
ri , j , d i , j : denote the release time and absolute deadline of the
jth instance of task i
i : phase of task i (release time of its first instance)
Di : relative deadline of task i
4 - 26
Time-Triggered Cyclic Executive Scheduling
Example of a single periodic task i :
i Ti
i Di
ri ,1 ri , 2
Ci
A set of periodic tasks : task instances should execute in these intervals
4 - 27
Time-Triggered Cyclic Executive Scheduling
The following hypotheses are assumed on the tasks:
The instances of a periodic task are regularly activated at a constant rate. The
interval Ti between two consecutive activations is called period. The release times
satisfy
ri, j i j 1Ti
All instances have the same worst case execution time Ci . The worst case
execution time is also denoted as WCET(i) .
All instances of a periodic task have the same relative deadline Di . Therefore, the
absolute deadlines satisfy
di , j i j 1Ti Di
4 - 28
Time-Triggered Cyclic Executive Scheduling
Example with 4 tasks:
requirement
schedule
0 4 8 12 16 20 24 28 32 36
4 - 30
Sketch of Proof for Last Condition
frames
starting time latest finishing time
f
4 - 31
Example: Cyclic Executive Scheduling
Conditions:
4 4 1.0
5 5 1.8
20 20 1.0
possible solution: f = 2 20 20 2.0
t
0 2 4 6 8 10 12 14 16 18 20
f P
4 - 32
Time-Triggered Cyclic Executive Scheduling
Checking for correctness of schedule:
denotes the number of the frame in which that instance j of task executes.
Is P a common multiple of all periods ?
Is P a multiple of f ?
Is the frame sufficiently long?
Determine offsets such that instances of tasks start after their release time:
4 - 33
Generic Time-Triggered Scheduler
In an entirely time-triggered system, the temporal control structure of all tasks is
established a priori by off-line support-tools.
This temporal control structure is encoded in a Task-Descriptor List (TDL) that
contains the cyclic schedule for all activities of the node.
This schedule considers the required precedence and mutual exclusion
relationships among the tasks such that an explicit coordination of the tasks by
the operating system at run time is not necessary.
The dispatcher is activated by a
synchronized clock tick. It looks at the
TDL, and then performs the action
that has been planned for this
instant [Kopetz].
4 - 34
Simplified Time-Triggered Scheduler
4 - 35
Summary Time-Triggered Scheduler
Properties:
deterministic schedule; conceptually simple (static table); relatively easy to
validate, test and certify
no problems in using shared resources
Extensions:
allow interrupts → be careful with shared resources and the WCET of tasks!!
allow preemptable background tasks
check for task overruns (execution time longer than WCET) using a watchdog timer
4 - 36
Event Triggered Systems
The schedule of tasks is determined by the occurrence of external or internal events:
dynamic and adaptive: there are possible problems with respect to timing, the use
of shared resources and buffer over- or underflow
guarantees can be given either off-line (if bounds on the behavior of the
environment are known) or during run-time
Timer
interfaces
CPU to sensor/
actuator
set timer
4 - 37
Non-Preemptive Event-Triggered Scheduling
Principle:
To each event, there is associated a corresponding task that will be executed.
Events are emitted by (a) external interrupts or (b) by tasks themselves.
All events are collected in a single queue; depending on the queuing discipline, an
event is chosen for execution, i.e., the corresponding task is executed.
Tasks can not be preempted.
Extensions:
A background task can run if the event queue is empty. It will be preempted by
any event processing.
Timed events are ready for execution only after a time interval elapsed. This
enables periodic instantiations, for example.
4 - 38
Non-Preemptive Event-Triggered Scheduling
main:
set the CPU to low power mode;
while (true) {
continue processing after interrupt
if (event queue is empty) {
sleep();
} else { for example using a function pointer in C;
extract event from event queue; task returns after finishing.
execute task corresponding to event;
}
}
ISR
(interrupt service
Interrupt: routine) tasks
put event into event queue; event
return; interrupts
event
4 - 39
Non-Preemptive Event-Triggered Scheduling
Properties:
communication between tasks does not lead to a simultaneous access to shared
resources, but interrupts may cause problems as they preempt running tasks
buffer overflow may happen if too many events are generated by the environment or
by tasks
tasks with a long running time prevent other tasks from running and may cause
buffer overflow as no events are being processed
during this time task with a long
partition tasks into smaller ones execution time
but the local context must be stored partition
subtask 1 subtask 2
save restore
context global memory context
4 - 40
Preemptive Event-Triggered Scheduling – Stack Policy
This case is similar to non-preemptive case, but tasks can be preempted by
others; this resolves partly the problem of tasks with a long execution time.
If the order of preemption is restricted, we can use the usual stack-based context
mechanism of function calls. The context of a
function contains the necessary state such as local main memory
variables and saved registers. context of addresses
main()
main(){
…
f1(); context of
… f1()
f1(){ context of
… f2()
f2();
…
4 - 41
Preemptive Event-Triggered Scheduling – Stack Policy
task T3
task T2
task T1
preemption t
Tasks must finish in LIFO (last in first out) order of their instantiation.
this restricts flexibility of the approach
it is not useful, if tasks wait some unknown time for external events, i.e., they are
blocked
Shared resources (communication between tasks!) must be protected, for
example by disabling interrupts or by the use of semaphores.
4 - 42
Preemptive Event-Triggered Scheduling – Stack Policy
main:
while (true) {
if (event queue is empty) {
set CPU to low power mode;
sleep();
processing continues after interrupt
} else {
select event from event queue;
execute selected task; for example using a function pointer
remove selected event from queue; in C; task returns after finishing.
}
}
InsertEvent: Interrupt:
put new event into event queue; InsertEvent(…);
select event from event queue; return;
if (selected task running task) {
execute selected task; may be called by interrupt service
remove selected event from queue; routines (ISR) or tasks
}
return;
4 - 43
Thread
A thread is a unique execution of a program.
Several copies of such a “program” may run simultaneously or at different times.
Threads share the same processor and its peripherals.
A thread has its own local state. This state consists mainly of:
register values;
memory stack (local variables);
program counter;
4 - 44
Threads and Memory Organization
Activation record (also denoted as the thread context) contains the
thread local state which includes
registers and local data structures.
PC
thread 1
Context switch:
current CPU context registers
thread 2
goes out
new CPU context ... CPU
goes in
memory
4 - 45
Co-operative Multitasking
Each thread allows a context switch to another thread at a call to the
cswitch() function.
This function is part of the underlying runtime system (operating system).
A scheduler within this runtime system chooses which thread will run next.
Advantages:
predictable, where context switches can occur
less errors with use of shared resources if the switch locations are chosen carefully
Problems:
programming errors can keep other threads out as a thread may never give up
CPU
real-time behavior may be at risk if a thread runs too long before the next context
switch is allowed
4 - 46
Example: Co-operative Multitasking
Thread 1 Thread 2
if (x > 2) procdata(r,s,t);
sub1(y); cswitch();
else if (val1 == 3)
sub2(y); abc(val2);
cswitch(); rst(val3);
proca(a,b,c);
Scheduler
save_state(current);
p = choose_process();
load_and_go(p);
4 - 47
Preemptive Multitasking
Most general form of multitasking:
The scheduler in the runtime system (operating system) controls when contexts
switches take place.
The scheduler also determines what thread runs next.
4 - 48
Embedded Systems
4a. Timing Anomalies
© Lothar Thiele
Computer Engineering and Networks Laboratory
4a - 2
4a - 3
4a - 4
4a - 5
4a - 6
4a - 7
4a - 8
4a - 9
4a - 10
4a - 11
4a - 12
Embedded Systems
5. Operating Systems
© Lothar Thiele
Computer Engineering and Networks Laboratory
Embedded Operating Systems
5-2
Where we are …
5-3
Embedded Operating System (OS)
Why an operating system (OS) at all?
Same reasons why we need one for a traditional computer.
Not every devices needs all services.
5-4
Embedded Operating System
Why is a desktop OS not suited?
The monolithic kernel of a desktop OS offers too many features that take space in
memory and consume time.
Monolithic kernels are often not modular, fault-tolerant, configurable.
Requires too much memory space and is often too ressource hungry in terms of
computation time.
Not designed for mission-critical applications.
The timing uncertainty may be too large for some applications.
5-5
Embedded Operating Systems
Essential characteristics of an embedded OS: Configurability
No single operating system will fit all needs, but often no overhead for
unused functions/data is tolerated. Therefore, configurability is needed.
For example, there are many embedded systems without external memory, a
keyboard, a screen or a mouse.
Configurability examples:
Remove unused functions/libraries (for example by the linker).
Use conditional compilation (using #if and #ifdef commands in C, for example).
But deriving a consistent configuration is a potential problem of systems with a
large number of derived operating systems. There is the danger of missing
relevant components.
5-6
Example: Configuration of VxWorks
© Windriver
http://www.windriver.com/products/development_tools/ide/tornado2/tornado_2_ds.pdf
5-7
Real-time Operating Systems
A real-time operating system is an operating system that supports the
construction of real-time systems.
Key requirements:
1. The timing behavior of the OS must be predictable.
For all services of the OS, an upper bound on the execution time is necessary. For
example, for every service upper bounds on blocking times need to be available,
i.e. for times during which interrupts are disabled. Moreover, almost all
processor activities should be controlled by a real-time scheduler.
2. OS must manage the timing and scheduling
OS has to be aware of deadlines and should have mechanism to take them
into account in the scheduling
OS must provide precise time services with a high resolution
5-8
Embedded Operating Systems
Features and Architecture
5-9
Embedded Operating System
Device drivers are typically handled directly by tasks instead of drivers that are
managed by the operating system:
This architecture improves timing predictability as access to devices is also handled by
the scheduler
If several tasks use the same external device and the associated driver, then the access
must be carefully managed (shared critical resource, ensure fairness of access)
Embedded OS Standard OS
5 - 10
Embedded Operating Systems
Every task can perform an interrupt:
For standard OS, this would be serious source of unreliability. But embedded
programs are typically programmed in a controlled environment.
It is possible to let interrupts directly start or stop tasks (by storing the tasks start
address in the interrupt table). This approach is more efficient and predictable
than going through the operating system’s interfaces and services.
5 - 11
Main Functionality of RTOS-Kernels
Task management:
Execution of quasi-parallel tasks on a processor using processes or threads (lightweight
process) by
maintaining process states, process queuing,
allowing for preemptive tasks (fast context switching) and quick interrupt handling
CPU scheduling (guaranteeing deadlines, minimizing process waiting times, fairness in
granting resources such as computing power)
Inter-task communication (buffering)
Support of real-time clocks
Task synchronization (critical sections, semaphores, monitors, mutual exclusion)
In classical operating systems, synchronization and mutual exclusion is performed via
semaphores and monitors.
In real-time OS, special semaphores and a deep integration of them into scheduling is
necessary (for example priority inheritance protocols as described in a later chapter).
5 - 12
Task States
Minimal Set of Task States:
delete
running
wait
blocked dispatch
preemption
signal ready
instantiate
5 - 13
Task states
Running:
A task enters this state when it starts executing on the processor. There is as
most one task with this state in the system.
Ready:
State of those tasks that are ready to execute but cannot be run because the
processor is assigned to another task, i.e. another task has the state “running”.
Blocked:
A task enters the blocked state when it executes a synchronization primitive to
wait for an event, e.g. a wait primitive on a semaphore or timer. In this case,
the task is inserted in a queue associated with this semaphore. The task at the
head is resumed when the semaphore is unlocked by an event.
5 - 14
Multiple Threads within a Process
5 - 15
Threads
A thread is the smallest sequence of programmed instructions that can be
managed independently by a scheduler; e.g., a thread is a basic unit of CPU
utilization.
Multiple threads can exist within the same process and share resources such
as memory, while different processes do not share these resources:
Typically shared by threads: memory.
Typically owned by threads: registers, stack.
timer queue
5 - 17
Context Switch: Processes or Threads
process or thread P0 operating system process or thread P1
process control block or
thread control block
5 - 18
Embedded Operating Systems
Classes of Operating Systems
5 - 19
Class 1: Fast and Efficient Kernels
Fast and efficient kernels
For hard real-time systems, these kernels are questionable, because they are
designed to be fast, rather than to be predictable in every respect.
Examples include
FreeRTOS, QNX, eCOS, RT-LINUX, VxWORKS, LynxOS.
5 - 20
Class 2: Extensions to Standard OSs
Real-time extensions to standard OS:
Attempt to exploit existing and comfortable main stream operating systems.
A real-time kernel runs all real-time tasks.
The standard-OS is executed as one task.
5 - 22
Example: RT Linux
RT-tasks cannot use standard OS calls.
Commercially available from fsmlabs and
WindRiver (www.fsmlabs.com)
5 - 23
Class 3: Research Systems
Research systems try to avoid limitations of existing real-time and embedded
operating systems.
Examples include L4, seL4, NICTA, ERIKA, SHARK
5 - 25
Example: FreeRTOS (ES-Lab)
FreeRTOS (http://www.freertos.org/) is a typical embedded operating system. It is
available for many hardware platforms, open source and widely used in industry. It
is used in the ES-Lab.
5 - 26
Example: FreeRTOS (ES-Lab)
Typical directory structure (excerpts):
5 - 28
Example FreeRTOS – Task Management
Tasks are implemented as threads.
The functionality of a thread is implemented in form of a function:
Prototype:
Task functions are not allowed to return! They can be “killed” by a specific call to a
FreeRTOS function, but usually run forever in an infinite loop.
Task functions can instantiate other tasks. Each created task is a separate
execution instance, with its own stack.
Example: void vTask1( void *pvParameters ) {
volatile uint32_t ul; /* volatile to ensure ul is implemented. */
for( ;; ) {
• ... /* do something repeatedly */
• for( ul = 0; ul < 10000; ul++ ) { /* delay by busy loop */ }
• }
•}
5 - 29
Example FreeRTOS – Task Management
Thread instantiation: a pointer to the function
that implements the task
handle of the task whose priority is being modified new priority (0 is lowest priority)
A task can delete itself or any other task. Deleted tasks no longer exist and cannot
enter the “running” state again.
handle of the task who will be deleted; if NULL, then the caller will be deleted
5 - 31
Embedded Operating Systems
FreeRTOS Timers
5 - 32
Example FreeRTOS – Timers
The operating system also provides interfaces to timers of the processor.
As an example, we use the FreeRTOS timer interface to replace the busy loop by
a delay. In this case, the task is put into the “blocked” state instead of
continuously running.
5 - 33
Example FreeRTOS – Timers
Problem: The task does not execute strictly periodically:
execution of “something” task in ready state again
t
task moved to run state wait 250ms
The parameters to vTaskDelayUntil() specify the exact tick count value at which
the calling task should be moved from the “blocked” state into the “ready” state.
Therefore, the task is put into the “ready” state periodically.
void vTask1( void *pvParameters ) { The xLastWakeTime variable needs to
TickType_t xLastWakeTime = xTaskGetTickCount(); be initialized with the current tick
for( ;; ) { count. Note that this is the only time
• ... /* do something repeatedly */
the variable is written to explicitly.
• vTaskDelayUntil(&xLastWakeTime, pdMS_TO_TICKS(250));
After this xLastWakeTime is
• }
•}
automatically updated within
vTaskDelayUntil().
automatically updated when task is unblocked time to next unblocking
5 - 34
Embedded Operating Systems
FreeRTOS Task States
5 - 35
Example FreeRTOS – Task States
What are the task states in FreeRTOS and the corresponding transitions? not much
used
A task that is waiting for an event is said to be
in the “Blocked” state, which is a sub-state of
the “Not Running” state.
Tasks can enter the “Blocked” state to wait for
two different types of event:
Temporal (time-related) events—the event
being either a delay period expiring, or an
absolute time being reached.
Synchronization events—where the events
originate from another task or interrupt. For
example, queues, semaphores, and mutexes, can
be used to create synchronization events.
5 - 36
Example FreeRTOS – Task States
Example 1: Two threads with equal priority.
void vTask1( void *pvParameters ) { void vTask2( void *pvParameters ) {
volatile uint32_t ul; volatile uint32_t u2;
for( ;; ) { for( ;; ) {
• ... /* do something repeatedly */ • ... /* do something repeatedly */
• for( ul = 0; ul < 10000; ul++ ) { } • for( u2 = 0; u2 < 10000; u2++ ) { }
• } • }
•} •}
5 - 37
Example FreeRTOS – Task States
Example 2: Two threads with delay timer.
void vTask1( void *pvParameters ) { int main( void ) {
TickType_t xLastWakeTime = xTaskGetTickCount(); xTaskCreate(vTask1,"Task 1",1000,NULL,1,NULL);
for( ;; ) { xTaskCreate(vTask2,"Task 2",1000,NULL,2,NULL);
• ... /* do something repeatedly */ vTaskStartScheduler();
• vTaskDelayUntil(&xLastWakeTime,pdMS_TO_TICKS(250)); for( ;; );
} }
}
5 - 39
Example FreeRTOS – Interrupts
How are tasks (threads) and hardware interrupts scheduled jointly?
Although written in software, an interrupt service routine (ISR) is a hardware
feature because the hardware controls which interrupt service routine will run,
and when it will run.
Tasks will only run when there are no ISRs running, so the lowest priority interrupt
will interrupt the highest priority task, and there is no way for a task to pre-empt
an ISR. In other words, ISRs have always a higher priority than any other task.
Usual pattern:
ISRs are usually very short. They find out the reason for the interrupt, clear the
interrupt flag and determine what to do in order to handle the interrupt.
Then, they unblock a regular task (thread) that performs the necessary processing
related to the interrupt.
For blocking and unblocking, usually semaphores are used.
5 - 40
Example FreeRTOS – Interrupts
blocking and
unblocking is
typically
implemented
via semaphores
5 - 41
Example FreeRTOS – Interrupts
5 - 42
Embedded Systems
6. Aperiodic and Periodic Scheduling
© Lothar Thiele
Computer Engineering and Networks Laboratory
Where we are …
6-2
Basic Terms and Models
6-3
Basic Terms
Real-time systems
Hard: A real-time task is said to be hard, if missing its deadline may cause
catastrophic consequences on the environment under control. Examples are
sensory data acquisition, detection of critical conditions, actuator servoing.
Soft: A real-time task is called soft, if meeting its deadline is desirable for
performance reasons, but missing its deadline does not cause serious damage to
the environment and does not jeopardize correct system behavior. Examples are
command interpreter of the user interface, displaying messages on the screen.
6-4
Schedule
Given a set of tasks J {J1 , J 2 ,...}:
A schedule is an assignment of tasks to the processor, such that each task is
executed until completion.
A schedule can be defined as an integer step function : R N
where (t ) denotes the task which is executed at time t. If
(t ) 0 then the processor is called idle.
If (t ) changes its value at some time, then the processor performs a context
switch.
Each interval, in which (t ) is constant is called a time slice.
A preemptive schedule is a schedule in which the running task can be arbitrarily
suspended at any time, to assign the CPU to another task according to a
predefined scheduling policy.
6-5
Schedule and Timing
A schedule is said to be feasible, if all task can be completed according to a set
of specified constraints.
A set of tasks is said to be schedulable, if there exists at least one algorithm that
can produce a feasible schedule.
Arrival time ai or release time ri is the time at which a task becomes ready for
execution.
Computation time Ci is the time necessary to the processor for executing the
task without interruption.
Deadline d i is the time at which a task should be completed.
Start time si is the time at which a task starts its execution.
Finishing time f i is the time at which a task finishes its execution.
6-6
Schedule and Timing
6-7
Schedule and Timing
Periodic task i : infinite sequence of identical activities, called instances or jobs,
that are regularly activated at a constant rate with period Ti . The activation
time of the first instance is called phase i .
relative deadline
instance 1 instance 2
6-8
Example for Real-Time Model
task J1 task J2
5 10 15 20 25
r1 r2 d1 d2
Computation times: C1 = 9, C2 = 12
Start times: s1 = 0, s2 = 6
Finishing times: f1 = 18, f2 = 28
Lateness: L1 = -4, L2 = 1
Tardiness: E1 = 0, E2 = 1
Laxity: X1 = 13, X2 = 11
6-9
Precedence Constraints
Precedence relations between tasks can be described through an acyclic directed
graph G where tasks are represented by nodes and precedence relations by
arrows. G induces a partial order on the task set.
J4 J5
6 - 10
Precedence Constraints
Example for concurrent activation:
Image acquisition acq1 acq 2
Low level image processing edge1 edge 2
Feature/contour extraction shape
Pixel disparities disp
Object size H
Object recognition rec
6 - 11
Classification of Scheduling Algorithms
With preemptive algorithms, the running task can be interrupted at any time to
assign the processor to another active task, according to a predefined
scheduling policy.
With a non-preemptive algorithm, a task, once started, is executed by the
processor until completion.
Static algorithms are those in which scheduling decisions are based on fixed
parameters, assigned to tasks before their activation.
Dynamic algorithms are those in which scheduling decisions are based on
dynamic parameters that may change during system execution.
6 - 12
Classification of Scheduling Algorithms
An algorithm is said optimal if it minimizes some given cost function defined
over the task set.
An algorithm is said to be heuristic if it tends toward but does not guarantee to
find the optimal schedule.
Acceptance Test: The runtime system decides whenever a task is added to the
system, whether it can schedule the whole task set without deadline violations.
6 - 13
Metrics to Compare Schedules
1 n
Average response time: tr f i ri
n i 1
Total completion time: tc max f i min ri
i i
n
wi ( fi ri )
Weighted sum of response time: t w i 1 n
wi
i 1
Maximum lateness: Lmax max f i d i
i
n
Number of late tasks: N late miss f i
i 1
0 if f i di
miss f i
1 otherwise
6 - 14
Metrics Example
task J1 task J2
5 10 15 20 25
r1 r2 d1 d2
6 - 15
Metrics and Scheduling Example
In schedule (a), the maximum lateness is minimized, but all tasks miss their deadlines.
In schedule (b), the maximal lateness is larger, but only one task misses its deadline.
6 - 16
Real-Time Scheduling of Aperiodic Tasks
6 - 17
Overview Aperiodic Task Scheduling
Scheduling of aperiodic tasks with real-time constraints:
Table with some known algorithms:
6 - 18
Earliest Deadline Due (EDD)
Jackson’s rule: Given a set of n tasks. Processing in order of non-decreasing
deadlines is optimal with respect to minimizing the maximum lateness.
6 - 19
Earliest Deadline Due (EDD)
Example 1:
6 - 20
Earliest Deadline Due (EDD)
Jackson’s rule: Given a set of n tasks. Processing in order of non-decreasing
deadlines is optimal with respect to minimizing the maximum lateness.
Proof concept:
6 - 21
Earliest Deadline Due (EDD)
Example 2:
6 - 22
Earliest Deadline First (EDF)
Horn’s rule: Given a set of n independent tasks with arbitrary arrival times, any
algorithm that at any instant executes a task with the earliest absolute deadline
among the ready tasks is optimal with respect to minimizing the maximum
lateness.
6 - 23
Earliest Deadline First (EDF)
Example:
6 - 24
Earliest Deadline First (EDF)
Horn’s rule: Given a set of n independent tasks with arbitrary arrival times, any
algorithm that at any instant executes the task with the earliest absolute deadline
among the ready tasks is optimal with respect to minimizing the maximum
lateness.
Concept of proof:
For each time interval t , t 1 it is verified, whether the actual running task is
the one with the earliest absolute deadline. If this is not the case, the task with the
earliest absolute deadline is executed in this interval instead. This operation cannot
increase the maximum lateness.
6 - 25
Earliest Deadline First (EDF)
which task is
executing ?
time slice
slice for
interchange
situation after
interchange
6 - 26
remaining worst-
Earliest Deadline First (EDF) case execution time
of task k
Acceptance test: i
worst case finishing time of task i: f i t ck (t )
k 1
i
EDF guarantee condition: i 1,..., n t ck (t ) d i
k 1
algorithm:
Algorithm: EDF_guarantee (J, Jnew)
{ J‘=J{Jnew}; /* ordered by deadline */
t = current_time();
f0 = t;
for (each JiJ‘) {
fi = fi-1 + ci(t);
if (fi > di) return(INFEASIBLE);
}
return(FEASIBLE);
}
6 - 27
Earliest Deadline First (EDF*)
The problem of scheduling a set of n tasks with precedence constraints
(concurrent activation) can be solved in polynomial time complexity if tasks are
preemptable.
The EDF* algorithm determines a feasible schedule in the case of tasks with
precedence constraints if there exists one.
6 - 28
EDF*
6 - 29
EDF*
6 - 30
Earliest Deadline First (EDF*)
Modification of deadlines:
Task must finish the execution time within its deadline.
Task must not finish the execution later than the maximum start time of its
successor.
task b depends on task a: J a Jb
j
6 - 31
Earliest Deadline First (EDF*)
Modification of release times:
Task must start the execution not earlier than its release time.
Task must not start the execution earlier than the minimum finishing time of its
predecessor.
i 6 - 32
Earliest Deadline First (EDF*)
Algorithm for modification of release times:
1. For any initial node of the precedence graph set ri * ri
2. Select a task j such that its release time has not been modified but the release times of
all immediate predecessors i have been modified. If no such task exists, exit.
3. Set r j * max r j , max ri * Ci : J i J j
4. Return to step 2
6 - 33
Earliest Deadline First (EDF*)
Proof concept:
Show that if there exists a feasible schedule for the modified task set under EDF
then the original task set is also schedulable. To this end, show that the original
task set meets the timing constraints also. This can be done by using ri * ri ,
di * di ; we only made the constraints stricter.
Show that if there exists a schedule for the original task set, then also for the
modified one. We can show the following: If there exists no schedule for the
modified task set, then there is none for the original task set. This can be done by
showing that no feasible schedule was excluded by changing the deadlines and
release times.
In addition, show that the precedence relations in the original task set are not
violated. In particular, show that
a task cannot start before its predecessor and
a task cannot preempt its predecessor.
6 - 34
Real-Time Scheduling of Periodic Tasks
6 - 35
Overview
Table of some known preemptive scheduling algorithms for periodic tasks:
6 - 36
Model of Periodic Tasks
Examples: sensory data acquisition, low-level actuation, control loops, action
planning and system monitoring.
When an application consists of several concurrent periodic tasks with individual
timing constraints, the OS has to guarantee that each periodic instance is
regularly activated at its proper rate and is completed within its deadline.
Definitions:
: denotes a set of periodic tasks
i : denotes a periodic task
i, j : denotes the jth instance of task i
ri , j , si , j , f i , j , di , j : denote the release time, start time, finishing time, absolute
deadline of the jth instance of task i
i : denotes the phase of task i (release time of its first instance)
Di : denotes the relative deadline of task i
Ti : denotes the period of task i
6 - 37
Model of Periodic Tasks
The following hypotheses are assumed on the tasks:
The instances of a periodic task are regularly activated at a constant rate. The
interval Ti between two consecutive activations is called period. The release times
satisfy
ri, j i j 1Ti
Often, the relative deadline equals the period Di Ti (implicit deadline), and
therefore
di , j i jTi
6 - 38
Model of Periodic Tasks
The following hypotheses are assumed on the tasks (continued):
All periodic tasks are independent; that is, there are no precedence relations and
no resource constraints.
No task can suspend itself, for example on I/O operations.
All tasks are released as soon as they arrive.
All overheads in the OS kernel are assumed to be zero.
Example:
Ti
i
i Di i ,3
ri ,1 ri , 2 si ,3 f i ,3
Ci
6 - 39
Rate Monotonic Scheduling (RM)
Assumptions:
Task priorities are assigned to tasks before execution and do not change over time
(static priority assignment).
RM is intrinsically preemptive: the currently executing job is preempted by a job of
a task with higher priority.
Deadlines equal the periods Di Ti .
6 - 40
Periodic Tasks
Example: 2 tasks, deadlines = periods, utilization = 97%
6 - 41
Rate Monotonic Scheduling (RM)
Optimality: RM is optimal among all fixed-priority assignments in the sense that
no other fixed-priority algorithm can schedule a task set that cannot be
scheduled by RM.
The proof is done by considering several cases that may occur, but the main
ideas are as follows:
A critical instant for any task occurs whenever the task is released
simultaneously with all higher priority tasks. The tasks schedulability can easily
be checked at their critical instants. If all tasks are feasible at their critical
instant, then the task set is schedulable in any other condition.
Show that, given two periodic tasks, if the schedule is feasible by an arbitrary
priority assignment, then it is also feasible by RM.
Extend the result to a set of n periodic tasks.
6 - 42
Proof of Critical Instance
Definition: A critical instant of a task is the time at which the release of a job
will produce the largest response time.
Lemma: For any task, the critical instant occurs if a job is simultaneously
released with all higher priority jobs.
•1
C2+2C1 t
6 - 43
Proof of Critical Instance
Delay may increase if 1 starts earlier:
•2
•1
C2+3C1 t
Repeating the argument for all higher priority tasks of some task 2 :
• The worst case response time of a job occurs when it
• is released simultaneously with all higher-priority jobs.
6 - 44
Proof of RM Optimality (2 Tasks)
We have two tasks 1, 2 with periods T1 < T2.
Define F= T2/T1: the number of periods of 1 fully contained in T2
6 - 45
Proof of RM Optimality (2 Tasks)
Case B: Assume RM is used prio(1) is highest:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable is feasible if
FC1+C2+min(T2–FT1, C1) T2 and C1 T1 (B)
Given tasks 1 and 2 with T1 < T2, then if the schedule is feasible by an
arbitrary fixed priority assignment, it is also feasible by RM.
6 - 46
Proof of RM Optimality (2 Tasks)
Case B: Assume RM is used prio(1) is highest:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable is feasible if
FC1+C2+min(T2–FT1, C1) T2 and C1 T1 (B)
Given tasks 1 and 2 with T1 < T2, then if the schedule is feasible by an
arbitrary fixed priority assignment, it is also feasible by RM.
6 - 47
Proof of RM Optimality (2 Tasks)
Case B: Assume RM is used prio(1) is highest:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable is feasible if
FC1+C2+min(T2–FT1, C1) T2 and C1 T1 (B)
Given tasks 1 and 2 with T1 < T2, then if the schedule is feasible by an
arbitrary fixed priority assignment, it is also feasible by RM.
6 - 48
Proof of RM Optimality (2 Tasks)
Case B: Assume RM is used prio(1) is highest:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable is feasible if
FC1+C2+min(T2–FT1, C1) T2 and C1 T1 (B)
Given tasks 1 and 2 with T1 < T2, then if the schedule is feasible by an
arbitrary fixed priority assignment, it is also feasible by RM.
6 - 49
Proof of RM Optimality (2 Tasks)
Case B: Assume RM is used prio(1) is highest:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable is feasible if
FC1+C2+min(T2–FT1, C1) T2 and C1 T1 (B)
Given tasks 1 and 2 with T1 < T2, then if the schedule is feasible by an
arbitrary fixed priority assignment, it is also feasible by RM.
6 - 50
Admittance Test
6 - 51
Rate Monotonic Scheduling (RM)
Schedulability analysis: A set of periodic tasks is schedulable with RM if
n
Ci
T n 21/ n
1
i 1 i
n
Ci
The term U denotes the processor
T
i 1 i
6 - 52
Proof of Utilization Bound (2 Tasks)
We have two tasks 1, 2 with periods T1 < T2.
Define F= T2/T1: number of periods of 1 fully contained in T2
Proof Concept: Compute upper bound on utilization U such that the task set is
still schedulable:
assign priorities according to RM;
compute upper bound Uup by increasing the computation time C2 to just
meet the deadline of 2; we will determine this limit of C2 using the results
of the RM optimality proof.
minimize upper bound with respect to other task parameters in order to
find the utilization below which the system is definitely schedulable.
6 - 53
Proof of Utilization Bound (2 Tasks)
As before:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable if FC1+C2+min(T2–FT1, C1) T2 and C1 T1
Utilization:
6 - 54
Proof of Utilization Bound (2 Tasks)
6 - 55
Proof of Utilization Bound (2 Tasks)
Minimize utilization bound w.r.t C1:
If C1 T2–FT1 then U decreases with increasing C1
If T2–FT1 C1 then U decreases with decreasing C1
Therefore, minimum U is obtained with C1 = T2–FT1 :
We now need to minimize w.r.t. G =T2/T1 where F = T2/T1 and T1 < T2. As F is
integer, we first suppose that it is independent of G = T2/T1. Then we obtain
6 - 56
Proof of Utilization Bound (2 Tasks)
Minimizing U with respect to G yields
It can easily be checked, that all other integer values for F lead to a larger upper
bound on the utilization.
6 - 57
Deadline Monotonic Scheduling (DM)
Assumptions are as in rate monotonic scheduling, but deadlines may be smaller
than the period, i.e.
Ci Di Ti
Algorithm: Each task is assigned a priority. Tasks with smaller relative deadlines will
have higher priorities. Jobs with higher priority interrupt jobs with lower priority.
n
Ci
D n 21/ n
1
i 1 i
This condition is sufficient but not necessary (in general).
6 - 58
Deadline Monotonic Scheduling (DM) - Example
n
Ci
U = 0.874 D 1 . 08 n 21/ n
1 0.757
i 1 i
1
1 10
2
1 10
3
1 10
4
1 10
6 - 59
Deadline Monotonic Scheduling (DM)
There is also a necessary and sufficient schedulability test which is computationally
more involved. It is based on the following observations:
The worst-case processor demand occurs when all tasks are released
simultaneously; that is, at their critical instances.
For each task i, the sum of its processing time and the interference imposed
by higher priority tasks must be less than or equal to Di .
A measure of the worst case interference for task i can be computed as the
sum of the processing times of all higher priority tasks released before some
time t where tasks are ordered according to m n Dm Dn :
i 1
t
I i C j
j 1 j
T
6 - 60
Deadline Monotonic Scheduling (DM)
The longest response time Ri of a job of a periodic task i is computed, at the
critical instant, as the sum of its computation time and the interference due to
preemption by higher priority tasks:
Ri Ci I i
Hence, the schedulability test needs to compute the smallest Ri that satisfies
i 1
Ri
Ri Ci C j
j 1 T j
6 - 61
Deadline Monotonic Scheduling (DM)
The longest response times Ri of the periodic tasks i can be computed iteratively
by the following algorithm:
6 - 62
DM Example
Example:
Task 1: C1 1; T1 4; D1 3
Task 2: C2 1; T2 5; D2 4
Task 3: C3 2; T3 6; D3 5
Task 4: C4 1; T4 11; D4 10
Algorithm for the schedulability test for task 4:
Step 0: R4 1
Step 1: R4 5
Step 2: R4 6
Step 3: R4 7
Step 4: R4 9
Step 5: R4 10
6 - 63
DM Example
n
Ci
U = 0.874 D 1 . 08 n 21/ n
1 0.757
i 1 i
1
1 10
2
1 10
3
1 10
4
1 10
6 - 64
EDF Scheduling (earliest deadline first)
Assumptions:
dynamic priority assignment
intrinsically preemptive
Optimality: No other algorithm can schedule a set of periodic tasks if the set that
can not be scheduled by EDF.
The proof is simple and follows that of the aperiodic case.
6 - 65
Periodic Tasks
Example: 2 tasks, deadlines = periods, utilization = 97%
6 - 66
EDF Scheduling
A necessary and sufficient schedulability test for Di Ti :
n
Ci
A set of periodic tasks is schedulable with EDF if and only if U 1
i 1 Ti
nCi
The term U denotes the average processor utilization.
i 1 Ti
6 - 67
EDF Scheduling
If the utilization satisfies U 1, then there is no valid schedule: The total
demand of computation time in interval T T1 T2 ... Tn is
n
Ci
i 1 Ti
T UT T
We will proof this fact by contradiction: Assume that deadline is missed at some
time t2 . Then we will show that the utilization was larger than 1.
6 - 68
6 - 69
EDF Scheduling
If the deadline was missed at t2 then define t1 as a time before t2 such that (a) the processor is
continuously busy in [t1, t2 ] and (b) the processor only executes tasks that have their arrival
time AND their deadline in [t1, t2 ].
Why does such a time t1 exist? We find such a t1 by starting at t2 and going backwards in time,
always ensuring that the processor only executed tasks that have their deadline before or at t2 :
Because of EDF, the processor will be busy shortly before t2 and it executes on the task that has
deadline at t2.
Suppose that we reach a time such that shortly before the processor works on a task with deadline
after t2 or the processor is idle, then we found t1: We know that there is no execution on a task with
deadline after t2 .
But it could be in principle, that a task that arrived before t1 is executing in [t1, t2 ].
If the processor is idle before t1, then this is clearly not possible due to EDF (the processor is not idle, if
there is a ready task).
If the processor is not idle before t1, this is not possible as well. Due to EDF, the processor will always
work on the task with the closest deadline and therefore, once starting with a task with deadline after t2
all task with deadlines before t2 are finished.
6 - 70
6 - 71
EDF Scheduling
Within the interval t1 ,t2 the total computation time demanded by the periodic
tasks is bounded by
n
t2 t1 n
t 2 t1
C p (t1 , t 2 ) Ci Ci t2 t1 U
i 1 Ti i 1 Ti
t2 t1 C p t1 , t2 t2 t1 U U 1
6 - 72
Periodic Task Scheduling
Example: 2 tasks, deadlines = periods, utilization = 97%
6 - 73
Real-Time Scheduling of Mixed Task Sets
6 - 74
Problem of Mixed Task Sets
In many applications, there are aperiodic as well as periodic tasks.
Periodic tasks: time-driven, execute critical control activities with hard timing
constraints aimed at guaranteeing regular activation rates.
Aperiodic tasks: event-driven, may have hard, soft, non-real-time requirements
depending on the specific application.
Sporadic tasks: Offline guarantee of event-driven aperiodic tasks with critical
timing constraints can be done only by making proper assumptions on the
environment; that is by assuming a maximum arrival rate for each critical event.
Aperiodic tasks characterized by a minimum interarrival time are called
sporadic.
6 - 75
Background Scheduling
Background scheduling is a simple solution for RM and EDF:
Processing of aperiodic tasks in the background, i.e. execute if there are no
pending periodic requests.
Periodic tasks are not affected.
Response of aperiodic tasks may be prohibitively long and there is no possibility to
assign a higher priority to them.
Example:
6 - 76
Background Scheduling
Example (rate monotonic periodic schedule):
6 - 77
Rate-Monotonic Polling Server
Idea: Introduce an artificial periodic task whose purpose is to service aperiodic
requests as soon as possible (therefore, “server”).
Function of polling server (PS)
At regular intervals equal to Ts , a PS task is instantiated. When it has the highest
current priority, it serves any pending aperiodic requests within the limit of its
capacity Cs .
If no aperiodic requests are pending, PS suspends itself until the beginning of the
next period and the time originally allocated for aperiodic service is not preserved
for aperiodic execution.
Its priority (period!) can be chosen to match the response time requirement for
the aperiodic tasks.
Disadvantage: If an aperiodic requests arrives just after the server has
suspended, it must wait until the beginning of the next polling period.
6 - 78
Rate-Monotonic Polling Server
Example:
Cs n Ci
Ts i 1 Ti
(n 1) 21/( n1) 1
6 - 80
Rate-Monotonic Polling Server
Guarantee the response time of aperiodic requests:
Assumption: An aperiodic task is finished before a new aperiodic request
arrives.
Computation time Ca , deadline Da
Sufficient schedulability test:
If the server task
Ca has the highest
(1 )Ts Da priority there is a
Cs necessary test also.
6 - 81
EDF – Total Bandwidth Server
Total Bandwidth Server:
When the kth aperiodic request arrives at time t = rk, it receives a deadline
Ck
d k max( rk , d k 1 )
Us
where Ck is the execution time of the request and Us is the server utilization
factor (that is, its bandwidth). By definition, d0=0.
Once a deadline is assigned, the request is inserted into the ready queue of
the system as any other periodic instance.
6 - 82
6 - 83
EDF – Total Bandwidth Server
Example:
U p 0.75, U s 0.25, U p U s 1
6 - 84
EDF – Total Bandwidth Server
Schedulability test:
Given a set of n periodic tasks with processor utilization Up and a total bandwidth
server with utilization Us, the whole set is schedulable by EDF if and only if
U p Us 1
Proof:
In each interval of time [t1, t2 ] , if Cape is the total execution time demanded by
aperiodic requests arrived at t1 or later and served with deadlines less or equal to
t2, then
Cape (t2 t1 )U s
6 - 85
EDF – Total Bandwidth Server
If this has been proven, the proof of the schedulability test follows closely that of the
periodic case.
Proof of lemma:
k2
C ape C
k k1
k
k2
U s (d k max( rk , d k 1 ))
k k1
U s d k 2 max( rk1 , d k1 1 )
U s (t 2 t1 )
6 - 86
Embedded Systems
Lothar Thiele
Network Processor:
Programmable Processor Optimized to
Perform Packet Processing
packet
Best Effort Flows (BE) processor
voice processing
F2
Input ports
F3
Fn
Best effort
flows CPU Scheduler
Swiss Federal Computer Engineering
Institute of Technology 6a - 5 and Networks Laboratory
CPU Scheduling
First Schedule RT, then BE (background scheduling)
Overly pessimistic
F2
F3
Fn
WFQ
Best effort
flows
deadline
RT flows
© Lothar Thiele
Computer Engineering and Networks Laboratory
Where we are …
7-2
Ressource Sharing
7-3
Resource Sharing
Examples of shared resources: data structures, variables, main memory area,
file, set of registers, I/O unit, … .
Many shared resources do not allow simultaneous accesses but require mutual
exclusion. These resources are called exclusive resources. In this case, no two
threads are allowed to operate on the resource at the same time.
There are several methods available to protect exclusive resources, for example
disabling interrupts and preemption or
using concepts like semaphores
and mutex that put threads into the
blocked state if necessary.
7-4
Protecting Exclusive Resources using Semaphores
Each exclusive resource Ri
must be protected by a different
semaphore Si . Each critical
section operating on a resource
must begin with a wait(Si)
primitive and end with a
signal(Si) primitive.
All tasks blocked on the same resource are kept in a queue associated with the
semaphore. When a running task executes a wait on a locked semaphore, it
enters a blocked state, until another tasks executes a signal primitive that
unlocks the semaphore.
7-5
Example FreeRTOS (ES-Lab)
To ensure data consistency is maintained at all times access to a resource that is
shared between tasks, or between tasks and interrupts, must be managed using a
‘mutual exclusion’ technique.
This kind of critical sections must be kept very short, otherwise they will adversely
affect interrupt response times.
7-6
Example FreeRTOS (ES-Lab)
Another possibility is to use mutual exclusion: In FreeRTOS, a mutex is a special type of
semaphore that is used to control access to a resource that is shared between two or
more tasks. A semaphore that is used for mutual exclusion must always be returned:
For a task to access the resource legitimately, it must first successfully ‘take’
the token (be the token holder). When the token holder has finished with the
resource, it must ‘give’ the token back.
Only when the token has been returned can another task successfully take the
token, and then safely access the same shared resource.
7-7
Example FreeRTOS (ES-Lab)
7-8
Example FreeRTOS (ES-Lab)
some defined constant for infinite timeout;
otherwise, the function would return if the
Example: create mutex semaphore mutex was not available for the specified time
7 - 10
Priority Inversion (1)
Unavoidable blocking:
7 - 11
Priority Inversion (2)
Priority Inversion:
[But97, S.184]
7 - 12
Solutions to Priority Inversion
Disallow preemption during the execution of all critical sections. Simple approach,
but it creates unnecessary blocking as unrelated tasks may be blocked.
7 - 13
Resource Access Protocols
Basic idea: Modify the priority of those tasks that cause blocking. When a task Ji
blocks one or more higher priority tasks, it temporarily assumes a higher priority.
Specific Methods:
Priority Inheritance Protocol (PIP), for static priorities
Priority Ceiling Protocol (PCP), for static priorities
Stack Resource Policy (SRP),
for static and dynamic priorities
others …
7 - 14
Priority Inheritance Protocol (PIP)
Assumptions:
n tasks which cooperate through m shared resources; fixed priorities, all
critical sections on a resource begin with a wait(Si) and end with a
signal(Si) operation.
Basic idea:
When a task Ji blocks one or more higher priority tasks, it temporarily assumes
(inherits) the highest priority of the blocked tasks.
Terms:
We distinguish a fixed nominal priority Pi and an active priority pi larger or
equal to Pi. Jobs J1, …Jn are ordered with respect to nominal priority where J1
has highest priority. Jobs do not suspend themselves.
7 - 15
Priority Inheritance Protocol (PIP)
Algorithm:
Jobs are scheduled based on their active priorities. Jobs with the same priority are
executed in a FCFS discipline.
When a job Ji tries to enter a critical section and the resource is blocked by a lower
priority job, the job Ji is blocked. Otherwise it enters the critical section.
When a job Ji is blocked, it transmits its active priority to the job Jk that holds the
semaphore. Jk resumes and executes the rest of its critical section with a priority
pk=pi (it inherits the priority of the highest priority of the jobs blocked by it).
When Jk exits a critical section, it unlocks the semaphore and the highest priority
job blocked on that semaphore is awakened. If no other jobs are blocked by Jk,
then pk is set to Pk, otherwise it is set to the highest priority of the jobs blocked by
Jk.
Priority inheritance is transitive, i.e. if 1 is blocked by 2 and 2 is blocked by 3, then
3 inherits the priority of 1 via 2.
7 - 16
Priority Inheritance Protocol (PIP)
Example:
Direct Blocking: higher-priority job tries to acquire a resource held by a lower-priority job
Push-through Blocking: medium-priority job is blocked by a lower-priority job that has
.
a a a
[But97, S. 189]
7 - 18
Priority Inheritance Protocol (PIP)
Example of transitive priority inheritance:
J1 blocked by J2, J2 blocked by J3.
J3 inherits priority from J1 via J2.
[But97, S. 190]
7 - 19
Priority Inheritance Protocol (PIP)
Still a Problem: Deadlock
…. but there are other protocols like the Priority Ceiling Protocol …
[But97, S. 200]
7 - 20
The MARS Pathfinder Problem (1)
“But a few days into the mission, not long after Pathfinder started gathering
meteorological data, the spacecraft began experiencing total system resets, each
resulting in losses of data.
7 - 21
The MARS Pathfinder Problem (2)
“VxWorks provides preemptive priority scheduling of threads. Tasks on the
Pathfinder spacecraft were executed as threads with priorities that were assigned
in the usual manner reflecting the relative urgency of these tasks.”
A bus management task ran frequently with high priority to move certain kinds of
data in and out of the information bus. Access to the bus was synchronized with
mutual exclusion locks (mutexes).”
7 - 22
The MARS Pathfinder Problem (3)
The meteorological data gathering task ran as an infrequent, low priority thread.
When publishing its data, it would acquire a mutex, do writes to the bus, and release
the mutex.
The spacecraft also contained a communications task that ran with medium priority.
7 - 23
The MARS Pathfinder Problem (4)
“Most of the time this combination worked fine.
However, very infrequently it was possible for an interrupt to occur that caused the
(medium priority) communications task to be scheduled during the short interval
while the (high priority) information bus thread was blocked waiting for the (low
priority) meteorological data thread. In this case, the long-running communications
task, having higher priority than the meteorological task, would prevent it from
running, consequently preventing the blocked information bus task from running.
After some time had passed, a watchdog timer would go off, notice that the data
bus task had not been executed for some time, conclude that something had gone
drastically wrong, and initiate a total system reset. This scenario is a classic case of
priority inversion.”
7 - 24
Priority Inversion on Mars
Priority inheritance also solved the Mars Pathfinder problem: the VxWorks
operating system used in the pathfinder implements a flag for the calls to mutex
primitives. This flag allows priority inheritance to be set to “on”. When the
software was shipped, it was set to “off”.
7 - 25
Timing Anomalies
7 - 26
Timing Anomaly
Suppose, a real-time system works correctly with a given processor architecture.
Now, you replace the processor with a faster one.
Are real-time constraints still satisfied?
Unfortunately, this is not true in general. Monotonicity does not hold in general,
i.e., making a part of the system operate faster does not lead to a faster system
execution. In other words, many software and systems architectures are fragile.
There are usually many timing anomalies in a system, starting from the
microarchitecture (caches, pipelines, speculation) via single processor scheduling
to multiprocessor scheduling.
7 - 27
Single Processor with Critical Sections
Example: Replacing the
processor with one
that is twice as fast
leads to a deadline
miss.
7 - 28
Multiprocessor Example (Richard’s Anomalies)
Example: 9 tasks with precedence constraints and the shown execution times. Scheduling
is preemptive fixed priority, where lower numbered tasks have higher priority than higher
numbers. Assignment of tasks to processors is greedy.
optimal
schedule on a
3-processor
architecture
7 - 29
Multiprocessor Example (Richard’s Anomalies)
Example: 9 tasks with precedence constraints and the shown execution times. Scheduling
is preemptive fixed priority, where lower numbered tasks have higher priority than higher
numbers. Assignment of tasks to processors is greedy.
optimal
schedule on a
3-processor
architecture
slower on a
4-processor
architecture!
7 - 30
Multiprocessor Example (Richard’s Anomalies)
Example: 9 tasks with precedence constraints and the shown execution times. Scheduling
is preemptive fixed priority, where lower numbered tasks have higher priority than higher
numbers. Assignment of tasks to processors is greedy.
optimal
schedule on a
3-processor
architecture
slower if all
computation
times are
reduced by 1!
7 - 31
Multiprocessor Example (Richard’s Anomalies)
Example: 9 tasks with precedence constraints and the shown execution times. Scheduling
is preemptive fixed priority, where lower numbered tasks have higher priority than higher
numbers. Assignment of tasks to processors is greedy.
optimal
schedule on a
3-processor
architecture
slower if
some
precedences
are removed!
7 - 32
Communication and Synchronization
7 - 33
Communication Between Tasks
Problem: the use of shared memory for implementing communication between
tasks may cause priority inversion and blocking.
7 - 34
Communication Mechanisms
Synchronous communication:
Whenever two tasks want to communicate they must be synchronized for a
message transfer to take place (rendez-vous).
They have to wait for each other, i.e. both must be at the same time ready to do
the data exchange.
Problem:
In case of dynamic real-time systems, estimating the maximum blocking time
for a process rendez-vous is difficult.
Communication always needs synchronization. Therefore, the timing of the
communication partners is closely linked.
7 - 35
Communication Mechanisms
Asynchronous communication:
Tasks do not necessarily have to wait for each other.
The sender just deposits its message into a channel and continues its execution;
similarly the receiver can directly access the message if at least a message has
been deposited into the channel.
More suited for real-time systems than synchronous communication.
Mailbox: Shared memory buffer, FIFO-queue, basic operations are send and
receive, usually has a fixed capacity.
Problem: Blocking behavior if the channel is full or empty; alternative approach is
provided by cyclical asynchronous buffers or double buffering.
sender receiver
mailbox
7 - 36
Example: FreeRTOS (ES-Lab)
7 - 37
Example: FreeRTOS (ES-Lab)
Creating a queue:
returns handle to the maximum number of items that the queue the size in bytes of
created queue being created can hold at any one time each data item
a pointer to the
data to be copied
into the queue
returns pdPASS if
item was successfully
the maximum amount of time the task
added to queue
should remain in the Blocked state to wait
for space to become available on the queue
7 - 38
Example: FreeRTOS (ES-Lab)
Receiving item from a queue:
a pointer to the
memory into which
the received data
returns pdPASS if data
will be copied
was successfully read
the maximum amount of time the task
from the queue
should remain in the Blocked state to wait
for data to become available on the queue
Example:
Two sending tasks with equal priority 1 and one receiving task with priority 2.
FreeRTOS schedules tasks with equal priority in a round-robin manner: A blocked
or preempted task is put to the end of the ready queue for its priority. The same
holds for the currently running task at the expiration of the time slice.
7 - 39
Example: FreeRTOS (ES-Lab)
Example cont.:
sender 1
queue receiver
sender 2
7 - 40
Communication Mechanisms
Cyclical Asynchronous Buffers (CAB):
Non-blocking communication between tasks.
A reader gets the most recent message put into the CAB. A message is not
consumed (that is, extracted) by a receiving process but is maintained until
overwritten by a new message.
As a consequence, once the first message has been put in a CAB, a task can never
be blocked during a receive operation. Similarly, since a new message overwrites
the old one, a sender can never be blocked.
Several readers can simultaneously read a single message from the CAB.
writing reading
7 - 41
Embedded Systems
8. Hardware Components
© Lothar Thiele
Computer Engineering and Networks Laboratory
Where we are …
8-2
Do you Remember ?
8-3
8-4
High-Level Physical View
8-5
High-Level Physical View
8-6
Implementation Alternatives
General-purpose processors
Programmable hardware
• FPGA (field-programmable gate arrays)
8-7
Energy Efficiency
8-8
Topics
General Purpose Processors
System Specialization
Application Specific Instruction Sets
Micro Controller
Digital Signal Processors and VLIW
Programmable Hardware
ASICs
System-on-Chip
8-9
General-Purpose Processors
High performance
Highly optimized circuits and technology
Use of parallelism
superscalar: dynamic scheduling of instructions
super-pipelining: instruction pipelining, branch prediction, speculation
complex memory hierarchy
Not suited for real-time applications
Execution times are highly unpredictable because of intensive resource sharing
and dynamic decisions
Properties
Good average performance for large application mix
High power consumption
8 - 10
General-Purpose Processors
Multicore Processors
Potential of providing higher execution performance by exploiting parallelism
8 - 11
Multicore Examples
48 cores
4 cores
8 - 12
Multicore Examples
8 - 13
Implementation Alternatives
General-purpose processors
Programmable hardware
• FPGA (field-programmable gate arrays)
8 - 14
Topics
General Purpose Processors
System Specialization
Application Specific Instruction Sets
Micro Controller
Digital Signal Processors and VLIW
Programmable Hardware
ASICs
Heterogeneous Architectures
8 - 15
System Specialization
The main difference between general purpose highest volume microprocessors
and embedded systems is specialization.
8 - 16
Embedded Multicore Example
Recent development:
Specialize multicore processors towards real-time processing and low power
consumption
Target domains:
8 - 17
Example: Code-size Efficiency
RISC (Reduced Instruction Set Computers) machines designed for run-time-, not
for code-size-efficiency.
Compression techniques: key idea
(de)compressor
8 - 18
Example: Multimedia-Instructions
• Multimedia instructions exploit that many registers, adders etc. are
quite wide (32/64 bit), whereas most multimedia data types are
narrow (e.g. 8 bit per color, 16 bit per audio sample per channel).
• Idea: Several values can be stored per register and added in parallel.
+
4 additions per instruction; carry
disabled at word boundaries.
8 - 19
Example: Heterogeneous Processor Registers
Example (ADSP 210x):
P
D
AX AY MX MY
Address- AF MF
registers
A0, A1, A2 ..
+,-,..
*
Address
AR
+,-
generation
unit (AGU) MR
8 - 20
Example: Multiple Memory Banks
P
D
AX AY MX MY
Address- AF MF
registers
A0, A1, A2 ..
+,-,..
*
Address
AR
+,-
generation
unit (AGU) MR
8 - 21
Example: Address Generation Units
• Data memory can only be fetched with
Example (ADSP 210x): address contained in register file A, but
its update can be done in parallel with
operation in main data path (takes
effectively 0 time).
• Register file A contains several
precomputed addresses A[i].
• There is another register file M that
contains modification values M[j].
• Possible updates:
M[j] := ‘immediate’
A[i] := A[i] ± M[j]
A[i] := A[i] ± 1
A[i] := A[i] ± ‘immediate’
A[i] := ‘immediate’
8 - 22
Topics
System Specialization
Application Specific Instruction Sets
Micro Controller
Digital Signal Processors and VLIW
Programmable Hardware
ASICs
Heterogeneous Architectures
8 - 23
Microcontroller
Control-dominant applications
supports process scheduling
and synchronization
preemption (interrupt),
context switch
short latency times
timers
A/D converter
interrupt controller
8 - 25
Topics
System Specialization
Application Specific Instruction Sets
Micro Controller
Digital Signal Processors and VLIW
Programmable Hardware
ASICs
Heterogeneous Architectures
8 - 26
Data Dominated Systems
Streaming oriented systems with mostly periodic behavior
Underlying model of computation is often a signal flow graph or data flow graph:
B f1 B f2 B f3 B
B: buffer
B f2
8 - 27
Digital Signal Processor
optimized for data-flow applications
suited for simple control flow
parallel hardware units (VLIW)
specialized instruction set
high data throughput
zero-overhead loops
specialized memory
8 - 29
Explicit Parallelism Instruction Computers (EPIC)
The TMS320C62xx VLIW Processor as an example of EPIC:
31 0 31 0 31 0 31 0 31 0 31 0 31 0
0 1 1 0 1 1 0
Instr. A Instr. B Instr. C Instr. D Instr. E Instr. F Instr. G
Cycle Instruction
1 A
2 B C D
3 E F G
8 - 30
Example Infineon
Processor core for car mirrors
Infineon
8 - 31
Example NXP Trimedia VLIW
VLIW
MIPS
8 - 32
Topics
System Specialization
Application Specific Instruction Sets
Micro Controller
Digital Signal Processors and VLIW
Programmable Hardware
ASICs
System-on-Chip
8 - 33
FPGA – Basic Strucutre
Logic Units
I/O Units
Connections
8 - 34
Floor-plan of VIRTEX II FPGAs
8 - 35
Virtex Logic
Cell
8 - 36
Example Virtex-6
Combination of flexibility (CLB’s), Integration and performance (heterogeneity of
hard-IP Blocks)
clock distribution
logic (CLB)
interfaces
(PCI, high speed)
memory (RAM)
8 - 37
XILINX Virtex UltraScale
8 - 39
Application Specific Circuits (ASICS)
Custom-designed circuits are necessary
if ultimate speed or
energy efficiency is the goal and
large numbers can be sold.
Approach suffers from
long design times,
lack of flexibility
(changing standards) and
high costs
(e.g. Mill. $ mask costs).
8 - 40
Topics
System Specialization
Application Specific Instruction Sets
Micro Controller
Digital Signal Processors and VLIW
Programmable Hardware
ASICs
Heterogeneous Architectures
8 - 41
Example: Heterogeneous Architecture
8 - 42
Example: Heterogeneous Architecture
Hexagon DSP Snapdragon 835
(Galaxy S8)
8 - 43
Example: ARM big.LITTLE Architecture
8 - 44
Embedded Systems
9. Power and Energy
© Lothar Thiele
Computer Engineering and Networks Laboratory
Lecture Overview
9-2
General Remarks
9-3
Power and Energy Consumption
Statements that are true since a decade or longer:
„Power is considered as the most important constraint in embedded
systems.” [in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW]
•“Power demands are increasing rapidly, yet battery capacity cannot
keep up.” [in Diztel et al.: Power-Aware Architecting for data-dominated applications, 2007, Springer]
9-4
Some Trends
9-5
Implementation Alternatives
General-purpose processors
Programmable hardware
9-6
Energy Efficiency
It is necessary to
optimize HW and SW.
Use heterogeneous
architectures in order to
adapt to required performance
and to class of application.
Apply specialization techniques.
•© Hugo De Man,
IMEC, Philips, 2007
9-7
Power and Energy
9-8
Power and Energy
•P
•t
•P
•t
•P
•t
•P
•t
9 - 13
Power Consumption of a CMOS Gate
subthreshold (ISUB), junction (IJUNC) and
gate-oxide (IGATE) leakage
IJUNC
9 - 14
Power Consumption of a CMOS Processors
Main sources:
Dynamic power consumption
charging and discharging capacitors
Short circuit power consumption:
short circuit path between supply rails
during switching
Leakage and static power
gate-oxide/subthreshold/junction
leakage
becomes one of the major factors
due to shrinking feature sizes in [J. Xue, T. Li, Y. Deng, Z. Yu, Full-chip leakage analysis for 65 nm CMOS
technology and beyond, Integration VLSI J. 43 (4) (2010) 353–364]
semiconductor technology
9 - 15
Reducing Static Power - Power Supply Gating
Power gating is one of the most effective ways of minimizing static power consumption
(leakage)
Cut-off power supply to inactive units/components
9 - 16
Dynamic Voltage Scaling (DVS)
Average power consumption of CMOS •Delay of CMOS circuits:
circuits (ignoring leakage):
•: supply voltage
•: supply voltage
•: switching activity
•: threshold voltage
•: load capacity
•: clock frequency
9 - 18
Techniques to Reduce Dynamic Power
9 - 19
Parallelism
9 - 20
Pipelining
Vdd Vdd/2
fmax fmax/2
Vdd/2
fmax/2
9 - 21
VLIW (Very Long Instruction Word) Architectures
Large degree of parallelism
many parallel computational units, (deeply) pipelined
Simple hardware architecture
explicit parallelism (parallel instruction set)
parallelization is done offline (compiler) all 4 instructions are
executed in parallel
9 - 22
Example: Qualcomm Hexagon
•Hexagon DSP •Snapdragon 835
(Galaxy S8)
9 - 23
Dynamic Voltage and Frequency Scaling -
Optimization
9 - 24
Dynamic Voltage and Frequency Scaling (DVFS)
energy per cycle reduce voltage -> reduce energy per task
9 - 26
Example: Dynamic Voltage and Frequency Scaling
9 - 27
Example: DVFS – Complete Task as Early as Possible
9 - 28
Example: DVFS – Use Two Voltages
9 - 29
Example: DVFS – Use One Voltage
Ec = 109 x 25 x 10-9
= 25 [J]
9 - 30
DVFS: Optimal Strategy
•Vdd •P(y) Execute task in fixed time T
•y
with variable voltage Vdd(t):
•P(x)
•x •gate delay:
•invariant:
9 - 31
DVFS: Optimal Strategy
•Vdd •P(y) Execute task in fixed time T
•y •P(z)
•z with variable voltage Vdd(t):
•P(x)
•x •gate delay:
•invariant:
•z ∙ T = a ∙ T∙ x + (1-a) ∙ T ∙ y •invariant:
z = a ∙ x + (1-a) ∙ y
case A: execute at voltage x for T ∙ a time units and at
voltage y for (1-a) ∙ T time units;
energy consumption: T ∙ ( P(x) ∙ a + P(y) ∙ (1-a) )
•P(x)
•average
•P(z)
9 - 34
DVFS: Real-Time Offline Scheduling on One Processor
Let us model a set of independent tasks as follows:
We suppose that a task vi ϵ V
requires ci computation time at normalized processor frequency 1
arrives at time ai
has (absolute) deadline constraint di
How do we schedule these tasks such that all these tasks can be finished no
later than their deadlines and the energy consumption is minimized?
YDS Algorithm from “A Scheduling Model for Reduce CPU Energy”, Frances
Yao, Alan Demers, and Scott Shenker, FOCS 1995.”
10,14,6
Define intensity G([z, z‘]) in some time interval [z, z‘]:
11,17,2
average accumulated execution time of all tasks that
12,17,2
have arrival and deadline in [z, z‘] relative to the length
of the interval z‘-z ai,di,ci
9 - 36
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 1: Execute jobs in the interval with the highest intensity by using the earliest-deadline first
schedule and running at the intensity as the frequency.
1 5 3,6,5
2 6 2,6,3
4
3 7 0,8,2
0 4 8 12 16 •time 6,14,6
9 - 37
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 1: Execute jobs in the interval with the highest intensity by using the earliest-deadline first
schedule and running at the intensity as the frequency.
1 5 3,6,5
2 6 2,6,3
4
3 7 0,8,2
0 4 8 12 16 •time 6,14,6
9 - 38
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 1: Execute jobs in the interval with the highest intensity by using the earliest-deadline first
schedule and running at the intensity as the frequency.
3,6,5
1 5
2 6 2,6,3
4
0,8,2
3 7
6,14,6
0 4 8 12 16 •time
10,14,6
11,17,2
12,17,2
2 1
ai,di,ci
0 4 8 12 16
9 - 39
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 2: Adjust the arrival times and deadlines by excluding the possibility to execute at the previous
critical intervals.
1 5
2 6
4 0,8,2
3 7 6,14,6
•time
0 4 8 12 16 10,14,6
11,17,2
12,17,2
ai,di,ci
9 - 40
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 2: Adjust the arrival times and deadlines by excluding the possibility to execute at the previous
critical intervals.
1 5
2 6
4 0,8,2 0,4,2
3 7 6,14,6 2,10,6
•time
0 4 8 12 16 10,14,6 6,10,6
11,17,2 7,13,2
5 12,17,2 8,13,2
6
4 ai,di,ci
3 7
0 4 8 12 16 •time
9 - 41
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 3: Run the algorithm for the revised input again
5 0,4,2
6
2,10,6
4
3 7 6,10,6
0 4 8 12 16 •time 7,13,2
9 - 42
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 3: Run the algorithm for the revised input again
5 0,4,2
6
2,10,6
4
3 7 6,10,6
0 4 8 12 16 •time 7,13,2
9 - 43
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 3: Run the algorithm for the revised input again
5 0,4,2
6
2,10,6
4
3 7 6,10,6
0 4 8 12 16 •time 7,13,2
4 5
0 4 8 12 16 •time
9 - 44
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 3: Run the algorithm for the revised input again
Step 4: Put pieces together
frequency
0,2,2 0,2,2
3 2 1 4 5 6 7
•time
0 4 8 12 16
v1 v2 v3 v4 v5 v6 v7
frequency 2 2 1 1.5 1.5 4/3 4/3
9 - 45
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3
1 0,8,2
3
0 4 8 12 16 time
ai,di,ci
9 - 46
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3
2 2,6,3
1 0,8,2
3 2 3
0 4 8 12 16 time
ai,di,ci
9 - 47
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3 3,6,5
2 2,6,3
1 2 1
0,8,2
3 2 3
0 4 8 12 16 time
9 - 48
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3 3,6,5
2 2,6,3
1 2 1
0,8,2
2 3 4
3 6,14,6
0 4 8 12 16 time
9 - 49
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3 3,6,5
2 2,6,3
1 2 1 4 5
0,8,2
2 3 4
3 6,14,6
0 4 8 12 16 time
10,14,6
Continuously update to the best schedule for all arrived tasks:
Time 0: task v3 is executed at 2/8
Time 2: task v2 arrives
G([2,6]) = ¾, G([2,8]) = 4.5/6=3/4 => execute v8 , v2 at ¾
Time 3: task v1 arrives
G([3,6]) = (5+3-3/4)/3=29/12, G([3,8]) < G([3,6]) => execute v2 and v1 at 29/12
Time 6: task v4 arrives
ai,di,ci
G([6,8]) = 1.5/2, G([6,14]) = 7.5/8 => execute v3 and v4 at 15/16
Time 10: task v5 arrives
G([10,14]) = 39/16 => execute v4 and v5 at 39/16
9 - 50
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3 3,6,5
2 2,6,3
1 2 1 4 5
0,8,2
4 6 7
2 3
3 6,14,6
0 4 8 12 16 time
10,14,6
Continuously update to the best schedule for all arrived tasks:
Time 0: task v3 is executed at 2/8 11,17,2
Time 2: task v2 arrives
G([2,6]) = ¾, G([2,8]) = 4.5/6=3/4 => execute v8 , v2 at ¾ 12,17,2
Time 3: task v1 arrives
G([3,6]) = (5+3-3/4)/3=29/12, G([3,8]) < G([3,6]) => execute v2 and v1 at 29/12
Time 6: task v4 arrives
ai,di,ci
G([6,8]) = 1.5/2, G([6,14]) = 7.5/8 => execute v3 and v4 at 15/16
Time 10: task v5 arrives
G([10,14]) = 39/16 => execute v4 and v5 at 39/16
Time 11 and Time 12
The arrival of v6 and v7 does not change the critical interval
Time 14:
G([14,17]) = 4/3 => execute v6 and v7 at 4/3
9 - 51
Remarks on the YDS Algorithm
Offline
The algorithm guarantees the minimal energy consumption while satisfying the
timing constraints
The time complexity is O(N3), where N is the number of tasks in V
Finding the critical interval can be done in O(N2)
The number of iterations is at most N
Exercise:
For periodic real-time tasks with deadline=period, running at constant speed with
100% utilization under EDF has minimum energy consumption while satisfying the
timing constraints.
Online
Compared to the optimal offline solution, the on-line schedule uses at most 27
times of the minimal energy consumption.
9 - 52
Dynamic Power Management
9 - 53
Dynamic Power Management (DPM)
• Dynamic power management tries to assign optimal
power saving states during program execution
• DPM requires hardware and software support
power states
Tsd: shutdown delay Twu: wakeup delay
Tw: waiting time
Enter an inactive state is beneficial only if the waiting time is longer than the
break-even time
Assumptions for the calculation:
No performance penalty is tolerated.
An ideal power manager that
has the full knowledge of the future
workload trace. On the previous slide,
we supposed that the power manager
has no knowledge about the future.
9 - 56
Break-Even Time
busy waiting busy
state transition application states
run sleep run
power states
Time constraint:
9 - 57
Break-Even Time
busy waiting busy
state transition application states
run sleep run
power states
Time constraint:
9 - 58
Power Modes in MSP432 (Lab)
9 - 60
Battery-Operated Systems and Energy Harvesting
9 - 61
Embedded Systems in the Extreme - Permasense
9 - 62
Embedded Systems
© Lothar Thiele
Computer Engineering and Networks Laboratory
64
Reasons for Battery-Operated Devices and Harvesting
Battery operation:
no continuous power source available
mobility
Energy harvesting:
prolong lifetime of battery-operated devices
infinite lifetime using rechargeable batteries
autonomous operation
Voltage
Stabilization
9 - 67
Typical Power Circuitry – Maximum Power Point Tracking
U/I curves of a typical solar cell: simple tracking algorithm (assume constant illumination) :
start new iteration k: = k+1
yes no yes
V(k) > V(k-1) ? V(k) > V(k-1) ?
red: current for different light intensities
blue: power for different light intensities set V(k+1) = V(k) + Δ
grey: maximal power set V(k+1) = V(k) - Δ
tracking: determine optimal impedance
seen by the solar panel end iteration k
9 - 68
Maximal Power Point Tracking
9 - 69
Maximal Power Point Tracking
9 - 70
Maximal Power Point Tracking
9 - 71
Maximal Power Point Tracking
9 - 72
Maximal Power Point Tracking
9 - 73
Typical Challenge in (Solar) Harvesting Systems
Challenges:
What is the optimal maximum capacity of the battery?
What is the optimal area of the solar cell?
How can we control the application such that a continuous system operation is
possible, even under a varying input energy (summer, winter, clouds)?
Example of a solar energy trace:
9 - 74
Example: Application Control
Scenario:
energy flow
energy source energy storage
information
flow
energy estimator controller consumer
The controller can adapt the service of the consumer device, for example the
sampling rate for its sensors or the transmission rate of information. As a result,
the power consumption changes proportionally.
Precondition for correctness of application control: Never run out of energy.
Example for optimality criterion: Maximize the lowest service of (or
equivalently, the lowest energy flow to) the consumer.
9 - 75
Application Control
energy capacity B
Formal Model:
p(t) u(t) discrete time t
energy source energy storage
b(t)
u(t)
energy estimator controller consumer
9 - 78
Application Control
Theorem: Given a use function u*(t), such that the system never enters a
failure state. If u*(t) is optimal with respect to maximizing the minimal used energy
among all use functions and maximizes the utility U(t, T), then the following
relations hold for all :
empty battery
full battery
Sketch of a proof: First, let us show that a consequence of the above theorem is
true (just reverting the relations):
In other words, as long as the battery is neither full nor empty, the optimal use
function does not change.
9 - 79
Application Control
Proof sketch cont.:
9 - 80
Application Control
Proof sketch cont.:
suppose we change
the use function
locally from being
constant such that
the overall battery
state does not change
or equivalently
9 - 82
Application Control
Proof sketch cont.: Now we show that for all
or equivalently
feasible, but
better choice of
use function with
9 - 83
Application Control
9 - 84
Application Control
How can we efficiently compute an optimal use function?
There are several options available as we just need to solve a convex optimization
problem.
A simple but inefficient possibility is to convert the problem into a linear program.
At first suppose that the utility is simply
9 - 85
u(t)
p(t) b(t)
4
0 1 2 3 4 5 6=T
9 - 86
u(t) u(t)
p(t) b(t) p(t) b(t)
4 4
3 3
2 2
1 1
0 1 2 3 4 5 6=T 0 1 2 3 4 5 6=T
9 - 87
Application Control
But what happens if the estimation of the future incoming energy is not correct?
If it would be correct, then we would just compute the whole future application
control now and would not change anything anymore.
This will not work as errors will accumulate and we will end up with many
infeasible situations, i.e., the battery is completely empty and we are forced to
stop the application.
Possibility: Finite horizon control
At time t, we compute the optimal control (see previous slides) using the currently
available battery state b(t) with predictions for all and
.
From the computed optimal use function for all we just take the
first use value u(t) in order to control the application.
At the next time step, we take as initial battery state the actual state; therefore, we
take mispredictions into account. For the estimated future energy, we also take the
new estimations.
9 - 88
Application Control
Finite horizon control:
9 - 89
Application Control using Finite Horizon
estimated input
energy
still energy
breakdown
due to misprediction
9 - 90
Application Control using Finite Horizon
more pessimistic
prediction
simplified
optimization
using a look-
up-table
[not covered]
9 - 91
Remember: What you got some time ago …
10 - 1
What we told you: Be careful and please do not …
10 - 2
Return the boards at the
embedded systems exam!
10 - 3
Embedded Systems
10. Architecture Synthesis
© Lothar Thiele
Computer Engineering and Networks Laboratory
Lecture Overview
10 - 5
Implementation Alternatives
General-purpose processors
Programmable hardware
10 - 6
Architecture Synthesis
Determine a hardware architecture that efficiently executes a given algorithm.
10 - 9
Specification
Formal specification of the desired functionality and the structure (architecture)
of an embedded systems is a necessary step for using computer aided design
methods.
There exist many different formalisms and models of computation, see also the
models used for real-time software and general specification models for the
whole system.
Now, we will introduce some relevant models for architecture level (hardware)
synthesis.
10 - 10
Task Graph or Dependence Graph (DG)
Sequence
constraint Nodes are assumed to be a
„program“ described in
some programming
language, e.g. C or Java; or
just a single operation.
10 - 11
Dependence Graph
A dependence graph describes order relations for the execution of single
operations or tasks. Nodes correspond to tasks or operations, edges correspond
to relations („executed after“).
10 - 13
Example of a Dependence Graph
10 - 14
Marked Graph (MG)
A marked graph G (V , A, del ) consists of
nodes (actors) v V
edges a (vi , v j ) A, A V V
number of initial tokens (or marking) on edges
actor token
10 - 15
10 - 16
Marked Graph
The token on the edges correspond to data that are stored in FIFO queues.
A node (actor) is called activated if on every input edge there is at least one
token.
A node (actor) can fire if it is activated.
The firing of a node vi (actor operates on the first tokens in the input queues)
removes from each input edge a token and adds a token to each output edge.
The output token correspond to the processed data.
Marked graphs are mainly used for modeling regular computations, for example
signal flow graphs.
10 - 17
Marked Graph
Example (model of a digital filter with infinite impulse response IIR)
Filter equation:
y (l ) a u (l ) b y (l 1) c y (l 2) d y (l 3)
nodes 3-5:
a d c b w
2 3 4 5 6 7 x
x+w y
output y
y
1 9 8
input u fork node 2: x=0
10 - 18
Implementation of Marked Graphs
There are different possibilities to implement marked graphs in hardware or
software directly. Only the most simple possibilities are shown here.
Hardware implementation as a synchronous digital circuit:
Actors are implemented as combinatorial circuits.
Edges correspond to synchronously clocked shift registers (FIFOs).
clock
10 - 19
Implementation of Marked Graphs
Hardware implementation as a self-timed asynchronous circuit:
Actors and FIFO registers are implemented as independent units.
The coordination and synchronization of firings is implemented using a handshake
protocol.
Delay insensitive direct implementation of the semantics of marked graphs.
ack ack
rdy rdy
actor
ack ack
10 - 20
Implementation of Marked Graphs
Software implementation with static scheduling:
At first, a feasible sequence of actor firings is determined which ends in the
starting state (initial distribution of tokens).
This sequence is implemented directly in software.
Example digital filter:
feasible sequence: (1, 2, 3, 9, 4, 8, 5, 6, 7)
program: while(true) {
t1 = read(u);
t2 = a*t1;
t3 = t2+d*t9;
t9 = t8;
t4 = t3+c*t9;
t8 = t6;
t5 = t4+b*t8;
t6 = t5;
write(y, t6);}
10 - 21
Implementation of Marked Graphs
Software implementation with dynamic scheduling:
10 - 22
Models for Architecture Synthesis
A sequence graph is a dependence graph with a single start node
(no incoming edges) and a single end node (no outgoing edges).
VS denotes the operations of the algorithm and ES denotes the dependence relations.
Cost function
10 - 23
Models for Architecture Synthesis - Example
Example sequence graph:
Algorithm (differential equation):
10 - 24
Models for Architecture Synthesis - Example
Corresponding sequence graph:
nop 0
1 x 2 x 6 x 8 x 10 +
3 x 7
x 9 + 11 <
-
4
-
5
nop 12 10 - 25
Models for Architecture Synthesis - Example
Corresponding resource graph
multiplier
with one instance of a
multiplier (cost 8) and one
instance of an ALU (cost 3): c(r1) = 8
c(r2) = 3
VS ER VT
10 - 26
Allocation and Binding
10 - 27
Models for Architecture Synthesis - Example
Corresponding resource graph
multiplier
with 4 instances of a 4
multiplier (cost 8) and two
instance of an ALU (cost 3): c(r1) = 8
2
c(r2) = 3
VS ER VT
10 - 28
Models for Architecture Synthesis - Example
Example binding ((r1) = 4, (r2) = 2):
10 - 29
Scheduling
10 - 30
10 - 31
Models for Architecture Synthesis - Example
Example: L = (v12) - (v0) = 7
(v0) = 1
(v1) = (v10) = 1
(v2) = (v11) = 2
(v3) = 3
(v6) = (v4) = 4
(v7) = 5
(v8) = (v5) = 6
(v9) = 7
(v12) = 8
10 - 32
Multiobjective Optimization
10 - 33
Multiobjective Optimization
Architecture Synthesis is an optimization problem with more than one objective:
Latency of the algorithm that is implemented
Hardware cost (memory, communication, computing units, control)
Power and energy consumption
10 - 34
Multiobjective Optimization
Let us suppose, we would like to select a typewriting device. Criteria are
mobility (related to weight)
comfort (related to keyboard size and performance)
2020
10 - 35
Multiobjective Optimization
writing comfort
better
1 Pareto-optimal
dominated
10
0.1 2 4 10 20 weight
10 - 36
Pareto-Dominance
:
dominated by solution k
dominate solution k
10 - 37
Pareto-optimal Set
A solution is named Pareto-optimal, if it is not Pareto-dominated by any other
solution in X.
The set of all Pareto-optimal solutions is denoted as the Pareto-optimal set and
its image in objective space as the Pareto-optimal front.
•f2
10 - 39
Synthesis Algorithms
Classification
unlimited resources:
no constraints in terms of the available resources are defined.
limited resources:
constrains are given in terms of the number and type of available resources.
10 - 40
Synthesis/Scheduling Without Resource Constraints
The corresponding scheduling method can be used
as a preparatory step for the general synthesis problem
to determine bounds on feasible schedules in the general case
if there is a dedicated resource for each operation.
10 - 41
ASAP Algorithm
ASAP = As Soon As Possible
10 - 42
The ASAP Algorithm - Example
Example:
w(vi) = 1
10 - 43
ALAP Algorithm
ALAP = As Late As Possible
10 - 44
ALAP Algorithm - Example
Example:
nop 0
Lmax = 7
w(vi) = 1
1 x 2 x
3 x 6 x
4
-
7
x 8 x 10 +
- 9 11
+ <
5
nop 12
10 - 45
Scheduling with Timing Constraints
There are different classes of timing constraints:
deadline (latest finishing times of operations), for example
10 - 46
10 - 47
Scheduling with Timing Constraints
We will model all timing constraints using relative constraints. Deadlines and
release times are defined relative to the start node v0.
Minimum, maximum and equality constraints can be converted into each other:
Minimum constraint:
Maximum constraint:
Equality constraint:
10 - 48
Weighted Constraint Graph
Timing constraints can be represented in form of a weighted constraint graph:
10 - 49
Weighted Constraint Graph
In order to represent a feasible schedule, we have one edge corresponding to
each precedence constraint with
10 - 50
Weighted Constraint Graph - Example
Example: w(v1) = w(v3) = 2 w(v2) = w(v4) = 1
min.
time
max. 4
time
3
10 - 51
Architecture Synthesis with Resource Constraints
10 - 52
Scheduling With Resource Constraints
10 - 53
List Scheduling
List scheduling is one of the most widely used algorithms for scheduling under
resource constraints.
Principles:
To each operation there is a priority assigned which denotes the urgency of being
scheduled. This priority is static, i.e. determined before the List Scheduling.
The algorithm schedules one time step after the other.
Uk denotes the set of operations that (a) are mapped onto resource vk and (b)
whose predecessors finished.
Tk denotes the currently running operations mapped to resource vk .
10 - 54
List Scheduling
resource types
10 - 55
List Scheduling - Example
Example:
10 - 56
List Scheduling - Example
Solution via list scheduling:
In the example, the solution is
independent of the chosen priority
function.
10 - 57
List Scheduling
Solution via an optimal method:
Latency is smaller than with
list scheduling.
An example of an optimal
algorithm is the transformation
into an integer linear program as
described next.
10 - 58
Integer Linear Programming
Principle:
Synthesis Problem
transformation into ILP
optimization of ILP
Solution of ILP
back interpretation
Solution of Synthesis Problem
10 - 59
Integer Linear Program
Yields optimal solution to synthesis problems as it is based on an exact
mathematical description of the problem.
10 - 60
10 - 61
Integer Linear Program
Many variants exist, depending on available information, constraints and
objectives, e.g. minimize latency, minimize resources, minimize memory. Just an
example is given here!!
10 - 62
Integer Linear Program
10 - 63
10 - 64
10 - 65
Integer Linear Program
Explanations:
(1) declares variables x to be binary .
(2) makes sure that exactly one variable xi,t for all t has the value 1, all others are 0.
(3) determines the relation between variables x and starting times of operations .
In particular, if xi,t = 1 then the operation vi starts at time t, i.e. (vi) = t.
(4) guarantees, that all precedence constraints are satisfied.
(5) makes sure, that the resource constraints are not violated. For all resource
types vk VT and for all time instances t it is guaranteed that the number of active
operations does not increase the number of available resource instances.
10 - 66
Integer Linear Program
Explanations:
(5) The first sum selects all operations that are mapped onto resource type vk. The
second sum considers all time instances where operation vi is occupying resource
type vk :
10 - 67
Architecture Synthesis for Iterative Algorithms and
Marked Graphs
10 - 68
Remember … : Marked Graph
Example (model of a digital filter with infinite impulse response IIR)
Filter equation:
y (l ) a u (l ) b y (l 1) c y (l 2) d y (l 3)
nodes 3-5:
a d c b w
2 3 4 5 6 7 x
x+w y
output y
y
1 9 8
input u fork node 2: x=0
10 - 69
Iterative Algorithms
Iterative algorithms consist of a set of indexed equations that are evaluated for
all values of an index variable l:
Here, xi denote a set of indexed variables, Fi denote arbitrary functions and dji
are constant index displacements.
Examples of well known representations are signal flow graphs (as used in signal
and image processing and automatic control), marked graphs and special forms
of loops.
10 - 70
Iterative Algorithms
Several representations of the same iterative algorithm:
One indexed equation with constant index dependencies:
10 - 71
Iterative Algorithms
Extended sequence graph GS = (VS, ES, d): To each edge (vi, vj) ES there is associated
the index displacement dij. An edge (vi, vj) ES denotes that the variable
corresponding to vj depends on variable corresponding to vi with displacement dij.
u x1 x2 x3 1
0 0 0 0
y
3 2
u x1 x2 x3
y
10 - 72
Iterative Algorithms
Equivalent signal flow graph:
a d c b y
u
z-1 z-1 z-1
10 - 73
Iterative Algorithms
An iteration is the set of all operations necessary to compute all variables xi[l]
for a fixed index l.
The iteration interval P is the time distance between two successive iterations of
an iterative algorithm. 1/P denotes the throughput of the implementation.
The latency L is the maximal time distance between the starting and the
finishing times of operations belonging to one iteration.
10 - 74
Iterative Algorithms
Implementation principles
A simple possibility, the edges with dij > 0 are removed from the extended
sequence graph. The resulting simple sequence graph is implemented using
standard methods.
0 1 2 2 2
execution
times w(vi)
t
one iteration L=7
one physical iteration P=7
no pipelining
10 - 75
Iterative Algorithms
Implementation principles
Using functional pipelining: Successive iterations overlap and a higher throughput
(1/P) is obtained.
• 4 resources
• functional pipelining
one physical iteration one iteration t
P=2 L=7
10 - 76
Iterative Algorithms
Solving the synthesis problem using integer linear programming:
Starting point is the ILP formulation given for simple sequence graphs.
ASAP and ALAP scheduling for upper and lower bounds hi and li use only edges
with dij = 0 (remove dependencies across iterations).
10 - 77
Integer Linear Program
10 - 78
Iterative Algorithms
Eqn.(4) is replaced by:
Proof of correctness:
dij i
i j j
t
P
dij P
10 - 79
Iterative Algorithms
Eqn. (5) is replaced by
Therefore, we obtain
10 - 80
Dynamic Voltage Scaling
If we transform the DVS problem into an integer linear program optimization: we
can optimize the energy in case of dynamic voltage scaling.
10 - 81
Dynamic Voltage Scaling
10 - 82
Dynamic Voltage Scaling
10 - 83
Dynamic Voltage Scaling
Explanations:
The objective functions just sums up all individual energies of operations.
Eqn. (1) makes decision variables yik binary.
Eqn. (2) guarantees that exactly one implementation (voltage) k K is
chosen for each operation vi .
Eqn. (3) implements the precedence constraints, where the actual
execution time is selected from the set of all available ones.
Eqn. (4) guarantees deadlines.
10 - 84
Chapter 8
Not covered this semester.
Not covered in exam.
If interested: Read
10 - 85
Remember: What you got some time ago …
10 - 86
What we told you: Be careful and please do not …
10 - 87
Return the boards at the
embedded systems exam!
10 - 88