0% found this document useful (0 votes)
17 views

ES Slides

Uploaded by

Thomas Shelby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

ES Slides

Uploaded by

Thomas Shelby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 607

Embedded Systems

1 - Introduction

© Lothar Thiele
Computer Engineering and Networks Laboratory
Lecture Organization

122

1-2
Organization
WWW: https://www.tec.ee.ethz.ch/education/lectures/embedded-systems.html
Lecture: Lothar Thiele, thiele@ethz.ch; Michele Magno <michele.magno@pbl.ee.ethz.ch>
Coordination: Seonyeong Heo (ETZ D97.7) <seoheo@ethz.ch>
References:
 P. Marwedel: Embedded System Design, Springer, ISBN 978-3-319-85812-8/978-3-030-
60909-2, 2018/2021.
 G.C. Buttazzo: Hard Real-Time Computing Systems. Springer Verlag, ISBN 978-1-4614-
0676-1, 2011.
 Edward A. Lee and Sanjit A. Seshia: Introduction to Embedded Systems, A Cyber-
Physical Systems Approach, Second Edition, MIT Press, ISBN 978-0-262-53381-2, 2017.

Sources: The slides contain ideas and material of J. Rabaey, K. Keuzer, M. Wolf, P.
Marwedel, P. Koopman, E. Lee, P. Dutta, S. Seshia, and from the above cited books.

1-3
Organization Summary
 Lectures are held on Mondays from 14:15 to 16:00 in ETF C1 until further notice.
Life streaming and slides are available via the web page of the lecture. In
addition, you find audio and video recordings of most of the slides as well as
recordings of this years and last years life streams on the web page of the
lecture.
 Exercises take place on Wednesdays and Fridays from 16:15 to 17:00 via Zoom.
On Wednesdays the lecture material is summarized, hints on how to approach
the solution are given and a sample question is solved. On Fridays, the correct
solutions are discussed.
 Laboratories take place on Wednesdays and Fridays from 16:15 to 18:00 (the
latest). On Wednesdays the session starts with a short introduction via Zoom
and then questions can be asked via Zoom. Fridays are reserved for questions
via Zoom.

1-4
Further Material via the Web Page

1-5
When and where?

1-6
What will you learn?
 Theoretical foundations and principles of the analysis and design of embedded
systems.
 Practical aspects of embedded system design, mainly software design.

The course has three components:


 Lecture: Communicate principles and practical aspects of embedded systems.
 Exercise: Use paper and pencil to deepen your understanding of analysis and
design principles .
 Laboratory (ES-Lab): Introduction into practical aspects of embedded systems
design. Use of state-of-the-art hardware and design tools.

1-7
Please read carfully!!
 https://www.tec.ee.ethz.ch/education/lectures/embedded-systems.html

1-8
What you got already…

1-9
Be careful and please do not …

1 - 10
You have to return the board at the end!

1 - 11
Embedded Systems - Impact

1 - 12
Embedded Systems
Embedded systems (ES) = information processing systems
embedded into a larger product

•Examples:

• Often, the main reason for buying is not information processing


1 - 13
© www.braingrid.org
© www.openpr.com
1 - 14
Many Names – Similar Meanings

© Edward Lee
1 - 15
Embedded System
Embedded System

Hardware & Computation


Software reasoning
deciding
big data

CYBER Communication
WORLD
observing influencing
PHYSICAL
WORLD physical/biological/social
processes

Nature
Use feedback to influence the dynamics of the physical
world by taking smart decisions in the cyber world
1 - 19
Reactivity & Timing
Embedded systems are often reactive:
 Reactive systems must react to stimuli from the system environment :

„A reactive system is one which is in continual interaction with is environment and


executes at a pace determined by that environment“ [Bergé, 1995]

Embedded systems often must meet real-time constraints:


 For hard real-time systems, right answers arriving too late are wrong. All other
time-constraints are called soft. A guaranteed system response has to be explained
without statistical arguments.

„A real-time constraint is called hard, if not meeting that constraint could


result in a catastrophe“ [Kopetz, 1997].

1 - 20
Predictability & Dependability

CPS = cyber-physical system

“It is essential to predict how a CPS is going to behave under any


circumstances […] before it is deployed.”Maj14

“CPS must operate dependably, safely, securely, efficiently and in


real-time.”Raj10

Maj14 R. Majumdar & B. Brandenburg (2014). Foundations of Cyber-Physical Systems.


Raj10 R. Rajkumar et al. (2010). Cyber-Physical Systems: The Next Computing Revolution.
1 - 21
Efficiency & Specialization
 Embedded systems must be efficient:
 Energy efficient
 Code-size and data memory efficient
 Run-time efficient
 Weight efficient
 Cost efficient

Embedded Systems are often specialized towards a certain


application or application domain:
 Knowledge about the expected behavior and the system environment at design
time is exploited to minimize resource usage and to maximize predictability and
reliability.

1 - 22
Comparison
Embedded Systems: General Purpose Computing
 Few applications that are known at  Broad class of applications.
design-time.
 Not programmable by end user.  Programmable by end user.

 Fixed run-time requirements (additional  Faster is better.


computing power often not useful).

 Typical criteria:  Typical criteria:


 cost  cost
 power consumption  power consumption
 size and weight  average speed
 dependability
 worst-case speed

1 - 23
Lecture Overview

1. Introduction to Embedded Systems


2. Software Development
3. Hardware-Software Interface
4. Programming Paradigms
Software Hardware-
5. Embedded Operating Systems
Software
6. Real-time Scheduling
7. Shared Resources
8. Hardware Components
Hardware 9. Power and Energy
10. Architecture Synthesis

1 - 24
Components and Requirements by Example

1 - 25
1 - 26
1 - 27
1 - 28
Components and Requirements by Example
- Hardware System Architecture -

1 - 29
High-Level Block Diagram View
low power CPU higher performance CPU
• enabling power to the rest of the system • sensor reading and motor control
• battery charging and voltage • flight control
measurement • telemetry (including the battery voltage)
• wireless radio (boot and operate) • additional user development
• detect and check expansion boards • USB connection

UART:
• communication protocol (Universal
Asynchronous Receiver/Transmitter)
• exchange of data packets to and from
interfaces (wireless, USB)

1 - 30
EEPROM:
• electrically erasable programmable
High-Level Block Diagram View read-only memory
Acronyms: sensor board • used for firmware (part of data and
software that usually is not
• Wkup: Wakeup signal changed, configuration data)
• GPIO: General-purpose input/output • can not be easily overwritten in
signal comparison to Flash
• SPI: Serial Peripheral Interface Bus
• I2C: Inter-Integrated Circuit (Bus)
• PWM: Pulse-width modulated Signal
• VCC: power-supply

Flash memory:
• non-volatile random-access memory
for program and data
1 - 31
1 - 32
High-Level Physical View

1 - 33
High-Level Physical View

1 - 34
Low-Level Schematic Diagram View

LEDs

(1 page out of 3)
1 - 35
Low-Level Schematic Diagram View

(1 page out of 3) Motors


1 - 36
High-Level Software View
 The software is built on top of a real-time operating system “FreeRTOS”.
 We will use the same operating system in the ES-Lab … .

1 - 37
High-Level Software View

The software architecture supports

 real-time tasks for motor control (gathering sensor values and pilot commands,
sensor fusion, automatic control, driving motors using PWM (pulse width
modulation, … ) but also

 non-real-time tasks (maintenance and test, handling external events, pilot


commands, … ).

1 - 39
High-Level Software View
Block diagram of the stabilization system:

sensor reading & transfer to cleaning and information


analog-digital processor preprocessing extraction from automatic control actuation
conversion sensors
on sensor
component 1 - 40
Components and Requirements by Example
- Processing Elements -

1 - 41
What can you do to increase performance?

1 - 42
From Computer Engineering

1 - 43
From Computer Engineering
iPhone Prozessor A12
• 2 processor cores
- high performance
• 4 processor cores - less
performant
• Acceleration for
Neural Networks
• Graphics processor
• Caches

1 - 46
What can you do to decrease power consumption?

1 - 47
Embedded Multicore Example
Trends:
 Specialize multicore processors towards real-time processing and low power
consumption (parallelism can decrease energy consumption)
 Target domains:

1 - 48
Why does higher parallelism help in reducing power?

1 - 49
System-on-Chip
Samsung Galaxy S6
– Exynos 7420 System on a Chip (SoC)
– 8 ARM Cortex processing cores
•Exynos 5422
(4 x A57, 4 x A53)
– 30 nanometer: transistor gate width

1 - 50
How to manage extreme workload variability?

1 - 51
System-on-Chip
Samsung Galaxy S6
– Exynos 7420 System on a Chip (SoC)
– 8 ARM Cortex processing cores
•Exynos 5422
(4 x A57, 4 x A53)
– 30 nanometer: transistor gate width

1 - 52
From Computer Engineering
iPhone Prozessor A12
• 2 processor cores
- high performance
• 4 processor cores - less
performant
• Acceleration for
Neural Networks
• Graphics processor
• Caches

1 - 53
Components and Requirements by Example
- Systems -

1 - 55
Zero Power Systems and Sensors

Streaming information to
and from the physical world:

• “Smart Dust”
• Sensor Networks
• Cyber-Physical Systems
• Internet-of-Things (IoT)

1 - 56
Zero Power Systems and Sensors

IEEE Journal of Solid-State Circuits, IEEE Journal of Solid-State


Jan 2013, 229-243. Circuits, April 2017, 961-971.
1 - 57
Trends …
 Embedded systems are communicating with each other, with servers or with the cloud.
Communication is increasingly wireless.

 Higher degree of integration on a single chip or integrated components:


 Memory + processor + I/O-units + (wireless) communication.
 Use of networks-on-chip for communication between units.
 Use of homogeneous or heterogeneous multiprocessor systems on a chip (MPSoC).
 Use of integrated microsystems that contain energy harvesting, energy storage, sensing,
processing and communication (“zero power systems”).
 The complexity and amount of software is increasing.

 Low power and energy constraints (portable or unattended devices) are increasingly important,
as well as temperature constraints (overheating).
 There is increasing interest in energy harvesting to achieve long term autonomous operation.

1 - 58
Embedded Systems
2. Software Development

© Lothar Thiele
Computer Engineering and Networks Laboratory
Where we are …

1. Introduction to Embedded Systems


2. Software Development
3. Hardware-Software Interface
4. Programming Paradigms
Software Hardware-
5. Embedded Operating Systems
Software
6. Real-time Scheduling
7. Shared Resources
8. Hardware Components
Hardware 9. Power and Energy
10. Architecture Synthesis

2-2
Remember: Computer Engineering I
Compilation of a C program to machine language program:

textual representation
of instructions

binary representation
of instructions and data

2-3
Embedded Software Development
Software Developer

previous
slide Software
Source Code Simulator

Binary operating
Compiler Code system

FPGA
Flash
sensors
actuators
Debugger
micro-
processor RAM

HOST EMBEDDED SYSTEM 2-4


Software Development with MSP432 (ES-Lab)

host PC

2-5
Software Development (ES-Lab)
Software development is nowadays usually done with the support of an IDE
(Integrated Debugger and Editor / Integrated Development Environment)
 edit and build the code
 debug and validate

2-6
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)

the executable output file


that is loaded into flash
source code memory on the processor
file in C

relocatable
object file

Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2-7
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)

the executable output file


that is loaded into flash
source code memory on the processor
file in C

relocatable
object file

Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2-8
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)

the executable output file


that is loaded into flash
source code memory on the processor
file in C

relocatable
object file

Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2-9
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)

the executable output file


that is loaded into flash
source code memory on the processor
file in C

relocatable
object file

Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2 - 10
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)

the executable output file


that is loaded into flash
source code memory on the processor
file in C

relocatable
object file

Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2 - 11
Software Development (ES-Lab)
object libraries target configuration file
object libraries that
assembly that are referenced specifies the connection to the
contain the operating
code in the code target (e.g. USB) and the target device
system (if any)

the executable output file


that is loaded into flash
source code memory on the processor
file in C

relocatable
object file

Linker command file that tells the linker report created by the linker describing
how to allocate memory and to stitch where the program and data sections
the object files and libraries together. are located in memory. 2 - 12
Much more in the ES-PreLab …
 The Pre-lab is intended for students with missing background in software
development in C and working with an integrated development environment.

2 - 13
Much more in the ES-PreLab …
 The Pre-lab is intended for students with missing background in software
development in C and working with an integrated development environment.

2 - 14
Embedded Systems
3. Hardware Software Interface

© Lothar Thiele
Computer Engineering and Networks Laboratory
Do you Remember ?

3-2
Where we are …

1. Introduction to Embedded Systems


2. Software Development
3. Hardware-Software Interface
4. Programming Paradigms
Software Hardware-
5. Embedded Operating Systems
Software
6. Real-time Scheduling
7. Shared Resources
8. Hardware Components
Hardware 9. Power and Energy
10. Architecture Synthesis

3-3
3-4
High-Level Physical View

3-5
High-Level Physical View

3-6
What you will learn …
Hardware-Software Interfaces in Embedded Systems
 Storage
 SRAM / DRAM / Flash
 Memory Map
 Input and Output
 UART Protocol
 Memory Mapped Device Access
 SPI Protocol
 Interrupts
 Clocks and Timers
 Clocks
 Watchdog Timer
 System Tick
 Timer and PWM
3-7
Storage

3-8
Remember … ?

3-9
MSP432P401R (ES-Lab)

3 - 10
Storage
SRAM / DRAM / Flash

3 - 11
Static Random Access Memory (SRAM)
 Single bit is stored in a bi-stable circuit
 Static Random Access Memory is used for
 caches
 register file within the processor core
 small but fast memories
 Read:
1. Pre-charge all bit-lines to average voltage
2. decode address (n+m bits)
3. select row of cells using n single-bit word lines (WL)
4. selected bit-cells drive all bit-lines BL (2m pairs)
5. sense difference between bit-line pairs and read out
 Write:
 select row and overwrite bit-lines using strong signals
3 - 12
Dynamic Random Access (DRAM)
Single bit is stored as a charge in a capacitor
 Bit cell loses charge when read, bit cell drains
over time
 Slower access than with SRAM due to small
storage capacity in comparison to capacity of
bit-line.
 Higher density than SRAM (1 vs. 6 transistors
per bit)

DRAMs require periodic refresh of charge


 Performed by the memory controller
 Refresh interval is tens of ms
 DRAM is unavailable during refresh
(RAS/CAS = row/column address select)
3 - 13
DRAM – Typical Access Process
1. Bus Transmission 2. Precharge and Row Access

3 - 14
DRAM – Typical Access Process
3. Column Access 4. Data Transfer and Bus Transmission

3 - 15
Flash Memory
Electrically modifiable, non‐volatile storage
Principle of operation:
 Transistor with a second “floating” gate
 Floating gate can trap electrons
 This results in a detectable change in
threshold voltage

3 - 16
NAND and NOR Flash Memory

Fast random access

3 - 17
Example: Reading out NAND Flash
Selected word-line (WL) : Target voltage (Vtarget)
Unselected word-lines : Vread is high enough to have a low resistance in all
transistors in this row

3 - 18
Storage
Memory Map

3 - 19
Example: Memory Map in MSP432 (ES-Lab)
Available memory:
 The processor used in the lab (MSP432P401R) has built in 256kB flash memory,
64kB SRAM and 32kB ROM (Read Only Memory).

Address space:
 The processor uses 32 bit addresses. Therefore, the addressable memory space is
4 GByte (= 232 Byte) as each memory location corresponds to 1 Byte.
 The address space is used to address the memories (reading and writing), to
address the peripheral units, and to have access to debug and trace information
(memory mapped microarchitecture).
 The address space is partitioned into zones, each one with a dedicated use. The
following is a simplified description to introduce the basic concepts.

3 - 20
Example: Memory Map in MSP432 (ES-Lab)
Memory map:

hexadecimal
representation
of a 32 bit
binary number;
each digit
corresponds
to 4 bit

0011 1111 …. 1111


0010 0000 …. 0000
diff. = 0001 1111 …. 1111 
229 different addresses
capacity = 229 Byte =
512 MByte 3 - 21
Example: Memory Map in MSP432 (ES-Lab)
Memory map:

hexadecimal
representation
of a 32 bit
binary number;
each digit
corresponds

to 4 bit


0011 1111 …. 1111
0010 0000 …. 0000
from base address
diff. = 0001 1111 …. 1111 
229 different addresses
capacity = 229 Byte =
512 MByte 3 - 22
Example: Memory Map in MSP432 (ES-Lab)
Memory map:

hexadecimal
representation
of a 32 bit
Schematic of LaunchPad as used in the Lab:
binary number;
each digit
corresponds
to 4 bit

0011 1111 …. 1111


0010 0000 …. 0000 LED1 is connected to Port 1, Pin 0

diff. = 0001 1111 …. 1111 


229 different addresses
How do we toggle LED1 in a C program?
capacity = 229 Byte =
512 MByte 3 - 23
Example: Memory Map in MSP432 (ES-Lab)
Memory map: Many necessary elements are missing in the
sketch below, in particular the configuration of
the port (input or output, pull up or pull down
hexadecimal resistors for input, drive strength for output).
representation See lab session.
of a 32 bit
binary number;
each digit …
corresponds //declare p1out as a pointer to an 8Bit integer
to 4 bit volatile uint8_t* p1out;

//P1OUT should point to Port 1 where LED1 is connected


p1out = (uint8_t*) 0x40004C02;
0011 1111 …. 1111
0010 0000 …. 0000 //Toggle Bit 0 (Signal to which LED1 is connected)
diff. = 0001 1111 …. 1111  *p1out = *p1out ^ 0x01;
229 different addresses
capacity = 229 Byte = ^ : XOR
512 MByte 3 - 24
Example: Memory Map in MSP432 (ES-Lab)
Memory map:

hexadecimal
representation
of a 32 bit
binary number;
each digit
corresponds
to 4 bit

0011 1111 …. 1111


0010 0000 …. 0000
diff. = 0001 1111 …. 1111  • 0x3FFFF address difference = 4 * 216 different addresses 
229 different addresses 256 kByte maximal data capacity for Flash Main Memory
capacity = 229 Byte = • Used for program, data and non-volatile configuration.
512 MByte 3 - 25
Example: Memory Map in MSP432 (ES-Lab)
Memory map:

hexadecimal
representation
of a 32 bit
binary number;
each digit
corresponds
to 4 bit

0011 1111 …. 1111


0010 0000 …. 0000
diff. = 0001 1111 …. 1111 
• 0x FFFF address difference = 216 different addresses 
229 different addresses
64 kByte maximal data capacity for SRAM Region
capacity = 229 Byte =
512 MByte • Used for program and data. 3 - 26
Input and Output

3 - 27
Device Communication
Very often, a processor needs to exchange information with other processors or
devices. To satisfy various needs, there exists many different communication
protocols, such as
 UART (Universal Asynchronous Receiver-Transmitter)
 SPI (Serial Peripheral Interface Bus)
 I2C (Inter-Integrated Circuit)
 USB (Universal Serial Bus)

 As the principles are similar, we will just explain a representative of an


asynchronous protocol (UART, no shared clock signal between sender and
receiver) and one of a synchronous protocol (SPI , shared clock signal).

3 - 28
Remember?
low power CPU higher performance CPU
• enabling power to the rest of the system • sensor reading and motor control
• battery charging and voltage • flight control
measurement • telemetry (including the battery voltage)
• wireless radio (boot and operate) • additional user development
• detect and check expansion boards • USB connection

UART:
• communication protocol (Universal
Asynchronous Receiver/Transmitter)
• exchange of data packets to and from
interfaces (wireless, USB)

3 - 29
Input and Output
UART Protocol

3 - 30
UART
 Serial communication of bits via a single signal, i.e. UART provides parallel-to-
serial and serial-to-parallel conversion.
 Sender and receiver need to agree on the transmission rate.
 Transmission of a serial packet starts with a start bit, followed by data bits and
finalized using a stop bit: 1-2 stop
6-9 data bits
bits

 There exist many variations of this simple scheme. for detecting single bit errors
3 - 31
UART
 The receiver runs an internal clock whose frequency is an exact multiple of the
expected bit rate.
 When a Start bit is detected, a counter begins to count clock cycles e.g. 8 cycles
until the midpoint of the anticipated Start bit is reached.
 The clock counter counts a
further 16 cycles, to the
middle of the first Data bit,
and so on until the Stop bit.

3 - 32
UART with MSP432 (ES-Lab)

host PC

3 - 33
UART with MSP432 (Lab)

3 - 34
Input and Output
Memory Mapped Device Access

3 - 35
Memory-Mapped Device Access

• Configuration of Transmitter and Receiver must


match; otherwise, they can not communicate.
• Examples of configuration parameters:
• transmission rate (baud rate, i.e., symbols/s)
• LSB or MSB first
in our case: bit/s
• number of bits per packet
• parity bit
• number of stop bits
• interrupt-based communication
• clock source

buffer for received bits and bits that should be transmitted


3 - 36
Transmission Rate
… … Clock subsampling:
clock subsampling
• The clock subsampling block
is complex, as one tries to
match a large set of transmission
rates with a fixed input frequency.
Clock Source:
• SMCLK in the lab setup = 3MHz
clock • Quartz frequency = 48 MHz, is
source parallel-to-serial divided by 16 before connected to
SMCLK
data to be Example:
transmitted serial • Transmission rate 4800 bit/s
output
• 16 clock periods per bit (see 3-26)
• Subsampling factor =
3*10^6 / (4.8*10^3 * 16) = 39.0625

3 - 37
Software Interface
Part of C program that prints a character to a UART terminal on the host PC:
...
static const eUSCI_UART_Config uartConfig =
{
EUSCI_A_UART_CLOCKSOURCE_SMCLK, // SMCLK Clock Source
39, // BRDIV = 39 , integral part
1, // UCxBRF = 1 , fractional part * 16 data structure uartConfig
0, // UCxBRS = 0
contains the configuration
EUSCI_A_UART_NO_PARITY, // No Parity
EUSCI_A_UART_LSB_FIRST, // LSB First
of the UART
EUSCI_A_UART_ONE_STOP_BIT, // One stop bit
EUSCI_A_UART_MODE, // UART mode
EUSCI_A_UART_OVERSAMPLING_BAUDRATE_GENERATION}; // Oversampling Mode
GPIO_setAsPeripheralModuleFunctionInputPin(GPIO_PORT_P1,
GPIO_PIN2 | GPIO_PIN3, GPIO_PRIMARY_MODULE_FUNCTION ); //Configure CPU signals use uartConfig to write to
UART_initModule(EUSCI_A0_BASE, &uartConfig); // Configuring UART Module A0
UART_enableModule(EUSCI_A0_BASE); // Enable UART module A0
eUSCI_A0 configuration
registers
UART_transmitData(EUSCI_A0_BASE,'a'); // Write character ‘a’ to UART
start UART
...

base address of A0 (0x40001000), where A0 is the instance of the UART peripheral


3 - 38
Software Interface
Replacing UART_transmitData(EUSCI_A0_BASE,'a') by a direct access to registers:
...
volatile uint16_t* uca0ifg = (uint16_t*) 0x4000101C; declare pointers to UART
volatile uint16_t* uca0txbuf = (uint16_t*) 0x4000100E; configuration registers
...
// Initialization of UART as before
...
wait until transmit buffer is empty
while (!((*uca0ifg >> 1) & 0x0001));
*uca0txbuf = (char) 'g'; // Write to transmit buffer write character ‘g’ to the
... transmit buffer

shift 1 bit to the right

!((*uca0ifg >> 1) & 0x0001)

expression is ‘1’ if bit


UCTXIFG = 0 (buffer not empty). 3 - 39
Input and Output
SPI Protocol

3 - 40
SPI (Serial Peripheral Interface Bus)
 Typically communicate across short distances
 Characteristics:
 4-wire synchronized (clocked) communications bus
 supports single master and multiple slaves
 always full-duplex: Communicates in both directions simultaneously
 multiple Mbps transmission speeds can be achieved
 transfer data in 4 to 16 bit serial packets
 Bus wiring:
 MOSI (Master Out Slave In) – carries data out of master to slave
 MISO (Master In Slave Out) – carries data out of slave to master
 Both MOSI and MISO are active during every transmission
 SS (or CS) – signal to select each slave chip
 System clock SCLK – produced by master to synchronize transfers
3 - 41
SPI (Serial Peripheral Interface Bus)
More detailed circuit diagram:
 details vary between
different vendors and
implementations

Timing diagram:

system clock SCLK

writing data output:


MOSI or MISO
reading data input
in the middle of bit:
3 - 42
SPI (Serial Peripheral Interface Bus)
Two examples of bus configurations:

Master and multiple independent Master and multiple daisy-chained


slaves slaves
http://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/SPI_three_slaves http://www.maxim-ic.com/appnotes.cfm/an_pk/3947
.svg/350px-SPI_three_slaves.svg.png

3 - 43
Interrupts

3 - 44
Interrupts
A hardware interrupt is an electronic alerting signal sent to the CPU from another
component, either from an internal peripheral or from an external device.

MSP 432 [ES-Lab]

The Nested Vector


Interrupt Controller
(NVIC) handles the
processing of
interrupts

3 - 45
Interrupts

...

MSP432

3 - 46
Processing of an Interrupt (MSP432 ES-Lab)
Timer_A0

I/O Port P1 Nested Vector


Interrupt Controller CPU
… (NVIC)
eUSCI_A0

peripheral unit interrupt handling

The vector interrupt controller (NVIC) Interrupt priorities are relevant if


 enables and disables interrupts  several interrupts happen at the same time
 allows to individually and globally  the programmer does not mask interrupts
mask interrupts (disable reaction to in an interrupt service routine (ISR) and
interrupt), and therefore, preemption of an ISR by another
 registers interrupt service routines ISR may happen (interrupt nesting).
(ISR), sets the priority of interrupts.
3 - 47
Processing of an Interrupt

IFG register

• When an interrupt signal is received, a


• Most peripherals can generate corresponding bit is set in an IFG register.
interrupts to provide status and • There is an such an IFG register for each
information. interrupt source.
• Interrupts can also be generated from • As some interrupt sources are only on for a
GPIO pins. short duration, the CPU registers the interrupt
signal internally.
3 - 48
Processing of an Interrupt

IFG register

3. CPU/NVIC acknowledges interrupt by:


• current instruction completes
• saves return-to location on stack
• mask interrupts globally
• determines source of interrupt
• calls interrupt service routine (ISR)

3 - 49
Processing of an Interrupt

IFG register

interrupt
vector
table

3. CPU/NVIC acknowledges interrupt by: pointer to ISR


• current instruction completes
• saves return-to location on stack
• mask interrupts globally
• determines source of interrupt
• calls interrupt service routine (ISR)

3 - 50
Processing of an Interrupt

IFG register

3. CPU/NVIC acknowledges interrupt by: 4. Interrupt Service Routine (ISR):


• current instruction completes • save context of system
• saves return-to location on stack • run your interrupt’s code
• mask interrupts globally • restore context of system
• determines source of interrupt • (automatically) un-mask interrupts and
• calls interrupt service routine (ISR) • continue where it left off

3 - 51
Processing of an Interrupt
Detailed interrupt processing flow:

get the interrupt status globally allow / dis-


of the selected pin … allow the processor
to react to interrupts
(IFG)

clears the interrupt status


on the selected pin Interrupt_enableMaster();
Interrupt_disableMaster();
enable interrupt
in the peripheral unit
Interrupt_enableInterrupt();
enable interrupt in the interrupt controller 3 - 52
Example: Interrupt Processing
 Port 1, pin 1 (which has a switch connected to it) is configured as an input with interrupts enabled
and port 1, pin 0 (which has an LED connected) is configured as an output.
 When the switch is pressed, the LED output is toggled.

int main(void)
{
clear interrupt
•...
flag and enable
•GPIO_setAsOutputPin(GPIO_PORT_P1, GPIO_PIN0);
interrupt in
•GPIO_setAsInputPinWithPullUpResistor(GPIO_PORT_P1, GPIO_PIN1);
periphery
•GPIO_clearInterruptFlag(GPIO_PORT_P1, GPIO_PIN1);
enable interrupts •GPIO_enableInterrupt(GPIO_PORT_P1, GPIO_PIN1);
in the controller
(NVIC) •Interrupt_enableInterrupt(INT_PORT1);
•Interrupt_enableMaster();
enter low power
mode LPM3 •while (1) PCM_gotoLPM3();
}
3 - 53
Example: Interrupt Processing
 Port 1, pin 1 (which has a switch connected to it) is configured as an input with interrupts enabled
and port 1, pin 0 (which has an LED connected) is configured as an output.
 When the switch is pressed, the LED output is toggled.

predefined name of ISR


void PORT1_IRQHandler(void)
attached to Port 1 {
uint32_t status;
get status (flags) of
status = GPIO_getEnabledInterruptStatus(GPIO_PORT_P1);
interrupt-enabled
•GPIO_clearInterruptFlag(GPIO_PORT_P1, status);
pins of port 1
clear all current flags •if(status & GPIO_PIN1)
from all interrupt- •{
enabled pins of port 1 • GPIO_toggleOutputOnPin(GPIO_PORT_P1, GPIO_PIN0);
•}
check, whether pin 1 •}
was flagged
3 - 54
Polling vs. Interrupt
int main(void)
Similar
{
functionality uint8_t new, old;
with polling: •...
•GPIO_setAsOutputPin(GPIO_PORT_P1, GPIO_PIN0);
•GPIO_setAsInputPinWithPullUpResistor(GPIO_PORT_P1, GPIO_PIN1);
•old = GPIO_getInputPinValue(GPIO_PORT_P1, GPIO_PIN1);

•while (1)
•{
•new = GPIO_getInputPinValue(GPIO_PORT_P1, GPIO_PIN1);
continuously get the •if (!new & old)
signal at pin1 and •{
detect falling edge •GPIO_toggleOutputOnPin(GPIO_PORT_P1, GPIO_PIN0);
•}
•old = new;
•}
}
3 - 55
Polling vs. Interrupt
What are advantages and disadvantages?
 We compare polling and interrupt based on the utilization of the CPU by using a
simplified timing model.
 Definitions:
 utilization u: average percentage, the processor is busy
 computation c: processing time of handling the event
 overhead h: time overhead for handling the interrupt
 period P: polling period
 interarrival time T: minimal time between two events
 deadline D: maximal time between event arrival and finishing event processing with D ≤ T.
polling interrupt events
P ≥T

c c h1 c h2 h =h1 + h2 ≤D ≤D
3 - 56
Polling vs. Interrupts
For the following considerations, we suppose that the interarrival time between
events is T. This makes the results a bit easier to understand.

Some relations for interrupt-based event processing :


 The average utilization is ui = (h + c) / T .
 As we need at least h+c time to finish the processing of an event, we find the
following constraint: h+c ≤ D ≤ T .

Some relations for polling-based event processing:


 The average utilization is up = c / P .
 We need at least time P+c to process an event that arrives shortly after a polling
took place. The polling period P should be larger than c. Therefore, we find the
following constraints: 2c ≤ c+P ≤ D ≤ T

3 - 57
Polling vs. Interrupts
Design problem: D and T are given by application requirements. h and c are given by
the implementation. When to use interrupt and when polling when considering the
resulting system utilization? What is the best value for the polling period P?

Case 1: If D < c + min(c, h) then event processing is not possible.


Case 2: If 2c ≤ D < h+c then only polling is possible. The maximal period P = D-c leads
to the optimal utilization up = c / (D-c) .
Case 3: If h+c ≤ D < 2c then only interrupt is possible with utilization ui = (h + c) / T .
Case 4: If c + max(c, h) ≤ D then both are possible with up = c / (D-c) or ui = (h + c) / T .

Interrupt gets better in comparison to polling, if the deadline D for processing


interrupts gets smaller in comparison to the interarrival time T, if the overhead h gets
smaller in comparison to the computation time c, or if the interarrival time of events
is only lower bounded by T (as in this case polling executes unnecessarily).
3 - 58
Clocks and Timers

3 - 59
Clocks and Timers
Clocks

3 - 60
Clocks
Microcontrollers usually have many different clock sources that have different
 frequency (relates to precision)
 energy consumption
 stability, e.g., crystal-controlled clock vs. digitally controlled oszillator
As an example, the MSP432 (ES-Lab) has the following clock sources:
frequency precision current comment
LFXTCLK 32 kHz 0.0001% / °C 150 nA external crystal
… 0.005% / °C
HFXTCLK 48 MHz 0.0001% / °C 550 µA external crystal
… 0.005% / °C
DCOCLK 3 MHz 0.025% / °C N/A internal
VLOCLK 9.4 kHz 0.1% / °C 50 nA internal
REFOCLK 32 kHz 0.012% / °C 0.6 µA internal
MODCLK 25 MHz 0.02% / °C 50 µA internal
SYSOSC 5 MHz 0.03% / °C 30 µA internal
3 - 61
Clocks and Timers MSP432 (ES-Lab)

3 - 62
Clocks and Timers MSP432 (ES-Lab)

3 - 63
Clocks
From these basic clocks, several internally available clock signals are derived.
They can be used for clocking peripheral units, the CPU, memory, and the various
timers.

Example MSP432 (ES-Lab):


 only some of the
clock generators are
shown (LFXT, HFXT,
DCO)
 dividers and clock
sources for the
internally available
clock signals can be
set by software
3 - 64
Clocks and Timers
Watchdog Timer

3 - 65
Watchdog Timer
Watchdog Timers provide system fail-safety:
 If their counter ever rolls over (back to zero), they reset the processor. The goal
here is to prevent your system from being inactive (deadlock) due to some
unexpected fault.
 To prevent your system from continuously resetting itself, the counter should be
reset at appropriate intervals.
CPU Watchdog Timer (WDT_A)
pause counting up
•WDT_A_holdTimer();
reset counter to 0
WDT_A_clearTimer();

overflow up counter clock input, e.g.,


reset SMCLK, ACLK

If the count completes without a restart,


the CPU is reset.
3 - 66
Clocks and Timers
System Tick

3 - 67
SysTick MSP432 (ES-Lab)
 SysTick is a simple decrementing 24 bit counter that is part of the NVIC
controller (Nested Vector Interrupt Controller). Its clock source is MCLK and it
reloads to period-1 after reaching 0.
 It’s a very simple timer, mainly used for periodic interrupts or measuring time.
int main(void) {
...
GPIO_setAsOutputPin(GPIO_PORT_P1, GPIO_PIN0);

SysTick_enableModule();
SysTick_setPeriod(1500000); if MCLK has a frequency of 3 MHz,
SysTick_enableInterrupt(); an interrupt is generated every 0.5 s.
Interrupt_enableMaster();

while (1) PCM_gotoLPM0(); go to low power mode LP0 after executing the ISR
}
void SysTick_Handler(void) {
MAP_GPIO_toggleOutputOnPin(GPIO_PORT_P1, GPIO_PIN0); }
3 - 68
SysTick MSP432 (ES-Lab)
Example for measuring the execution time of some parts of a program:
int main(void) {
int32_t start, end, duration;
...
SysTick_enableModule();
if MCLK has frequency of 3 MHz,
SysTick_setPeriod(0x01000000);
the counter rolls over every ~5.6 seconds
SysTick_disableInterrupt();
as (224 / (3 106) = 5.59
start = SysTick_getValue();

... // part of the program whose duration is measured


the resolution of the duration is one
end = SysTick_getValue(); microsecond; the duration must not be
duration = ((start - end) & 0x00FFFFFF) / 3; longer than ~6 seconds; note the use of
modular arithmetic if end > start;
... overhead for calling SysTick_getValue()
} is not accounted for;

3 - 69
3 - 70
Clocks and Timers
Timer and PWM

3 - 71
Timer
Usually, embedded microprocessors have several elaborate timers that allow to
 capture the current time or time differences, triggered by hardware or software
events,
 generate interrupts when a certain time is reached (stop watch, timeout),
 generate interrupts when counters overflow,
 generate periodic interrupts, for example in order to periodically execute tasks,
 generate specific output signals, for example PWM (pulse width modulation).
0xFFFF
0xFFFE
clock input counter interrupt on
0xFFFD
register overflow /
0x0002
roll-over
each pulse of the 0x0001

clock increments the 0x0000


counter register example 16 bit
counter register interrupt on roll over
3 - 72
Timer
Typically, the mentioned functions are realized via capture and compare registers:
capture compare
clock input counter interrupt on clock input counter interrupt on
register roll-over register roll-over

capture capture capture compare compare


event register actions register actions

• the value of counter register is stored in • the value of the compare register can be
capture register at the time of the capture set by software
event (input signals, software) • as soon as the values of the counter and
• the value can be read by software compare register are equal, compare
actions can be taken such as interrupt,
• at the time of the capture, further actions signaling peripherals, changing pin values,
can be triggered (interrupt, signal) resetting the counter register 3 - 73
Timer
 Pulse Width Modulation (PWM) can be used to change the average power of a
signal.
 The use case could be to change the speed of a motor or to modulate the light
intensity of an LED.
0xFFFF counter one compare register
register is used to define the
period

another compare register


is used to change the
duty cycle of the signal

0x0000

output signal
3 - 74
Timer Example MSP432 (ES-Lab)
Example: Configure Timer in “continuous mode”. Goal: generate periodic interrupts.

TXCLK (external)
ACLK
SMCLK
inverted TXCLK

clock sources 7 configurable


compare or
capture
registers

3 - 75
Timer Example MSP432 (ES-Lab)
Example: Configure Timer in “continuous mode”. Goal: generate periodic interrupts.

0xFFFF

0x0000

Interrupt

3 - 76
Timer Example MSP432 (ES-Lab)
Example: Configure Timer in “continuous mode”. Goal: generate periodic interrupts,
but with configurable periods.
int main(void) {
...
const Timer_A_ContinuousModeConfig continuousModeConfig = {
TIMER_A_CLOCKSOURCE_ACLK,
TIMER_A_CLOCKSOURCE_DIVIDER_1, clock source is ACLK (32.768 kHz);
divider is 1 (count frequency 32.768 kHz); so far,
TIMER_A_TAIE_INTERRUPT_DISABLE,
TIMER_A_DO_CLEAR}; no interrupt on roll-over; nothing
... happens
configure continuous mode
of timer instance A0 only the
Timer_A_configureContinuousMode(TIMER_A0_BASE, &continuousModeConfig);
counter is
Timer_A_startCounter(TIMER_A0_BASE, TIMER_A_CONTINUOUS_MODE); running
...

while(1) PCM_gotoLPM0(); }
start counter A0 in
continuous mode
3 - 77
Timer Example MSP432 (ES-Lab)
Example:
 For a periodic interrupt, we need to add a compare register and an ISR.
 The following code should be added as a definition:
#define PERIOD 32768

 The following code should be added to main():


const Timer_A_CompareModeConfig compareModeConfig = {
TIMER_A_CAPTURECOMPARE_REGISTER_1,
TIMER_A_CAPTURECOMPARE_INTERRUPT_ENABLE, a first interrupt is generated after about one
0, second as the counter frequency is 32.768 kHz
PERIOD};
...
Timer_A_initCompare(TIMER_A0_BASE, &compareModeConfig);
Timer_A_enableCaptureCompareInterrupt(TIMER_A0_BASE, TIMER_A_CAPTURECOMPARE_REGISTER_1);
Interrupt_enableInterrupt(INT_TA0_N);
Interrupt_enableMaster();
... 3 - 78
Timer Example MSP432 (ES-Lab)
Example:
 For a periodic interrupt, we need to add a compare register and an ISR.
 The following Interrupt Service Routine (ISR) should be added. It is called if one of
the capture/compare registers CCR1 … CCR6 raises an interrupt

void TA0_N_IRQHandler(void) { the register TA0IV contains the interrupt flags for
the registers; after being read, the highest priority
switch(TA0IV) { interrupt (smallest register number) is cleared
case 0x0002: //flag for register CCR1 automatically.
TA0CCR1 = TA0CCR1 + PERIOD;
... // do something every PERIOD
default: break; the register TA0CCR1 contains the compare
} value of compare register 1.
}
other cases in the switch statement may be used
to handle other capture and compare registers
3 - 79
Timer Example MSP432 (ES-Lab)
Example: This principle can be used to generate several periodic interrupts with
one timer.
TA0CCR2 TA0CCR2
TA0CCR1
TA0CCR1 TA0CCR1
0xFFFF
TA0CCR2 TA0CCR2
TA0CCR1

3 - 80
Embedded Systems
4. Programming Paradigms

© Lothar Thiele
Computer Engineering and Networks Laboratory
Where we are …

1. Introduction to Embedded Systems


2. Software Development
3. Hardware-Software Interface
4. Programming Paradigms
Software Hardware-
5. Embedded Operating Systems
Software
6. Real-time Scheduling
7. Shared Resources
8. Hardware Components
Hardware 9. Power and Energy
10. Architecture Synthesis

4-2
Reactive Systems and Timing

4-3
Timing Guarantees
 Hard real-time systems can be often found in safety-critical applications. They
need to provide the result of a computation within a fixed time bound.
 Typical application domains:
 avionics, automotive, train systems, automatic control including robotics,
manufacturing, media content production

wing vibration of airplane,


sideairbag in car,
sensing every 5 ms
reaction after event in <10 mSec

4-4
Simple Real-Time Control System

A/D
Input Control-Law D/A
Computation
A/D

Sensor Environment Actuator

4-5
Real-Time Systems
In many cyber-physical systems (CPSs), correct timing is a matter of correctness, not
performance: an answer arriving too late is consider to be an error.

4-6
Real-Time Systems

4-7
Real-Time Systems

4-8
Real-Time Systems

4-9
Real-Time Systems

4 - 10
Real-Time Systems

start time deadline

4 - 11
Real-Time Systems
 Embedded controllers are often expected to finish the processing of data and
events reliably within defined time bounds. Such a processing may involve
sequences of computations and communications.

 Essential for the analysis and design of a real-time system: Upper bounds on the
execution times of all tasks are statically known. This also includes the
communication of information via a wired or wireless connection.

 This value is commonly called the Worst-Case Execution Time (WCET).

 Analogously, one can define the lower bound on the execution time, the Best-Case
Execution Time (BCET).

4 - 12
Distribution of Execution Times
Distribution of execution times

Unsafe:
Best Case Execution Time
Execution Time Measurement
Upper bound
Worst Case
Execution Time

Execution Time

4 - 13
Modern Hardware Features
 Modern processors increase the average performance (execution of tasks) by
using caches, pipelines, branch prediction, and speculation techniques, for
example.
 These features make the computation of the WCET very difficult: The
execution times of single instructions vary widely.
 The microarchitecture has a large time-varying internal state that is changed by
the execution of instructions and that influences the execution times of
instructions.
 Best case - everything goes smoothely: no cache miss, operands ready, needed
resources free, branch correctly predicted.
 Worst case - everything goes wrong: all loads miss the cache, resources needed
are occupied, operands are not ready.
 The span between the best case and worst case may be several hundred cycles.

4 - 14
Methods to Determine the Execution Time of a Task
•execution time

•Worst-Case

•Best-Case

•Real System •Measurement •Simulation •Worst-Case


(correct model) Analysis
4 - 15
(Most of) Industry’s Best Practice
 Measurements: determine execution times directly by observing the execution
or a simulation on a set of inputs.
 Does not guarantee an upper bound to all executions unless the reaction to all
initial system states and all possible inputs is measured.
 Exhaustive execution in general not possible: Too large space of (input domain) x
(set of initial execution states).
 Simulation suffers from the same restrictions.

 Compute upper bounds along the structure of the program:


 Programs are hierarchically structured: Instructions are “nested” inside
statements.
 Therefore, one may compute the upper execution time bound for a statement
from the upper bounds of its constituents, for example of single instructions.
 But: The execution times of individual instructions varies largely!

4 - 16
Determine the WCET
Complexity of determining the WCET of tasks:
 In the general case, it is even undecidable whether a finite bound exists.
 For restricted classes of programs it is possible, in principle. Computing accurate
bounds is simple for „old“ architectures, but very complex for new architectures with
pipelines, caches, interrupts, and virtual memory, for example.

Analytic (formal) approaches exist for hardware and software.


 In case of software, it requires the analysis of the program flow and the analysis of the
hardware (microarchitecture). Both are combined in a complex analysis flow, see for
example www.absint.de and the lecture “Hardware/Software Codesign”.
 For the rest of the lecture, we assume that reliable bounds on the WCET are available,
for example by means of exhaustive measurements or simulations, or by analytic
formal analysis.

4 - 17
Different Programming Paradigms

4 - 18
Why Multiple Tasks on one Embedded Device?
 The concept of concurrent tasks reflects our intuition about the functionality of
embedded systems.

 Tasks help us manage the complexity of concurrent activities as happening in the


system environment:
 Input data arrive from various sensors and input devices.
 These input streams may have different data rates like in multimedia processing,
systems with multiple sensors, automatic control of robots

 The system may also receive asynchronous (sporadic) input events.


 These input event may arrive from user interfaces, from sensors, or from
communication interfaces, for example.

4 - 19
Example: Engine Control
Typical Tasks:
 spark control
 crankshaft sensing
 fuel/air mixture
 oxygen sensor
engine
 Kalman filter – control
controller
algorithm

4 - 20
Overview
 There are many structured ways of programming an embedded system.
 In this lecture, only the main principles will be covered:
 time triggered approaches
 periodic
 cyclic executive
 generic time-triggered scheduler

 event triggered approaches


 non-preemptive
 preemptive – stack policy
 preemptive – cooperative scheduling
 preemptive - multitasking

4 - 21
Time-Triggered Systems
Pure time-triggered model:
 no interrupts are allowed, except by timers
 the schedule of tasks is computed off-line and therefore, complex sophisticated
algorithms can be used
 the scheduling at run-time is fixed and therefore, it is deterministic
 the interaction with environment happens through polling

interrupt polling

Timer
interfaces
CPU to sensor/
actuator

set timer 4 - 22
Simple Periodic TT Scheduler
 A timer interrupts regularly with period P.
 All tasks have same period P.

T1 T2 T3 T1 T2 T3 T1 T2 T3
t
t(0)
P
 Properties:
 later tasks, for example T2 and T3, have unpredictable starting times
 the communication between tasks or the use of common resources is safe, as
there is a static ordering of tasks, for example T2 starts after finishing T1
 as a necessary precondition, the sum of WCETs of all tasks within a period is
bounded by the period P:

4 - 23
Simple Periodic Time-Triggered Scheduler
usually done offline
main:
determine table of tasks (k, T(k)), for k=0,1,…,m-1;
i=0; set the timer to expire at initial phase t(0);
while (true) sleep();
set CPU to low power mode;
Timer Interrupt: processing starts again after interrupt
i=i+1; k T(k)
set the timer to expire at i*P + t(0); 0 T1
for (k=0,…,m-1){ execute task T(k); } 1 T2
return;
2 T3
for example using a function pointer in C; 3 T4
task(= function) returns after finishing. 4 T5
m=5

4 - 24
Time-Triggered Cyclic Executive Scheduler
 Suppose now, that tasks may have different periods.
 To accommodate this situation, the period P is partitioned into frames of length f.

T1 T3 T2 T1 T4 T2 T1 T2 T1 T1 T2
t
0 2 4 6 8 10 12 14 16 18 20

f P

 We have a problem to determine a feasible schedule, if there are tasks with a


long execution time.
 long tasks could be partitioned into a sequence of short sub-tasks
 but this is tedious and error-prone process, as the local state of the task must be
extracted and stored globally

4 - 25
Time-Triggered Cyclic Executive Scheduling
 Examples for periodic tasks: sensory data acquisition, control loops, action
planning and system monitoring.
 When a control application consists of several concurrent periodic tasks with
individual timing constraints, the schedule has to guarantee that each periodic
instance is regularly activated at its proper rate and is completed within its
deadline.
 Definitions:
 : denotes the set of all periodic tasks
 i : denotes a periodic task
 i, j : denotes the jth instance of task i
ri , j , d i , j : denote the release time and absolute deadline of the
jth instance of task i
 i : phase of task i (release time of its first instance)
Di : relative deadline of task i

4 - 26
Time-Triggered Cyclic Executive Scheduling
 Example of a single periodic task  i :
i Ti
i Di

ri ,1 ri , 2
Ci
 A set of periodic tasks  : task instances should execute in these intervals

4 - 27
Time-Triggered Cyclic Executive Scheduling
 The following hypotheses are assumed on the tasks:
 The instances of a periodic task are regularly activated at a constant rate. The
interval Ti between two consecutive activations is called period. The release times
satisfy
ri, j   i   j  1Ti

 All instances have the same worst case execution time Ci . The worst case
execution time is also denoted as WCET(i) .
 All instances of a periodic task have the same relative deadline Di . Therefore, the
absolute deadlines satisfy

di , j   i   j  1Ti  Di

4 - 28
Time-Triggered Cyclic Executive Scheduling
Example with 4 tasks:
 requirement

schedule

0 4 8 12 16 20 24 28 32 36

not given as part of the requirement 4 - 29


Time-Triggered Cyclic Executive Scheduling
Some conditions for period P and frame length f:
 A task executes at most once within a frame:

 P is a multiple of f. period of task


 Period P is least common multiple of all periods .
 Tasks start and complete within a single frame:
worst case execution time
of task
 Between release time and deadline of every task there is at least one full frame:

relative deadline of task

4 - 30
Sketch of Proof for Last Condition

release times and


deadlines of tasks

frames
starting time latest finishing time
f

4 - 31
Example: Cyclic Executive Scheduling
Conditions:

4 4 1.0
5 5 1.8
20 20 1.0
possible solution: f = 2 20 20 2.0

Feasible solution (f=2):

t
0 2 4 6 8 10 12 14 16 18 20

f P
4 - 32
Time-Triggered Cyclic Executive Scheduling
Checking for correctness of schedule:
 denotes the number of the frame in which that instance j of task executes.
 Is P a common multiple of all periods ?
 Is P a multiple of f ?
 Is the frame sufficiently long?

 Determine offsets such that instances of tasks start after their release time:

 Are deadlines respected?

4 - 33
Generic Time-Triggered Scheduler
 In an entirely time-triggered system, the temporal control structure of all tasks is
established a priori by off-line support-tools.
 This temporal control structure is encoded in a Task-Descriptor List (TDL) that
contains the cyclic schedule for all activities of the node.
 This schedule considers the required precedence and mutual exclusion
relationships among the tasks such that an explicit coordination of the tasks by
the operating system at run time is not necessary.
 The dispatcher is activated by a
synchronized clock tick. It looks at the
TDL, and then performs the action
that has been planned for this
instant [Kopetz].

4 - 34
Simplified Time-Triggered Scheduler

main: usually done offline


determine static schedule (t(k), T(k)), for k=0,1,…,n-1;
determine period of the schedule P;
set i=k=0 initially; set the timer to expire at t(0);
while (true) sleep();
set CPU to low power mode;
Timer Interrupt: k t(k) T(k)
processing continues after interrupt
k_old := k; 0 0 T1
i := i+1; k := i mod n; 1 3 T2
set the timer to expire at i/n * P + t(k);
execute task T(k_old); 2 7 T1
return; 3 8 T3
for example using a function pointer in C;
4 12 T2
task returns after finishing.
n=5, P = 16

4 - 35
Summary Time-Triggered Scheduler
Properties:
 deterministic schedule; conceptually simple (static table); relatively easy to
validate, test and certify
 no problems in using shared resources

 external communication only via polling


 inflexible as no adaptation to the environment
 serious problems if there are long tasks

Extensions:
 allow interrupts → be careful with shared resources and the WCET of tasks!!
 allow preemptable background tasks
 check for task overruns (execution time longer than WCET) using a watchdog timer
4 - 36
Event Triggered Systems
The schedule of tasks is determined by the occurrence of external or internal events:
 dynamic and adaptive: there are possible problems with respect to timing, the use
of shared resources and buffer over- or underflow
 guarantees can be given either off-line (if bounds on the behavior of the
environment are known) or during run-time

interrupt interrupt or polling

Timer
interfaces
CPU to sensor/
actuator

set timer
4 - 37
Non-Preemptive Event-Triggered Scheduling
Principle:
 To each event, there is associated a corresponding task that will be executed.
 Events are emitted by (a) external interrupts or (b) by tasks themselves.
 All events are collected in a single queue; depending on the queuing discipline, an
event is chosen for execution, i.e., the corresponding task is executed.
 Tasks can not be preempted.

Extensions:
 A background task can run if the event queue is empty. It will be preempted by
any event processing.
 Timed events are ready for execution only after a time interval elapsed. This
enables periodic instantiations, for example.

4 - 38
Non-Preemptive Event-Triggered Scheduling
main:
set the CPU to low power mode;
while (true) {
continue processing after interrupt
if (event queue is empty) {
sleep();
} else { for example using a function pointer in C;
extract event from event queue; task returns after finishing.
execute task corresponding to event;
}
}
ISR
(interrupt service
Interrupt: routine) tasks
put event into event queue; event
return; interrupts
event

event queue extract event;


dispatch corresponding task

4 - 39
Non-Preemptive Event-Triggered Scheduling
Properties:
 communication between tasks does not lead to a simultaneous access to shared
resources, but interrupts may cause problems as they preempt running tasks
 buffer overflow may happen if too many events are generated by the environment or
by tasks
 tasks with a long running time prevent other tasks from running and may cause
buffer overflow as no events are being processed
during this time task with a long
 partition tasks into smaller ones execution time
 but the local context must be stored partition

subtask 1 subtask 2
save restore
context global memory context

4 - 40
Preemptive Event-Triggered Scheduling – Stack Policy
 This case is similar to non-preemptive case, but tasks can be preempted by
others; this resolves partly the problem of tasks with a long execution time.
 If the order of preemption is restricted, we can use the usual stack-based context
mechanism of function calls. The context of a
function contains the necessary state such as local main memory
variables and saved registers. context of addresses
main()
main(){

f1(); context of
… f1()

f1(){ context of
… f2()
f2();

4 - 41
Preemptive Event-Triggered Scheduling – Stack Policy
task T3
task T2
task T1
preemption t

 Tasks must finish in LIFO (last in first out) order of their instantiation.
 this restricts flexibility of the approach
 it is not useful, if tasks wait some unknown time for external events, i.e., they are
blocked
 Shared resources (communication between tasks!) must be protected, for
example by disabling interrupts or by the use of semaphores.

4 - 42
Preemptive Event-Triggered Scheduling – Stack Policy
main:
while (true) {
if (event queue is empty) {
set CPU to low power mode;
sleep();
processing continues after interrupt
} else {
select event from event queue;
execute selected task; for example using a function pointer
remove selected event from queue; in C; task returns after finishing.
}
}

InsertEvent: Interrupt:
put new event into event queue; InsertEvent(…);
select event from event queue; return;
if (selected task  running task) {
execute selected task; may be called by interrupt service
remove selected event from queue; routines (ISR) or tasks
}
return;

4 - 43
Thread
 A thread is a unique execution of a program.
 Several copies of such a “program” may run simultaneously or at different times.
 Threads share the same processor and its peripherals.

 A thread has its own local state. This state consists mainly of:
 register values;
 memory stack (local variables);
 program counter;

 Several threads may have a shared state consisting of global variables.

4 - 44
Threads and Memory Organization
 Activation record (also denoted as the thread context) contains the
thread local state which includes
registers and local data structures.
PC
thread 1
 Context switch:
 current CPU context registers
thread 2
goes out
 new CPU context ... CPU
goes in
memory

4 - 45
Co-operative Multitasking
 Each thread allows a context switch to another thread at a call to the
cswitch() function.
 This function is part of the underlying runtime system (operating system).
 A scheduler within this runtime system chooses which thread will run next.

 Advantages:
 predictable, where context switches can occur
 less errors with use of shared resources if the switch locations are chosen carefully

 Problems:
 programming errors can keep other threads out as a thread may never give up
CPU
 real-time behavior may be at risk if a thread runs too long before the next context
switch is allowed
4 - 46
Example: Co-operative Multitasking
Thread 1 Thread 2

if (x > 2) procdata(r,s,t);
sub1(y); cswitch();
else if (val1 == 3)
sub2(y); abc(val2);
cswitch(); rst(val3);
proca(a,b,c);

Scheduler

save_state(current);
p = choose_process();
load_and_go(p);

4 - 47
Preemptive Multitasking
 Most general form of multitasking:
 The scheduler in the runtime system (operating system) controls when contexts
switches take place.
 The scheduler also determines what thread runs next.

 State diagram corresponding to each single thread: terminate thread


 Run: A thread enters this state as it starts executing
on the processor run
wait
 Ready: State of threads that are ready to execute
but cannot be executed because the processor dispatch
blocked
is assigned to another thread.
preemption
 Blocked: A task enters this state when it waits
for an event. signal ready
activate thread

4 - 48
Embedded Systems
4a. Timing Anomalies

© Lothar Thiele
Computer Engineering and Networks Laboratory
4a - 2
4a - 3
4a - 4
4a - 5
4a - 6
4a - 7
4a - 8
4a - 9
4a - 10
4a - 11
4a - 12
Embedded Systems
5. Operating Systems

© Lothar Thiele
Computer Engineering and Networks Laboratory
Embedded Operating Systems

5-2
Where we are …

1. Introduction to Embedded Systems


2. Software Development
3. Hardware-Software Interface
4. Programming Paradigms
Software Hardware-
5. Embedded Operating Systems
Software
6. Real-time Scheduling
7. Shared Resources
8. Hardware Components
Hardware 9. Power and Energy
10. Architecture Synthesis

5-3
Embedded Operating System (OS)
 Why an operating system (OS) at all?
 Same reasons why we need one for a traditional computer.
 Not every devices needs all services.

 In embedded systems we find a large variety of requirements and environments:


 Critical applications with high functionality (medical applications, space shuttle,
process automation, …).
 Critical applications with small functionality (ABS, pace maker, …).
 Not very critical applications with broad range of functionality (smart phone, …).

5-4
Embedded Operating System
 Why is a desktop OS not suited?
 The monolithic kernel of a desktop OS offers too many features that take space in
memory and consume time.
 Monolithic kernels are often not modular, fault-tolerant, configurable.
 Requires too much memory space and is often too ressource hungry in terms of
computation time.
 Not designed for mission-critical applications.
 The timing uncertainty may be too large for some applications.

5-5
Embedded Operating Systems
Essential characteristics of an embedded OS: Configurability
 No single operating system will fit all needs, but often no overhead for
unused functions/data is tolerated. Therefore, configurability is needed.
 For example, there are many embedded systems without external memory, a
keyboard, a screen or a mouse.

Configurability examples:
 Remove unused functions/libraries (for example by the linker).
 Use conditional compilation (using #if and #ifdef commands in C, for example).
 But deriving a consistent configuration is a potential problem of systems with a
large number of derived operating systems. There is the danger of missing
relevant components.

5-6
Example: Configuration of VxWorks

© Windriver

http://www.windriver.com/products/development_tools/ide/tornado2/tornado_2_ds.pdf
5-7
Real-time Operating Systems
A real-time operating system is an operating system that supports the
construction of real-time systems.
Key requirements:
1. The timing behavior of the OS must be predictable.
For all services of the OS, an upper bound on the execution time is necessary. For
example, for every service upper bounds on blocking times need to be available,
i.e. for times during which interrupts are disabled. Moreover, almost all
processor activities should be controlled by a real-time scheduler.
2. OS must manage the timing and scheduling
 OS has to be aware of deadlines and should have mechanism to take them
into account in the scheduling
 OS must provide precise time services with a high resolution

5-8
Embedded Operating Systems
Features and Architecture

5-9
Embedded Operating System
Device drivers are typically handled directly by tasks instead of drivers that are
managed by the operating system:
 This architecture improves timing predictability as access to devices is also handled by
the scheduler
 If several tasks use the same external device and the associated driver, then the access
must be carefully managed (shared critical resource, ensure fairness of access)

Embedded OS Standard OS

5 - 10
Embedded Operating Systems
Every task can perform an interrupt:
 For standard OS, this would be serious source of unreliability. But embedded
programs are typically programmed in a controlled environment.
 It is possible to let interrupts directly start or stop tasks (by storing the tasks start
address in the interrupt table). This approach is more efficient and predictable
than going through the operating system’s interfaces and services.

Protection mechanisms are not always necessary in embedded operating systems:


 Embedded systems are typically designed for a single purpose, untested programs
are rarely loaded, software can be considered to be reliable.
 However, protection mechanisms may be needed for safety and security reasons.

5 - 11
Main Functionality of RTOS-Kernels
Task management:
 Execution of quasi-parallel tasks on a processor using processes or threads (lightweight
process) by
 maintaining process states, process queuing,
 allowing for preemptive tasks (fast context switching) and quick interrupt handling
 CPU scheduling (guaranteeing deadlines, minimizing process waiting times, fairness in
granting resources such as computing power)
 Inter-task communication (buffering)
 Support of real-time clocks
 Task synchronization (critical sections, semaphores, monitors, mutual exclusion)
 In classical operating systems, synchronization and mutual exclusion is performed via
semaphores and monitors.
 In real-time OS, special semaphores and a deep integration of them into scheduling is
necessary (for example priority inheritance protocols as described in a later chapter).
5 - 12
Task States
Minimal Set of Task States:

delete

running
wait

blocked dispatch
preemption

signal ready
instantiate

5 - 13
Task states
Running:
 A task enters this state when it starts executing on the processor. There is as
most one task with this state in the system.

Ready:
 State of those tasks that are ready to execute but cannot be run because the
processor is assigned to another task, i.e. another task has the state “running”.

Blocked:
 A task enters the blocked state when it executes a synchronization primitive to
wait for an event, e.g. a wait primitive on a semaphore or timer. In this case,
the task is inserted in a queue associated with this semaphore. The task at the
head is resumed when the semaphore is unlocked by an event.
5 - 14
Multiple Threads within a Process

5 - 15
Threads
A thread is the smallest sequence of programmed instructions that can be
managed independently by a scheduler; e.g., a thread is a basic unit of CPU
utilization.

 Multiple threads can exist within the same process and share resources such
as memory, while different processes do not share these resources:
 Typically shared by threads: memory.
 Typically owned by threads: registers, stack.

 Thread advantages and characteristics:


 Faster to switch between threads; switching between user-level threads requires
no major intervention by the operating system.
 Typically, an application will have a separate thread for each distinct activity.
 Thread Control Block (TCB) stores information needed to manage and schedule a
thread
5 - 16
Threads
 The operating system maintains for each thread a data structure (TCB – thread control block)
that contains its current status such as program counter, priority, state, scheduling information,
thread name.
 The TCBs are administered in linked lists:

timer queue

5 - 17
Context Switch: Processes or Threads
process or thread P0 operating system process or thread P1
process control block or
thread control block

5 - 18
Embedded Operating Systems
Classes of Operating Systems

5 - 19
Class 1: Fast and Efficient Kernels
Fast and efficient kernels

For hard real-time systems, these kernels are questionable, because they are
designed to be fast, rather than to be predictable in every respect.

Examples include
FreeRTOS, QNX, eCOS, RT-LINUX, VxWORKS, LynxOS.

5 - 20
Class 2: Extensions to Standard OSs
Real-time extensions to standard OS:
 Attempt to exploit existing and comfortable main stream operating systems.
 A real-time kernel runs all real-time tasks.
 The standard-OS is executed as one task.

+ Crash of standard-OS does not affect RT-tasks;


- RT-tasks cannot use Standard-OS services;
less comfortable than expected revival of the concept:
hypervisor
5 - 21
Example: Posix 1.b RT-extensions to Linux
The standard scheduler of a general purpose operating system can be replaced by
a scheduler that exhibits (soft) real-time properties.

RT-Task RT-Task Init Bash Mozilla Special calls for real-time


as well as standard
operating system calls
POSIX 1.b scheduler
available.
Linux-Kernel
driver Simplifies programming,
but no guarantees for
I/O, interrupts
meeting deadlines are
provided.
Hardware

5 - 22
Example: RT Linux
 RT-tasks cannot use standard OS calls.
 Commercially available from fsmlabs and
WindRiver (www.fsmlabs.com)

5 - 23
Class 3: Research Systems
Research systems try to avoid limitations of existing real-time and embedded
operating systems.
 Examples include L4, seL4, NICTA, ERIKA, SHARK

Typical Research questions:


 low overhead memory protection,
 temporal protection of computing resources
 RTOS for on-chip multiprocessors
 quality of service (QoS) control (besides real-time constraints)
 formally verified kernel properties

List of current real-time operating systems:


http://en.wikipedia.org/wiki/Comparison_of_real-time_operating_systems
5 - 24
Embedded Operating Systems
FreeRTOS in the Embedded Systems Lab (ES-Lab)

5 - 25
Example: FreeRTOS (ES-Lab)
FreeRTOS (http://www.freertos.org/) is a typical embedded operating system. It is
available for many hardware platforms, open source and widely used in industry. It
is used in the ES-Lab.

 FreeRTOS is a real-time kernel (or real-time scheduler).


 Applications are organized as a collection of independent
threads of execution.
 Characteristics: Pre-emptive or co-operative operation,
queues, binary semaphores, counting semaphores,
mutexes (mutual exclusion), software timers,
stack overflow checking, trace recording, … .

5 - 26
Example: FreeRTOS (ES-Lab)
Typical directory structure (excerpts):

functions that implement the handling of tasks (threads)

implementation of linked list data type

implementation of queue and semaphore services

software timer functionality

directory containing all port specific source files

 FreeRTOS is configured by a header file called FreeRTOSConfig.h that


determines almost all configurations (co-operative scheduling vs. preemptive,
time-slicing, heap size, mutex, semaphores, priority levels, timers, …)
5 - 27
Embedded Operating Systems
FreeRTOS Task Management

5 - 28
Example FreeRTOS – Task Management
Tasks are implemented as threads.
 The functionality of a thread is implemented in form of a function:
 Prototype:

some name of task function pointer to task arguments

 Task functions are not allowed to return! They can be “killed” by a specific call to a
FreeRTOS function, but usually run forever in an infinite loop.
 Task functions can instantiate other tasks. Each created task is a separate
execution instance, with its own stack.
 Example: void vTask1( void *pvParameters ) {
volatile uint32_t ul; /* volatile to ensure ul is implemented. */
for( ;; ) {
• ... /* do something repeatedly */
• for( ul = 0; ul < 10000; ul++ ) { /* delay by busy loop */ }
• }
•}
5 - 29
Example FreeRTOS – Task Management
 Thread instantiation: a pointer to the function
that implements the task

a descriptive name for the task

each task has its own unique


returns pdPASS or pdFAIL
stack that is allocated by the
depending on the success
kernel to the task when the
of the thread creation
task is created; the
usStackDepth value
the priority at which the determines the size of the
task will execute; priority 0 stack (in words)
is the lowest priority
task functions accept a parameter
pxCreatedTask can be of type pointer to void; the
used to pass out a handle value assigned to pvParameters is
to the task being created. the value passed into the task.
5 - 30
Example FreeRTOS – Task Management
Examples for changing properties of tasks:
 Changing the priority of a task. In case of preemptive scheduling policy, the ready
task with the highest priority is automatically assigned to the “running” state.

handle of the task whose priority is being modified new priority (0 is lowest priority)

 A task can delete itself or any other task. Deleted tasks no longer exist and cannot
enter the “running” state again.

handle of the task who will be deleted; if NULL, then the caller will be deleted

5 - 31
Embedded Operating Systems
FreeRTOS Timers

5 - 32
Example FreeRTOS – Timers
 The operating system also provides interfaces to timers of the processor.
 As an example, we use the FreeRTOS timer interface to replace the busy loop by
a delay. In this case, the task is put into the “blocked” state instead of
continuously running.

time is measured in “tick” units that are defined in the


configuration of FreeRTOS (FreeRTOSConfig.h). The
function pdMS_TO_TICKS()converts ms to “ticks”.

void vTask1( void *pvParameters ) {


for( ;; ) {
• ... /* do something repeatedly */
• vTaskDelay(pdMS_TO_TICKS(250)); /* delay by 250 ms */
• }
•}

5 - 33
Example FreeRTOS – Timers
 Problem: The task does not execute strictly periodically:
execution of “something” task in ready state again
t
task moved to run state wait 250ms

 The parameters to vTaskDelayUntil() specify the exact tick count value at which
the calling task should be moved from the “blocked” state into the “ready” state.
Therefore, the task is put into the “ready” state periodically.
void vTask1( void *pvParameters ) { The xLastWakeTime variable needs to
TickType_t xLastWakeTime = xTaskGetTickCount(); be initialized with the current tick
for( ;; ) { count. Note that this is the only time
• ... /* do something repeatedly */
the variable is written to explicitly.
• vTaskDelayUntil(&xLastWakeTime, pdMS_TO_TICKS(250));
After this xLastWakeTime is
• }
•}
automatically updated within
vTaskDelayUntil().
automatically updated when task is unblocked time to next unblocking
5 - 34
Embedded Operating Systems
FreeRTOS Task States

5 - 35
Example FreeRTOS – Task States
What are the task states in FreeRTOS and the corresponding transitions? not much
used
 A task that is waiting for an event is said to be
in the “Blocked” state, which is a sub-state of
the “Not Running” state.
 Tasks can enter the “Blocked” state to wait for
two different types of event:
 Temporal (time-related) events—the event
being either a delay period expiring, or an
absolute time being reached.
 Synchronization events—where the events
originate from another task or interrupt. For
example, queues, semaphores, and mutexes, can
be used to create synchronization events.

5 - 36
Example FreeRTOS – Task States
Example 1: Two threads with equal priority.
void vTask1( void *pvParameters ) { void vTask2( void *pvParameters ) {
volatile uint32_t ul; volatile uint32_t u2;
for( ;; ) { for( ;; ) {
• ... /* do something repeatedly */ • ... /* do something repeatedly */
• for( ul = 0; ul < 10000; ul++ ) { } • for( u2 = 0; u2 < 10000; u2++ ) { }
• } • }
•} •}

int main( void ) {


xTaskCreate(vTask1, "Task 1", 1000, NULL, 1, NULL);
xTaskCreate(vTask2, "Task 2", 1000, NULL, 1, NULL);
vTaskStartScheduler();
for( ;; );
•}

Both tasks have priority 1. In this case,


FreeRTOS uses time slicing, i.e., every
task is put into “running” state in turn.

5 - 37
Example FreeRTOS – Task States
Example 2: Two threads with delay timer.
void vTask1( void *pvParameters ) { int main( void ) {
TickType_t xLastWakeTime = xTaskGetTickCount(); xTaskCreate(vTask1,"Task 1",1000,NULL,1,NULL);
for( ;; ) { xTaskCreate(vTask2,"Task 2",1000,NULL,2,NULL);
• ... /* do something repeatedly */ vTaskStartScheduler();
• vTaskDelayUntil(&xLastWakeTime,pdMS_TO_TICKS(250)); for( ;; );
} }
}

void vTask2( void *pvParameters ) {


TickType_t xLastWakeTime = xTaskGetTickCount();
for( ;; ) {
• ... /* do something repeatedly */
• vTaskDelayUntil(&xLastWakeTime,pdMS_TO_TICKS(250));
• }
}

If no user-defined task is in the running state,


FreeRTOS chooses a built-in Idle task with priority
0. One can associate a function to this task, e.g.,
in order to go to low power processor state. 5 - 38
Embedded Operating Systems
FreeRTOS Interrupts

5 - 39
Example FreeRTOS – Interrupts
How are tasks (threads) and hardware interrupts scheduled jointly?
 Although written in software, an interrupt service routine (ISR) is a hardware
feature because the hardware controls which interrupt service routine will run,
and when it will run.
 Tasks will only run when there are no ISRs running, so the lowest priority interrupt
will interrupt the highest priority task, and there is no way for a task to pre-empt
an ISR. In other words, ISRs have always a higher priority than any other task.

 Usual pattern:
 ISRs are usually very short. They find out the reason for the interrupt, clear the
interrupt flag and determine what to do in order to handle the interrupt.
 Then, they unblock a regular task (thread) that performs the necessary processing
related to the interrupt.
 For blocking and unblocking, usually semaphores are used.

5 - 40
Example FreeRTOS – Interrupts

blocking and
unblocking is
typically
implemented
via semaphores

5 - 41
Example FreeRTOS – Interrupts

5 - 42
Embedded Systems
6. Aperiodic and Periodic Scheduling

© Lothar Thiele
Computer Engineering and Networks Laboratory
Where we are …

1. Introduction to Embedded Systems


2. Software Development
3. Hardware-Software Interface
4. Programming Paradigms
Software Hardware-
5. Embedded Operating Systems
Software
6. Real-time Scheduling
7. Shared Resources
8. Hardware Components
Hardware 9. Power and Energy
10. Architecture Synthesis

6-2
Basic Terms and Models

6-3
Basic Terms
Real-time systems
 Hard: A real-time task is said to be hard, if missing its deadline may cause
catastrophic consequences on the environment under control. Examples are
sensory data acquisition, detection of critical conditions, actuator servoing.

 Soft: A real-time task is called soft, if meeting its deadline is desirable for
performance reasons, but missing its deadline does not cause serious damage to
the environment and does not jeopardize correct system behavior. Examples are
command interpreter of the user interface, displaying messages on the screen.

6-4
Schedule
Given a set of tasks J  {J1 , J 2 ,...}:
 A schedule is an assignment of tasks to the processor, such that each task is
executed until completion.
 A schedule can be defined as an integer step function  : R  N
where  (t ) denotes the task which is executed at time t. If
 (t )  0 then the processor is called idle.
 If  (t ) changes its value at some time, then the processor performs a context
switch.
 Each interval, in which  (t ) is constant is called a time slice.
 A preemptive schedule is a schedule in which the running task can be arbitrarily
suspended at any time, to assign the CPU to another task according to a
predefined scheduling policy.

6-5
Schedule and Timing
 A schedule is said to be feasible, if all task can be completed according to a set
of specified constraints.
 A set of tasks is said to be schedulable, if there exists at least one algorithm that
can produce a feasible schedule.
 Arrival time ai or release time ri is the time at which a task becomes ready for
execution.
 Computation time Ci is the time necessary to the processor for executing the
task without interruption.
 Deadline d i is the time at which a task should be completed.
 Start time si is the time at which a task starts its execution.
 Finishing time f i is the time at which a task finishes its execution.

6-6
Schedule and Timing

 Using the above definitions, we have di  ri  Ci


 Lateness Li  f i  di represents the delay of a task completion with respect to
its deadline; note that if a task completes before the deadline, its lateness is
negative.
 Tardiness or exceeding time Ei  max( 0, Li ) is the time a task stays active after
its deadline.
 Laxity or slack time X i  di  ai  Ci is the maximum time a task can be delayed
on its activation to complete within its deadline.

6-7
Schedule and Timing
 Periodic task  i : infinite sequence of identical activities, called instances or jobs,
that are regularly activated at a constant rate with period Ti . The activation
time of the first instance is called phase  i .
relative deadline

initial phase period deadline of period k


arrival time of instance k

instance 1 instance 2
6-8
Example for Real-Time Model
task J1 task J2

5 10 15 20 25
r1 r2 d1 d2

Computation times: C1 = 9, C2 = 12
Start times: s1 = 0, s2 = 6
Finishing times: f1 = 18, f2 = 28
Lateness: L1 = -4, L2 = 1
Tardiness: E1 = 0, E2 = 1
Laxity: X1 = 13, X2 = 11
6-9
Precedence Constraints
 Precedence relations between tasks can be described through an acyclic directed
graph G where tasks are represented by nodes and precedence relations by
arrows. G induces a partial order on the task set.

 There are different interpretations possible:


 All successors of a task are activated (concurrent task execution). We will use this
interpretation in the lecture.
 One successor of a task is activated: J1
non-deterministic choice.
J2 J3

J4 J5

6 - 10
Precedence Constraints
Example for concurrent activation:
 Image acquisition acq1 acq 2
 Low level image processing edge1 edge 2
 Feature/contour extraction shape
 Pixel disparities disp
 Object size H
 Object recognition rec

6 - 11
Classification of Scheduling Algorithms
 With preemptive algorithms, the running task can be interrupted at any time to
assign the processor to another active task, according to a predefined
scheduling policy.
 With a non-preemptive algorithm, a task, once started, is executed by the
processor until completion.

 Static algorithms are those in which scheduling decisions are based on fixed
parameters, assigned to tasks before their activation.
 Dynamic algorithms are those in which scheduling decisions are based on
dynamic parameters that may change during system execution.

6 - 12
Classification of Scheduling Algorithms
 An algorithm is said optimal if it minimizes some given cost function defined
over the task set.
 An algorithm is said to be heuristic if it tends toward but does not guarantee to
find the optimal schedule.
 Acceptance Test: The runtime system decides whenever a task is added to the
system, whether it can schedule the whole task set without deadline violations.

Example for the „domino


effect“, if an acceptance test
wrongly accepted a new task.

6 - 13
Metrics to Compare Schedules
1 n
 Average response time: tr    f i  ri 
n i 1
 Total completion time: tc  max  f i   min ri 
i i
n
 wi ( fi  ri )
 Weighted sum of response time: t w  i 1 n
 wi
i 1
 Maximum lateness: Lmax  max  f i  d i 
i
n
 Number of late tasks: N late   miss f i 
i 1
0 if f i  di
miss f i   
1 otherwise

6 - 14
Metrics Example
task J1 task J2

5 10 15 20 25
r1 r2 d1 d2

Average response time: tr  12 (18  24)  21


Total completion time: tc  28  0  28
Weighted sum of response times: w1  2, w2  1 : t w  2183 24  20
Number of late tasks: N late  1
Maximum lateness: Lmax  1

6 - 15
Metrics and Scheduling Example
In schedule (a), the maximum lateness is minimized, but all tasks miss their deadlines.
In schedule (b), the maximal lateness is larger, but only one task misses its deadline.

6 - 16
Real-Time Scheduling of Aperiodic Tasks

6 - 17
Overview Aperiodic Task Scheduling
Scheduling of aperiodic tasks with real-time constraints:
 Table with some known algorithms:

Equal arrival times Arbitrary arrival times


non preemptive preemptive

Independent EDD EDF (Horn)


tasks (Jackson)
Dependent LDF (Lawler) EDF* (Chetto)
tasks

6 - 18
Earliest Deadline Due (EDD)
Jackson’s rule: Given a set of n tasks. Processing in order of non-decreasing
deadlines is optimal with respect to minimizing the maximum lateness.

6 - 19
Earliest Deadline Due (EDD)
Example 1:

6 - 20
Earliest Deadline Due (EDD)
Jackson’s rule: Given a set of n tasks. Processing in order of non-decreasing
deadlines is optimal with respect to minimizing the maximum lateness.

Proof concept:

6 - 21
Earliest Deadline Due (EDD)
Example 2:

6 - 22
Earliest Deadline First (EDF)
Horn’s rule: Given a set of n independent tasks with arbitrary arrival times, any
algorithm that at any instant executes a task with the earliest absolute deadline
among the ready tasks is optimal with respect to minimizing the maximum
lateness.

6 - 23
Earliest Deadline First (EDF)
Example:

6 - 24
Earliest Deadline First (EDF)
Horn’s rule: Given a set of n independent tasks with arbitrary arrival times, any
algorithm that at any instant executes the task with the earliest absolute deadline
among the ready tasks is optimal with respect to minimizing the maximum
lateness.

Concept of proof:
For each time interval t , t  1 it is verified, whether the actual running task is
the one with the earliest absolute deadline. If this is not the case, the task with the
earliest absolute deadline is executed in this interval instead. This operation cannot
increase the maximum lateness.

6 - 25
Earliest Deadline First (EDF)
which task is
executing ?

which task has


earliest deadline ?

time slice

slice for
interchange

situation after
interchange

6 - 26
remaining worst-
Earliest Deadline First (EDF) case execution time
of task k
Acceptance test: i
 worst case finishing time of task i: f i  t   ck (t )
k 1
i
 EDF guarantee condition: i  1,..., n t   ck (t )  d i
k 1

 algorithm:
Algorithm: EDF_guarantee (J, Jnew)
{ J‘=J{Jnew}; /* ordered by deadline */
t = current_time();
f0 = t;
for (each JiJ‘) {
fi = fi-1 + ci(t);
if (fi > di) return(INFEASIBLE);
}
return(FEASIBLE);
}

6 - 27
Earliest Deadline First (EDF*)
 The problem of scheduling a set of n tasks with precedence constraints
(concurrent activation) can be solved in polynomial time complexity if tasks are
preemptable.

 The EDF* algorithm determines a feasible schedule in the case of tasks with
precedence constraints if there exists one.

 By the modification it is guaranteed that if there exists a valid schedule at all


then
 a task starts execution not earlier than its release time and not earlier than the
finishing times of its predecessors (a task cannot preempt any predecessor)
 all tasks finish their execution within their deadlines

6 - 28
EDF*

6 - 29
EDF*

6 - 30
Earliest Deadline First (EDF*)
Modification of deadlines:
 Task must finish the execution time within its deadline.
 Task must not finish the execution later than the maximum start time of its
successor.
task b depends on task a: J a  Jb

 Solution: di *  min di , min d j * C j : J i  J j  i

j
6 - 31
Earliest Deadline First (EDF*)
Modification of release times:
 Task must start the execution not earlier than its release time.
 Task must not start the execution earlier than the minimum finishing time of its
predecessor.

task b depends on task a: J a  Jb

 Solution: r j *  maxr j , maxri * Ci : J i  J j 


j

i 6 - 32
Earliest Deadline First (EDF*)
Algorithm for modification of release times:
1. For any initial node of the precedence graph set ri *  ri
2. Select a task j such that its release time has not been modified but the release times of
all immediate predecessors i have been modified. If no such task exists, exit.
 
3. Set r j *  max r j , max ri * Ci : J i  J j 
4. Return to step 2

Algorithm for modification of deadlines:


1. For any terminal node of the precedence graph set di *  di
2. Select a task i such that its deadline has not been modified but the deadlines of all
immediate successors j have been modified. If no such task exists, exit.
 
3. Set di *  min di , min d j * C j : J i  J j 
4. Return to step 2

6 - 33
Earliest Deadline First (EDF*)
Proof concept:
 Show that if there exists a feasible schedule for the modified task set under EDF
then the original task set is also schedulable. To this end, show that the original
task set meets the timing constraints also. This can be done by using ri *  ri ,
di *  di ; we only made the constraints stricter.
 Show that if there exists a schedule for the original task set, then also for the
modified one. We can show the following: If there exists no schedule for the
modified task set, then there is none for the original task set. This can be done by
showing that no feasible schedule was excluded by changing the deadlines and
release times.
 In addition, show that the precedence relations in the original task set are not
violated. In particular, show that
 a task cannot start before its predecessor and
 a task cannot preempt its predecessor.

6 - 34
Real-Time Scheduling of Periodic Tasks

6 - 35
Overview
Table of some known preemptive scheduling algorithms for periodic tasks:

Deadline equals Deadline smaller than


period period
static RM DM
priority (rate-monotonic) (deadline-monotonic)
dynamic EDF EDF*
priority

6 - 36
Model of Periodic Tasks
 Examples: sensory data acquisition, low-level actuation, control loops, action
planning and system monitoring.
 When an application consists of several concurrent periodic tasks with individual
timing constraints, the OS has to guarantee that each periodic instance is
regularly activated at its proper rate and is completed within its deadline.

 Definitions:
 : denotes a set of periodic tasks
 i : denotes a periodic task
 i, j : denotes the jth instance of task i
ri , j , si , j , f i , j , di , j : denote the release time, start time, finishing time, absolute
deadline of the jth instance of task i
i : denotes the phase of task i (release time of its first instance)
Di : denotes the relative deadline of task i
Ti : denotes the period of task i

6 - 37
Model of Periodic Tasks
 The following hypotheses are assumed on the tasks:
 The instances of a periodic task are regularly activated at a constant rate. The
interval Ti between two consecutive activations is called period. The release times
satisfy
ri, j   i   j  1Ti

 All instances have the same worst case execution time Ci


 All instances of a periodic task have the same relative deadline Di . Therefore, the
absolute deadlines satisfy
di , j   i   j  1Ti  Di

 Often, the relative deadline equals the period Di  Ti (implicit deadline), and
therefore
di , j   i  jTi
6 - 38
Model of Periodic Tasks
 The following hypotheses are assumed on the tasks (continued):
 All periodic tasks are independent; that is, there are no precedence relations and
no resource constraints.
 No task can suspend itself, for example on I/O operations.
 All tasks are released as soon as they arrive.
 All overheads in the OS kernel are assumed to be zero.
 Example:
Ti
i
i Di  i ,3

ri ,1 ri , 2 si ,3 f i ,3
Ci

6 - 39
Rate Monotonic Scheduling (RM)
 Assumptions:
 Task priorities are assigned to tasks before execution and do not change over time
(static priority assignment).
 RM is intrinsically preemptive: the currently executing job is preempted by a job of
a task with higher priority.
 Deadlines equal the periods Di  Ti .

Rate-Monotonic Scheduling Algorithm: Each task is assigned a priority. Tasks with


higher request rates (that is with shorter periods) will have higher priorities. Jobs of
tasks with higher priority interrupt jobs of tasks with lower priority.

6 - 40
Periodic Tasks
Example: 2 tasks, deadlines = periods, utilization = 97%

6 - 41
Rate Monotonic Scheduling (RM)
Optimality: RM is optimal among all fixed-priority assignments in the sense that
no other fixed-priority algorithm can schedule a task set that cannot be
scheduled by RM.

 The proof is done by considering several cases that may occur, but the main
ideas are as follows:
 A critical instant for any task occurs whenever the task is released
simultaneously with all higher priority tasks. The tasks schedulability can easily
be checked at their critical instants. If all tasks are feasible at their critical
instant, then the task set is schedulable in any other condition.
 Show that, given two periodic tasks, if the schedule is feasible by an arbitrary
priority assignment, then it is also feasible by RM.
 Extend the result to a set of n periodic tasks.

6 - 42
Proof of Critical Instance
Definition: A critical instant of a task is the time at which the release of a job
will produce the largest response time.

Lemma: For any task, the critical instant occurs if a job is simultaneously
released with all higher priority jobs.

Proof sketch: Start with 2 tasks 1 and 2 .

Response time of a job of 2 is delayed by jobs of 1 of higher priority:


•2

•1
C2+2C1 t

6 - 43
Proof of Critical Instance
Delay may increase if 1 starts earlier:
•2

•1
C2+3C1 t

Maximum delay achieved if 2 and 1 start simultaneously.

Repeating the argument for all higher priority tasks of some task 2 :
• The worst case response time of a job occurs when it
• is released simultaneously with all higher-priority jobs.

6 - 44
Proof of RM Optimality (2 Tasks)
We have two tasks 1, 2 with periods T1 < T2.
Define F= T2/T1: the number of periods of 1 fully contained in T2

Consider two cases A and B:


Case A: Assume RM is not used  prio(2) is highest:
T1
•1
C1
T2
•2
t
C2
• Schedule is feasible if C1+C2  T1 and C2  T2 (A)

6 - 45
Proof of RM Optimality (2 Tasks)
Case B: Assume RM is used  prio(1) is highest:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable is feasible if
FC1+C2+min(T2–FT1, C1)  T2 and C1  T1 (B)

We need to show that (A)  (B): C1+C2  T1  C1  T1


C1+C2  T1  FC1+C2  FC1+FC2  FT1 
FC1+C2+min(T2–FT1, C1)  FT1 +min(T2–FT1, C1)  min(T2, C1+FT1)  T2

Given tasks 1 and 2 with T1 < T2, then if the schedule is feasible by an
arbitrary fixed priority assignment, it is also feasible by RM.
6 - 46
Proof of RM Optimality (2 Tasks)
Case B: Assume RM is used  prio(1) is highest:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable is feasible if
FC1+C2+min(T2–FT1, C1)  T2 and C1  T1 (B)

We need to show that (A)  (B): C1+C2  T1  C1  T1


C1+C2  T1  FC1+C2  FC1+FC2  FT1 
FC1+C2+min(T2–FT1, C1)  FT1 +min(T2–FT1, C1)  min(T2, C1+FT1)  T2

Given tasks 1 and 2 with T1 < T2, then if the schedule is feasible by an
arbitrary fixed priority assignment, it is also feasible by RM.
6 - 47
Proof of RM Optimality (2 Tasks)
Case B: Assume RM is used  prio(1) is highest:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable is feasible if
FC1+C2+min(T2–FT1, C1)  T2 and C1  T1 (B)

We need to show that (A)  (B): C1+C2  T1  C1  T1


C1+C2  T1  FC1+C2  FC1+FC2  FT1 
FC1+C2+min(T2–FT1, C1)  FT1 +min(T2–FT1, C1)  min(T2, C1+FT1)  T2

Given tasks 1 and 2 with T1 < T2, then if the schedule is feasible by an
arbitrary fixed priority assignment, it is also feasible by RM.
6 - 48
Proof of RM Optimality (2 Tasks)
Case B: Assume RM is used  prio(1) is highest:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable is feasible if
FC1+C2+min(T2–FT1, C1)  T2 and C1  T1 (B)

We need to show that (A)  (B): C1+C2  T1  C1  T1


C1+C2  T1  FC1+C2  FC1+FC2  FT1 
FC1+C2+min(T2–FT1, C1)  FT1 +min(T2–FT1, C1)  min(T2, C1+FT1)  T2

Given tasks 1 and 2 with T1 < T2, then if the schedule is feasible by an
arbitrary fixed priority assignment, it is also feasible by RM.
6 - 49
Proof of RM Optimality (2 Tasks)
Case B: Assume RM is used  prio(1) is highest:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable is feasible if
FC1+C2+min(T2–FT1, C1)  T2 and C1  T1 (B)

We need to show that (A)  (B): C1+C2  T1  C1  T1


C1+C2  T1  FC1+C2  FC1+FC2  FT1 
FC1+C2+min(T2–FT1, C1)  FT1 +min(T2–FT1, C1)  min(T2, C1+FT1)  T2

Given tasks 1 and 2 with T1 < T2, then if the schedule is feasible by an
arbitrary fixed priority assignment, it is also feasible by RM.
6 - 50
Admittance Test

6 - 51
Rate Monotonic Scheduling (RM)
Schedulability analysis: A set of periodic tasks is schedulable with RM if

 
n
Ci
T  n 21/ n
1
i 1 i

This condition is sufficient but not necessary.

n
Ci
The term U   denotes the processor
T
i 1 i

utilization factor U which is the fraction of processor


time spent in the execution of the task set.

6 - 52
Proof of Utilization Bound (2 Tasks)
We have two tasks 1, 2 with periods T1 < T2.
Define F= T2/T1: number of periods of 1 fully contained in T2

Proof Concept: Compute upper bound on utilization U such that the task set is
still schedulable:
 assign priorities according to RM;
 compute upper bound Uup by increasing the computation time C2 to just
meet the deadline of 2; we will determine this limit of C2 using the results
of the RM optimality proof.
 minimize upper bound with respect to other task parameters in order to
find the utilization below which the system is definitely schedulable.
6 - 53
Proof of Utilization Bound (2 Tasks)
As before:
•1
C1 T2–FT1
•2 t
FT1 T2
Schedulable if FC1+C2+min(T2–FT1, C1)  T2 and C1  T1

Utilization:

6 - 54
Proof of Utilization Bound (2 Tasks)

6 - 55
Proof of Utilization Bound (2 Tasks)
Minimize utilization bound w.r.t C1:
 If C1  T2–FT1 then U decreases with increasing C1
 If T2–FT1  C1 then U decreases with decreasing C1
 Therefore, minimum U is obtained with C1 = T2–FT1 :

We now need to minimize w.r.t. G =T2/T1 where F = T2/T1 and T1 < T2. As F is
integer, we first suppose that it is independent of G = T2/T1. Then we obtain

6 - 56
Proof of Utilization Bound (2 Tasks)
Minimizing U with respect to G yields

If we set F = 1, then we obtain

It can easily be checked, that all other integer values for F lead to a larger upper
bound on the utilization.

6 - 57
Deadline Monotonic Scheduling (DM)
 Assumptions are as in rate monotonic scheduling, but deadlines may be smaller
than the period, i.e.
Ci  Di  Ti

Algorithm: Each task is assigned a priority. Tasks with smaller relative deadlines will
have higher priorities. Jobs with higher priority interrupt jobs with lower priority.

 Schedulability Analysis: A set of periodic tasks is schedulable with DM if

 
n
Ci
D  n 21/ n
1
i 1 i
This condition is sufficient but not necessary (in general).
6 - 58
Deadline Monotonic Scheduling (DM) - Example
 
n
Ci
U = 0.874 D  1 . 08  n 21/ n
 1  0.757
i 1 i

1

1 10
2

1 10
3

1 10
4

1 10
6 - 59
Deadline Monotonic Scheduling (DM)
There is also a necessary and sufficient schedulability test which is computationally
more involved. It is based on the following observations:
 The worst-case processor demand occurs when all tasks are released
simultaneously; that is, at their critical instances.
 For each task i, the sum of its processing time and the interference imposed
by higher priority tasks must be less than or equal to Di .
 A measure of the worst case interference for task i can be computed as the
sum of the processing times of all higher priority tasks released before some
time t where tasks are ordered according to m  n  Dm  Dn :

i 1 
t 
I i    C j
j 1  j 
T

6 - 60
Deadline Monotonic Scheduling (DM)
 The longest response time Ri of a job of a periodic task i is computed, at the
critical instant, as the sum of its computation time and the interference due to
preemption by higher priority tasks:
Ri  Ci  I i

 Hence, the schedulability test needs to compute the smallest Ri that satisfies
i 1 
Ri 
Ri  Ci    C j
j 1  T j 

for all tasks i. Then, Ri  Di must hold for all tasks i.


 It can be shown that this condition is necessary and sufficient.

6 - 61
Deadline Monotonic Scheduling (DM)
The longest response times Ri of the periodic tasks i can be computed iteratively
by the following algorithm:

Algorithm: DM_guarantee ()


{ for (each i){
I = 0;
do {
R = I + Ci;
if (R > Di) return(UNSCHEDULABLE);
I = j=1,…,(i-1)R/Tj Cj;
} while (I + Ci > R);
}
return(SCHEDULABLE);
}

6 - 62
DM Example
Example:
 Task 1: C1  1; T1  4; D1  3
 Task 2: C2  1; T2  5; D2  4
 Task 3: C3  2; T3  6; D3  5
 Task 4: C4  1; T4  11; D4  10
 Algorithm for the schedulability test for task 4:
 Step 0: R4  1
 Step 1: R4  5
 Step 2: R4  6
 Step 3: R4  7
 Step 4: R4  9
 Step 5: R4  10

6 - 63
DM Example
 
n
Ci
U = 0.874 D  1 . 08  n 21/ n
 1  0.757
i 1 i
1

1 10
2

1 10
3

1 10
4

1 10

6 - 64
EDF Scheduling (earliest deadline first)
 Assumptions:
 dynamic priority assignment
 intrinsically preemptive

 Algorithm: The currently executing task is preempted whenever another


periodic instance with earlier deadline becomes active.
di , j   i   j  1Ti  Di

 Optimality: No other algorithm can schedule a set of periodic tasks if the set that
can not be scheduled by EDF.
 The proof is simple and follows that of the aperiodic case.

6 - 65
Periodic Tasks
Example: 2 tasks, deadlines = periods, utilization = 97%

6 - 66
EDF Scheduling
A necessary and sufficient schedulability test for Di  Ti :

n
Ci
A set of periodic tasks is schedulable with EDF if and only if   U  1
i 1 Ti

nCi
The term U   denotes the average processor utilization.
i 1 Ti

6 - 67
EDF Scheduling
 If the utilization satisfies U  1, then there is no valid schedule: The total
demand of computation time in interval T  T1  T2  ...  Tn is
n
Ci
i 1 Ti
T  UT  T

and therefore, it exceeds the available processor time in this interval.

 If the utilization satisfies U  1 , then there is a valid schedule.

We will proof this fact by contradiction: Assume that deadline is missed at some
time t2 . Then we will show that the utilization was larger than 1.

6 - 68
6 - 69
EDF Scheduling
 If the deadline was missed at t2 then define t1 as a time before t2 such that (a) the processor is
continuously busy in [t1, t2 ] and (b) the processor only executes tasks that have their arrival
time AND their deadline in [t1, t2 ].
 Why does such a time t1 exist? We find such a t1 by starting at t2 and going backwards in time,
always ensuring that the processor only executed tasks that have their deadline before or at t2 :
 Because of EDF, the processor will be busy shortly before t2 and it executes on the task that has
deadline at t2.
 Suppose that we reach a time such that shortly before the processor works on a task with deadline
after t2 or the processor is idle, then we found t1: We know that there is no execution on a task with
deadline after t2 .
 But it could be in principle, that a task that arrived before t1 is executing in [t1, t2 ].
 If the processor is idle before t1, then this is clearly not possible due to EDF (the processor is not idle, if
there is a ready task).
 If the processor is not idle before t1, this is not possible as well. Due to EDF, the processor will always
work on the task with the closest deadline and therefore, once starting with a task with deadline after t2
all task with deadlines before t2 are finished.
6 - 70
6 - 71
EDF Scheduling
 Within the interval t1 ,t2  the total computation time demanded by the periodic
tasks is bounded by
n
 t2  t1  n
t 2  t1
C p (t1 , t 2 )    Ci   Ci  t2  t1 U
i 1  Ti  i 1 Ti

number of complete periods


of task i in the interval

 Since the deadline at time t2 is missed, we must have:

t2  t1  C p t1 , t2   t2  t1 U  U  1

6 - 72
Periodic Task Scheduling
Example: 2 tasks, deadlines = periods, utilization = 97%

6 - 73
Real-Time Scheduling of Mixed Task Sets

6 - 74
Problem of Mixed Task Sets
In many applications, there are aperiodic as well as periodic tasks.

 Periodic tasks: time-driven, execute critical control activities with hard timing
constraints aimed at guaranteeing regular activation rates.
 Aperiodic tasks: event-driven, may have hard, soft, non-real-time requirements
depending on the specific application.
 Sporadic tasks: Offline guarantee of event-driven aperiodic tasks with critical
timing constraints can be done only by making proper assumptions on the
environment; that is by assuming a maximum arrival rate for each critical event.
Aperiodic tasks characterized by a minimum interarrival time are called
sporadic.

6 - 75
Background Scheduling
Background scheduling is a simple solution for RM and EDF:
 Processing of aperiodic tasks in the background, i.e. execute if there are no
pending periodic requests.
 Periodic tasks are not affected.
 Response of aperiodic tasks may be prohibitively long and there is no possibility to
assign a higher priority to them.
 Example:

6 - 76
Background Scheduling
Example (rate monotonic periodic schedule):

6 - 77
Rate-Monotonic Polling Server
 Idea: Introduce an artificial periodic task whose purpose is to service aperiodic
requests as soon as possible (therefore, “server”).
 Function of polling server (PS)
 At regular intervals equal to Ts , a PS task is instantiated. When it has the highest
current priority, it serves any pending aperiodic requests within the limit of its
capacity Cs .
 If no aperiodic requests are pending, PS suspends itself until the beginning of the
next period and the time originally allocated for aperiodic service is not preserved
for aperiodic execution.
 Its priority (period!) can be chosen to match the response time requirement for
the aperiodic tasks.
 Disadvantage: If an aperiodic requests arrives just after the server has
suspended, it must wait until the beginning of the next polling period.

6 - 78
Rate-Monotonic Polling Server
Example:

server has current


highest priority
and checks the
queue of tasks

remaining budget is lost


6 - 79
Rate-Monotonic Polling Server
Schedulability analysis of periodic tasks:
 The interference by a server task is the same as the one introduced by an
equivalent periodic task in rate-monotonic fixed-priority scheduling.
 A set of periodic tasks and a server task can be executed within their deadlines if

Cs n Ci

Ts i 1 Ti

 (n  1) 21/( n1)  1

 Again, this test is sufficient but not necessary.

6 - 80
Rate-Monotonic Polling Server
Guarantee the response time of aperiodic requests:
 Assumption: An aperiodic task is finished before a new aperiodic request
arrives.
 Computation time Ca , deadline Da
 Sufficient schedulability test:
If the server task
 Ca  has the highest
(1    )Ts  Da priority there is a
 Cs  necessary test also.

The aperiodic task arrives


shortly after the activation
of the server task. Maximal number of
necessary server periods.

6 - 81
EDF – Total Bandwidth Server
Total Bandwidth Server:
 When the kth aperiodic request arrives at time t = rk, it receives a deadline

Ck
d k  max( rk , d k 1 ) 
Us

where Ck is the execution time of the request and Us is the server utilization
factor (that is, its bandwidth). By definition, d0=0.

 Once a deadline is assigned, the request is inserted into the ready queue of
the system as any other periodic instance.

6 - 82
6 - 83
EDF – Total Bandwidth Server
Example:
U p  0.75, U s  0.25, U p  U s  1

6 - 84
EDF – Total Bandwidth Server
Schedulability test:
Given a set of n periodic tasks with processor utilization Up and a total bandwidth
server with utilization Us, the whole set is schedulable by EDF if and only if

U p Us  1

Proof:
 In each interval of time [t1, t2 ] , if Cape is the total execution time demanded by
aperiodic requests arrived at t1 or later and served with deadlines less or equal to
t2, then
Cape  (t2  t1 )U s

6 - 85
EDF – Total Bandwidth Server
If this has been proven, the proof of the schedulability test follows closely that of the
periodic case.
Proof of lemma:
k2
C ape  C
k  k1
k

k2
 U s  (d k  max( rk , d k 1 ))
k  k1


 U s d k 2  max( rk1 , d k1 1 ) 
 U s (t 2  t1 )

6 - 86
Embedded Systems

6a. Example Network Processor

Lothar Thiele

Swiss Federal Computer Engineering


Institute of Technology 6a - 1 and Networks Laboratory
Software-Based NP

Network Processor:
Programmable Processor Optimized to
Perform Packet Processing

How to Schedule the CPU cycles meaningfully?


 Differentiating the level of service given to different flows
 Each flow being processed by a different processing function

Swiss Federal Computer Engineering


Institute of Technology 6a - 2 and Networks Laboratory
Our Model – Simple NP
Real-Time Flows (RT)

packet
Best Effort Flows (BE) processor

Real-time flows have deadlines which must be met

Best effort flows may have several QoS classes and


should be served to achieve maximum throughput

Swiss Federal Computer Engineering


Institute of Technology 6a - 3 and Networks Laboratory
Task Model
Packet processing
functions may be
represented by directed
acyclic graphs

End-to-end deadlines for


RT packets
security

voice processing

Swiss Federal Computer Engineering


Institute of Technology 6a - 4 and Networks Laboratory
Architecture
Real-time
Flows
Packet Processing functions
F1

F2
Input ports

F3

Fn

Best effort
flows CPU Scheduler
Swiss Federal Computer Engineering
Institute of Technology 6a - 5 and Networks Laboratory
CPU Scheduling
First Schedule RT, then BE (background scheduling)
 Overly pessimistic

Use EDF Total Bandwidth Server


 EDF for Real-Time tasks
 Use the remaining bandwidth to server Best Effort Traffic
 WFQ (weighted fair queuing) to determine which best effort
flow to serve; not discussed here …

Swiss Federal Computer Engineering


Institute of Technology 6a - 6 and Networks Laboratory
CPU Scheduling
Real-time
Flows
Packet Processing functions
F1

F2

F3

Fn
WFQ

Best effort
flows

Swiss Federal Computer Engineering


Institute of Technology 6a - 7 and Networks Laboratory
CPU Scheduling
As discussed, the basis is the TBS:
computation demand of best effort packet

deadline of best effort packet


utilization by real-time flows
arrival of best effort packet

But: utilization depends on time (packet streams) !


 Just taking upper bound is too pessimistic
 Solution with time dependent utilization is (much) more
complex – BUT IT HELPS …

Swiss Federal Computer Engineering


Institute of Technology 6a - 8 and Networks Laboratory
CPU Scheduling
Before

deadline
RT flows

Swiss Federal Computer Engineering


Institute of Technology 6a - 9 and Networks Laboratory
CPU Scheduling
After
deadline
RT flows

Swiss Federal Computer Engineering


Institute of Technology 6a - 10 and Networks Laboratory
Embedded Systems
7. Shared Resources

© Lothar Thiele
Computer Engineering and Networks Laboratory
Where we are …

1. Introduction to Embedded Systems


2. Software Development
3. Hardware-Software Interface
4. Programming Paradigms
Software Hardware-
5. Embedded Operating Systems
Software
6. Real-time Scheduling
7. Shared Resources
8. Hardware Components
Hardware 9. Power and Energy
10. Architecture Synthesis

7-2
Ressource Sharing

7-3
Resource Sharing
 Examples of shared resources: data structures, variables, main memory area,
file, set of registers, I/O unit, … .
 Many shared resources do not allow simultaneous accesses but require mutual
exclusion. These resources are called exclusive resources. In this case, no two
threads are allowed to operate on the resource at the same time.

 There are several methods available to protect exclusive resources, for example
 disabling interrupts and preemption or
 using concepts like semaphores
and mutex that put threads into the
blocked state if necessary.

7-4
Protecting Exclusive Resources using Semaphores
 Each exclusive resource Ri
must be protected by a different
semaphore Si . Each critical
section operating on a resource
must begin with a wait(Si)
primitive and end with a
signal(Si) primitive.

 All tasks blocked on the same resource are kept in a queue associated with the
semaphore. When a running task executes a wait on a locked semaphore, it
enters a blocked state, until another tasks executes a signal primitive that
unlocks the semaphore.
7-5
Example FreeRTOS (ES-Lab)
To ensure data consistency is maintained at all times access to a resource that is
shared between tasks, or between tasks and interrupts, must be managed using a
‘mutual exclusion’ technique.

One possibility is to disable all interrupts:


...
taskENTER_CRITICAL();
... /* access to some exclusive resource */
taskEXIT_CRITICAL();
...

This kind of critical sections must be kept very short, otherwise they will adversely
affect interrupt response times.

7-6
Example FreeRTOS (ES-Lab)
Another possibility is to use mutual exclusion: In FreeRTOS, a mutex is a special type of
semaphore that is used to control access to a resource that is shared between two or
more tasks. A semaphore that is used for mutual exclusion must always be returned:

 When used in a mutual exclusion scenario, the mutex can be thought of as a


token that is associated with the resource being shared.

 For a task to access the resource legitimately, it must first successfully ‘take’
the token (be the token holder). When the token holder has finished with the
resource, it must ‘give’ the token back.

 Only when the token has been returned can another task successfully take the
token, and then safely access the same shared resource.

7-7
Example FreeRTOS (ES-Lab)

7-8
Example FreeRTOS (ES-Lab)
some defined constant for infinite timeout;
otherwise, the function would return if the
Example: create mutex semaphore mutex was not available for the specified time

SemaphoreHandle_t xMutex; void vTask1( void *pvParameters ) {


for( ;; ) {
int main( void ) { ...
xMutex = xSemaphoreCreateMutex(); xSemaphoreTake(xMutex,portMAX_DELAY);
if( xMutex != NULL ) { ... /* access to exclusive resource */
xTaskCreate(vTask1,“Task1",1000,NULL,1,NULL); xSemaphoreGive(xMutex);
xTaskCreate(vTask2,“Task2",1000,NULL,2,NULL); ... }
vTaskStartScheduler(); }
}
for( ;; ); void vTask2( void *pvParameters ) {
} for( ;; ) {
...
xSemaphoreTake(xMutex,portMAX_DELAY);
... /* access to exclusive resource */
xSemaphoreGive(xMutex);
... }
}
7-9
Ressource Sharing
Priority Inversion

7 - 10
Priority Inversion (1)
Unavoidable blocking:

7 - 11
Priority Inversion (2)
Priority Inversion:

can last arbitrarily long

[But97, S.184]

7 - 12
Solutions to Priority Inversion
Disallow preemption during the execution of all critical sections. Simple approach,
but it creates unnecessary blocking as unrelated tasks may be blocked.

7 - 13
Resource Access Protocols
Basic idea: Modify the priority of those tasks that cause blocking. When a task Ji
blocks one or more higher priority tasks, it temporarily assumes a higher priority.

Specific Methods:
 Priority Inheritance Protocol (PIP), for static priorities
 Priority Ceiling Protocol (PCP), for static priorities
 Stack Resource Policy (SRP),
for static and dynamic priorities
 others …

7 - 14
Priority Inheritance Protocol (PIP)
Assumptions:
n tasks which cooperate through m shared resources; fixed priorities, all
critical sections on a resource begin with a wait(Si) and end with a
signal(Si) operation.

Basic idea:
When a task Ji blocks one or more higher priority tasks, it temporarily assumes
(inherits) the highest priority of the blocked tasks.

Terms:
We distinguish a fixed nominal priority Pi and an active priority pi larger or
equal to Pi. Jobs J1, …Jn are ordered with respect to nominal priority where J1
has highest priority. Jobs do not suspend themselves.
7 - 15
Priority Inheritance Protocol (PIP)
Algorithm:
 Jobs are scheduled based on their active priorities. Jobs with the same priority are
executed in a FCFS discipline.
 When a job Ji tries to enter a critical section and the resource is blocked by a lower
priority job, the job Ji is blocked. Otherwise it enters the critical section.
 When a job Ji is blocked, it transmits its active priority to the job Jk that holds the
semaphore. Jk resumes and executes the rest of its critical section with a priority
pk=pi (it inherits the priority of the highest priority of the jobs blocked by it).
 When Jk exits a critical section, it unlocks the semaphore and the highest priority
job blocked on that semaphore is awakened. If no other jobs are blocked by Jk,
then pk is set to Pk, otherwise it is set to the highest priority of the jobs blocked by
Jk.
 Priority inheritance is transitive, i.e. if 1 is blocked by 2 and 2 is blocked by 3, then
3 inherits the priority of 1 via 2.

7 - 16
Priority Inheritance Protocol (PIP)
Example:

Direct Blocking: higher-priority job tries to acquire a resource held by a lower-priority job
Push-through Blocking: medium-priority job is blocked by a lower-priority job that has
.

inherited a higher priority from a job it directly blocks


7 - 17
Priority Inheritance Protocol (PIP)
Example with nested critical sections:
priority does not change

a a a

[But97, S. 189]

7 - 18
Priority Inheritance Protocol (PIP)
Example of transitive priority inheritance:
J1 blocked by J2, J2 blocked by J3.
J3 inherits priority from J1 via J2.

[But97, S. 190]

7 - 19
Priority Inheritance Protocol (PIP)
Still a Problem: Deadlock
…. but there are other protocols like the Priority Ceiling Protocol …

[But97, S. 200]

7 - 20
The MARS Pathfinder Problem (1)
“But a few days into the mission, not long after Pathfinder started gathering
meteorological data, the spacecraft began experiencing total system resets, each
resulting in losses of data.

7 - 21
The MARS Pathfinder Problem (2)
“VxWorks provides preemptive priority scheduling of threads. Tasks on the
Pathfinder spacecraft were executed as threads with priorities that were assigned
in the usual manner reflecting the relative urgency of these tasks.”

“Pathfinder contained an "information bus", which you can think of as a shared


memory area used for passing information between different components of the
spacecraft.”

 A bus management task ran frequently with high priority to move certain kinds of
data in and out of the information bus. Access to the bus was synchronized with
mutual exclusion locks (mutexes).”

7 - 22
The MARS Pathfinder Problem (3)
 The meteorological data gathering task ran as an infrequent, low priority thread.
When publishing its data, it would acquire a mutex, do writes to the bus, and release
the mutex.
 The spacecraft also contained a communications task that ran with medium priority.

High priority: retrieval of data from shared memory


Medium priority: communications task
Low priority: thread collecting meteorological data

7 - 23
The MARS Pathfinder Problem (4)
“Most of the time this combination worked fine.

However, very infrequently it was possible for an interrupt to occur that caused the
(medium priority) communications task to be scheduled during the short interval
while the (high priority) information bus thread was blocked waiting for the (low
priority) meteorological data thread. In this case, the long-running communications
task, having higher priority than the meteorological task, would prevent it from
running, consequently preventing the blocked information bus task from running.

After some time had passed, a watchdog timer would go off, notice that the data
bus task had not been executed for some time, conclude that something had gone
drastically wrong, and initiate a total system reset. This scenario is a classic case of
priority inversion.”
7 - 24
Priority Inversion on Mars
Priority inheritance also solved the Mars Pathfinder problem: the VxWorks
operating system used in the pathfinder implements a flag for the calls to mutex
primitives. This flag allows priority inheritance to be set to “on”. When the
software was shipped, it was set to “off”.

The problem on Mars was corrected


by using the debugging facilities of
VxWorks to change the flag to “on”,
while the Pathfinder was already on
the Mars [Jones, 1997].

7 - 25
Timing Anomalies

7 - 26
Timing Anomaly
Suppose, a real-time system works correctly with a given processor architecture.
Now, you replace the processor with a faster one.
Are real-time constraints still satisfied?

Unfortunately, this is not true in general. Monotonicity does not hold in general,
i.e., making a part of the system operate faster does not lead to a faster system
execution. In other words, many software and systems architectures are fragile.

There are usually many timing anomalies in a system, starting from the
microarchitecture (caches, pipelines, speculation) via single processor scheduling
to multiprocessor scheduling.

7 - 27
Single Processor with Critical Sections
Example: Replacing the
processor with one
that is twice as fast
leads to a deadline
miss.

7 - 28
Multiprocessor Example (Richard’s Anomalies)
Example: 9 tasks with precedence constraints and the shown execution times. Scheduling
is preemptive fixed priority, where lower numbered tasks have higher priority than higher
numbers. Assignment of tasks to processors is greedy.
optimal
schedule on a
3-processor
architecture

7 - 29
Multiprocessor Example (Richard’s Anomalies)
Example: 9 tasks with precedence constraints and the shown execution times. Scheduling
is preemptive fixed priority, where lower numbered tasks have higher priority than higher
numbers. Assignment of tasks to processors is greedy.
optimal
schedule on a
3-processor
architecture

slower on a
4-processor
architecture!

7 - 30
Multiprocessor Example (Richard’s Anomalies)
Example: 9 tasks with precedence constraints and the shown execution times. Scheduling
is preemptive fixed priority, where lower numbered tasks have higher priority than higher
numbers. Assignment of tasks to processors is greedy.
optimal
schedule on a
3-processor
architecture

slower if all
computation
times are
reduced by 1!

7 - 31
Multiprocessor Example (Richard’s Anomalies)
Example: 9 tasks with precedence constraints and the shown execution times. Scheduling
is preemptive fixed priority, where lower numbered tasks have higher priority than higher
numbers. Assignment of tasks to processors is greedy.
optimal
schedule on a
3-processor
architecture

slower if
some
precedences
are removed!

7 - 32
Communication and Synchronization

7 - 33
Communication Between Tasks
Problem: the use of shared memory for implementing communication between
tasks may cause priority inversion and blocking.

Therefore, either the implementation of the shared medium is “thread safe” or


the data exchange must be protected by critical sections.

7 - 34
Communication Mechanisms
Synchronous communication:
 Whenever two tasks want to communicate they must be synchronized for a
message transfer to take place (rendez-vous).
 They have to wait for each other, i.e. both must be at the same time ready to do
the data exchange.

 Problem:
 In case of dynamic real-time systems, estimating the maximum blocking time
for a process rendez-vous is difficult.
 Communication always needs synchronization. Therefore, the timing of the
communication partners is closely linked.

7 - 35
Communication Mechanisms
Asynchronous communication:
 Tasks do not necessarily have to wait for each other.
 The sender just deposits its message into a channel and continues its execution;
similarly the receiver can directly access the message if at least a message has
been deposited into the channel.
 More suited for real-time systems than synchronous communication.
 Mailbox: Shared memory buffer, FIFO-queue, basic operations are send and
receive, usually has a fixed capacity.
 Problem: Blocking behavior if the channel is full or empty; alternative approach is
provided by cyclical asynchronous buffers or double buffering.

sender receiver
mailbox
7 - 36
Example: FreeRTOS (ES-Lab)

7 - 37
Example: FreeRTOS (ES-Lab)
Creating a queue:

returns handle to the maximum number of items that the queue the size in bytes of
created queue being created can hold at any one time each data item

Sending item to a queue:

a pointer to the
data to be copied
into the queue
returns pdPASS if
item was successfully
the maximum amount of time the task
added to queue
should remain in the Blocked state to wait
for space to become available on the queue

7 - 38
Example: FreeRTOS (ES-Lab)
Receiving item from a queue:
a pointer to the
memory into which
the received data
returns pdPASS if data
will be copied
was successfully read
the maximum amount of time the task
from the queue
should remain in the Blocked state to wait
for data to become available on the queue

Example:
 Two sending tasks with equal priority 1 and one receiving task with priority 2.
 FreeRTOS schedules tasks with equal priority in a round-robin manner: A blocked
or preempted task is put to the end of the ready queue for its priority. The same
holds for the currently running task at the expiration of the time slice.

7 - 39
Example: FreeRTOS (ES-Lab)
Example cont.:

sender 1
queue receiver
sender 2

7 - 40
Communication Mechanisms
Cyclical Asynchronous Buffers (CAB):
 Non-blocking communication between tasks.
 A reader gets the most recent message put into the CAB. A message is not
consumed (that is, extracted) by a receiving process but is maintained until
overwritten by a new message.
 As a consequence, once the first message has been put in a CAB, a task can never
be blocked during a receive operation. Similarly, since a new message overwrites
the old one, a sender can never be blocked.
 Several readers can simultaneously read a single message from the CAB.

writing reading

7 - 41
Embedded Systems
8. Hardware Components

© Lothar Thiele
Computer Engineering and Networks Laboratory
Where we are …

1. Introduction to Embedded Systems


2. Software Development
3. Hardware-Software Interface
4. Programming Paradigms
Software Hardware-
5. Embedded Operating Systems
Software
6. Real-time Scheduling
7. Shared Resources
8. Hardware Components
Hardware 9. Power and Energy
10. Architecture Synthesis

8-2
Do you Remember ?

8-3
8-4
High-Level Physical View

8-5
High-Level Physical View

8-6
Implementation Alternatives

General-purpose processors

Application-specific instruction set processors


(ASIPs)
• Microcontroller
Performance
• DSPs (digital signal processors)
Energy Efficiency Flexibility

Programmable hardware
• FPGA (field-programmable gate arrays)

Application-specific integrated circuits (ASICs)

8-7
Energy Efficiency

© Hugo De Man, IMEC,


Philips, 2007

8-8
Topics
 General Purpose Processors
 System Specialization
 Application Specific Instruction Sets
 Micro Controller
 Digital Signal Processors and VLIW
 Programmable Hardware
 ASICs
 System-on-Chip

8-9
General-Purpose Processors
 High performance
 Highly optimized circuits and technology
 Use of parallelism
 superscalar: dynamic scheduling of instructions
 super-pipelining: instruction pipelining, branch prediction, speculation
 complex memory hierarchy
 Not suited for real-time applications
 Execution times are highly unpredictable because of intensive resource sharing
and dynamic decisions
 Properties
 Good average performance for large application mix
 High power consumption

8 - 10
General-Purpose Processors
 Multicore Processors
 Potential of providing higher execution performance by exploiting parallelism

 Especially useful in high-performance embedded systems, e.g. autonomous driving

 Disadvantages and problems for embedded systems:


 Increased interference on shared resources such as buses and shared caches
 Increased timing uncertainty

8 - 11
Multicore Examples

48 cores

4 cores

8 - 12
Multicore Examples

Intel Xeon Phi


(5 Billion transistors, Oracle Sparc T5
22nm technology,
350mm2 area)

8 - 13
Implementation Alternatives

General-purpose processors

Application-specific instruction set processors


(ASIPs)
• Microcontroller
Performance
• DSPs (digital signal processors)
Energy Efficiency Flexibility

Programmable hardware
• FPGA (field-programmable gate arrays)

Application-specific integrated circuits (ASICs)

8 - 14
Topics
 General Purpose Processors
 System Specialization
 Application Specific Instruction Sets
 Micro Controller
 Digital Signal Processors and VLIW
 Programmable Hardware
 ASICs
 Heterogeneous Architectures

8 - 15
System Specialization
 The main difference between general purpose highest volume microprocessors
and embedded systems is specialization.

 Specialization should respect flexibility


 application domain specific systems shall cover a class of applications
 some flexibility is required to account for late changes, debugging

 System analysis required


 identification of application properties which can be used for specialization
 quantification of individual specialization effects

8 - 16
Embedded Multicore Example
Recent development:
 Specialize multicore processors towards real-time processing and low power
consumption
 Target domains:

8 - 17
Example: Code-size Efficiency
 RISC (Reduced Instruction Set Computers) machines designed for run-time-, not
for code-size-efficiency.
 Compression techniques: key idea

(de)compressor

8 - 18
Example: Multimedia-Instructions
• Multimedia instructions exploit that many registers, adders etc. are
quite wide (32/64 bit), whereas most multimedia data types are
narrow (e.g. 8 bit per color, 16 bit per audio sample per channel).
• Idea: Several values can be stored per register and added in parallel.

+
4 additions per instruction; carry
disabled at word boundaries.

8 - 19
Example: Heterogeneous Processor Registers
Example (ADSP 210x):
P
D

AX AY MX MY
Address- AF MF
registers
A0, A1, A2 ..
+,-,..
*
Address
AR
+,-
generation
unit (AGU) MR

Different functionality of registers AR, AX, AY, AF,MX, MY, MF, MR

8 - 20
Example: Multiple Memory Banks
P
D

AX AY MX MY
Address- AF MF
registers
A0, A1, A2 ..
+,-,..
*
Address
AR
+,-
generation
unit (AGU) MR

Enables parallel fetches for some operations

8 - 21
Example: Address Generation Units
• Data memory can only be fetched with
Example (ADSP 210x): address contained in register file A, but
its update can be done in parallel with
operation in main data path (takes
effectively 0 time).
• Register file A contains several
precomputed addresses A[i].
• There is another register file M that
contains modification values M[j].

• Possible updates:
M[j] := ‘immediate’
A[i] := A[i] ± M[j]
A[i] := A[i] ± 1
A[i] := A[i] ± ‘immediate’
A[i] := ‘immediate’

8 - 22
Topics
 System Specialization
 Application Specific Instruction Sets
 Micro Controller
 Digital Signal Processors and VLIW
 Programmable Hardware
 ASICs
 Heterogeneous Architectures

8 - 23
Microcontroller
 Control-dominant applications
 supports process scheduling
and synchronization
 preemption (interrupt),
context switch
 short latency times

 Low power consumption

 Peripheral units often integrated 8051 core


SIECO51 (Siemens)

 Suited for real-time applications

•Major System Components 8 - 24


Microcontroller as a System-on-Chip
 complete system

 timers

 I2C-bus and par./ser.


interfaces for communi-
cation

 A/D converter

 watchdog (SW activity


timeout): safety

 on-chip memory (volatile/non-volatile)

 interrupt controller

MSP 430 RISC Processor (Microchip)

8 - 25
Topics
 System Specialization
 Application Specific Instruction Sets
 Micro Controller
 Digital Signal Processors and VLIW
 Programmable Hardware
 ASICs
 Heterogeneous Architectures

8 - 26
Data Dominated Systems
 Streaming oriented systems with mostly periodic behavior
 Underlying model of computation is often a signal flow graph or data flow graph:

B f1 B f2 B f3 B

B: buffer
B f2

 Typical application examples:


 signal processing
 multimedia processing
 automatic control

8 - 27
Digital Signal Processor
 optimized for data-flow applications
 suited for simple control flow
 parallel hardware units (VLIW)
 specialized instruction set
 high data throughput
 zero-overhead loops
 specialized memory

 suited for real-time applications

•Major System Components 8 - 28


Very Long Instruction Word (VLIW)
Key idea: detection of possible parallelism to be done by compiler, not
by hardware at run-time (inefficient).
VLIW: parallel operations (instructions) encoded in one long word
(instruction packet), each instruction controlling one functional unit.

8 - 29
Explicit Parallelism Instruction Computers (EPIC)
The TMS320C62xx VLIW Processor as an example of EPIC:

31 0 31 0 31 0 31 0 31 0 31 0 31 0
0 1 1 0 1 1 0
Instr. A Instr. B Instr. C Instr. D Instr. E Instr. F Instr. G

Cycle Instruction
1 A
2 B C D
3 E F G

8 - 30
Example Infineon
Processor core for car mirrors
Infineon

8 - 31
Example NXP Trimedia VLIW

VLIW
MIPS

8 - 32
Topics
 System Specialization
 Application Specific Instruction Sets
 Micro Controller
 Digital Signal Processors and VLIW
 Programmable Hardware
 ASICs
 System-on-Chip

8 - 33
FPGA – Basic Strucutre
 Logic Units
 I/O Units
 Connections

8 - 34
Floor-plan of VIRTEX II FPGAs

8 - 35
Virtex Logic
Cell

[© and source: Xilinx Inc.: Virtex-II


Pro™ Platform FPGAs: Functional
Description, Sept. 2002,
//www.xilinx.com]

8 - 36
Example Virtex-6
 Combination of flexibility (CLB’s), Integration and performance (heterogeneity of
hard-IP Blocks)

clock distribution
logic (CLB)

interfaces
(PCI, high speed)

memory (RAM)

DSP slice fast communication

8 - 37
XILINX Virtex UltraScale

Virtex-6 CLB Slice


•Swiss Federal •Computer Engineering
•Institute of Technology •7 - 38 •and Networks Laboratory
Topics
 System Specialization
 Application Specific Instruction Sets
 Micro Controller
 Digital Signal Processors and VLIW
 Programmable Hardware
 ASICs
 Heterogeneous Architectures

8 - 39
Application Specific Circuits (ASICS)
Custom-designed circuits are necessary
 if ultimate speed or
 energy efficiency is the goal and
 large numbers can be sold.
Approach suffers from
 long design times,
 lack of flexibility
(changing standards) and
 high costs
(e.g. Mill. $ mask costs).

8 - 40
Topics
 System Specialization
 Application Specific Instruction Sets
 Micro Controller
 Digital Signal Processors and VLIW
 Programmable Hardware
 ASICs
 Heterogeneous Architectures

8 - 41
Example: Heterogeneous Architecture

Samsung Galaxy Note II


– Eynos 4412 System on a Chip (SoC)
– ARM Cortex-A9 processing core
– 32 nanometer: transistor gate width
– Four processing cores

8 - 42
Example: Heterogeneous Architecture
Hexagon DSP Snapdragon 835
(Galaxy S8)

8 - 43
Example: ARM big.LITTLE Architecture

8 - 44
Embedded Systems
9. Power and Energy

© Lothar Thiele
Computer Engineering and Networks Laboratory
Lecture Overview

1. Introduction to Embedded Systems


2. Software Development
3. Hardware-Software Interface
4. Programming Paradigms
Software Hardware-
5. Embedded Operating Systems
Software
6. Real-time Scheduling
7. Shared Resources
8. Hardware Components
Hardware 9. Power and Energy
10. Architecture Synthesis

9-2
General Remarks

9-3
Power and Energy Consumption
 Statements that are true since a decade or longer:
„Power is considered as the most important constraint in embedded
systems.” [in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW]
•“Power demands are increasing rapidly, yet battery capacity cannot
keep up.” [in Diztel et al.: Power-Aware Architecting for data-dominated applications, 2007, Springer]

 Main reasons are:


 power provisioning is expensive
 battery capacity is growing only slowly
 devices may overheat
 energy harvesting (e.g. from solar cells) is limited due to the relatively low energy
available density

9-4
Some Trends

9-5
Implementation Alternatives

General-purpose processors

Application-specific instruction set processors (ASIPs)


Microcontroller
•Performance
DSPs (digital signal processors)
•Power Efficiency •Flexibility

Programmable hardware

FPGA (field-programmable gate arrays)

Application-specific integrated circuits (ASICs)

9-6
Energy Efficiency
 It is necessary to
optimize HW and SW.
 Use heterogeneous
architectures in order to
adapt to required performance
and to class of application.
 Apply specialization techniques.

•© Hugo De Man,
IMEC, Philips, 2007
9-7
Power and Energy

9-8
Power and Energy

•P

•t

In some cases, faster execution also means less energy, but


the opposite may be true if power has to be increased to allow
for a faster execution.
9-9
Power and Energy

•P

•t

In some cases, faster execution also means less energy, but


the opposite may be true if power has to be increased to allow
for a faster execution.
9 - 10
Power and Energy

•P

•t

In some cases, faster execution also means less energy, but


the opposite may be true if power has to be increased to allow
for a faster execution.
9 - 11
Power and Energy

•P

•t

In some cases, faster execution also means less energy, but


the opposite may be true if power has to be increased to allow
for a faster execution.
9 - 12
Low Power vs. Low Energy
 Minimizing the power consumption (voltage * current) is important for
 the design of the power supply and voltage regulators
 the dimensioning of interconnect between power supply and components
 cooling (short term cooling)
 high cost
 limited space
 Minimizing the energy consumption is important due to
 restricted availability of energy (mobile systems)
 limited battery capacities (only slowly improving)
 very high costs of energy (energy harvesting, solar panels, maintenance/batteries)
 long lifetimes, low temperatures

9 - 13
Power Consumption of a CMOS Gate
subthreshold (ISUB), junction (IJUNC) and
gate-oxide (IGATE) leakage

IJUNC

•Ileak : leakage current


•Iint : short circuit current
•Isw : switching current

9 - 14
Power Consumption of a CMOS Processors
Main sources:
 Dynamic power consumption
 charging and discharging capacitors
 Short circuit power consumption:
short circuit path between supply rails
during switching
 Leakage and static power
 gate-oxide/subthreshold/junction
leakage
 becomes one of the major factors
due to shrinking feature sizes in [J. Xue, T. Li, Y. Deng, Z. Yu, Full-chip leakage analysis for 65 nm CMOS
technology and beyond, Integration VLSI J. 43 (4) (2010) 353–364]
semiconductor technology

9 - 15
Reducing Static Power - Power Supply Gating
Power gating is one of the most effective ways of minimizing static power consumption
(leakage)
 Cut-off power supply to inactive units/components

9 - 16
Dynamic Voltage Scaling (DVS)
Average power consumption of CMOS •Delay of CMOS circuits:
circuits (ignoring leakage):

•: supply voltage
•: supply voltage
•: switching activity
•: threshold voltage
•: load capacity
•: clock frequency

Decreasing Vdd reduces P quadratically (f constant).


The gate delay increases reciprocally with decreasing Vdd .
Maximal frequency fmax decreases linearly with decreasing Vdd .
9 - 17
Dynamic Voltage Scaling (DVS)

Saving energy for a given task:


– reduce the supply voltage Vdd
– reduce switching activity α
– reduce the load capacitance CL
– reduce the number of cycles #cycles

9 - 18
Techniques to Reduce Dynamic Power

9 - 19
Parallelism

Vdd Vdd/2 Vdd/2


fmax fmax/2 fmax/2

9 - 20
Pipelining

Vdd Vdd/2
fmax fmax/2

Vdd/2
fmax/2

9 - 21
VLIW (Very Long Instruction Word) Architectures
 Large degree of parallelism
 many parallel computational units, (deeply) pipelined
 Simple hardware architecture
 explicit parallelism (parallel instruction set)
 parallelization is done offline (compiler) all 4 instructions are
executed in parallel

9 - 22
Example: Qualcomm Hexagon
•Hexagon DSP •Snapdragon 835
(Galaxy S8)

9 - 23
Dynamic Voltage and Frequency Scaling -
Optimization

9 - 24
Dynamic Voltage and Frequency Scaling (DVFS)
energy per cycle reduce voltage -> reduce energy per task

reduce voltage -> reduce clock frequency

maximum gate delay


frequency
of operation Saving energy for a given task:
– reduce the supply voltage Vdd
– reduce switching activity α
– reduce the load capacitance CL
– reduce the number of cycles #cycles
9 - 25
Example DVFS: Samsung Exynos (ARM processor)
ARM processor core A53 on the Samsung Exynos 7420 (used in
mobile phones, e.g. Galaxy S6)

9 - 26
Example: Dynamic Voltage and Frequency Scaling

•[Courtesy, Yasuura, 2000] •Vdd

9 - 27
Example: DVFS – Complete Task as Early as Possible

We suppose a task that needs 109 cycles to execute within 25 seconds.

Ea= 109 x 40 x 10-9


= 40 [J]

9 - 28
Example: DVFS – Use Two Voltages

Eb= 750 106 x 40 x 10-9


+ 250 106 x 10 x 10-9
= 32.5 [J]

9 - 29
Example: DVFS – Use One Voltage

Ec = 109 x 25 x 10-9
= 25 [J]

9 - 30
DVFS: Optimal Strategy
•Vdd •P(y) Execute task in fixed time T
•y
with variable voltage Vdd(t):
•P(x)
•x •gate delay:

•T∙a •T •t •execution rate:

•invariant:

 case A: execute at voltage x for T ∙ a time units and at


voltage y for (1-a) ∙ T time units;
energy consumption: T ∙ ( P(x) ∙ a + P(y) ∙ (1-a) )

9 - 31
DVFS: Optimal Strategy
•Vdd •P(y) Execute task in fixed time T
•y •P(z)
•z with variable voltage Vdd(t):
•P(x)
•x •gate delay:

•T∙a •T •t •execution rate:

•invariant:

 case A: execute at voltage x for T ∙ a time units and at


voltage y for (1-a) ∙ T time units;
energy consumption: T ∙ ( P(x) ∙ a + P(y) ∙ (1-a) )

 case B: execute at voltage z = a ∙ x + (1-a) ∙ y for T time units;


energy consumption: T ∙ P(z)
9 - 32
DVFS: Optimal Strategy
•Vdd •P(y) Execute task in fixed time T
•y •P(z)
•z with variable voltage Vdd(t):
•P(x)
•x •gate delay:

•T∙a •T •t •execution rate:

•z ∙ T = a ∙ T∙ x + (1-a) ∙ T ∙ y •invariant:
z = a ∙ x + (1-a) ∙ y
 case A: execute at voltage x for T ∙ a time units and at
voltage y for (1-a) ∙ T time units;
energy consumption: T ∙ ( P(x) ∙ a + P(y) ∙ (1-a) )

 case B: execute at voltage z = a ∙ x + (1-a) ∙ y for T time units;


energy consumption: T ∙ P(z)
9 - 33
DVFS: Optimal Strategy
Assumption: Dynamic power
is a convex function of Vdd
•P(y)
•P(x) ∙ a + P(y) ∙ (1-a)

•P(x)
•average

•P(z)

If possible, running at a constant frequency (voltage) minimizes the energy


consumption for dynamic voltage scaling:
case A is always worse if the power consumption is a convex function of the
supply voltage

9 - 34
DVFS: Real-Time Offline Scheduling on One Processor
 Let us model a set of independent tasks as follows:
 We suppose that a task vi ϵ V
 requires ci computation time at normalized processor frequency 1
 arrives at time ai
 has (absolute) deadline constraint di

 How do we schedule these tasks such that all these tasks can be finished no
later than their deadlines and the energy consumption is minimized?
 YDS Algorithm from “A Scheduling Model for Reduce CPU Energy”, Frances
Yao, Alan Demers, and Scott Shenker, FOCS 1995.”

If possible, running at a constant frequency (voltage) minimizes


the energy consumption for dynamic voltage scaling.
9 - 35
YDS Optimal DVFS Algorithm for Offline Scheduling
1 5 3,6,5
2 6
4 2,6,3
3 7 0,8,2
0 4 8 12 16 •time
6,14,6

10,14,6
 Define intensity G([z, z‘]) in some time interval [z, z‘]:
11,17,2
 average accumulated execution time of all tasks that
12,17,2
have arrival and deadline in [z, z‘] relative to the length
of the interval z‘-z ai,di,ci

9 - 36
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 1: Execute jobs in the interval with the highest intensity by using the earliest-deadline first
schedule and running at the intensity as the frequency.

1 5 3,6,5
2 6 2,6,3
4
3 7 0,8,2

0 4 8 12 16 •time 6,14,6

G([0,6]) = (5+3)/6=8/6, G([0,8]) = (5+3+2)/ (8-0) = 10/8, 10,14,6


G([0,14]) = (5+3+2+6+6)/14=11/7, G([0,17]) = (5+3+2+6+6+2+2)/17=26/17 11,17,2
G([2, 6]) = (5+3)/(6-2)=2, G([2,14]) = (5+3+6+6) / (14-2) = 5/3,
12,17,2
G([2,17]) = (5+3+6+6+2+2)/15=24/15
G([3,6]) =5/3, G([3,14]) = (5+6+6)/(14-3) = 17/11, G([3,17])=(5+6+6+2+2)/14=21/14
ai,di,ci
G([6,14]) = 12/(14-6)=12/8, G([6,17]) = (6+6+2+2)/(17-6)=16/11

G([10,14]) = 6/4, G([10,17]) = 10/7, G([11,17]) = 4/6, G([12,17]) = 2/5

9 - 37
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 1: Execute jobs in the interval with the highest intensity by using the earliest-deadline first
schedule and running at the intensity as the frequency.

1 5 3,6,5
2 6 2,6,3
4
3 7 0,8,2

0 4 8 12 16 •time 6,14,6

G([0,6]) = (5+3)/6=8/6, G([0,8]) = (5+3+2)/ (8-0) = 10/8, 10,14,6


G([0,14]) = (5+3+2+6+6)/14=11/7, G([0,17]) = (5+3+2+6+6+2+2)/17=26/17 11,17,2
G([2, 6]) = (5+3)/(6-2)=2, G([2,14]) = (5+3+6+6) / (14-2) = 5/3,
12,17,2
G([2,17]) = (5+3+6+6+2+2)/15=24/15
G([3,6]) =5/3, G([3,14]) = (5+6+6)/(14-3) = 17/11, G([3,17])=(5+6+6+2+2)/14=21/14
ai,di,ci
G([6,14]) = 12/(14-6)=12/8, G([6,17]) = (6+6+2+2)/(17-6)=16/11

G([10,14]) = 6/4, G([10,17]) = 10/7, G([11,17]) = 4/6, G([12,17]) = 2/5

9 - 38
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 1: Execute jobs in the interval with the highest intensity by using the earliest-deadline first
schedule and running at the intensity as the frequency.

3,6,5
1 5
2 6 2,6,3
4
0,8,2
3 7
6,14,6
0 4 8 12 16 •time
10,14,6
11,17,2

12,17,2
2 1
ai,di,ci
0 4 8 12 16

9 - 39
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 2: Adjust the arrival times and deadlines by excluding the possibility to execute at the previous
critical intervals.
1 5
2 6
4 0,8,2
3 7 6,14,6
•time
0 4 8 12 16 10,14,6
11,17,2

12,17,2

ai,di,ci

9 - 40
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 2: Adjust the arrival times and deadlines by excluding the possibility to execute at the previous
critical intervals.
1 5
2 6
4 0,8,2 0,4,2
3 7 6,14,6 2,10,6
•time
0 4 8 12 16 10,14,6 6,10,6
11,17,2 7,13,2

5 12,17,2 8,13,2
6
4 ai,di,ci
3 7

0 4 8 12 16 •time

9 - 41
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 3: Run the algorithm for the revised input again
5 0,4,2
6
2,10,6
4
3 7 6,10,6

0 4 8 12 16 •time 7,13,2

G([0,4])=2/4, G([0,10]) = 14/10, G([0,13])=18/13


8,13,2

G([2,10])=12/8, G([2,13]) = 16/11, G([6,10])=6/4


•G([6,13])=10/7, G([7,13])=4/6, G([8,13])=4/5 ai,di,ci

9 - 42
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 3: Run the algorithm for the revised input again
5 0,4,2
6
2,10,6
4
3 7 6,10,6

0 4 8 12 16 •time 7,13,2

G([0,4])=2/4, G([0,10]) = 14/10, G([0,13])=18/13


8,13,2

G([2,10])=12/8, G([2,13]) = 16/11, G([6,10])=6/4


•G([6,13])=10/7, G([7,13])=4/6, G([8,13])=4/5 ai,di,ci

9 - 43
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 3: Run the algorithm for the revised input again
5 0,4,2
6
2,10,6
4
3 7 6,10,6

0 4 8 12 16 •time 7,13,2

G([0,4])=2/4, G([0,10]) = 14/10, G([0,13])=18/13


8,13,2

G([2,10])=12/8, G([2,13]) = 16/11, G([6,10])=6/4


•G([6,13])=10/7, G([7,13])=4/6, G([8,13])=4/5 ai,di,ci

4 5
0 4 8 12 16 •time

9 - 44
YDS Optimal DVFS Algorithm for Offline Scheduling
Step 3: Run the algorithm for the revised input again
Step 4: Put pieces together

frequency 0,4,2 0,2,2


2 1 4 5 7,13,2 2,5,2
•time
0 4 8 12 16 8,13,2 2,5,2

frequency
0,2,2 0,2,2
3 2 1 4 5 6 7
•time
0 4 8 12 16

v1 v2 v3 v4 v5 v6 v7
frequency 2 2 1 1.5 1.5 4/3 4/3

9 - 45
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3

1 0,8,2
3
0 4 8 12 16 time

Continuously update to the best schedule for all arrived tasks:


Time 0: task v3 is executed at 2/8

ai,di,ci

9 - 46
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3

2 2,6,3
1 0,8,2
3 2 3
0 4 8 12 16 time

Continuously update to the best schedule for all arrived tasks:


Time 0: task v3 is executed at 2/8
Time 2: task v2 arrives
 G([2,6]) = ¾, G([2,8]) = 4.5/6=3/4 => execute v8 , v2 at ¾

ai,di,ci

9 - 47
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3 3,6,5

2 2,6,3
1 2 1
0,8,2
3 2 3
0 4 8 12 16 time

Continuously update to the best schedule for all arrived tasks:


Time 0: task v3 is executed at 2/8
Time 2: task v2 arrives
 G([2,6]) = ¾, G([2,8]) = 4.5/6=3/4 => execute v8 , v2 at ¾
Time 3: task v1 arrives
 G([3,6]) = (5+3-3/4)/3=29/12, G([3,8]) < G([3,6]) => execute v2 and v1 at 29/12
ai,di,ci

9 - 48
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3 3,6,5

2 2,6,3
1 2 1
0,8,2
2 3 4
3 6,14,6
0 4 8 12 16 time

Continuously update to the best schedule for all arrived tasks:


Time 0: task v3 is executed at 2/8
Time 2: task v2 arrives
 G([2,6]) = ¾, G([2,8]) = 4.5/6=3/4 => execute v8 , v2 at ¾
Time 3: task v1 arrives
 G([3,6]) = (5+3-3/4)/3=29/12, G([3,8]) < G([3,6]) => execute v2 and v1 at 29/12
Time 6: task v4 arrives
ai,di,ci
 G([6,8]) = 1.5/2, G([6,14]) = 7.5/8 => execute v3 and v4 at 15/16

9 - 49
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3 3,6,5

2 2,6,3
1 2 1 4 5
0,8,2
2 3 4
3 6,14,6
0 4 8 12 16 time
10,14,6
Continuously update to the best schedule for all arrived tasks:
Time 0: task v3 is executed at 2/8
Time 2: task v2 arrives
 G([2,6]) = ¾, G([2,8]) = 4.5/6=3/4 => execute v8 , v2 at ¾
Time 3: task v1 arrives
 G([3,6]) = (5+3-3/4)/3=29/12, G([3,8]) < G([3,6]) => execute v2 and v1 at 29/12
Time 6: task v4 arrives
ai,di,ci
 G([6,8]) = 1.5/2, G([6,14]) = 7.5/8 => execute v3 and v4 at 15/16
Time 10: task v5 arrives
 G([10,14]) = 39/16 => execute v4 and v5 at 39/16

9 - 50
YDS Optimal DVFS Algorithm for Online Scheduling
frequency
3 3,6,5

2 2,6,3
1 2 1 4 5
0,8,2
4 6 7
2 3
3 6,14,6
0 4 8 12 16 time
10,14,6
Continuously update to the best schedule for all arrived tasks:
Time 0: task v3 is executed at 2/8 11,17,2
Time 2: task v2 arrives
 G([2,6]) = ¾, G([2,8]) = 4.5/6=3/4 => execute v8 , v2 at ¾ 12,17,2
Time 3: task v1 arrives
 G([3,6]) = (5+3-3/4)/3=29/12, G([3,8]) < G([3,6]) => execute v2 and v1 at 29/12
Time 6: task v4 arrives
ai,di,ci
 G([6,8]) = 1.5/2, G([6,14]) = 7.5/8 => execute v3 and v4 at 15/16
Time 10: task v5 arrives
 G([10,14]) = 39/16 => execute v4 and v5 at 39/16
Time 11 and Time 12
 The arrival of v6 and v7 does not change the critical interval
Time 14:
 G([14,17]) = 4/3 => execute v6 and v7 at 4/3
9 - 51
Remarks on the YDS Algorithm
 Offline
 The algorithm guarantees the minimal energy consumption while satisfying the
timing constraints
 The time complexity is O(N3), where N is the number of tasks in V
 Finding the critical interval can be done in O(N2)
 The number of iterations is at most N
 Exercise:
 For periodic real-time tasks with deadline=period, running at constant speed with
100% utilization under EDF has minimum energy consumption while satisfying the
timing constraints.

 Online
 Compared to the optimal offline solution, the on-line schedule uses at most 27
times of the minimal energy consumption.

9 - 52
Dynamic Power Management

9 - 53
Dynamic Power Management (DPM)
• Dynamic power management tries to assign optimal
power saving states during program execution
• DPM requires hardware and software support

Example: StrongARM SA1100


400mW
RUN: operational RUN
10μs
IDLE: a SW routine may stop the 4μJ 160ms
CPU when not in use, while 90μs 64mJ
10μs 36μJ
monitoring interrupts 4μJ
SLEEP: Shutdown of on-chip
activity IDLE SLEEP
90μs
50mW 5μJ 160μW
9 - 54
Dynamic Power Management (DPM)
application states shut down wake up
Tw

busy waiting busy

run Tsd sleep Twu run

power states
Tsd: shutdown delay Twu: wakeup delay
Tw: waiting time

Desired: Shutdown only during long waiting times. This


leads to a tradeoff between energy saving and overhead.
9 - 55
Break-Even Time
Definition: The minimum waiting time required to compensate
the cost of entering an inactive (sleep) state.

 Enter an inactive state is beneficial only if the waiting time is longer than the
break-even time
 Assumptions for the calculation:
 No performance penalty is tolerated.
 An ideal power manager that
has the full knowledge of the future
workload trace. On the previous slide,
we supposed that the power manager
has no knowledge about the future.

9 - 56
Break-Even Time
busy waiting busy
state transition application states
run sleep run
power states

Scenario 1 (no transition):


Scenario 2 (state transition):
Break-even time: Limit for such that
break-even
time
Break-even constraint:

Time constraint:
9 - 57
Break-Even Time
busy waiting busy
state transition application states
run sleep run
power states

remove, if power manager has


no knowledge about future
Scenario 1 (no transition):
Scenario 2 (state transition):
Break-even time: Limit for such that
break-even
time
Break-even constraint:

Time constraint:
9 - 58
Power Modes in MSP432 (Lab)

The MSP432 has one


active mode in 6 different
configurations which all
allow for execution of
code.

It has 5 major low power


modes (LP0, LP3, LP4,
LP3.5, LP4.5), some of
them can be in one of
several configurations.

In total, the MSP432 can


be in 18 different low
power configurations.
active mode (32MHz): 6 - 15 mW ; low power mode (LP4): 1.5 – 2.1 µW 9 - 59
Power Modes in MSP432 (Lab)
 Transition between modes can be handled using C-level interfaces to the power
control manger.

 Examples of interface functions:


 uint8_t PCM_getPowerState (void)
 bool PCM_gotoLPM0 (void)
 bool PCM_gotoLPM3 (void)
 bool PCM_gotoLPM4 (void)
 bool PCM_shutdownDevice (uint32_t shutdownMode)

9 - 60
Battery-Operated Systems and Energy Harvesting

9 - 61
Embedded Systems in the Extreme - Permasense

9 - 62
Embedded Systems

© Lothar Thiele
Computer Engineering and Networks Laboratory
64
Reasons for Battery-Operated Devices and Harvesting
 Battery operation:
 no continuous power source available
 mobility

 Energy harvesting:
 prolong lifetime of battery-operated devices
 infinite lifetime using rechargeable batteries
 autonomous operation

radio frequency (RF) harvesting 9 - 65


Typical Power Circuitry – Power Point Tracking

Voltage
Stabilization

power point tracking / impedance


matching; conversion to voltage
of energy storage
rechargeable battery
or supercapacitor
9 - 66
Solar Panel Characteristics

9 - 67
Typical Power Circuitry – Maximum Power Point Tracking
U/I curves of a typical solar cell: simple tracking algorithm (assume constant illumination) :
start new iteration k: = k+1

sense V(k), I(k)


P(k) = V(k) * I(k)

yes P(k) > P(k-1) ? no

yes no yes
V(k) > V(k-1) ? V(k) > V(k-1) ?
red: current for different light intensities
blue: power for different light intensities set V(k+1) = V(k) + Δ
grey: maximal power set V(k+1) = V(k) - Δ
tracking: determine optimal impedance
seen by the solar panel end iteration k
9 - 68
Maximal Power Point Tracking

9 - 69
Maximal Power Point Tracking

9 - 70
Maximal Power Point Tracking

9 - 71
Maximal Power Point Tracking

9 - 72
Maximal Power Point Tracking

9 - 73
Typical Challenge in (Solar) Harvesting Systems
Challenges:
 What is the optimal maximum capacity of the battery?
 What is the optimal area of the solar cell?
 How can we control the application such that a continuous system operation is
possible, even under a varying input energy (summer, winter, clouds)?
Example of a solar energy trace:

9 - 74
Example: Application Control
Scenario:
energy flow
energy source energy storage
information
flow
energy estimator controller consumer

 The controller can adapt the service of the consumer device, for example the
sampling rate for its sensors or the transmission rate of information. As a result,
the power consumption changes proportionally.
 Precondition for correctness of application control: Never run out of energy.
 Example for optimality criterion: Maximize the lowest service of (or
equivalently, the lowest energy flow to) the consumer.
9 - 75
Application Control
energy capacity B
Formal Model:
p(t) u(t) discrete time t
energy source energy storage

b(t)
u(t)
energy estimator controller consumer

 harvested and used energy in [t, t+1): p(t), u(t)


 battery model:
 failure state:
 utility: is a strictly concave function;
higher used energy gives a reduced
reward for the overall utility.
9 - 76
9 - 77
Application Control
 What do we want? We would like to determine an optimal control u*(t) for
time interval [t, t+1) for all t in [0, T) with the following properties:

 There is no feasible use function u(t) with a larger minimal energy:

 The use function maximizes the utility U(0, T).


 We suppose that the battery has the same or better state at the end than at the
start of the time interval, i.e., b*(T) ≥ b*(0).
 We would like to answer two questions:
 Can we say something about the characteristics of u*(t) ?
 How does an algorithm look like that efficiently computes u*(t) ?

9 - 78
Application Control
Theorem: Given a use function u*(t), such that the system never enters a
failure state. If u*(t) is optimal with respect to maximizing the minimal used energy
among all use functions and maximizes the utility U(t, T), then the following
relations hold for all :
empty battery

full battery

Sketch of a proof: First, let us show that a consequence of the above theorem is
true (just reverting the relations):

In other words, as long as the battery is neither full nor empty, the optimal use
function does not change.
9 - 79
Application Control
 Proof sketch cont.:

9 - 80
Application Control
 Proof sketch cont.:
suppose we change
the use function
locally from being
constant such that
the overall battery
state does not change

then the utility is worse


due to the concave
function : diminishing
reward for higher
use function values; and
the minimal use function
is potentially smaller
9 - 81
Application Control
 Proof sketch cont.: Now we show that for all

or equivalently

We already have shown this for . Therefore, we only need to


show that . Suppose now that we have
if the battery is full at . Then we can increase the use at
time and decrease it at time by the same amount without changing the
battery level at time . This again would increase the overall utility and
potentially increase the minimal use function.

initial, not optimal


choice of the use
function

9 - 82
Application Control
 Proof sketch cont.: Now we show that for all

or equivalently

We already have shown this for . Therefore, we only need to


show that . Suppose now that we have
if the battery is full at . Then we can increase the use at
time and decrease it at time by the same amount without changing the
battery level at time . This again would increase the overall utility and
potentially increase the minimal use function.

feasible, but
better choice of
use function with

9 - 83
Application Control

9 - 84
Application Control
 How can we efficiently compute an optimal use function?
 There are several options available as we just need to solve a convex optimization
problem.
 A simple but inefficient possibility is to convert the problem into a linear program.
At first suppose that the utility is simply

Then the linear program has the form:

[Concave functions could be


piecewise linearly approximated.
This is not shown here.]

9 - 85
u(t)
p(t) b(t)
4

0 1 2 3 4 5 6=T

9 - 86
u(t) u(t)
p(t) b(t) p(t) b(t)
4 4

3 3

2 2

1 1

0 1 2 3 4 5 6=T 0 1 2 3 4 5 6=T

9 - 87
Application Control
 But what happens if the estimation of the future incoming energy is not correct?
 If it would be correct, then we would just compute the whole future application
control now and would not change anything anymore.
 This will not work as errors will accumulate and we will end up with many
infeasible situations, i.e., the battery is completely empty and we are forced to
stop the application.
 Possibility: Finite horizon control
 At time t, we compute the optimal control (see previous slides) using the currently
available battery state b(t) with predictions for all and
.
 From the computed optimal use function for all we just take the
first use value u(t) in order to control the application.
 At the next time step, we take as initial battery state the actual state; therefore, we
take mispredictions into account. For the estimated future energy, we also take the
new estimations.
9 - 88
Application Control
 Finite horizon control:

compute the optimal use function in [t, t+T)


using the actual battery state at time t
t t+T

apply this use function in the interval [t, t+1).


t t+1

compute the optimal use function in [t+1, t+T+1)


using the actual batter state at time t+1
t+1 t+T+1

9 - 89
Application Control using Finite Horizon

estimated input
energy

still energy
breakdown
due to misprediction

9 - 90
Application Control using Finite Horizon
more pessimistic
prediction
simplified
optimization
using a look-
up-table
[not covered]

9 - 91
Remember: What you got some time ago …

10 - 1
What we told you: Be careful and please do not …

10 - 2
Return the boards at the
embedded systems exam!

10 - 3
Embedded Systems
10. Architecture Synthesis

© Lothar Thiele
Computer Engineering and Networks Laboratory
Lecture Overview

1. Introduction to Embedded Systems


2. Software Development
3. Hardware-Software Interface
4. Programming Paradigms
Software Hardware-
5. Embedded Operating Systems
Software
6. Real-time Scheduling
7. Shared Resources
8. Hardware Components
Hardware 9. Power and Energy
10. Architecture Synthesis

10 - 5
Implementation Alternatives

General-purpose processors

Application-specific instruction set processors (ASIPs)


Microcontroller
•Performance
DSPs (digital signal processors)
•Power Efficiency •Flexibility

Programmable hardware

FPGA (field-programmable gate arrays)

Application-specific integrated circuits (ASICs)

10 - 6
Architecture Synthesis
Determine a hardware architecture that efficiently executes a given algorithm.

 Major tasks of architecture synthesis:


 allocation (determine the necessary hardware resources)
 scheduling (determine the timing of individual operations)
 binding (determine relation between individual operations of the algorithm and
hardware resources)
 Classification of synthesis algorithms:
 heuristics or exact methods

 Synthesis methods can often be applied independently of granularity of


algorithms, e.g. whether operation is a whole complex task or a single
operation.
10 - 7
10 - 8
Specification Models

10 - 9
Specification
 Formal specification of the desired functionality and the structure (architecture)
of an embedded systems is a necessary step for using computer aided design
methods.

 There exist many different formalisms and models of computation, see also the
models used for real-time software and general specification models for the
whole system.

 Now, we will introduce some relevant models for architecture level (hardware)
synthesis.

10 - 10
Task Graph or Dependence Graph (DG)
Sequence
constraint Nodes are assumed to be a
„program“ described in
some programming
language, e.g. C or Java; or
just a single operation.

A dependence graph is a directed graph G=(V,E) in which E  V  V


is a partial order.
If (v1, v2)  E, then v1 is called an immediate predecessor of v2 and
v2 is called an immediate successor of v1.
Suppose E* is the transitive closure of E. If (v1, v2)  E*, then v1 is
called a predecessor of v2 and v2 is called a successor of v1.

10 - 11
Dependence Graph
 A dependence graph describes order relations for the execution of single
operations or tasks. Nodes correspond to tasks or operations, edges correspond
to relations („executed after“).

 Usually, a dependence graph describes a partial order between operations and


therefore, leaves freedom for scheduling (parallel or sequential). It represents
parallelism in a program but no branches in control flow.

 A dependence graph is acyclic.

 Often, there are additional quantities associated to edges or nodes such as


 execution times, deadlines, arrival times
 communication demand
10 - 12
Dependence Graph and Single Assignment Form
given basic block: dependence graph
x = a + b; a b c d
y = c - d;
z = x * y;
+ -
y = b + d;
y
x
single assignment
form: * +
x = a + b;
y = c - d;
z = x * y;
z y1
y1 = b + d;

10 - 13
Example of a Dependence Graph

10 - 14
Marked Graph (MG)
 A marked graph G  (V , A, del ) consists of
 nodes (actors) v V
 edges a  (vi , v j )  A, A  V  V
 number of initial tokens (or marking) on edges

 The marking is often represented in form of a vector:

actor token

10 - 15
10 - 16
Marked Graph
 The token on the edges correspond to data that are stored in FIFO queues.
 A node (actor) is called activated if on every input edge there is at least one
token.
 A node (actor) can fire if it is activated.
 The firing of a node vi (actor operates on the first tokens in the input queues)
removes from each input edge a token and adds a token to each output edge.
The output token correspond to the processed data.

 Marked graphs are mainly used for modeling regular computations, for example
signal flow graphs.

10 - 17
Marked Graph
Example (model of a digital filter with infinite impulse response IIR)
 Filter equation:

y (l )  a  u (l )  b  y (l  1)  c  y (l  2)  d  y (l  3)

 Possible model as a marked graph:

nodes 3-5:
a d c b w
2 3 4 5 6 7 x
x+w y
output y
y
1 9 8
input u fork node 2: x=0

10 - 18
Implementation of Marked Graphs
 There are different possibilities to implement marked graphs in hardware or
software directly. Only the most simple possibilities are shown here.
 Hardware implementation as a synchronous digital circuit:
 Actors are implemented as combinatorial circuits.
 Edges correspond to synchronously clocked shift registers (FIFOs).

clock

10 - 19
Implementation of Marked Graphs
 Hardware implementation as a self-timed asynchronous circuit:
 Actors and FIFO registers are implemented as independent units.
 The coordination and synchronization of firings is implemented using a handshake
protocol.
 Delay insensitive direct implementation of the semantics of marked graphs.

ack ack

rdy rdy
actor

FIFO actor FIFO


rdy rdy

ack ack

10 - 20
Implementation of Marked Graphs
 Software implementation with static scheduling:
 At first, a feasible sequence of actor firings is determined which ends in the
starting state (initial distribution of tokens).
 This sequence is implemented directly in software.
 Example digital filter:
feasible sequence: (1, 2, 3, 9, 4, 8, 5, 6, 7)
program: while(true) {
t1 = read(u);
t2 = a*t1;
t3 = t2+d*t9;
t9 = t8;
t4 = t3+c*t9;
t8 = t6;
t5 = t4+b*t8;
t6 = t5;
write(y, t6);}
10 - 21
Implementation of Marked Graphs
 Software implementation with dynamic scheduling:

 Scheduling is done using a (real-time) operating system.


 Actors correspond to threads (or tasks).
 After firing (finishing the execution of the corresponding thread) the thread
is removed from the set of ready threads and put into wait state.
 It is put into the ready state if all necessary input data are present.
 This mode of execution directly corresponds to the semantics of marked
graphs. It can be compared with the self-timed hardware implementation.

10 - 22
Models for Architecture Synthesis
 A sequence graph is a dependence graph with a single start node
(no incoming edges) and a single end node (no outgoing edges).
VS denotes the operations of the algorithm and ES denotes the dependence relations.

 A resource graph models resources and bindings.


VT denote the resource types of the architecture and GR is a bipartite graph. An edge
represents the availability of a resource type vt for an operation vs.

 Cost function

 Execution times are assigned to each edge


and denote the execution time of operation on resource type .

10 - 23
Models for Architecture Synthesis - Example
Example sequence graph:
 Algorithm (differential equation):

int diffeq(int x, int y, int u, int dx, int a) {


int x1, u1, y1;
while ( x < a ) {
x1 = x + dx;
u1 = u - (3 * x * u * dx) - (3 * y * dx);
y1 = y + u * dx;
x = x1;
u = u1;
y = y1;
}
return y;
}

10 - 24
Models for Architecture Synthesis - Example
 Corresponding sequence graph:
nop 0

1 x 2 x 6 x 8 x 10 +

3 x 7
x 9 + 11 <

-
4
-
5

nop 12 10 - 25
Models for Architecture Synthesis - Example
 Corresponding resource graph
multiplier
with one instance of a
multiplier (cost 8) and one
instance of an ALU (cost 3): c(r1) = 8

c(r2) = 3

VS ER VT

10 - 26
Allocation and Binding

10 - 27
Models for Architecture Synthesis - Example
 Corresponding resource graph
multiplier
with 4 instances of a 4
multiplier (cost 8) and two
instance of an ALU (cost 3): c(r1) = 8

2
c(r2) = 3

VS ER VT

10 - 28
Models for Architecture Synthesis - Example
 Example binding ((r1) = 4, (r2) = 2):

10 - 29
Scheduling

10 - 30
10 - 31
Models for Architecture Synthesis - Example
Example: L = (v12) - (v0) = 7
(v0) = 1
(v1) = (v10) = 1
(v2) = (v11) = 2
(v3) = 3
(v6) = (v4) = 4
(v7) = 5
(v8) = (v5) = 6
(v9) = 7
(v12) = 8
10 - 32
Multiobjective Optimization

10 - 33
Multiobjective Optimization
 Architecture Synthesis is an optimization problem with more than one objective:
 Latency of the algorithm that is implemented
 Hardware cost (memory, communication, computing units, control)
 Power and energy consumption

 Optimization problems with several objectives are called “multiobjective


optimization problems”.

 Synthesis or design problems are typically multiobjective.

10 - 34
Multiobjective Optimization
 Let us suppose, we would like to select a typewriting device. Criteria are
 mobility (related to weight)
 comfort (related to keyboard size and performance)

2020

10 - 35
Multiobjective Optimization

writing comfort
better
1 Pareto-optimal
dominated

10

0.1 2 4 10 20 weight
10 - 36
Pareto-Dominance
:

dominated by solution k

dominate solution k

10 - 37
Pareto-optimal Set
 A solution is named Pareto-optimal, if it is not Pareto-dominated by any other
solution in X.
 The set of all Pareto-optimal solutions is denoted as the Pareto-optimal set and
its image in objective space as the Pareto-optimal front.

•f2

objective space Z: •dominated

•Pareto-optimal = not dominated


f1
10 - 38
Architecture Synthesis without Resource Constraints

10 - 39
Synthesis Algorithms
Classification
 unlimited resources:
 no constraints in terms of the available resources are defined.
 limited resources:
 constrains are given in terms of the number and type of available resources.

Classes of synthesis algorithms


 iterative algorithms:
 an initial solution to the architecture synthesis is improved step by step.
 constructive algorithms:
 the synthesis problem is solved in one step.
 transformative algorithms:
 the initial problem formulation is converted into a (classical) optimization problem.

10 - 40
Synthesis/Scheduling Without Resource Constraints
The corresponding scheduling method can be used
 as a preparatory step for the general synthesis problem
 to determine bounds on feasible schedules in the general case
 if there is a dedicated resource for each operation.

10 - 41
ASAP Algorithm
ASAP = As Soon As Possible

10 - 42
The ASAP Algorithm - Example
Example:
w(vi) = 1

10 - 43
ALAP Algorithm
ALAP = As Late As Possible

10 - 44
ALAP Algorithm - Example
Example:
nop 0
Lmax = 7
w(vi) = 1
1 x 2 x

3 x 6 x

4
-
7
x 8 x 10 +
- 9 11
+ <
5

nop 12
10 - 45
Scheduling with Timing Constraints
There are different classes of timing constraints:
 deadline (latest finishing times of operations), for example

 release times (earliest starting times of operations), for example

 relative constraints (differences between starting times of a pair of operations), for


example

10 - 46
10 - 47
Scheduling with Timing Constraints
We will model all timing constraints using relative constraints. Deadlines and
release times are defined relative to the start node v0.
Minimum, maximum and equality constraints can be converted into each other:
 Minimum constraint:

 Maximum constraint:

 Equality constraint:

10 - 48
Weighted Constraint Graph
Timing constraints can be represented in form of a weighted constraint graph:

10 - 49
Weighted Constraint Graph
 In order to represent a feasible schedule, we have one edge corresponding to
each precedence constraint with

where w(vi) denotes the execution time of vi.


 A consistent assignment of starting times (vi) to all operations can be done by
solving a single source longest path problem.

 A possible algorithm (Bellman-Ford) has complexity O(|VC| |EC|) (“iterative


ASAP”):

10 - 50
Weighted Constraint Graph - Example
Example: w(v1) = w(v3) = 2 w(v2) = w(v4) = 1

min.
time
max. 4
time
3

10 - 51
Architecture Synthesis with Resource Constraints

10 - 52
Scheduling With Resource Constraints

dependencies are respected


there are not more than the available
resources in use at any moment in
time and for any resource type

10 - 53
List Scheduling
List scheduling is one of the most widely used algorithms for scheduling under
resource constraints.

Principles:
 To each operation there is a priority assigned which denotes the urgency of being
scheduled. This priority is static, i.e. determined before the List Scheduling.
 The algorithm schedules one time step after the other.
 Uk denotes the set of operations that (a) are mapped onto resource vk and (b)
whose predecessors finished.
 Tk denotes the currently running operations mapped to resource vk .

10 - 54
List Scheduling

resource types

10 - 55
List Scheduling - Example
Example:

10 - 56
List Scheduling - Example
Solution via list scheduling:
 In the example, the solution is
independent of the chosen priority
function.

 Because of the greedy selection principle,


all resource are occupied in the first
time step.

 List scheduling is a heuristic algorithm:


In this example, it does not yield the minimal
latency!

10 - 57
List Scheduling
Solution via an optimal method:
 Latency is smaller than with
list scheduling.

 An example of an optimal
algorithm is the transformation
into an integer linear program as
described next.

10 - 58
Integer Linear Programming
Principle:

Synthesis Problem
transformation into ILP

Integer Linear Program (ILP)

optimization of ILP

Solution of ILP

back interpretation
Solution of Synthesis Problem

10 - 59
Integer Linear Program
 Yields optimal solution to synthesis problems as it is based on an exact
mathematical description of the problem.

 Solves scheduling, binding and allocation simultaneously.

 Standard optimization approaches (and software) are available to solve integer


linear programs:
 in addition to linear programs (linear constraints, linear objective function) some
variables are forced to be integers.
 much higher computational complexity than solving linear program
 efficient methods are based on (a) branch and bound methods and (b)
determining additional hyperplanes (cuts).

10 - 60
10 - 61
Integer Linear Program
 Many variants exist, depending on available information, constraints and
objectives, e.g. minimize latency, minimize resources, minimize memory. Just an
example is given here!!

 For the following example, we use the assumptions:


 The binding is determined already, i.e. every operation vi has a unique execution
time w(vi).
 We have determined the earliest and latest starting times of operations vi as li and
hi, respectively. To this end, we can use the ASAP and ALAP algorithms that have
been introduced earlier. The maximal latency Lmax is chosen such that a feasible
solution to the problem exists.

10 - 62
Integer Linear Program

10 - 63
10 - 64
10 - 65
Integer Linear Program
Explanations:
 (1) declares variables x to be binary .

 (2) makes sure that exactly one variable xi,t for all t has the value 1, all others are 0.
 (3) determines the relation between variables x and starting times of operations .
In particular, if xi,t = 1 then the operation vi starts at time t, i.e. (vi) = t.
 (4) guarantees, that all precedence constraints are satisfied.
 (5) makes sure, that the resource constraints are not violated. For all resource
types vk  VT and for all time instances t it is guaranteed that the number of active
operations does not increase the number of available resource instances.

10 - 66
Integer Linear Program
Explanations:
 (5) The first sum selects all operations that are mapped onto resource type vk. The
second sum considers all time instances where operation vi is occupying resource
type vk :

10 - 67
Architecture Synthesis for Iterative Algorithms and
Marked Graphs

10 - 68
Remember … : Marked Graph
Example (model of a digital filter with infinite impulse response IIR)
 Filter equation:

y (l )  a  u (l )  b  y (l  1)  c  y (l  2)  d  y (l  3)

 Possible model as a marked graph:

nodes 3-5:
a d c b w
2 3 4 5 6 7 x
x+w y
output y
y
1 9 8
input u fork node 2: x=0

10 - 69
Iterative Algorithms
 Iterative algorithms consist of a set of indexed equations that are evaluated for
all values of an index variable l:

Here, xi denote a set of indexed variables, Fi denote arbitrary functions and dji
are constant index displacements.

 Examples of well known representations are signal flow graphs (as used in signal
and image processing and automatic control), marked graphs and special forms
of loops.

10 - 70
Iterative Algorithms
Several representations of the same iterative algorithm:
 One indexed equation with constant index dependencies:

 Equivalent set of indexed equations:

10 - 71
Iterative Algorithms
Extended sequence graph GS = (VS, ES, d): To each edge (vi, vj)  ES there is associated
the index displacement dij. An edge (vi, vj)  ES denotes that the variable
corresponding to vj depends on variable corresponding to vi with displacement dij.

u x1 x2 x3 1
0 0 0 0
y
3 2

Equivalent marked graph:

u x1 x2 x3
y

10 - 72
Iterative Algorithms
 Equivalent signal flow graph:

a d c b y
u
z-1 z-1 z-1

 Equivalent loop program:


while(true) {
t1 = read(u);
t5 = a*t1 + d*t2 + c*t3 + b*t4;
t2 = t3;
t3 = t4;
t4 = t5;
write(y, t5);}

10 - 73
Iterative Algorithms
 An iteration is the set of all operations necessary to compute all variables xi[l]
for a fixed index l.

 The iteration interval P is the time distance between two successive iterations of
an iterative algorithm. 1/P denotes the throughput of the implementation.

 The latency L is the maximal time distance between the starting and the
finishing times of operations belonging to one iteration.

 In a pipelined implementation (functional pipelining), there exist time instances


where the operations of different iterations l are executed simultaneously.

10 - 74
Iterative Algorithms
 Implementation principles
 A simple possibility, the edges with dij > 0 are removed from the extended
sequence graph. The resulting simple sequence graph is implemented using
standard methods.

Example with unlimited resources:

0 1 2 2 2
execution
times w(vi)

t
one iteration L=7
one physical iteration P=7
no pipelining
10 - 75
Iterative Algorithms
Implementation principles
 Using functional pipelining: Successive iterations overlap and a higher throughput
(1/P) is obtained.

Example with unlimited resources (note data dependencies across iterations!)


u x1 x2 x3 1
0 0 0 0
0 1 2 2 2
y
3 2

• 4 resources
• functional pipelining
one physical iteration one iteration t
P=2 L=7
10 - 76
Iterative Algorithms
Solving the synthesis problem using integer linear programming:
 Starting point is the ILP formulation given for simple sequence graphs.

 Now, we use the extended sequence graph (including displacements dij ).

 ASAP and ALAP scheduling for upper and lower bounds hi and li use only edges
with dij = 0 (remove dependencies across iterations).

 We suppose, that a suitable iteration interval P is chosen beforehand. If it is too


small, no feasible solution to the ILP exists and P needs to be increased.

10 - 77
Integer Linear Program

10 - 78
Iterative Algorithms
Eqn.(4) is replaced by:

Proof of correctness:

dij i
i j j
t
P
dij P

10 - 79
Iterative Algorithms
Eqn. (5) is replaced by

Sketch of Proof: An operation vi starting at (vi) uses the corresponding resource at


time steps t with

Therefore, we obtain

10 - 80
Dynamic Voltage Scaling
If we transform the DVS problem into an integer linear program optimization: we
can optimize the energy in case of dynamic voltage scaling.

Shows how one can consider binding in an ILP.

As an example, let us model a set of tasks with dependency constraints.


 We suppose that a task vi  VS can use one of the execution times wk(vi)  k  K and
corresponding energy ek(vi). There are |K| different voltage levels.
 We suppose that there are deadlines d(vi) for each operation vi.
 We suppose that there are no resource constraints, i.e. all tasks can be executed in
parallel.

10 - 81
Dynamic Voltage Scaling

10 - 82
Dynamic Voltage Scaling

10 - 83
Dynamic Voltage Scaling
Explanations:
 The objective functions just sums up all individual energies of operations.
 Eqn. (1) makes decision variables yik binary.
 Eqn. (2) guarantees that exactly one implementation (voltage) k  K is
chosen for each operation vi .
 Eqn. (3) implements the precedence constraints, where the actual
execution time is selected from the set of all available ones.
 Eqn. (4) guarantees deadlines.

10 - 84
Chapter 8
 Not covered this semester.
 Not covered in exam.

 If interested: Read

10 - 85
Remember: What you got some time ago …

10 - 86
What we told you: Be careful and please do not …

10 - 87
Return the boards at the
embedded systems exam!

10 - 88

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy