Ece Viii Embedded System Design (06ec82) Notes
Ece Viii Embedded System Design (06ec82) Notes
Ece Viii Embedded System Design (06ec82) Notes
06EC82
UNIT- 1 INTRODUCTION: Overview of embedded systems, embedded system design challenges, common design metrics and optimizing them. Survey of different embedded system design technologies, trade-offs. Custom Single-Purpose Processors, Design of custom single purpose processor 4 Hours UNIT 2 SINGLE-PURPOSE PROCESSORS: Hardware, Combinational Logic, Sequential Logic, RT level Combinational and Sequential Components, Optimizing single-purpose processors. SinglePurpose Processors: Software, Basic Architecture, Operation, Programmers View, Development Environment, ASIPS. 6 Hours UNIT 3 Standard Single-Purpose Peripherals, Timers, Counters, UART, PWM, LCD Controllers, Keypad controllers, Stepper Motor Controller, A to D Converters, Examples. 6 Hours UNIT 4 MEMORY: Introduction, Common memory Types, Compulsory memory, Memory Hierarchy and Cache, Advanced RAM. Interfacing, Communication Basics, Microprocessor Interfacing, Arbitration, Advanced Communication Principles, Protocols - Serial, Parallel 8 Hours PART - B UNIT - 5 INTERRUPTS: Basics - Shared Data Problem - Interrupt latency. Survey of Software Architecture, Round Robin, Round Robin with Interrupts - Function Queues - scheduling RTOS architecture. 8 Hours UNIT 6 INTRODUCTION TO RTOS: MORE OS SERVICES: Tasks - states - Data - Semaphores and shared data. More operating systems services - Massage Queues - Mail Boxes -Timers Events - Memory Management. 8 Hours UNIT 7 & 8 Basic Design Using RTOS:Principles- An example, Encapsulating semaphores and Queues. Hard real-time scheduling considerations Saving Memory space and power. Hardware software co-design aspects in embedded systems. 12 Hours
ECE, SJBIT
06EC82
INDEX SHEET
SL.NO TOPIC PAGE NO. 8 to 32 8 to 9 9 to 12 13 to 15
UNIT - 1 INTRODUCTION: Overview of embedded systems 01 02 Embedded systems overview Design challenges, common design metrics Processor technology 03 IC technology 04 Design Technology 05 Tradeoffs 06
16 to 19
19 to 20
21 to 22
23 to 31 07 UNIT - 2 CUSTOM SINGLE-PURPOSE PROCESSORS HARDWARE: 32 to 36 01 Introduction, combinational logic Sequential logic 02 03 Custom single purpose processor design 39 to 40 36 to 38 Recommended questions and solutions 33 to 72
ECE, SJBIT
06EC82
40 to 42
42 to 44
10
ASIPs
UNIT - 3 Standard Single Purpose Processors : Peripherals Introduction, timers, counters watchdog timers 01 UART,PWM, 02 LCD controllers ,Stepper Motor controllers 03
73 to 74
75 to 76
77 to 79
ECE, SJBIT
06EC82
80 to 81
UNIT - 4 Memory and Microprocessor interfacing Intro, Memory write ability 01 Common memory types 02 Composing memory 03 Memory hierarchy and cache 04 Advanced RAM 05 Communication basics 06 Microprocessor interfacing 07 Arbitration 08 Multilevel Bus architectures 09
89 to 91
92 to 98
98 to 99
99 to 105
105 to 108
109 to 113
114 to 121
122 to 125
125 to 126
ECE, SJBIT
06EC82
Advanced communication principles 10 Recommended questions and solutions 11 UNIT - 5 INTERRUPTS and Survey of software architecture Shared Date problem 01 Round robin 02 Function queues 03 RTOS architecture 04 05 Recommended questions and solutions
126 to 132
133 to 152
154 to 174
154 to 157
157 to 161
161 to 162
UNIT - 6 INTRODUCTION TO RTOS , MORE ON OS SERVICES Tasks , states data 01 Semaphores 02 Messages queues, mail boxes 03
175 to 183
184 to 195
195 to 209
209 to 219 04
ECE, SJBIT
06EC82
UNIT 7 & 8 BASIC DESIGN USING RTOS Principles 01 Encapsulating semaphores 02 Hard real time scheduling considerations 03 Saving memory and power 04
231 to 270
230 to 234
234 to 258
258 to 258
258 to 260
ECE, SJBIT
06EC82
PART A UNIT- 1
INTRODUCTION: Overview of embedded systems, embedded system design challenges, common design metrics and optimizing them. Survey of different embedded system design technologies, trade-offs. Custom Single-Purpose Processors, Design of custom single purpose processors.
4 Hours
TEXT BOOKS: 1. Embedded System Design: A Unified Hardware/Software Introduction - Frank Vahid, Tony Givargis, John Wiley & Sons, Inc.2002 2. An Embedded software Primer - David E. Simon: Pearson Education, 1999
REFERENCE BOOKS: 1. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 2. Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
3.
ECE, SJBIT
06EC82
1.1.
An embedded system is nearly any computing system other than a desktop computer. An embedded system is a dedicated system which performs the desired function upon power up, repeatedly. Embedded systems are found in a variety of common electronic devices such as consumer electronics ex. Cell phones, pagers, digital cameras, VCD players, portable Video games, calculators, etc., Embedded systems are found in a variety of common electronic devices, such as: (a)consumer electronics -- cell phones, pagers, digital cameras, camcorders, videocassette recorders, portable video games, calculators, and personal digital assistants; (b) home appliances -- microwave ovens, answering machines, thermostat, home security, washing machines, and lighting systems; (c) office automation -- fax machines, copiers, printers, and scanners; (d) business equipment -- cash registers, curbside check-in, alarm systems, card readers, product scanners, and automated teller machines; (e) automobiles --transmission control, cruise control, fuel injection, anti-lock brakes, and active suspension. Common characteristics of Embedded systems : Embedded systems have several common characteristics that distinguish such system from other computing systems; 1. Single functioned :An Embedded system executes a single program repeatedly. The entire program is executed in a loop over and over again. 2. Tightly coupled (constrained):It should cost less, perform fast enough to process data in real time, must fit in a single chip, consume as much less power as possible, etc. 3. Reactive and real time: Embedded Systems should continuously react to changes in the environment. It should also process and compute data in real time without delay.
ECE, SJBIT
06EC82
ECE, SJBIT
06EC82
Size: physical space required by the system. Often measured in terms of bytes in case of software, and no. of gates in terms of hardware. Performance: execution/response time of the system. Power: The amount of power consumed by the system, which may define lifetime of the battery and cooling requirement of IC. More power means more heat. Flexibility: ability to change the functionality of the system. Time to prototype: time needed to build a working system w/o incurring heavy NRE. Time to market: time required to develop & released to the market. Maintainability: ability to modify the system after its release to the market. Correctness: our confidence that we have implemented systems functionality correctly. Safety: probability that the system does not cause any harm.
Metrics typically compete with one another: improving one often leads to worsening of another
The time to market: Introducing an embedded system early to the market can make big difference in terms of systems profitability. Market windows generally will be very
ECE, SJBIT
10
06EC82
narrow, often in the order of few months. Missing this window can mean significant loss in sales.
Fig 1.3 Time to Market (A) Market window (B) simplified revenue model for computing revenue loss
Lets investigate the loss of revenue that can occur due to delayed entry of a product in the market. We can use a simple triangle model y axis is the market rise, x axis to represent the point of entry to the market. The revenue for an on time market entry is the area of the triangle labeled on time and the revenue for a delayed entry product is the area of the triangle labeled Delayed. The revenue loss for a delayed entry is the difference of these triangles areas. % revenue loss = ((on time Delayed)/on time)*100 % The area of on time triangle = * base * height W -- height the market raise D -- Delayed entry ( in terms of weeks or months ) 2W products life time Area of on time triangle = *2W*W Area of delayed triangle=1/2*(W-D+W)*(W-D) %age revenue loss = (D (3W- D)/2W*W) * 100 % Ex: products life time is 52 weeks Delay of entry to the market is 4 weeks Percentage revenue loss = 22%
ECE, SJBIT
11
06EC82
Unlike other design metric the best technology choice will depend on the no of units. Tech. A B C would result in NRE cost $2000 $30000 100000 unit cost $100 $30 $2
Total cost= NRE cost + unit cost* no of units Per product cost = total cost/no of units = NRE cost/no of units + unit cost 1.2.3 The performance Design metric: Performance of a system is a measure of how long the system takes to execute our desired tasks. There are several measures of performance. The two main measures are Latency or response time Throughput : no of tasks that are processed in unit
speed up is a method of comparing performance of two systems Speed up of A over B = performance of A/performance of B.
ECE, SJBIT
12
06EC82
Application specific processor: may serve as a compromise between single purpose and general purpose. An ASIP is a programmable processor optimized for particular class of applications having common characteristics, such as embedded control, digital signal processing, or telecommunications. This provides flexibility, while achieving good performance, low power and size. General purpose processors: The designer of a general purpose or microprocessor, builds a programmable device that is suitable for a variety to maximize the sale. Design considerations Should accommodate different kinds of program Should provide general data path to handle variety of computations
Design technology: design technology involves converting our concepts of desired functionalities into an implementation. Design implementations should optimize design metrics and should also realize faster. Variations of top down design process have become popular
13
06EC82
many units. Performance may be fast for computation-intensive applications, if using a fast processor, due to advanced architecture features and leading edge IC technology. some design-metric drawbacks : Unit cost may be too high for large quantities. Performance may be slow for certain applications. Size and power may be large due to unnecessary processor hardware. Figure 1.4(d) illustrates the use of a single-purpose processor in our embedded system example, representing an exact fit of the desired functionality, nothing more, nothing less.
Fig : 1.4 Processors vary in their customization for the problem at hand: (a) desired functionality, (b) generalpurpose processor, (b) application-specific processor, (c)single-purpose processor.
ECE, SJBIT
14
06EC82
ECE, SJBIT
15
06EC82
3.Application Specific Processors: Application specific Instruction set processors (ASIP): They are programmable processors optimized for a particular class of applications having common characteristics. They strike a compromise between general-purpose and single-purpose processors. They have a program memory, an optimized data path and special functional units. They have good performance, some flexibility, size and power. An application-specific instruction-set processor (or ASIP) can serve as a compromise between the above processor options. An ASIP is designed for a particular class of applications with common characteristics, such as digital-signal processing, telecommunications, embedded control, etc. The designer of such a processor can optimize the datapath for the application class, perhaps adding special functional units for common operations, and eliminating other infrequently used units.
ECE, SJBIT
16
06EC82
Digital-signal processors (DSPs) are a common class of ASIP, so demand special mention. A DSP is a processor designed to perform common operations on digital signals, which are the digital encodings of analog signals like video and audio. These operations carry out common signal processing tasks like signal filtering, transformation,or combination. Such operations are usually math-intensive, including operations like multiply and add or shift and add. To support such operations, a DSP may have special purpose datapath components such a multiplyaccumulate unit, which can perform a computation like T = T + M[i]*k using only one instruction. Because DSP programs often manipulate large arrays of data, a DSP may also include special hardware to fetch sequential data memory locations in parallel with other operations, to further speed execution. Highlight merits and demerits of single purpose processors and general-purpose processors. Single Purpose Processors: Merits: 1. They are fast 2. They consume low power 3. They have small size 4. Unit cost may be low for large quantities
ECE, SJBIT
17
06EC82
Demerits: 1. NRE costs may be high 2. Low flexibility 3. Unit cost high for small quantities
4. Performance may not match for some applications
Merits: 1. High Flexibility 2. Low NRE costs 3. Low time to market 4. Performance may be for fast and high-intensive computations. De-Merits: 1. Unit cost may be relatively high for large quantities. 2. Performance may be slower for certain applications. 3. Size and power may be large due to unnecessary processor hardware.
How a single purpose processor is distinctly different from a general-purpose processor? Sl.No . 1. 2. Sl.No . 3. 4. Single Purpose Processor Executes exactly one program. The functionality cannot be changed. Single Purpose Processor They do not have program memory General Purpose Processor Executes any program written by the user. The functionality can be changed by the user by writing the required program. General Purpose Processor They have program memory
Do not have any flexibility and contain Has a very large amount of resource which resources required only for that particular may or may not be used for a particular functionality functionality as decided by the user
ECE, SJBIT
18
06EC82
5.
Merits include : They are fast, they consume low power, they have small size and the unit cost may be low for large quantities
Merits include : They have high Flexibility, Low NRE costs, Low time to market, Performance may be for fast and highintensive computations.
1.4 IC Technology
Every processor must eventually be implemented on an IC. IC technology involves the manner in which we map a digital (gate-level) implementation onto an IC. An IC (Integrated Circuit), often called a chip, is a semiconductor device consisting of a set of connected transistors and other devices. A number of different processes exist to build semiconductors, the most popular of which is CMOS (Complementary Metal Oxide Semiconductor). The IC technologies differ by how customized the IC is for a particular implementation. IC technology is independent from processor technology; any type of processor can be mapped to any type of IC technology.
Fig : 1. 8 The independence of processor and IC technologies: any processor technology can be mapped to any IC technology.
To understand the differences among IC technologies, we must first recognize that semiconductors consist of numerous layers. The bottom layers form the transistors. The middle layers form logic gates. The top layers connect these gates with wires. One way to create these layers is by depositing photo-sensitive chemicals on the chip surface and then shining light through masks to change regions of the chemicals. Thus, the task of building the layers is actually one of designing appropriate masks. A set of masks is often called a layout. The narrowest line that we can create on a chip is called the feature size, which today is well below one micrometer (sub-micron).
1.4.1 Full-custom/VLSI
In a full-custom IC technology, we optimize all layers for our particular embedded systems digital implementation. Such optimization includes placing the transistors to minimize interconnection lengths, sizing the transistors to optimize signal transmissions and routing wires among the transistors. Once we complete all the masks, we send the mask specifications to a
ECE, SJBIT
19
06EC82
fabrication plant that builds the actual ICs. Full-custom IC design, often referred to as VLSI (Very Large Scale Integration) design, has very high NRE cost and long turnaround times (typically months) before the IC becomes available, but can yield excellent performance with small size and power. It is usually used only in high-volume or extremely performance-critical applications.
1.4.3 PLD
In a PLD (Programmable Logic Device) technology, all layers already exist, so we can purchase the actual IC. The layers implement a programmable circuit, where programming has a lowerlevel meaning than a software program. The programming that takes place may consist of creating or destroying connections between wires that connect gates, either by blowing a fuse, or setting a bit in a programmable switch. Small devices, called programmers, connected to a desktop computer can typically perform such programming. We can divide PLD's into two types, simple and complex. One type of simple PLD is a PLA (Programmable Logic Array), which consists of a programmable array of AND gates and a programmable array of OR gates. Another type is a PAL (Programmable Array Logic), which uses just one programmable array to reduce the number of expensive programmable components. One type of complex PLD, growing very rapidly in popularity over the past decade, is the FPGA (Field Programmable Gate Array), which offers more general connectivity among blocks of logic, rather than just arrays of logic as with PLAs and PALs, and are thus able to implement far more complex designs. PLDs offer very low NRE cost and almost instant IC availability. However, they are typically bigger than ASICs, may have higher unit cost, may consume more power, and may be slower (especially FPGAs). They still provide reasonable performance, though, so are especially well suited to rapid prototyping.
ECE, SJBIT
20
06EC82
Variations of a top-down design process have become popular in the past decade, an ideal form of which is illustrated in the figure. The designer refines the system through several abstraction levels. At the system level the designer describes the desired functionality in an executable language like C. This is called system specification. The designer refines this specification by distributing portions of it among several general and/or single purpose processors, yielding behavioural specifications for each processor. The designer refines these specifications into register-transfer (RT) specifications by converting behaviour on general-purpose processors to assembly code, and by converting behaviour on single purpose processors to a connection of register-transfer components and state machines. The designer then refines the RT level specification into a logic specification. Finally, the designer refines the remaining specifications into an implementation consisting of machine code for general purpose processors and a design gate level net list for single purpose processors.
There are three main approaches to improving the design process for increased productivity, which we label as compilation/synthesis, libraries/IP, and test/verification. Several other approaches also exist.
ECE, SJBIT
21
06EC82
1.5.1 Compilation/Synthesis Compilation/Synthesis lets a designer specify desired functionality in an abstract manner, and automatically generates lower-level implementation details. Describing a system at high abstraction levels can improve productivity by reducing the amount of details, often by an order of magnitude, that a design must specify. A logic synthesis tool converts Boolean expressions into a connection of logic gates (called a netlist). A register-transfer (RT) synthesis tool converts finite-state machines and registertransfers into a datapath of RT components and a controller of Boolean equations. A behavioral synthesis tool converts a sequential program into finite-state machines and register transfers. Likewise, a software compiler converts a sequential program to assembly code, which is essentially register-transfer code. Finally, a system synthesis tool converts an abstract system specification into a set of sequential programs on general and single-purpose processors. The relatively recent maturation of RT and behavioral synthesis tools has enabled a unified view of the design process for single-purpose and general-purpose processors. Design for the former is commonly known as hardware design, and design for the latter as software design. In the past, the design processes were radically different software designers wrote sequential programs, while hardware designers connected components.
Fig 1.10 The co-design ladder: recent maturation of synthesis enables a unified view of hardware and software.
ECE, SJBIT
22
06EC82
1.5.2 Libraries/IP
Libraries involve re-use of pre-existing implementations. Using libraries of existing implementations can improve productivity if the time it takes to find, acquire, integrate and test a library item is less than that of designing the item oneself. A logic-level library may consist of layouts for gates and cells. An RT-level library may consist of layouts for RT components, like registers, multiplexors, decoders, and functional units. A behavioral-level library may consist of commonly used components, such as compression components, bus interfaces, display controllers, and even general purpose processors. The advent of system-level integration has caused a great change in this level of library. 1.5.3 Test/Verification Test/Verification involves ensuring that functionality is correct. Such assurance can prevent timeconsuming debugging at low abstraction levels and iterating back to high abstraction levels. Simulation is the most common method of testing for correct functionality, although more formal verification techniques are growing in popularity. At the logic level, gate level simulators provide output signal timing waveforms given input signal waveforms. Likewise, general-purpose processor simulators execute machine code. At the RT-level, hardware description language (HDL) simulators execute RT-level descriptions and provide output waveforms given input waveforms. At the behavioral level, HDL simulators simulate sequential programs, and co-simulators connect HDL and general purpose processor simulators to enable hardware/software co-verification. At the system level, a model simulator simulates the initial system specification using an abstract computation model, independent of any processor technology, to verify correctness and completeness of the specification.
ECE, SJBIT
23
06EC82
ECE, SJBIT
24
06EC82
General Purpose Processors: Merits: 5. 6. 7. 8. High Flexibility Low NRE costs Low time to market Performance may be for fast and high-intensive computations. De-Merits: 4. 5. Unit cost may be relatively high for large quantities. Performance may be slower for certain applications.
ECE, SJBIT
25
06EC82
5.
ECE, SJBIT
26
06EC82
They are programmable processors optimized for a particular class of applications having common characteristics. They strike a compromise between general-purpose and single-purpose processors. They have a program memory, an optimized data path and special functional units. They have good performance, some flexibility, size and power.
4. What are the common design metrics that a design engineer should consider?
NRE( non recurring engineering Cost) : The one time monetary cost of designing the system. Unit cost: Monetary cost of manufacturing each copy of the system, excluding NRE cost. Size: physical space required by the system. Often measured in terms of bytes in case of software, and no. of gates in terms of hardware. Performance: execution/response time of the system. Power: The amount of power consumed by the system, which may define lifetime of the battery and cooling requirement of IC. More power means more heat. Flexibility: ability to change the functionality of the system. Time to prototype: time needed to build a working system w/o incurring heavy NRE. Time to market: time required to develop & released to the market. Maintainability: ability to modify the system after its release to the market. Correctness: our confidence that we have implemented systems functionality correctly. Safety: probability that the system does not cause any harm.
Metrics typically compete with one another: improving one often leads to worsening of another
ECE, SJBIT
27
06EC82
The independence of processor and IC technologies: any processor technology can be mapped to any IC technology. To understand the differences among IC technologies, we must first recognize that semiconductors consist of numerous layers. The bottom layers form the transistors. The middle layers form logic gates. The top layers connect these gates with wires. One way to create these layers is by depositing photosensitive chemicals on the chip surface and then shining light through masks to change regions of the chemicals. Thus, the task of building the layers is actually one of designing appropriate masks. A set of masks is often called a layout. The narrowest line that we can create on a chip is called the feature size, which today is well below one micrometer (sub-micron). For each IC technology, all layers must eventually be built to get a working IC; the question is who builds each layer and when.
Q6. Derive the equation for percentage loss for any market rise . A product was delayed by 4 weeks in releasing to market. The peak revenue for on time entry to market would occur after 20 weeks for a market rise angle by 45. Find the percentage revenue loss.
Ans : Lets investigate the loss of revenue that can occur due to delayed entry of a product in the market. We can use a simple triangle model y axis is the market rise, x axis to represent the point of entry to the market. The revenue for an on time market entry is the area of the triangle labeled on time and the revenue for a delayed entry product is the area of the triangle labeled Delayed. The revenue loss for a delayed entry is the difference of these triangles areas. % revenue loss = ((on time Delayed)/on time)*100 % The area of on time triangle = * base * height W -- height the market raise
D -- Delayed entry ( in terms of weeks or months ) 2W products life time ECE, SJBIT
28
Embedded System Design Area of on time triangle = *2W*W Area of delayed triangle=1/2*(W-D+W)*(W-D) %age revenue loss = (D (3W- D)/2W*W) * 100 % Ex: products life time is 52 weeks Delay of entry to the market is 4 weeks Percentage revenue loss = 22%
06EC82
Q7. Compare GPP,SPP and ASSP along with their block diagrams .
1. General Purpose Processors Software
They are programmable devices used in a variety of applications. They are also known as microprocessors. They have a program memory and a general data path with a large register file and general ALU. The data path must be large enough to handle a variety of computations. The programmer writes the program to carry out the required functionality in the program memory and uses the features (instructions) provided by the general data path. This is called as the software portion of the system. The benefits of such a processor are very high. They require Low time-to-market and have low NRE costs. They provide a high flexibility. Design time and NRE cost are low, because the designer must only write a program, but need not do any digital design. Flexibility is high, because changing functionality requires only changing the program. Unit cost may be relatively low in small quantities, since the processor manufacturer sells large quantities to other customers and hence distributes the NRE cost over many units. Performance may be fast for computation-intensive applications, if using a fast processor, due to advanced architecture features and leading edge IC technology. some design-metric drawbacks : Unit cost may be too high for large quantities. Performance may be slow for certain applications. Size and power may be large due to unnecessary processor hardware. Figure 1.4(d) illustrates the use of a single-purpose processor in our embedded system example, representing an exact fit of the desired functionality, nothing more, nothing less.
ECE, SJBIT
29
06EC82
Fig : 1.4 Processors vary in their customization for the problem at hand: (a) desired functionality, (b) generalpurpose processor, (b) application-specific processor, (c)single-purpose processor.
An embedded system designer creates a single-purpose processor by designing a custom digital circuit. Using a single-purpose processor in an embedded system results in several design metric benefits and drawbacks, which are essentially the inverse of those for general purpose processors. Performance may be fast, size and power may be small, and unit-cost may be low for large quantities, while design time and NRE costs may be high, flexibility is low, unit cost may be high for small quantities, and performance may not match general-purpose processors for some applications.
ECE, SJBIT
30
06EC82
ECE, SJBIT
31
06EC82
Digital-signal processors (DSPs) are a common class of ASIP, so demand special mention. A DSP is a processor designed to perform common operations on digital signals, which are the digital encodings of analog signals like video and audio. These operations carry out common signal processing tasks like signal filtering, transformation,or combination. Such operations are usually math-intensive, including operations like multiply and add or shift and add. To support such operations, a DSP may have special purpose datapath components such a multiply-accumulate unit, which can perform a computation like T = T + M[i]*k using only one instruction. Because DSP programs often manipulate large arrays of data, a DSP may also include special hardware to fetch sequential data memory locations in parallel with other operations, to further speed execution.
ECE, SJBIT
32
06EC82
UNIT 2
SINGLE-PURPOSE PROCESSORS: Hardware, Combinational Logic, Sequential Logic, RT level Combinational and Sequential Components, Optimizing single-purpose processors. SinglePurpose Processors: Software, Basic Architecture, Operation, Programmers View, Development Environment, ASIPS. 6 Hours
TEXT BOOKS: 1. Embedded System Design: A Unified Hardware/Software Introduction - Frank Vahid, Tony Givargis, John Wiley & Sons, Inc.2002
REFERENCE BOOKS: 1. 2. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
3.
ECE, SJBIT
33
06EC82
Fig 2.1 view of CMOS transistor on silicon The CMOS transistor consists of Gate, source and drain , where gate controls the current flow from source to drain. The voltage of +3V or +5V can be supply which will refer to logic 1 and low voltage is typically ground and treated as logic 0.
ECE, SJBIT
34
06EC82
When logic 1 is applied to gate transistor conducts so current flows When logic 0 is applied to gate transistor does not conduct.
Digital system designers work at the abstraction level of logic gates where each gate is represented symbolically with Boolean equation as shown in figure 2.3
ECE, SJBIT
35
06EC82
Fig 2.4 combi design : problem , TT, output , minimized , final ckt.
ECE, SJBIT
36
06EC82
2.3 Sequential logic a. Flip flops b. RT level sequential components c. Sequential logic design
2.3.1 Flip flops
A sequential circuit is a digital circuit whose outputs are a function of the present as well as previous input values. Basic sequential circuits is a flip flop. A flip flop stores a single bit.
ECE, SJBIT
37
06EC82
D-flip flop: It has two inputs D and clock, when clock is 1, value of D is stored in flip
flop and output Q occurs. When clock is 0, previously stored bit is maintained and output appears at Q. SR Flip flop : It has three inputs S,R,clock , when clock is 1, inputs S and R are examined , if S is 1 ,1 is stored. If R is 1, 0 is stored. If both S and R is 0, there is no change. If both are 1 behavior is undefined. Thus S stands for set and R for reset.
ECE, SJBIT
38
06EC82
39
06EC82
Fig 2.8 A basic processor(a) controller and datapath (b) view inside the controller and datapath
Example program : First create algorithm Convert algorithm to complex state machine Known as FSMD: finite-state machine with datapath Can use templates to perform such conversion
ECE, SJBIT
40
06EC82
Create a register for any declared variable Create a functional unit for each arithmetic operation Connect the ports, registers and functional units Based on reads and writes Use multiplexors for multiple sources Create unique identifier for each datapath component control input and output
ECE, SJBIT
41
06EC82
ECE, SJBIT
42
06EC82
ECE, SJBIT
43
06EC82
44
06EC82
Areas of possible improvements merge states states with constants on transitions can be eliminated, transition taken is already known states with independent operations can be merged separate states states which require complex operations (a*b*c*d) can be broken into smaller states to reduce hardware size scheduling
Fig 2.16 optimizing the FSDM for GCD Optimizing the datapath:
ECE, SJBIT
45
06EC82
Sharing of functional units one-to-one mapping, as done previously, is not necessary if same operation occurs in different states, they can share a single functional unit Multi-functional units ALUs support a variety of operations, it can be shared among operations occurring in different states
ECE, SJBIT
46
06EC82
Basic Architecture:
A general purpose processor sometimes called a CPU consists of datapath and a control unit linked with memory. Control unit and datapath Note similarity to single-purpose processor Key differences Datapath is general Control unit doesnt store the algorithm the algorithm is programmed into the memory
Datapath Operations:
Load Read memory location into register ALU operation Input certain registers through ALU, store back in register Store Write register to memory location
ECE, SJBIT
47
06EC82
Control unit :
Control unit: configures the datapath operations Sequence of desired operations (instructions) stored in memory program Instruction cycle broken into several sub-operations, each one clock cycle, e.g.: Fetch: Get next instruction into IR Decode: Determine what the instruction means Fetch operands: Move data from memory to datapath register Execute: Move data through the ALU Store results: Write data from register to memory
ECE, SJBIT
48
06EC82
Memory:
Program information consists of the sequence of instructions that cause the processor to carry out the desired system functionality. Data information represents the values being input, output and transformed by the program. We can store program and data together or separately.. In a Princeton architecture,data and program words share the same memory space. The Princeton architecture may result in a simpler hardware connection to memory, since only one connection is necessary. In a Harvard architecture, the program memory space is distinct from the data memory space. A Harvard architecture,while requiring two connections, can perform instruction and data fetches simultaneously, so may result in improved performance. Most machines have a Princeton architecture. The Intel 8051 is a well-known Harvard architecture.
ECE, SJBIT
49
06EC82
Memory may be read-only memory (ROM) or readable and writable memory (RAM). ROM is usually much more compact than RAM. An embedded system often uses ROM for program memory, since, unlike in desktop systems, an embedded systems program does not change. Constant-data may be stored in ROM, but other data of course requires RAM. Memory may be on-chip or off-chip. On-chip memory resides on the same IC as the processor, while off-chip memory resides on a separate IC. The processor can usually access on-chip memory must faster than off-chip memory, perhaps in just one cycle, but finite IC capacity of course implies only a limited amount of on-chip memory.
To reduce the time needed to access (read or write) memory, a local copy of a portion of memory may be kept in a small but especially fast memory called cache. Cache
ECE, SJBIT
50
06EC82
memory often resides on-chip, and often uses fast but expensive static RAM technology rather than slower but cheaper dynamic RAM. Cache memory is based on the principle that if at a particular time a processor accesses a particular memory location, then the processor will likely access that location and immediate neighbors of the location in the near future.
Operation:
Instruction execution: 1. Fetch instruction: the task of reading the next instruction from memory into the instruction register. 2. Decode instruction: the task of determining what operation the instruction in the instruction register represents (e.g., add, move, etc.). 3. Fetch operands: the task of moving the instructions operand data into appropriate registers. 4. Execute operation: the task of feeding the appropriate registers through the ALU and back into an appropriate register. 5. Store results: the task of writing a register into memory. If each stage takes one clock cycle, then we can see that a single instruction may take several cycles to complete.
Pipelining
Pipelining is a common way to increase the instruction throughput of a microprocessor. We first make a simple analogy of two people approaching the chore of washing and drying 8 dishes. In one approach, the first person washes all 8 dishes, and then the second person dries all 8 dishes. Assuming 1 minute per dish per person, this approach requires 16 minutes. The approach is clearly inefficient since at any time only one person is working and the other is idle. Obviously, a better approach is for the second person to begin drying the first dish immediately after it has been washed. This approach requires only 9 minutes -- 1 minute for the first dish to be washed, and then 8 more minutes until the last dish is finally dry . We refer to this latter approach as pipelined.
ECE, SJBIT
51
06EC82
Figure 2.21: Pipelining: (a) non-pipelined dish cleaning, (b) pipelined dish cleaning, (c) pipelined instruction execution.
Each dish is like an instruction, and the two tasks of washing and drying are like the five stages listed above. By using a separate unit (each akin a person) for each stage, we can pipeline instruction execution. After the instruction fetch unit etches the first instruction, the decode unit decodes it while the instruction fetch unit simultaneously fetches the next instruction.
Superscalar and VLIW Architectures: Performance can be improved by: Faster clock (but theres a limit) Pipelining: slice up instruction into stages, overlap stages Multiple ALUs to support more than one instruction stream Superscalar Scalar: non-vector operations Fetches instructions in batches, executes as many as possible May require extensive hardware to detect independent instructions VLIW: each word in memory has multiple independent instructions
ECE, SJBIT
52
06EC82
Relies on the compiler to detect and schedule instructions Currently growing in popularity
Programmers View
Programmer doesnt need detailed understanding of architecture Instead, needs to know what instructions can be executed Two levels of instructions: Assembly level Structured languages (C, C++, Java, etc.) Most development today done using structured languages But, some assembly level programming may still be necessary Drivers: portion of program that communicates with and/or controls (drives) another device Often have detailed timing considerations, extensive bit manipulation Assembly level may be best for these
53
06EC82
Instruction Set:
Defines the legal set of instructions for that processor Data transfer: memory/register, register/register, I/O, etc. Arithmetic/logical: move register through ALU and back Branches: determine next PC value when not just PC+1
Addressing Modes:
ECE, SJBIT
54
06EC82
Program and data memory space The embedded systems programmer must be aware of the size of the available memory for program and for data. The programmer must not exceed these limits. In addition, the programmer will probably want to be aware of on-chip program and data memory capacity, taking care to fit the necessary program and data in on-chip memory if possible.
Registers
The assembly-language programmer must know how many registers are available for general-purpose data storage. For example, a base register may exist, which permits the programmer to use a data-transfer instruction where the processor adds an operand field to the base register to obtain an actual memory address.
I/O
The programmer should be aware of the processors input and output (I/O) facilities, with which the processor communicates with other devices. One common I/O facility is parallel I/O, in which the programmer can read or write a port (a collection of external pins) by reading or writing a special-function register. Another common I/O facility is a system bus, consisting of address and data ports that are automatically activated by
ECE, SJBIT
55
06EC82
Interrupts
An interrupt causes the processor to suspend execution of the main program, and instead jump to an Interrupt Service Routine (ISR) that fulfills a special, short-term processing need. In particular, the processor stores the current PC, and sets it to the address of the ISR. After the ISR completes, the processor resumes execution of the main program by restoring the PC.The programmer should be aware of the types of interrupts supported by the processor (we describe several types in a subsequent chapter), and must write ISRs when necessary. The assembly-language programmer places each ISR at a specific address in program memory. The structured-language programmer must do so also; some compilers allow a programmer to force a procedure to start at a particular memory location, while recognize pre-defined names for particular ISRs. For example, we may need to record the occurrence of an event from a peripheral device, such as the pressing of a button. We record the event by setting a variable in memory when that event occurs, although the users main program may not process that event until later. Rather than requiring the user to insert checks for the event throughout the main program, the programmer merely need write an interrupt service routine and associate it with an input pin connected to the button. The processor will then call the routine automatically when the button is pressed.
Operating System
Optional software layer providing low-level services to a program (application). File management, disk access Keyboard/display interfacing Scheduling multiple programs for execution Or even just multiple threads from one program Program makes system calls to the OS
Development Environment
Development processor The processor on which we write and debug our programs Usually a PC Target processor
ECE, SJBIT
56
06EC82
The processor that the program will run on in our embedded system Often different from the development processor
ECE, SJBIT
57
06EC82
Running a Program:
If development processor is different than target, how can we run our compiled code? Two options: Download to target processor Simulate Simulation One method: Hardware description language But slow, not always available Another method: Instruction set simulator (ISS) Runs on development processor, but executes instructions of target processor ISS Gives us control over time set breakpoints, look at register values, set values, step-by-step execution, ... But, doesnt interact with real environment Download to board Use device programmer Runs in real environment, but not controllable Compromise: emulator Runs in real environment, at speed or near Supports some controllability from the PC
58
06EC82
Application-Specific Instruction-Set Processors (ASIPs): General-purpose processors Sometimes too general to be effective in demanding application e.g., video processing requires huge video buffers and operations on large arrays of data, inefficient on a GPP But single-purpose processor has high NRE, not programmable ASIPs targeted to a particular domain Contain architectural features specific to that domain e.g., embedded control, digital signal processing, video processing, network processing, telecommunications, etc. Still programmable A Common ASIP: Microcontroller For embedded control applications Reading sensors, setting actuators Mostly dealing with events (bits): data is present, but not in huge amounts e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing machine, microwave oven Microcontroller features On-chip peripherals Timers, analog-digital converters, serial communication, etc. Tightly integrated for programmer, typically part of register space On-chip program and data memory Direct programmer access to many of the chips pins Specialized instructions for bit-manipulation and other low-level operations
59
06EC82
DSP features Several instruction execution units Multiple-accumulate single-cycle instruction, other instrs. Efficient vector operations e.g., add two arrays Vector ALUs, loop buffers, etc.
Selecting a Microprocessor
Issues Technical: speed, power, size, cost Other: development environment, prior expertise, licensing, etc. Speed: how evaluate a processors speed? Clock speed but instructions per cycle may differ Instructions per second but work per instr. may differ Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec. MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digitals VAX 11/780). A.k.a. Dhrystone MIPS. Commonly used today. So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second SPEC: set of more realistic benchmarks, but oriented to desktops EEMBC EDN Embedded Benchmark Consortium, Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications
ECE, SJBIT
60
06EC82
ECE, SJBIT
61
06EC82
UNIT 2 ( Software)
1. Describe why a general purpose processor could cost less than a single purpose processor. 2. Create a table listing the address spaces for 8 ,16, 24,32, 64 bit address sizes. 3. Illustrate how program and data memory fetches can be overlapped in a Harvard architecture. 4. For a microcontroller create a table listing Five existing variations stressing the features that differ from the basic version.
ECE, SJBIT
62
06EC82
QUESTION PAPER SOLUTION UNIT 2 Q1. Write an algorithm for GCD with more time complexity and write the FSDM and also determine total number of steps required for GCD.
First create algorithm Convert algorithm to complex state machine Known as FSMD: finite-state machine with datapath Can use templates to perform such conversion
GCD
Create a register for any declared variable Create a functional unit for each arithmetic operation Connect the ports, registers and functional units Based on reads and writes Use multiplexors for multiple sources Create unique identifier
ECE, SJBIT
63
06EC82
Optimization is the task of making design metric values the best possible Optimization opportunities original program
ECE, SJBIT
64
06EC82
ECE, SJBIT
65
06EC82
Memory may be read-only memory (ROM) or readable and writable memory (RAM). ROM is usually much more compact than RAM. An embedded system often uses ROM for program memory, since, unlike in desktop systems, an embedded systems program does not change. Constant-data may be stored in ROM, but other data of course requires RAM. Memory may be on-chip or off-chip. On-chip memory resides on the same IC as the processor, while off-chip memory resides on a separate IC. The processor can usually access on-chip memory must faster than off-chip memory, perhaps in just one cycle, but finite IC capacity of course implies only a limited amount of on-chip memory.
ECE, SJBIT
66
06EC82
: Pipelining: (a) non-pipelined dish cleaning, (b) pipelined dish cleaning, (c) pipelined instruction execution.
Each dish is like an instruction, and the two tasks of washing and drying are like the five stages listed above. By using a separate unit (each akin a person) for each stage, we can pipeline instruction execution. After the instruction fetch unit etches the first instruction, the decode unit decodes it while the instruction fetch unit simultaneously fetches the next instruction.
ECE, SJBIT
67
06EC82
Running a Program:
If development processor is different than target, how can we run our compiled code? Two options: Download to target processor Simulate Simulation One method: Hardware description language But slow, not always available Another method: Instruction set simulator (ISS) Runs on development processor, but executes instructions of target processor ISS Gives us control over time set breakpoints, look at register values, set values, step-by-step execution, ... But, doesnt interact with real environment Download to board Use device programmer
ECE, SJBIT
68
06EC82
Runs in real environment, but not controllable Compromise: emulator Runs in real environment, at speed or near Supports some controllability from the PC
ECE, SJBIT
69
06EC82
70
06EC82
71
06EC82
task of assigning a unique bit pattern to each state in an FSM size of state register and combinational logic vary can be treated as an ordering problem State minimization task of merging equivalent states into a single state state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state
ECE, SJBIT
72
06EC82
UNIT 3
Standard Single-Purpose Peripherals, Timers, Counters, UART, PWM, LCD Controllers, Keypad controllers, Stepper Motor Controller, A to D Converters, Examples. 6 Hours
TEXT BOOKS: 1. Embedded System Design: A Unified Hardware/Software Introduction - Frank Vahid, Tony Givargis, John Wiley & Sons, Inc.2002 2. An Embedded software Primer - David E. Simon: Pearson Education, 1999 REFERENCE BOOKS: 1. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 2. Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 3. Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
ECE, SJBIT
73
06EC82
ECE, SJBIT
74
06EC82
Fig: 3.1 Timer structure: basic timer, counter, timer with count, timer with prescalar .
ECE, SJBIT
75
06EC82
3.3 UART
A UART (Universal Asynchronous Receiver/Transmitter) receives serial data and stores it as parallel data (usually one byte), and takes parallel data and transmits it as serial data.Such serial communication is beneficial when we need to communicate bytes of data between devices separated by long distances, or when we simply have few available I/O pins. Internally, a simple UART may possess a baud-rate configuration register, and two independently operating processors, one for receiving and the other for transmitting. The transmitter may possess a register, often called a transmit buffer, that holds data to be sent. This register is a shift register, so the data can be transmitted one bit at a time by shifting at the appropriate rate.
76
06EC82
To use a UART, we must configure its baud rate by writing to the configuration register, and then we must write data to the transmit register and/or read data from the received register. Baudrate = (2s mod / 32) *oscfreq / (12 *(256 - TH1))) smod corresponds to 2 bits in a special-function register, oscfreq is the frequency of the oscillator, and TH1 is an 8-bit rate register of a built-in timer.
ECE, SJBIT
77
06EC82
RS 0 0 0 0 0 0 1
R/W 0 0 0 0 0 0 0
DB7 0 0 0 0 0 0
DB6 0 0 0 0 0 0
DB5 0 0 0 0 0 1
DB4 0 0 0 0 1 DL
DB3 0 0 0 1 S/C N
DB2 0 0 1 D R/L F
DB1 0 1 I/D C * *
DB0 1 * S B * *
Description Clears all display, return cursor home Returns cursor home Sets cursor move direction and/or specifies not to shift display ON/OFF of all display(D), cursor ON/OFF (C), and blink position (B) Move cursor and shifts display Sets interface data length, number of display lines, and character font Writes Data
WRITE DATA
CODES I/D = 1 cursor moves left I/D = 0 cursor moves right S = 1 with display shift S/C =1 display shift S/C = 0 cursor movement R/L = 1 shift to right R/L = 0 shift to left DL = 1 8-bit DL = 0 4-bit N = 1 2 rows N = 0 1 row F = 1 5x10 dots F = 0 5x7 dots
78
06EC82
One of the simplest LCDs is 7-segment LCD. Each of the 7 segments can be activated to display any digit character or one of several letters and symbols.
79
06EC82
Sequence 1 2 3 4 5
A + + +
B + + +
A + + -
B + + -
80
06EC82
81
06EC82
82
06EC82
RECOMMENDED QUESTIONS UNIT - 3 1. A timer has a clock frequency of 10MHz. determine its range and resolution, terminal count needed to measure 3ms intervals. 2. A watch dog timer that uses two cascaded 16 bit up counters is connected to an 11.981 MHz oscillator. A timeout should occur if the function watch dog reset is not called within 5 minutes. What value should be loaded into the up counter pair when the function is called? 3. Determine the values for smod and TH1 to generate baud rate of 9600 for the 8.51 microcontroller baud rate equation assuming an 11.981 MHz oscillator . 4. Using PWM circuit compute the value assigned to PWM1 to achieve an RPM of 8050 assuming the input voltage needed is 4.375 V. 5. Write a function in pseudocode that initializes the LCD . 6. Compute the memory needed in bytes to store a 4-bit digital encoding of a 3-second analog audio signal sampled every 10 milliseconds. 7. Given an analog input signal whose voltage ranges from -5 to 5V, and a 8-bit digital encoding calculate the correct encoding 1.2V and then trace the successive approximation approach to find the correct encoding. 8. Extend the ratio and resolution equations of analog to digital conversion to any voltage range between Vmin to vmax rather than 0 to Vmax.
ECE, SJBIT
83
06EC82
ECE, SJBIT
84
06EC82
Fig: Timer structure: basic timer, counter, timer with count, timer with prescalar.
Sequence 1 2 3 4 5
ECE, SJBIT
A + + +
B + + +
A + + -
B + + -
85
06EC82
ECE, SJBIT
86
06EC82
Q5. The analog input range for an 8-bit ADC is -5V to +5V. determine the resolution of this ADC and also the digital output in binary when the input is -2V.also trace the successive approximation steps fro verification.
An analog-to-digital converter (ADC, A/D or A2D) converts an analog signal to a digital signal, and a digital-to-analog converter (DAC, D/A or D2A) does the opposite. Such conversions are necessary because, while embedded systems deal with digital values, an embedded systems surroundings typically involve many analog signals. Analog refers to continuously-valued signal, such as temperature or speed represented by a voltage between 0 and 100, with infinite possible values in between. "Digital" refers to discretely-valued signals, such as integers, and in computing systems, these signals are encoded in binary. By converting between analog and digital signals, we can use digital processors in an analog environment.
ECE, SJBIT
87
06EC82
ECE, SJBIT
88
06EC82
UNIT 4
MEMORY: Introduction, Common memory Types, Compulsory memory, Memory Hierarchy and Cache, Advanced RAM. Interfacing, Communication Basics, Microprocessor Interfacing, Arbitration, Advanced Communication Principles, Protocols - Serial, Parallel and Wireless.
8 Hours
TEXT BOOKS: 1. Embedded System Design: A Unified Hardware/Software Introduction - Frank Vahid, Tony Givargis, John Wiley & Sons, Inc.2002
REFERENCE BOOKS: 1. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 2. Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 3. Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
ECE, SJBIT
89
06EC82
Fig : 1a Memory (words and bits per word) (1b)memory block diagram
ECE, SJBIT
90
06EC82
read and write, lose stored bits without power Traditional distinctions blurred Advanced ROMs can be written to e.g., EEPROM Advanced RAMs can hold bits without power e.g., NVRAM Write ability Manner and speed a memory can be written Storage permanence ability of memory to hold stored bits after they are written
ECE, SJBIT
91
06EC82
Middle range processor writes to memory, but slower e.g., FLASH, EEPROM Lower range special equipment, programmer, must be used to write to memory e.g., EPROM, OTP ROM Low end bits stored only during fabrication e.g., Mask-programmed ROM In-system programmable memory Can be written to by a processor in the embedded system using the memory Memories in high end and middle range of write ability
Storage permanence:
Range of storage permanence High end essentially never loses bits e.g., mask-programmed ROM Middle range holds bits days, months, or years after memorys power source turned off e.g., NVRAM Lower range holds bits as long as power supplied to memory e.g., SRAM Low end begins to lose bits almost immediately after written e.g., DRAM Nonvolatile memory Holds bits after power is no longer supplied High end and middle range of storage permanence
ECE, SJBIT
92
06EC82
Example: 8 x 4 ROM
Horizontal lines = words Vertical lines = data Lines connected only at circles Decoder sets word 2s line to 1 if address input is 010 Data lines Q3 and Q1 are set to 1 because there is a programmed connection with word 2s line Word 2 is not connected with data lines Q2 and Q0 Output is 1010
ECE, SJBIT
93
06EC82
Mask-programmed ROM Connections programmed at fabrication set of masks Lowest write ability only once Highest storage permanence bits never change unless damaged Typically used for final design of high-volume systems spread out NRE cost for a low unit cost OTP ROM: One-time programmable ROM
Connections programmed after manufacture by user user provides file of desired contents of ROM file input to machine called ROM programmer each programmable connection is a fuse ROM programmer blows fuses where connections should not exist Very low write ability typically written only once and requires ROM programmer device Very high storage permanence
ECE, SJBIT
94
06EC82
bits dont change unless reconnected to programmer and more fuses blown Commonly used in final products cheaper, harder to inadvertently modify
ECE, SJBIT
95
06EC82
EEPROM: Electrically erasable programmable ROM: Programmed and erased electronically typically by using higher than normal voltage can program and erase individual words Better write ability can be in-system programmable with built-in circuit to provide higher than normal voltage built-in memory controller commonly used to hide details from memory user writes very slow due to erasing and programming busy pin indicates to processor EEPROM still writing can be erased and programmed tens of thousands of times Similar storage permanence to EPROM (about 10 years) Far more convenient than EPROMs, but more expensive
ECE, SJBIT
96
06EC82
Flash Memory: Extension of EEPROM Same floating gate principle Same write ability and storage permanence Fast erase Large blocks of memory erased at once, rather than one word at a time Blocks typically several thousand bytes large Writes to single words may be slower Entire block must be read, word updated, then entire block written back Used with embedded systems storing large data items in nonvolatile memory e.g., digital cameras, TV set-top boxes, cell phones RAM: Random-access memory Typically volatile memory bits are not held without power supply Read and written to easily by embedded system during execution Internal structure more complex than ROM a word consists of several memory cells, each storing 1 bit each input and output data line connects to each cell in its column rd/wr connected to every cell when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit when rd/wr indicates read
ECE, SJBIT
97
06EC82
ECE, SJBIT
98
06EC82
Ram variations:
PSRAM: Pseudo-static RAM DRAM with built-in memory refresh controller Popular low-cost high-density alternative to SRAM NVRAM: Nonvolatile RAM Holds data after external power removed Battery-backed RAM SRAM with own permanently connected battery writes as fast as reads no limit on number of writes unlike nonvolatile ROM-based memory SRAM with EEPROM or flash stores complete RAM contents on EEPROM or flash before power turned off
Low-cost low-capacity memory devices Commonly used in 8-bit microcontroller-based embedded systems First two numeric digits indicate device type RAM: 62 ROM: 27 Subsequent digits indicate capacity in kilobits Example: TC55V2325FF-100 memory device 2-megabit synchronous pipelined burst SRAM memory device Designed to be interfaced with 32-bit processors Capable of fast sequential reads and writes as well as single byte I/O
Composing memory
Memory size needed often differs from size of readily available memories When available memory is larger, simply ignore unneeded high-order address bits and higher data lines
ECE, SJBIT
99
06EC82
When available memory is smaller, compose several smaller memories into one larger memory Connect side-by-side to increase width of words Connect top to bottom to increase number of words added high-order address line selects smaller memory containing desired word using a decoder Combine techniques to increase number and width of words
ECE, SJBIT
100
06EC82
Memory hierarchy: Want inexpensive, fast memory Main memory Large, inexpensive, slow memory stores entire program and data Cache Small, expensive, fast memory stores copy of likely accessed parts of larger memory Can be multiple levels of cache
Cache:
Usually designed with SRAM faster but more expensive than DRAM Usually on same chip as processor space limited, so much smaller than off-chip main memory faster access ( 1 cycle vs. several cycles for main memory) Cache operation: Request for main memory access (read or write) First, check cache for copy
ECE, SJBIT
101
06EC82
cache hit copy is in cache, quick access cache miss copy not in cache, read address and possibly its neighbors into cache Several cache design choices cache mapping, replacement policies, and write techniques
Cache mapping
Far fewer number of available cache addresses Are address contents in cache? Cache mapping used to assign main memory address to cache address and determine hit or miss Three basic techniques: Direct mapping Fully associative mapping Set-associative mapping Caches partitioned into indivisible blocks or lines of adjacent memory addresses usually 4 or 8 addresses per line
Direct mapping
Main memory address divided into 2 fields Index cache address number of bits determined by cache size Tag compared with tag stored in cache at address indicated by index if tags match, check valid bit
ECE, SJBIT
102
06EC82
Valid bit indicates whether data in slot has been loaded from memory Offset used to find particular word in cache line
Set-associative mapping
ECE, SJBIT
103
06EC82
Compromise between direct mapping and fully associative mapping Index same as in direct mapping But, each cache address contains content and tags of 2 or more memory address locations Tags of that set simultaneously compared as in fully associative mapping Cache with set size N called N-way set-associative 2-way, 4-way, 8-way are common
Cache-replacement policy
Technique for choosing which block to replace when fully associative cache is full when set-associative caches line is full Direct mapped cache has no choice Random replace block chosen at random LRU: least-recently used replace block not accessed for longest time FIFO: first-in-first-out push block onto queue when accessed choose block to replace by popping queue
ECE, SJBIT
104
06EC82
ECE, SJBIT
105
06EC82
8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles (worse) Cache performance trade-offs: Improving cache hit rate without increasing size Increase line size Change set-associativity
Advanced RAM
DRAMs commonly used as main memory in processor based embedded systems high capacity, low cost Many variations of DRAMs proposed need to keep pace with processor speeds FPM DRAM: fast page mode DRAM EDO DRAM: extended data out DRAM SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM RDRAM: rambus DRAM
ECE, SJBIT
106
06EC82
Basic DRAM Address bus multiplexed between row and column components Row and column addresses are latched in, sequentially, by strobing ras and cas signals, respectively Refresh circuitry can be external or internal to DRAM device strobes consecutive memory address periodically causing memory content to be refreshed Refresh circuitry disabled during read or write operation
Fast Page Mode DRAM (FPM DRAM) Each row of memory bit array is viewed as a page Page contains multiple words Individual words addressed by column address Timing diagram: row (page) address sent 3 words read consecutively by sending column address for each Extra cycle eliminated on each read/write of words from same page
ECE, SJBIT
107
06EC82
Extended data out DRAM (EDO DRAM) Improvement of FPM DRAM Extra latch before output buffer allows strobing of cas before data read operation completed Reduces read/write latency by additional cycle
ECE, SJBIT
108
06EC82
Rambus DRAM (RDRAM) More of a bus interface architecture than DRAM architecture Data is latched on both rising and falling edge of clock Broken into 4 banks each with own row decoder can have 4 pages open at a time Capable of very high throughput DRAM integration problem SRAM easily integrated on same chip as processor DRAM more difficult Different chip making process between DRAM and conventional logic Goal of conventional logic (IC) designers: minimize parasitic capacitance to reduce signal propagation delays and power consumption Goal of DRAM designers: create capacitor cells to retain stored information Integration processes beginning to appear Memory Management Unit (MMU) Duties of MMU Handles DRAM refresh, bus interface and arbitration Takes care of memory sharing among multiple processors
ECE, SJBIT
109
06EC82
Translates logic memory addresses from processor to physical memory addresses of DRAM Modern CPUs often come with MMU built-in Single-purpose processors can be used
Introduction to interfacing :
Embedded system functionality aspects Processing Transformation of data Implemented using processors Storage Retention of data Implemented using memory Communication Transfer of data between processors and memories Implemented using buses Called interfacing
A simple bus
Wires: Uni-directional or bi-directional One line may represent multiple wires Bus Set of wires with a single function Address bus, data bus Or, entire collection of wires Address, data and control Associated protocol: rules for communication
ECE, SJBIT
110
06EC82
Ports
Conducting device on periphery Connects bus to processor or memory Often referred to as a pin Actual pins on periphery of IC package that plug into socket on printed-circuit board Sometimes metallic balls instead of pins Today, metal pads connecting process ors and memories within single IC Single wire or set of wires with single function E.g., 12-wire address port
Timing Diagrams
Most common method for describing a communication protocol Time proceeds to the right on x-axis Control signal: low or high May be active low (e.g., go, /go, or go_L) Use terms assert (active) and deassert Asserting go means go=0 Data signal: not valid or valid Protocol may have subprotocols Called bus cycle, e.g., read and write Each may be several clock cycles Read example
ECE, SJBIT
111
06EC82
rd/wr set low,address placed on addr for at least tsetup time before enable asserted, enable triggers memory to place data on data wires by time tread
ECE, SJBIT
112
06EC82
A strobe/handshake compromise
ECE, SJBIT
113
06EC82
ECE, SJBIT
114
06EC82
Compromises/extensions
Parallel I/O peripheral When processor only supports bus-based I/O but parallel I/O needed Each port on peripheral connected to a register within peripheral that is read/written by the processor Extended parallel I/O When processor supports port-based I/O but more ports needed One or more processor ports interface with parallel I/O peripheral extending total number of ports available for I/O e.g., extending 4 ports to 6 ports in figure
ECE, SJBIT
115
06EC82
ECE, SJBIT
116
06EC82
ISA bus
ISA supports standard I/O /IOR distinct from /MEMR for peripheral read /IOW used for writes 16-bit address space for I/O vs. 20-bit address space for memory Otherwise very similar to memory protocol
ECE, SJBIT
117
06EC82
Generates control signals to drive the TC55V2325FF memory chip in burst mode Addr0 is the starting address input to device GO is enable/disable input to device
ECE, SJBIT
118
06EC82
What is the address (interrupt address vector) of the ISR? Fixed interrupt Address built into microprocessor, cannot be changed Either ISR stored at address or a jump to actual ISR stored if not enough bytes available Vectored interrupt Peripheral must provide the address Common when microprocessor has multiple peripherals connected by a system bus Compromise: interrupt address table
ECE, SJBIT
119
06EC82
120
06EC82
Jump to ISR Some microprocessors treat jump same as call of any subroutine Complete state saved (PC, registers) may take hundreds of cycles Others only save partial state, like PC only Thus, ISR must not modify registers, or else must save them first Assembly-language programmer must be aware of which registers stored
ECE, SJBIT
121
06EC82
ECE, SJBIT
122
06EC82
1. Microprocessor is executing its program. 2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts Ireq2.
ECE, SJBIT
123
06EC82
3. Priority arbiter sees at least one Ireq input asserted, so asserts Int. 4. Microprocessor stops executing its program and stores its state. 5. Microprocessor asserts Inta. 6. Priority arbiter asserts Iack1 to acknowledge Peripheral1. 7. Peripheral1 puts its interrupt address vector on the system bus 8. Microprocessor jumps to the address of ISR read from data bus, ISR executes and returns (and completes handshake with arbiter). 9. Microprocessor resumes executing its program Arbitration: Priority arbiter Types of priority Fixed priority each peripheral has unique rank highest rank chosen first with simultaneous requests preferred when clear difference in rank between peripherals Rotating priority (round-robin) priority changed based on history of servicing better distribution of servicing especially among peripherals with similar priority demand Arbitration: Daisy-chain arbitration Arbitration done by peripherals Built into peripheral or external logic added req input and ack output added to each peripheral Peripherals connected to each other in daisy-chain manner One peripheral connected to resource, all others connected upstream Peripherals req flows downstream to resource, resources ack flows upstream to requesting peripheral Closest peripheral has highest priority
ECE, SJBIT
124
06EC82
Arbitration: Daisy-chain arbitration Pros/cons Easy to add/remove peripheral - no system redesign needed Does not support rotating priority One broken peripheral can cause loss of access to other peripherals
ECE, SJBIT
125
06EC82
Network-oriented arbitration When multiple microprocessors share a bus (sometimes called a network) Arbitration typically built into bus protocol Separate processors may try to write simultaneously causing collisions Data must be resent Dont want to start sending again at same time statistical methods can be used to reduce chances Typically used for connecting multiple distant chips Trend use to connect multiple on-chip processors
126
06EC82
Typically industry standard bus (ISA, PCI) for portability Bridge Single-purpose processor converts communication between busses
ECE, SJBIT
127
06EC82
Parallel communication
Multiple data, control, and possibly power wires One bit per wire High data throughput with short distances Typically used when connecting devices on same IC or same circuit board Bus must be kept short long parallel wires result in high capacitance values which requires more time to charge/discharge Data misalignment between wires increases as length increases Higher cost, bulky
Serial communication
Single data wire, possibly also control and power wires Words transmitted one bit at a time Higher data throughput with long distances Less average capacitance, so more bits per unit of time Cheaper, less bulky More complex interfacing logic and communication protocol Sender needs to decompose word into bits Receiver needs to recompose bits into word Control signals often sent on same wire as data increasing protocol complexity
Wireless communication
Infrared (IR) Electronic wave frequencies just below visible light spectrum Diode emits infrared light to generate signal Infrared transistor detects signal, conducts when exposed to infrared light Cheap to build Need line of sight, limited range Radio frequency (RF)
ECE, SJBIT
128
06EC82
Electromagnetic wave frequencies in radio spectrum Analog circuitry and antenna needed on both sides of transmission Line of sight not needed, transmitter power determines range
Serial protocols: I2C I2C (Inter-IC) Two-wire serial bus protocol developed by Philips Semiconductors nearly 20 years ago Enables peripheral ICs to communicate using simple communication hardware Data transfer rates up to 100 kbits/s and 7-bit addressing possible in normal mode 3.4 Mbits/s and 10-bit addressing in fast-mode Common devices capable of interfacing to I2C bus: EPROMS, Flash, and some RAM memory, real-time clocks, watchdog timers, and microcontrollers
ECE, SJBIT
129
06EC82
130
06EC82
ECE, SJBIT
131
06EC82
ECE, SJBIT
132
06EC82
ECE, SJBIT
133
06EC82
ECE, SJBIT
134
06EC82
Q1. Explain the features of flash memory, SRAM and OTP RAM. OTP ROM: One-time programmable ROM
Connections programmed after manufacture by user user provides file of desired contents of ROM file input to machine called ROM programmer each programmable connection is a fuse ROM programmer blows fuses where connections should not exist Very low write ability typically written only once and requires ROM programmer device Very high storage permanence bits dont change unless reconnected to programmer and more fuses blown Commonly used in final products cheaper, harder to inadvertently modify
EEPROM: Electrically erasable programmable ROM: Programmed and erased electronically typically by using higher than normal voltage can program and erase individual words Better write ability can be in-system programmable with built-in circuit to provide higher than normal voltage built-in memory controller commonly used to hide details from memory user writes very slow due to erasing and programming busy pin indicates to processor EEPROM still writing can be erased and programmed tens of thousands of times Similar storage permanence to EPROM (about 10 years) Far more convenient than EPROMs, but more expensive
ECE, SJBIT
135
06EC82
Flash Memory: Extension of EEPROM Same floating gate principle Same write ability and storage permanence Fast erase Large blocks of memory erased at once, rather than one word at a time Blocks typically several thousand bytes large Writes to single words may be slower Entire block must be read, word updated, then entire block written back Used with embedded systems storing large data items in nonvolatile memory e.g., digital cameras, TV set-top boxes, cell phones RAM: Random-access memory Typically volatile memory bits are not held without power supply Read and written to easily by embedded system during execution Internal structure more complex than ROM a word consists of several memory cells, each storing 1 bit each input and output data line connects to each cell in its column rd/wr connected to every cell when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit when rd/wr indicates read
ECE, SJBIT
136
06EC82
ECE, SJBIT
137
06EC82
Ram variations:
PSRAM: Pseudo-static RAM DRAM with built-in memory refresh controller Popular low-cost high-density alternative to SRAM NVRAM: Nonvolatile RAM Holds data after external power removed Battery-backed RAM SRAM with own permanently connected battery writes as fast as reads no limit on number of writes unlike nonvolatile ROM-based memory SRAM with EEPROM or flash stores complete RAM contents on EEPROM or flash before power turned off
Low-cost low-capacity memory devices Commonly used in 8-bit microcontroller-based embedded systems First two numeric digits indicate device type RAM: 62 ROM: 27 Subsequent digits indicate capacity in kilobits Example: TC55V2325FF-100 memory device 2-megabit synchronous pipelined burst SRAM memory device Designed to be interfaced with 32-bit processors Capable of fast sequential reads and writes as well as single byte I/O
Q2. Describe set associative cache mapping technique. What are its merits and demerits?
Cache:
ECE, SJBIT
138
06EC82
Usually designed with SRAM faster but more expensive than DRAM Usually on same chip as processor space limited, so much smaller than off-chip main memory faster access ( 1 cycle vs. several cycles for main memory) Cache operation: Request for main memory access (read or write) First, check cache for copy cache hit copy is in cache, quick access cache miss copy not in cache, read address and possibly its neighbors into cache Several cache design choices cache mapping, replacement policies, and write techniques
Cache mapping
Far fewer number of available cache addresses Are address contents in cache? Cache mapping used to assign main memory address to cache address and determine hit or miss Three basic techniques: Direct mapping Fully associative mapping Set-associative mapping Caches partitioned into indivisible blocks or lines of adjacent memory addresses usually 4 or 8 addresses per line
Direct mapping
ECE, SJBIT
139
06EC82
Main memory address divided into 2 fields Index cache address number of bits determined by cache size Tag compared with tag stored in cache at address indicated by index if tags match, check valid bit Valid bit indicates whether data in slot has been loaded from memory Offset used to find particular word in cache line
ECE, SJBIT
140
06EC82
Set-associative mapping
Compromise between direct mapping and fully associative mapping Index same as in direct mapping But, each cache address contains content and tags of 2 or more memory address locations Tags of that set simultaneously compared as in fully associative mapping Cache with set size N called N-way set-associative 2-way, 4-way, 8-way are common
Cache-replacement policy
ECE, SJBIT
141
06EC82
Technique for choosing which block to replace when fully associative cache is full when set-associative caches line is full Direct mapped cache has no choice Random replace block chosen at random LRU: least-recently used replace block not accessed for longest time FIFO: first-in-first-out push block onto queue when accessed choose block to replace by popping queue
142
06EC82
Data block size Larger caches achieve lower miss rates but higher access cost e.g., 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles (improvement) 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles (worse)
Q3. Explain the different protocol concepts and control methods.
ECE, SJBIT
143
06EC82
A strobe/handshake compromise
Q4. Which are the two types of bus based I/O ?Explain.
ECE, SJBIT
144
06EC82
Q5.How does the data get transferred from peripheral to memory withour DMA and with DMA explain with diagram
ECE, SJBIT
145
06EC82
ECE, SJBIT
146
06EC82
Arbitration: Daisy-chain arbitration Arbitration done by peripherals Built into peripheral or external logic added req input and ack output added to each peripheral Peripherals connected to each other in daisy-chain manner One peripheral connected to resource, all others connected upstream Peripherals req flows downstream to resource, resources ack flows upstream to requesting peripheral Closest peripheral has highest priority
ECE, SJBIT
147
06EC82
Arbitration: Daisy-chain arbitration Pros/cons Easy to add/remove peripheral - no system redesign needed Does not support rotating priority One broken peripheral can cause loss of access to other peripherals
ECE, SJBIT
148
06EC82
Network-oriented arbitration When multiple microprocessors share a bus (sometimes called a network) Arbitration typically built into bus protocol Separate processors may try to write simultaneously causing collisions Data must be resent Dont want to start sending again at same time statistical methods can be used to reduce chances Typically used for connecting multiple distant chips Trend use to connect multiple on-chip processors
Serial protocols: I2C I2C (Inter-IC) Two-wire serial bus protocol developed by Philips Semiconductors nearly 20 years ago Enables peripheral ICs to communicate using simple communication hardware Data transfer rates up to 100 kbits/s and 7-bit addressing possible in normal mode 3.4 Mbits/s and 10-bit addressing in fast-mode
ECE, SJBIT
149
06EC82
Common devices capable of interfacing to I2C bus: EPROMS, Flash, and some RAM memory, real-time clocks, watchdog timers, and microcontrollers
150
06EC82
Capable of supporting a LAN similar to Ethernet 64-bit address: 10 bits for network ids, 1023 subnetworks 6 bits for node ids, each subnetwork can have 63 nodes 48 bits for memory address, each node can have 281 terabytes of distinct locations
ECE, SJBIT
151
06EC82
Later extended to 64-bit while maintaining compatibility with 32-bit schemes Synchronous bus architecture Multiplexed data/address lines
152
06EC82
ECE, SJBIT
153
06EC82
PART - B UNIT - 5 INTERRUPTS: Basics - Shared Data Problem - Interrupt latency. Survey of Software Architecture, Round Robin, Round Robin with Interrupts - Function Queues - scheduling - RTOS architecture. 8 Hours
TEXT BOOKS: 1. An Embedded software Primer - David E. Simon: Pearson Education, 1999 REFERENCE BOOKS: 1. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 2. Embedded Systems Architecture A Comprehensive Guide for Programmers, Tammy Noergaard, Elsevier Publication, 2005 3. Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
Engineers
and
ECE, SJBIT
154
06EC82
ECE, SJBIT
155
06EC82
The interrupt routine can suspend the main loop and execute at any time. Consider an interrupt that occurs between the calculations of delta and offset: On the return from interrupt, the data ADC_channel[0-2] may result in an unintended value being assigned to the calculated variable offset if the values have changed from the previous data acquisition. More subtly, the calculation of delta may also be affected because, as well see, even a single line of code may be interrupted.
The important point about assembly instructions with respect to shared data is that they are atomic, that is, an assembly instruction, because it is implementing fundamental machine operations (data moves between registers, and memory), cannot be interrupted. Now, consider the following assembler instructions with the equivalent C code temp = temp - offset:
ECE, SJBIT
156
06EC82
lwz r5, 0(r10); Read temp stored at 0(r10) and put it in r5 li r6, offset; Put offset value into r6 sub r4, r5, r6; Subtract the offset and put the result into r4 stwz r4, 0(r10); Store the result back in memory Thus, our single line of C code gets compiled into multiple lines of assembler. Consequently, whereas a single line of atomic assembler cannot be interrupted, one line of C code can be. This means that our pseudocode fragment void main(void) { while(TRUE) { ... delta = ADC_channel[0]-ADC_channel[1]; offset = delta*ADC_channel[2] ... } }
This can be interrupted anywhere. In particular, it can be interrupted in the middle of the delta calculation, with the result that the variable may be determined with one new and one old data value; undoubtedly not what the programmer intended. We shall refer to a section of code that must be atomic to execute correctly as a critical section. It is incumbent upon the programmer to protect critical code sections to maintain data coherency. All microprocessors implement instructions to enable and disable interrupts, so the obvious approach is to simply not permit critical sections to be interrupted:
void main(void) { while(TRUE) { ... disable() delta = ADC_channel[0]-ADC_channel[1]; offset = delta*ADC_channel[2] enable() ... } }
It must be kept in mind that code in the interrupt service routine has high priority for a reason something needs to be done immediately. Consequently, its important to disable interrupts sparingly, and only when absolutely necessary (and, naturally, to remember to enable interrupts again after the section of critical code). Other methods of maintaining data coherency will be discussed in the section on real-time operating systems.
Interrupt latency
How fast will a system react to interrupts? Depends on:
ECE, SJBIT
157
06EC82
1. Max. time while IT-s are disabled. 2. Max. time taken to execute higher priority IT-s. 3. Time taken by ISR invocation (context save, etc.) and return (context restore) 4. Work time in ISR to generate a response. Values: For 3: see processor docs. Others: count instructions does not work well for processors with cache! General rule: WRITE SHORT IT SERVICE ROUTINES!
Disabling Interrupts
Example system: Must disable IT-s for 125uS to process pressure variables. Must disable IT-s for 250uS to manage timer Must respond to a network IT within 600uS, the network ISR takes 300uS to execute
158
06EC82
Software architecture, according to ANSI/IEEE Standard 1471-2000, is defined as the fundamental organization of a system, embodied in its components, their relationships to each other and the environment, and the principles governing its design and evolution. Embedded software, as weve said, must interact with the environment through sensors and actuators, and often has hard, real-time constraints. The organization of the software, or its architecture, must reflect these realities. Usually, the critical aspect of an embedded control system is its speed of response which is a function of (among other things) the processor speed and the number and complexity of the tasks to be accomplished, as well as the software architecture. Clearly,embedded systems with not much to do, and plenty of time in which to do it, can employ a simple software organization (a vending machine, for example, or the power seat in your car). Systems that must respond rapidly to many different events with hard real-time deadlines generally require a more complex software architecture (the avionics systems in an aircraft, engine and transmission control, traction control and antilock brakes in your car). Most often, the various tasks managed by an embedded system have different priorities: Some things have to be done immediately (fire the spark plug precisely 20 before the piston reaches top-dead-center in the cylinder), while other tasks may have less severe time constraints. Round robin Round robin with interrupts Function queue scheduling Real time operating systems (RTOS)
Round Robin
The simplest possible software architecture is called round robin.2 Round robin architecture has no interrupts; the software organization consists of one main loop wherein the processor simply polls eachattached device in turn, and provides service if any is required. After all devices have been serviced, start over from the top. Graphically, round robin looks like Figure 1. Round robin pseudocode looks something like this:
ECE, SJBIT
159
06EC82
One can think of many examples where round robin is a perfectly capable architecture: A vending machine, ATM, or household appliance such as a microwave oven (check for a button push, decrement timer, update display and start over). Basically, anything where the processor has plenty of time to get around the loop, and the user wont notice the delay (usually micro-seconds) between a request for service and the processor response (the time between pushing a button on your microwave and the update of the display, for example). The main advantage to round robin is that its very simple, and often its good enough. On the other hand, there are several obvious disadvantages. If a device has to be serviced in less time than it takes the processor to get around the loop, then it wont work. In fact, the worst case response time for round robin is the sum of the execution times for all of the task code. Its also fragile: suppose you added one more device, or some additional processing to a loop that was almost at its chronometric limit then you could be in trouble.3 Some additional performance can be coaxed from the
ECE, SJBIT
160
06EC82
round robin architecture, however. If one or more tasks have more stringent deadlines than the others (they have higher priority),they may simply be checked more often:
ECE, SJBIT
161
06EC82
The obvious advantage to round robin with interrupts is that the response time to high-priority tasks is improved, since the ISR always has priority over the main loop (the main loop will always stop whatever its doing to service the interrupt), and yet it remains fairly simple. The worst case response time for a low priority task is the sum of the execution times for all of the code in the main loop plus all of the interrupt service routines. With the introduction of interrupts, the problem of shared data may arise: As in the previous example, if the interrupted low priority function is in the middle of a calculation using data that are supplied or modified by the high priority interrupting function, care must be taken that on the return from interrupt the low priority function data are still valid (by disabling interrupts around critical code sections, for example).
162
06EC82
order of the function in the queue theres no reason that functions have to be placed in the queue in the order in which the interrupt occurred. They may just as easily be placed in the queue in priority order: high priority functions at the top of the queue, and low priority functions at the bottom. The worst case timing for the highest priority function is the execution time of the longest function in the queue (think of the case of the processor just starting to execute the longest function right before an interrupt places a high priority task at the front of the queue). The worst case timing for the lowest priority task is infinite: it may never get executed if higher priority code is always being inserted at the front of the queue. The advantage to function queue scheduling is that priorities can be assigned to tasks; the disadvantages are that its more complicated than the other architectures discussed previously, and it may be subject to shared data problems.
ECE, SJBIT
163
06EC82
Blocked: A task may be blocked waiting for data or for an event to occur. A task, if it is not preempted, will block after running to completion. Many tasks may be blocked at one time. The part of the RTOS called a scheduler keeps track of the state of each task, and decides which one should be running. The scheduler is a simple-minded device: It simply looks at all the tasks in the ready state and chooses the one with the highest priority. Tasks can block themselves if they run out of things to do, and they can unblock and become ready if an event occurs, but its the job of the scheduler to move tasks between the ready and running states based on their priorities. Since only one of the tasks can possess the semaphore at any time, coherency is assured by taking and releasing a semaphore around the shared data: If the 10ms task attempts to take the semaphore before the 50ms task has released it, the faster task will block until the semaphore is available. Problems, however,may arise if care is not taken in the use of semaphores. Specifically, priority inversion and deadlock. Priority inversion, as the name implies, refers to a situation in which a semaphore inadvertently causes a high priority task to block while lower priority tasks run to completion. Consider the case where a high priority task and a 7 low priority task share a semaphore, and there are tasks of intermediate priority between them . Initially, the low priority task is running and takes a semaphore; all other tasks are blocked. Should the high priority task unblock and attempt to take the semaphore before the low priority task releases it, it will block again until the semaphore is available. If, in the meantime, intermediate priority tasks have unblocked, the simple-minded RTOS will run each one in priority order, completing all the intermediate priority tasks before finally running the low priority function to the point where it gives up its semaphore, permitting the high priority task to run again. The task priorities have been inverted: all of the lower priority tasks have run before the highest priority task gets to complete.
ECE, SJBIT
164
06EC82
Different real-time operating systems employ different algorithms, or resource access protocols, to request and release semaphores in order to avoid priority inversion. A common method is called priority inheritance. In this protocol, whenever a lower priority task blocks a higher priority task, it inherits the priority of the blocked task. Reconsider our priority inversion problem, this time with priority inheritance protocol as illustrated in Figure 5. Once again, the low priority task is running and takes a semaphore; all other tasks are blocked, and again the high priority task unblocks and attempts to take the semaphore before the low priority task has released it, blocking again until the semaphore is available. In the meantime the intermediate priority tasks have unblocked, but with priority inheritance, the low priority task has inherited
ECE, SJBIT
165
06EC82
the priority of the blocked high priority task. Consequently, the RTOS will schedule the blocking task with its promoted priority first, which runs until the semaphore is released, at which time the high priority task takes the semaphore and runs, and the promoted task is reassigned its initial low priority. Consequently,all tasks will run in the correct priority order. Note that if the high priority task accesses multiple shared resources (that is, there is more than one semaphore), it may potentially block as many times as there are semaphores. Furthermore, priority inheritance protocol does nothing to mitigate deadlock. A more complex algorithm is priority ceiling protocol. In priority ceiling protocol, each task is assigned a static priority, and each semaphore, or resource is assigned a ceiling priority greater than or equal to the maximum priority of all the tasks that use it. At run time, a task assumes a priority equal to the static priority or the ceiling value of its resource, whichever is larger: if a task requires a resource, the priority of the task will be raised to the ceiling priority of the resource; when the task releases the resource, the priority is reset. It can be shown that this scheme minimizes the time that the highest priority task will be blocked, and eliminates the potential of deadlock.
ECE, SJBIT
166
06EC82
There are methods other than reliance on RTOS resource access protocols to assure data coherency. Though often used, such methods do not, in general, constitute good programming practice. If the shared data consist of only a single variable, a local copy may be assigned in the low priority task, thus assuring its integrity.
ECE, SJBIT
167
06EC82
1. 2. 3. 4.
Explain the microprocessor architecture. With a block diagram explain the interrupt hardware. How are the interrupts disabled. How does the microprocessor know where to find the interrupt routine when the interrupts occurs. 5. Describe shared data problem with an example. Show how disable/enable interrupt can be used for solving this problem. 6. Explain about the Atomic and critical section in interrupts . 7. Explain interrupt handling procedure , context switching and critical section. 8. Define interrupt latency and how to make interrupt routines short. 9. Explain the steps to disable interrupts. 10. Describe the round robin architecture for a communication bridge. 11. Explain function queue scheduling. 12. Explain the priority levels for RTOS architecture. 13. How to select an architecture for a system.
ECE, SJBIT
168
06EC82
INTERRUPTS
The interrupt routine can suspend the main loop and execute at any time. Consider an interrupt that occurs between the calculations of delta and offset: On the return from interrupt, the data ADC_channel[0-2] may result in an unintended value being assigned to the calculated variable offset if the values have changed from the previous data acquisition. More subtly, the calculation of delta may also be affected because, as well see, even a single line of code may be interrupted.
Round Robin
ECE, SJBIT
169
06EC82
The simplest possible software architecture is called round robin. 2 Round robin architecture has no interrupts; the software organization consists of one main loop wherein the processor simply polls eachattached device in turn, and provides service if any is required. After all devices have been serviced, start over from the top. Graphically, round robin looks like Figure 1. Round robin pseudocode looks something like this:
One can think of many examples where round robin is a perfectly capable architecture: A vending machine, ATM, or household appliance such as a microwave oven (check for a button push, decrement timer, update display and start over). Basically, anything where the processor has plenty of time to get around the loop, and the user wont notice the delay (usually micro-seconds) between a request for service and the processor response (the time between pushing a button on your microwave and the update of the display,
ECE, SJBIT
170
06EC82
for example). The main advantage to round robin is that its very simple, and often its good enough. On the other hand, there are several obvious disadvantages. If a device has to be serviced in less time than it takes the processor to get around the loop, then it wont work. In fact, the worst case response time for round robin is the sum of the execution times for all of the task code. Its also fragile: suppose you added one more device, or some additional processing to a loop that was almost at its chronometric limit then you could be in trouble.3 Some additional performance can be coaxed from the round robin architecture, however. If one or more tasks have more stringent deadlines than the others (they have higher priority),they may simply be checked more often:
ECE, SJBIT
171
06EC82
Initially, the low priority task is running and takes a semaphore; all other tasks are blocked. Should the high priority task unblock and attempt to take the semaphore before the low priority task releases it, it will block again until the semaphore is available. If, in the meantime, intermediate priority tasks have unblocked, the simple-minded RTOS will run each one in priority order, completing all the intermediate priority tasks before finally running the low priority function to the point where it gives up its semaphore, permitting the high priority task to run again. The task priorities have been inverted: all of the lower priority tasks have run before the highest priority task gets to complete.
ECE, SJBIT
172
06EC82
Different real-time operating systems employ different algorithms, or resource access protocols, to request and release semaphores in order to avoid priority inversion. A common method is called priority inheritance. In this protocol, whenever a lower priority task blocks a higher priority task, it inherits the priority of the blocked task. Reconsider our priority inversion problem, this time with priority inheritance protocol as illustrated in Figure 5. Once again, the low priority task is running and takes a semaphore; all other tasks are blocked, and again the high priority task unblocks and attempts to take the semaphore before the low priority task has released it, blocking again until the semaphore is available. In the meantime the intermediate priority tasks have
ECE, SJBIT
173
06EC82
unblocked, but with priority inheritance, the low priority task has inherited the priority of the blocked high priority task.
ECE, SJBIT
174
06EC82
UNIT 6
INTRODUCTION TO RTOS: MORE OS SERVICES Tasks - states - Data - Semaphores and shared data. More operating systems services Massage Queues - Mail Boxes -Timers Events - Memory Management. 8 Hours
TEXT BOOKS: 1. An Embedded software Primer - David E. Simon: Pearson Education, 1999 REFERENCE BOOKS: 1. 2. 3. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
ECE, SJBIT
175
06EC82
ECE, SJBIT
176
06EC82
ECE, SJBIT
177
06EC82
ECE, SJBIT
178
06EC82
ECE, SJBIT
179
06EC82
ECE, SJBIT
180
06EC82
ECE, SJBIT
181
06EC82
ECE, SJBIT
182
06EC82
6.2 Tasks 2 Reentrancy A function that works correctly regardless of the number of tasks that call it between interrupts Characteristics of reentrant functions Only access shared variable in an atomic-way, or when variable is on callees stack A reentrant function calls only reentrant functions A reentrant function uses system hardware (shared resource) atomically Inspecting code to determine Reentrancy: See Fig 6.9 Where are data stored in C? Shared, non-shared, or stacked?
ECE, SJBIT
183
06EC82
See Fig 6.10 Is it reentrant? What about variable fError? Is printf reentrant? If shared variables are not protected, could they be accessed using single assembly instructions (guaranteeing non-atomicity)?
ECE, SJBIT
184
06EC82
6.3 Semaphores and Shared Data A new tool for atomicity Semaphore a variable/lock/flag used to control access to shared resource (to avoid shared-data problems in RTOS) Protection at the start is via primitive function, called take, indexed by the semaphore Protection at the end is via a primitive function, called release, also indexed similarly Simple semaphores Binary semaphores are often adequate for shared data problems in RTOS (See Fig 6.12 and Fig 6.13)
ECE, SJBIT
185
06EC82
ECE, SJBIT
186
06EC82
ECE, SJBIT
187
06EC82
ECE, SJBIT
188
06EC82
ECE, SJBIT
189
06EC82
6.3 Semaphores and Shared Data 2 Reentrancy, Semaphores, Multiple Semaphores, Device Signaling, Fig 6.15 a reentrant function, protecting a shared data, cErrors, in critical section Each shared data (resource/device) requires a separate semaphore for individual protection, allowing multiple tasks and data/resources/devices to be shared exclusively, while allowing efficient implementation and response time Fig 6.16 example of a printer device signaled by a report-buffering task, via semaphore signaling, on each print of lines constituting the formatted and buffered report
ECE, SJBIT
190
06EC82
ECE, SJBIT
191
06EC82
ECE, SJBIT
192
06EC82
6.3 Semaphores and Shared Data 3 Semaphore Problems Messing up with semaphores The initial values of semaphores when not set properly or at the wrong place The symmetry of takes and releases must match or correspond each take must have a corresponding release somewhere in the ES application Taking the wrong semaphore unintentionally (issue with multiple semaphores) Holding a semaphore for too long can cause waiting tasks deadline to be missed Priorities could be inverted and usually solved by priority inheritance/promotion (See Fig 6.17) Causing the deadly embrace problem (cycles) (See Fig 6.18)
ECE, SJBIT
193
06EC82
ECE, SJBIT
194
06EC82
6.3 Semaphores and Shared Data 4 Variants: Binary semaphores single resource, one-at-a time, alternating in use (also for resources) Counting semaphores multiple instances of resources, increase/decrease of integer semaphore variable Mutex protects data shared while dealing with priority inversion problem Summary Protecting shared data in RTOS Disabling/Enabling interrupts (for task code and interrupt routines), faster
ECE, SJBIT
195
06EC82
Taking/Releasing semaphores (cant use them in interrupt routines), slower, affecting response times of those tasks that need the semaphore Disabling task switches (no effect on interrupt routines), holds all other tasks response PART B IN UNIT 6 7.0 MORE OS SERVICES 7.1 Message Queues, Mailboxes and Pipes Basic techniques for inter-task communication and data sharing are: interrupt enable/disable and using semaphores. E.g., the tank monitoring tasks and serial port and printer handling tasks Others supported by RTOS: Message Queues, Mailboxes and Pipes Example of Message Queue: (See Fig 7.1) Task1 and Task2 (guaranteed to be reentrant) compute separate functions Use services of vLogError and ErrorsTask (vLogError enqueues errors for ErrorsTask to process) vLogError is supported by AddToQueue function, which keeps a queue of integers for the RTOS to interpret or map to errortype. Using the ReadFromQueue function, the RTOS then activates ErrorTask to handle the error if the queue is not empty freeing Task1 and Task2 to continue their tasks. Functions AddToQueue and ReadFromQueue are nonreentrant, and the RTOS switches between Task1 and Task2 in the middle of their tasks execution are guaranteed to be ok
ECE, SJBIT
196
06EC82
ECE, SJBIT
197
06EC82
7.1 Message Queues, Mailboxes, and Pipes 1 Difficulties in using Queues: Queue initialization (like semaphore initialization) must be dedicated to a separate task to a) guarantee correct start-up values and b) avoid uncertainty about task priorities and order of execution which might affect the queues content Queues must be tagged (identify which queue is referenced) Need code to manage the queue (when full and empty) if RTOS doesnt block reading/writing task on empty/full, plus returning an error code RTOS may limit the amount of info to write/read to queue in any single call (See Fig 7.2)
ECE, SJBIT
198
06EC82
ECE, SJBIT
199
06EC82
ECE, SJBIT
200
06EC82
Message Queues, Mailboxes, and Pipes Using Pointers and Queues Code in Fig 7.2 limits the amount of data to write to or read from the queue For tasks to communicate any amount of data, create a buffer and write the pointer to the buffer to the queue. (The receiving task reads/retrieves data from the buffer via the pointer, and frees the buffer space.) (See Fig 7.3)
ECE, SJBIT
201
06EC82
ECE, SJBIT
202
06EC82
7.1 Message Queues, Mailboxes, and Pipes Using Mailboxes: Purpose is similar to queues (both supporting asynchronous task communication) Typical RTOS function for managing mailboxes create, write, read, check-mail, destroy Variations in RTOS implementations of mailboxes Either a single-message mailbox or multi-message mailbox (set # entries at start) # of messages per mailbox could be unlimited, but total # in the system could be (with possibility of shuffling/distributing messages among mailboxes) Mailboxes could be prioritized Examples: (from the RTOS MultiTask! ) int sndmsg (unsigned int uMbid, void *p_vMsg, unsigned int uPriority); void *rcvmsg(unsigned int uMbid, unsigned int uTimeout); void *chkmsg(unsigned int uMbid);
Using Pipes: Pipes are implemented as (special) files, using normal file-descriptors RTOS can create, read from, write to, destroy pipes (typically: each pipe has 2 ends) Details of implementation depends on RTOS Pipes can have varying length messages (unlike fixed length for queues / mailboxes) Pipes could be byte-oriented and read/write by tasks depends on # bytes specified In standard C, read/write of pipes use fread/fwrite functions, respectively Programming queues, mailboxes, and pipes caution! Coding tasks to read from or write to intended structure (RTOS cant help on mismatch) Interpretation and processing of message types (see code segments on p. 182)
ECE, SJBIT
203
06EC82
Overflow of structure size could cripple the software, so need to set size as large as possible Passing pointers in structures provides unwanted opportunity to create shared data problem (See Fig 7.4)
ECE, SJBIT
204
06EC82
Timer Functions Issues: Embedded systems track time passage, hence, need to keep time (e.g., to save battery life, power need to be shut off automatically after, say, X seconds; a message send-task expects an ACK after Y seconds, it is delayed Y seconds and may retransmit; task is allowed a slice of time after which it is blocked) RTOS provides these timing services or functions (See Fig 7.5 VsWorks RTOS support for taskDelay(nticks) function in telephone call code)
ECE, SJBIT
205
06EC82
7.2 Timer Functions Issues: How long is delay measured in ticks (a tick is like a single heartbeat timer interrupt time) (See Fig 7.6) RTOS knowledge of time/timer and specifics of nticks or timeinterval relies on microprocessors hardware timer and its interrupt cycles (RTOS writers must know this!) OR RTOS writers write watchdog timers based on non-standard timer hardware and corresponding software interrupts called each time the software timer expires RTOS vendors provide board support packages (BSP) of drivers for timers and other hardware
ECE, SJBIT
206
06EC82
Length of a tick depends on the hardware timers design trade-off Accurate timing short tick intervals OR use dedicated timer for purpose
7.2 Timer Functions Other Timing Services (all based on system tick) Waiting time or delay on message, on a semaphore (but not too tight for high priority tasks to miss access to shared data) Place call to or activation of time -critical, high priority tasks inside timer interrupts or specialized-time-critical tasks inside the RTOS (Note: OS task have higher priority over other embedded software tasks). Calling a function of choice after some S nticks Example: (See Fig 7.7) The Timer Callback Function Note how wdStart function is passed a function vSetFrequency or vTurnOnTxorRx, associated nticks, and the parameter to the function. Also note how the vRadioControlTask communicates with vTurnOnTxorRx and vSetFrequency using the queue queueRadio and msgQreceive/msgQSend)
ECE, SJBIT
207
06EC82
ECE, SJBIT
208
06EC82
ECE, SJBIT
209
06EC82
7.3 Events In standard OS, an event is typically an indication which is related to time In RTOS, an event is a boolean flag, which is set and reset by tasks/routines for other tasks to wait on RTOS is supposed to manage several events for the waiting tasks. Blocked or waiting tasks are unblocked after the event occurrence, and the event is reset E.g., pulling the trigger of a cordless bar-code scanner sets the flag for a waiting task, which turns of the laser beam for scanning, to start running (See Fig 7.8 and Fig 7.9)
ECE, SJBIT
210
06EC82
ECE, SJBIT
211
06EC82
ECE, SJBIT
212
06EC82
7.3 Events 1 Features of events (and comparison with semaphores, queues, mbox, pipes): More than one task can wait on the same event (tasks are activated by priority) Events can be grouped, and tasks may wait on a subset of events in a group Resetting events is either done by the RTOS automatically or your embedded software Tasks can wait on only one semaphore, queue, mbox or pipe, but on many events simultaneously. Semaphores are faster, but unlike queues, mboxes, and pipes, they carry 1-bit info Queues, mboxes, and pipes are error prone and message posting/retrieval is compute-intensive 7.4 Memory Management In general RTOS offer C lang equivalent of malloc and free for MM, which are slow and unpredictable Real time system engineers prefer the faster and more predictable alloc/free functions for fixed size buffers. E.g., MultiTask! RTOS allocates pools of fixed size buffers, using getbuf() [with timed task blocking on no buffers] and reqbuf() [with no blocking and return of NULL pointer on no buffers] relbuf() to free buffers in a given pool (buffer pointer must be valid) Note that most embedded sw is integrated with the RTOS (same address space) and the ES starts the microprocessor; hence your ES must tell the memory-pool (See Fig 7.10 and Fig 7.11 high priority FormatTask and low priority OutputTask)
ECE, SJBIT
213
06EC82
ECE, SJBIT
214
06EC82
ECE, SJBIT
215
06EC82
7.5 Interrupt Routines in an RTOS Environment Rules that IRs must comply with (but not a task code) Rule 1: an IR cant call RTOS function that will cause it to blo ck, e.g., wait on semaphores, reading empty queues or mailboxes, wait on events to avoid high latency or large response time and potential deadlock (See Fig 7.12 which doesnt work; and Fig 7.13 which works using queues)
ECE, SJBIT
216
06EC82
ECE, SJBIT
217
06EC82
7.5 Interrupt Routines in an RTOS Environment 1 Rule 2: an IR cant call RTOS functions that will cause the RTOS to switch other tasks (except other IRs); breaking this rule will cause the RTOS to switch from the IR itself to handle the task, leaving the IR code incomplete or delay lower priority interrupts (See Fig 7.14 should-work case; and Fig 7.15 what really happens case)
ECE, SJBIT
218
06EC82
Let the RTOS intercept all the interrupts, aided by an RTOS function which tells the RTOS where the IRs are and the corresponding interrupt hardware The RTOS then activates the calling IR or the highest priority IR Control returns to the RTOS, and the RTOS scheduler decides which task gets the microprocessor (allowing the IR to run to completion) (See Fig 7.16)
7.5 Interrupt Routines in an RTOS Environment Second solution to Rule 2: Let the IR call a function in the RTOS to inform the RTOS of an interrupt After the IR is done, control goes back to the RTOS, where another function calls the scheduler to schedule the next task (See Fig 7.17) Third solution to Rule 2: Let RTOS maintain a separate queue of specialized, interrupt-supporting functions which are called by the IR (on the appropriate interrupt). When these functions complete, control goes back to that IR (similar to Fig 7.17 with queues)
ECE, SJBIT
219
06EC82
Interrupt Routines in an RTOS Environment Nested Interrupts If a running IR is interrupted by another (higher) priority interrupt (kind of interrupt stacking), the RTOS should unstack the IRs to allow all IRs to complete before letting the scheduler switch to any task code (See Fig 7.18)
ECE, SJBIT
220
06EC82
RECOMMENDED QUESTIONS UNIT -6 Introduction to RTOS and More operating systems services
1. What are the three states in a task. explain it with neat block diagram 2. Describe the use of take semaphore( ) and release semaphore( ) with an example . 3. Explain any 6 problems with semaphores. 4. Describe the use of message queues, mailbox and pipes. 5. Explain memory management in multitasking. 6. How does interrupt routines work in RTOS environment. 7. What are nested interrupts ? and how do they work?
ECE, SJBIT
221
06EC82
SOLUTION FOR UNIT 6 Q1. How does a microprocessor respond to a button under an RTOS.
Issue Scheduler/Task signal exchange for block-unblock of tasks via function calls Issue All tasks are blocked and scheduler idles forever (not desirable!) Issue Two or more tasks with same priority levels in Ready state (time-slice, FIFO) Example: scheduler switches from processor-hog vLevelsTask to vButtonTask (on user interruption by pressing a push-button), controlled by the main() which initializes the RTOS, sets priority levels, and starts the RTOS
ECE, SJBIT
222
06EC82
ECE, SJBIT
223
06EC82
Q3. What is semaphore? How does it help in shared data access along with code. Semaphore a variable/lock/flag used to control access to shared resource (to avoid shared-data problems in RTOS) Protection at the start is via primitive function, called take, indexed by the semaphore Protection at the end is via a primitive function, called release, also indexed similarly Simple semaphores Binary semaphores are often adequate for shared data problems in RTOS
ECE, SJBIT
224
06EC82
ECE, SJBIT
225
06EC82
226
06EC82
Priorities could be inverted and usually solved by priority inheritance/promotion Causing the deadly embrace problem (cycles) Q6. Describe the use of message queues. Basic techniques for inter-task communication and data sharing are: interrupt enable/disable and using semaphores. E.g., the tank monitoring tasks and serial port and printer handling tasks Others supported by RTOS: Message Queues, Mailboxes and Pipes Example of Message Queue: Task1 and Task2 (guaranteed to be reentrant) compute separate functions Use services of vLogError and ErrorsTask (vLogError enqueues errors for ErrorsTask to process) vLogError is supported by AddToQueue function, which keeps a queue of integers for the RTOS to interpret or map to errortype. Using the ReadFromQueue function, the RTOS then activates ErrorTask to handle the error if the queue is not empty freeing Task1 and Task2 to continue their tasks. Functions AddToQueue and ReadFromQueue are nonreentrant, and the RTOS switches between Task1 and Task2 in the middle of their tasks execution are guaranteed to be ok
ECE, SJBIT
227
06EC82
ECE, SJBIT
228
06EC82
229
06EC82
UNIT 7 & 8 Basic Design Using RTOS Principles- An example, Encapsulating semaphores and Queues. Hard real-time scheduling considerations Saving Memory space and power. Hardware software co-design aspects in embedded systems. 12 Hours
TEXT BOOKS: 1. Embedded System Design: A Unified Hardware/Software Introduction - Frank Vahid, Tony Givargis, John Wiley & Sons, Inc.2002 2. An Embedded software Primer - David E. Simon: Pearson Education, 1999
REFERENCE BOOKS: 1. Embedded Systems: Architecture and Programming, Raj Kamal, TMH. 2008 2. Embedded Systems Architecture A Comprehensive Guide for Engineers and Programmers, Tammy Noergaard, Elsevier Publication, 2005 3. Embedded C programming, Barnett, Cox & Ocull, Thomson (2005).
ECE, SJBIT
230
06EC82
231
06EC82
8.2 Principles Design considerations ES is interrupt-driven and ES remains dormant until Time passes for an event to occur (timer interrupt) A need for a response to an external request/interrupt arises Interrupts create cascade of events, causing RTOS tasks act/behave accordingly ES design technique: Create all needed tasks, get them into blocked-state or idle state waiting on interrupts (to be generated by an external event, e.g., frame-arrival at a network port) (See Fig 8.1 network port and serial port comm via tasks that implement DDP and ADSP protocol stack)
ECE, SJBIT
232
06EC82
8.2 Principles 1 Write Short IRs: Even lowest priority IRs are handled before the highest priority task code (minimize task code response time) IRs are error prone and hard to debug (due to hardware -dependent software parts) Parts IR code requiring immediate / quick response should be in the core of IR code; parts needing longer processing and not -so-urgent response should be done a task (signaled by the IR) 8.2 Principles 2 Consider the ff specs: A system responds to commands from a serial port All commands end with a carriage-return (CR) Commands arrive one at a time, the next arrives iff the preceding one is processed Serial ports buffer is 1 character long, and characters arrive quickly (at X bps) Systems processing time per character is Y char per second Three possible designs: A. Let IR handle everything => long response time, big IR code, hard to debug errors B. Let skeletal IR code, with a command parsing task that queues commands (with all the attendant message/data queuing problems C. Better compromise: Let IR save chars in a mailbox-buffer until CR, then the command parsing task can work on the buffer (See Fig 8.2 IR and parsing-task use different parts of the mail-buffer: tail and head)
ECE, SJBIT
233
06EC82
ECE, SJBIT
234
06EC82
8.2 Principles 3 Problem Decomposition into Tasks How many tasks? Considerations (+ if carefully decomposed and few tasks; and if theres no choice): +More tasks offer better control of overall response time +Modularity different task for different device handling or functionality +Encapsulation data and functionality can be encapsulated within responsible task - More tasks means data-sharing, hence more protection worries and long response time due to associated overheads - More task means intertask messaging, with overhead due to queuing, mailboxing, and pipe use - More tasks means more space for task stacks and messages - More tasks means frequent context switching (overhead) and less throughput - More tasks means frequent calls to the RTOS functions (major overhead adds up) 8.2 Principles 3 Priorities (advantage of using RTOS software architecture): Decomposing based on functionality and time criticality, separates ES components into tasks (naturally), for quicker response time using task prioritization high priority for time-critical ones, and low priority for others Encapsulating functionality in Tasks A dedicated task to encapsulate the handling of each shared device (e.g., printer display unit) or a common data structure (e.g., an error log) (See Fig 8.3) Parts of a target hardware storing data in a flash memory a single task encapsulates the handling of permission-to-write-to-flash (set / reset of flash at given times) (See Fig 8.4 using POSIX standard RTOS functions: mq_open, mq_receive, mq_send, nanosleep)
ECE, SJBIT
235
06EC82
ECE, SJBIT
236
06EC82
ECE, SJBIT
237
06EC82
ECE, SJBIT
238
06EC82
ECE, SJBIT
239
06EC82
ECE, SJBIT
240
06EC82
8.2 Principles 4 Other Tasks ? Need many small, simple tasks? But worry about data-sharing, intertask comm Need a task per stimuli? Same problems! Recommended Task Structure Modeled/Structured as State-Machines Tasks run in an infinite loop Tasks wait on RTOS for an event (expected in each tasks independent message queue) Tasks declare their own private data to use (fully encapsulated) Tasks block on in one place (RTOS signal), and not any other semaphore, no data sharing Tasks use no microprocessor time when their queues are empty 8.2 Principles 5 Avoid Creating and Destroying Tasks Creating tasks takes more system time Destroying tasks could leave destroy pointers-to-messages, remove semaphore others are waiting on (blocking them forever) Rule-of-thumb: Create all tasks needed at start, and keep them if memory is cheap! Turn Time-Slicing Off Useful in conventional OSs for fairness to user programs In ESs fairness is not an issue, response-time is! Time-slicing causes context switching time consuming and diminishes throughput Where the RTOS offers an option to turn time-slicing off, turn it off. 8.2 Principles 6 Restrict the use of RTOS functions/features Customize the RTOS features to your needs (Note: the RTOS and your ES gets linked and located together into same address space of ROM/RAM See Chapter 9)
ECE, SJBIT
241
06EC82
If possible write ES functions to interface with RTOS select features to minimize excessive calls to several RTOS functions (increases opportunity for errors) Develop a shell around the RTOS functions, and let your own ES tasks call the shell (and not the RTOS directly) improves portability since only the shell may be rewritten fro RTOS to RTOS
8.3 An Example Designing an Underground Tank Monitoring ES System Summary of Problem Specification: System of 8 underground tanks Measures read: temperature of gas (thermometer) read at any time float levels (float hardware) interrupted periodically by the microprocessor Calculate the number of gallons per tank using both measures Set an alarm on leaking tank (when level slowly and consistently falls over time) Set an alarm on overflow (level rising slowly close to full-level) User interface: a) 16-button control panel, LCD, thermal printer System can override user display options and show warning messages Histories of levels and temperature over time can be requested by user (30-50 lines long) and user can queue up several reports Issuing commands require 2 or 3 buttons, and system can prompt the display in the middle of a user command sequence Buttons interrupt the microprocessor One dedicated button turns alarm off (connected to the system) through software The printer prints one line at a time, and interrupts the microprocessor when done The LCD prints the most recent line; saves its display-data and doesnt need the microprocessor to retrieve info (See Fig 8.7)
ECE, SJBIT
242
06EC82
8.3 An Examples Issues that remain incomplete specs: What is displayed? Timing info? Print-line length? How often is float-level read? What is the response time on push-button user interface response? Printer speed number of lines per second? What is the microprocessor speed? Which kind, 8-bit? The time to set/reset alarm? Compute-time for # of gallons? 4-5 sec? (influences code design and tasking and kind of microprocessor if no calc is required to set overflow alarm, that saves time!) Knowing # gallons, what is the tolerant time-interval, or responsetime, to set alarm?
ECE, SJBIT
243
06EC82
Is reading a pair of temperature and float-level data for one tank at a time? How is software interface to alarm-set off done write a bit flag to memory or power cutoff to the alarm device Does the microprocessor come with a timer? 8.3 An Example Which Architecture? If RTOS, meeting deadlines depends on dealing with the 4-5 secs time required to calculate the # of gallons requires task suspensions, perhaps, with less IRs usage; and above all, the microprocessor must support some RTOS If not RTOS, meeting deadlines requires the use of several interrupts (and IRs) BASIC DESIGN OF AN EMBEDDED SOFTWARE (ES) USING RTOS An Example System Decomposition for Tasks One low priority task that handles all # gallons calculations and detects leaks as well (for all tanks 1 at a time) A high priority overflow-detection task (higher than a leak-detection task) A high priority float-hardware task, using semaphores to make the level-calc and overflow-detection task wait on it for reading (semaphores will be simpler, faster than queuing requests to read levels) A high priority button handling tasks need a state-machine model (an IR? with internal static data structures, a simple wait on buttonsignal, and an action which is predicated on sequence of button signals) since semaphores wont work (See Fig 8.8) A high priority display task to handle contention for LCD use [Turning the alarm bell on/off by the level-calc, overflow, and userbutton is typically non-contentious an atomic op hence do not
ECE, SJBIT
244
06EC82
need a separate alarm-bell task] However, need a module with BellOn(), BellOff() functions to encapsulate the alarm hardware Low priority task to handle report formatting (one line at a time), and handle report queue (See Table 8.2)
ECE, SJBIT
245
06EC82
8.3 An Example Moving System Forward Putting it together as Scenarios System is interrupt driven via interrupt routines responding to signals, activating tasks to their work User presses button, button hardware interrupts the microprocessor, the button IR sends message to button-handling task to interpret command, which activates display task or printer task Timer interrupts, timer IR -> signal to Overflow-Detection task
ECE, SJBIT
246
06EC82
Moving System Forward Putting it together as Scenarios 1 User presses printer button, print IR signals print-formatting task -> which sends first line to printer; printer interrupts for print IR to send next line to printer; when all lines (for report) are done, print IR signals print-formatting task for next report A level task need to read, it interrupts the level-read-hardware routine; the level is read by the hardware and the IR interrupts the task to read the new float level Dealing with Shared Level-Data: Three tasks need this data: level-calc for leak detection; display task; print formatting task Reading level data and processing it by given task takes a few msec or msec Use semaphores: let level-calc and display tasks read and process level in critical section (CS) and let formatting task copy level data in CS, release semaphore, and format outside CS See Fig 8.9
ECE, SJBIT
247
06EC82
8.4 Encapsulating Semaphores and Queues Encapsulating Semaphores: Dont assume that all tasks will use semaphore correctly (take/release), leading to errors Protect semaphores and associated data encapsulate/hide them in a task Let all tasks call a separate module (acting as an intermediary) to get to the CS - this separate module/function will in turn call the task which encapsulates the semaphore (See Fig 8.10 the correct code) (See Fig 8.11 the incorrect alternative, which bypasses the intermediate function
ECE, SJBIT
248
06EC82
ECE, SJBIT
249
06EC82
ECE, SJBIT
250
06EC82
ECE, SJBIT
251
06EC82
ECE, SJBIT
252
06EC82
ECE, SJBIT
253
06EC82
8.4 Encapsulating Semaphores and Queues Encapsulating Queues: Writing to or reading from a flash memory using queues to enqueue messages, the correctness of Fig 8.4 implementation depends passing the correct FLASH_MSG type Can a message meant for the FLASH be enqueued elsewhere Exposing the flash queue to inadvertent deletion or destruction Extra layer of data queue for holding data read from the FLASH could this auxiliary queue be referenced wrongly? Type compatible with the FLASH content? Solution Encapsulate the Flash Queue structure inside a separate module, flash.c; with access to it only through intermediate task vHandleFlashTask, which is supported by auxiliary functions vReadFlash and vWriteFlash. [The handle-task provides an interface for all other tasks to get to the queue] (See Fig 8.13)
ECE, SJBIT
254
06EC82
ECE, SJBIT
255
06EC82
ECE, SJBIT
256
06EC82
ECE, SJBIT
257
06EC82
ECE, SJBIT
258
06EC82
8.5 Hard Real-Time Scheduling Considerations Guaranteeing that the system will meet hard deadlines comes from writing fast code Issues: fast algorithms, efficient data structures, code in assembly (if possible) Characterizing real-time systems: Made of n tasks that execute periodically every Tn units of time Each task worst case execution time, Cn units of time and deadline of Dn Assume task switching time is 0 and non-blocking on semaphore Each task has priority Pn Question: SCn = S(Dn + Jn) < Tn, where Jn is some variability in tasks time Predicting Cn is very important, and depends on avoiding variability in execution times for tasks, functions, access time of data structures/buffers, semaphore blocking any operation that cant be done in the same time units on each execution/access 8.6 Saving Memory Space
ECE, SJBIT
259
06EC82
Considerations of limited memory space for ES systems Code is stored in ROM (loaded into RAM for execution), Data is stored in RAM (except for initialization/shadowing. The two memory space types are not interchangeable Trade-offs: packed data saves RAM space, but unpacking code takes ROM space Estimate space by: A. Tasks take stack space, fewer tasks take less RAM space, inspect code to estimate stack-bytes per task local variables, parameters, function nesting-level, worst-case nesting of interrupt routines, space for the RTOS (or select features) from the manual B. Experimental runs of the code not easy and wont reflect worstcase behavior 8.6 Saving Memory Space 1 Techniques / Suggestions: Substitute or eliminate large functions, watch for repeated calls to large functions Consider writing your own function to replace RTOS functions, watch RTOS functions that call several others Configure or customize the RTOS functions to suit only the needs of the ES Study assembly listing of cross-compilers, and rework your C code or write your own assembly unit/task Use static variable instead of relying on stack variables (push/pop and pointering takes space) Copy data structures passed to a function, via a pointer, into the functions local, static variables process the data and copy back into structures: trade-off code is slower For an 8-bit processor, use char instead of int variable (int takes 2bytes and longer in calculations than 1-byte chars) If ROM is really tight, experiment with coding most functions/tasks in assembly lang Saving Power Some embedded systems run on battery, turning battery off for some or all devices is good Generally, how to do you save power?
ECE, SJBIT
260
06EC82
Look for the power-saving modes (enablers) which the manufacturers provide Software can put microprocessor in one the modes via special instruction or writing a code to special register in the processor. The software must be fast!! Power saving modes: sleep, low-power, idle, standby, etc. Typical: uproc stops running, all built-in devices, and clock circuit (but leave static RAM power on since the wattage is very small) Waking uproc up is done by special circuitry and software (to avoid restart and reset write special code to RAM address and let software check if it is cold start or restart from power saving mode) Alternative: uproc stops running but all devices stay alive, uproc is resume by interrupt (this is less a hassle that stopping all devices) If software turns power of devices back on, status data for resumption must be in EEROM, and for those devices Turn off built-in devices that signal frequently from hi-low, low-hi power hungry!
ECE, SJBIT
261
06EC82
1. Explain the basic operation of telegraph system under embedded system . 2. How to avoid creating and destroying of tasks. 3. Explain underground tank monitoring system. 4. How do you encapsulate a semaphore. Explain 5. What are considerations in real time scheduling consideration. 6. How to save memory space in embedded system design. 7. Explain the techniques to save power.
ECE, SJBIT
262
06EC82
SOLUTION FOR UNIT 7 & 8 Q1.Explain the basic telegraph operation with block diagram.
263
06EC82
Encapsulating functionality in Tasks A dedicated task to encapsulate the handling of each shared device (e.g., printer display unit) or a common data structure (e.g., an error log) (See Fig 8.3) Parts of a target hardware storing data in a flash memory a single task encapsulates the handling of permission-to-write-to-flash (set / reset of flash at given times) (See Fig 8.4 using POSIX standard RTOS functions: mq_open, mq_receive, mq_send, nanosleep)
ECE, SJBIT
264
06EC82
ECE, SJBIT
265
06EC82
Q4. How to avoid creating and destroying of tasks. Avoid Creating and Destroying Tasks Creating tasks takes more system time Destroying tasks could leave destroy pointers-to-messages, remove semaphore others are waiting on (blocking them forever) Rule-of-thumb: Create all tasks needed at start, and keep them if memory is cheap! Turn Time-Slicing Off Useful in conventional OSs for fairness to user programs In ESs fairness is not an issue, response-time is! Time-slicing causes context switching time consuming and diminishes throughput Where the RTOS offers an option to turn time-slicing off, turn it off.
ECE, SJBIT
266
06EC82
ECE, SJBIT
267
06EC82
\ Q6. How to encapsulate queues. Encapsulating Queues: Writing to or reading from a flash memory using queues to enqueue messages, the correctness of Fig 8.4 implementation depends passing the correct FLASH_MSG type Can a message meant for the FLASH be enqueued elsewhere Exposing the flash queue to inadvertent deletion or destruction Extra layer of data queue for holding data read from the FLASH could this auxiliary queue be referenced wrongly? Type compatible with the FLASH content?
ECE, SJBIT
268
06EC82
Solution Encapsulate the Flash Queue structure inside a separate module, flash.c; with access to it only through intermediate task vHandleFlashTask, which is supported by auxiliary functions vReadFlash and vWriteFlash. [The handle-task provides an interface for all other tasks to get to the queue] (See Fig 8.13)
269
06EC82
Study assembly listing of cross-compilers, and rework your C code or write your own assembly unit/task Use static variable instead of relying on stack variables (push/pop and pointering takes space) Copy data structures passed to a function, via a pointer, into the functions local, static variables process the data and copy back into structures: trade-off code is slower For an 8-bit processor, use char instead of int variable (int takes 2bytes and longer in calculations than 1-byte chars) If ROM is really tight, experiment with coding most functions/tasks in assembly lang Saving Power Some embedded systems run on battery, turning battery off for some or all devices is good Generally, how to do you save power? Look for the power-saving modes (enablers) which the manufacturers provide Software can put microprocessor in one the modes via special instruction or writing a code to special register in the processor. The software must be fast!! Power saving modes: sleep, low-power, idle, standby, etc. Typical: uproc stops running, all built-in devices, and clock circuit (but leave static RAM power on since the wattage is very small) Waking uproc up is done by special circuitry and software (to avoid restart and reset write special code to RAM address and let software check if it is cold start or restart from power saving mode) Alternative: uproc stops running but all devices stay alive, uproc is resume by interrupt (this is less a hassle that stopping all devices) If software turns power of devices back on, status data for resumption must be in EEROM, and for those devices Turn off built-in devices that signal frequently from hi-low, low-hi power .
ECE, SJBIT
270