Unit2 Classnotes
Unit2 Classnotes
Usage of RTOS
Components of RTOS
2
Fig:1.1: Components of RTOS
The Scheduler: This component of RTOS tells that in which order, the tasks can be
executed which is generally based on the priority.
Function Library: It is an important element of RTOS that acts as an interface that helps
you to connect kernel and application code. This application allows you to send the
requests to the Kernel using a function library so that the application can give the desired
results.
Memory Management: this element is needed in the system to allocate memory to every
program, which is the most important element of the RTOS.
Fast dispatch latency: It is an interval between the termination of the task that can be
identified by the OS and the actual time taken by the thread, which is in the ready queue,
that has started processing.
User-defined data objects and classes: RTOS system makes use of programming languages
like C or C++, which should be organized according to their operation.
Types of RTOS
3
Hard Real Time :
In Hard RTOS, the deadline is handled very strictly which means that given task
must start executing on specified scheduled time, and must be completed within the
assigned time duration.
These type of RTOS also need to follow the deadlines. However, missing a deadline
may not have big impact but could cause undesired affects, like a huge reduction in quality
of a product.
Soft Real time RTOS, accepts some delays by the Operating system. In this type of
RTOS, there is a deadline assigned for a specific job, but a delay for a small amount of time
is acceptable. So, deadlines are handled softly by this type of RTOS.
Here, are essential factors that you need to consider for selecting RTOS:
Middleware: if there is no middleware support in Real time operating system, then the
issue of time-taken integration of processes occurs.
Error-free: RTOS systems are error-free. Therefore, there is no chance of getting an error
while performing the task.
Embedded system usage: Programs of RTOS are of small size. So we widely use RTOS for
embedded systems.
Maximum Consumption: we can achieve maximum Consumption with the help of RTOS.
4
Unique features: A good RTS should be capable, and it has some extra features like how it
operates to execute a command, efficient protection of the memory of the system, etc.
24/7 performance: RTOS is ideal for those applications which require to run 24/7.
5
Disadvantages of RTOS
RTOS system can run minimal tasks together, and it concentrates only on those
applications which contain an error so that it can avoid them.
RTOS is the system that concentrates on a few tasks. Therefore, it is really hard
for these systems to do multi-tasking.
Specific drivers are required for the RTOS so that it can offer fast response time
to interrupt signals, which helps to maintain its speed.
Plenty of resources are used by RTOS, which makes this system expensive.
The tasks which have a low priority need to wait for a long time as the RTOS
maintains the accuracy of the program, which are under execution.
Minimum switching of tasks is done in Real time operating systems.
It uses complex algorithms which is difficult to understand.
RTOS uses lot of resources, which sometimes not suitable for the system.
RTOS Architecture
For simpler applications, RTOS is usually a kernel but as complexity increases,
various modules like networking protocol stacks debugging facilities, device I/Os are
includes in addition to the kernel. The general architecture of RTOS is shown in the below
fig 1.2
6
Kernel
RTOS kernel acts as an abstraction layer between the hardware and the
applications. There are three broad categories of kernels
·
Monolithic kernel
Monolithic kernels are part of Unix-like operating systems like Linux, FreeBSD etc.
A monolithic kernel is one single program that contains all of the code necessary to
perform every kernel related task. It runs all basic system services (i.e. process and
memory management, interrupt handling and I/O communication, file system, etc) and
provides powerful abstractions of the underlying hardware. Amount of context switches
and messaging involved are greatly reduced which makes it run faster than microkernel.
· Microkernel
It runs only basic process communication (messaging) and I/O control. It normally
provides only the minimal services such as managing memory protection, Inter process
communication and the process management. The other functions such as running the
hardware processes are not handled directly by micro kernels. Thus, micro kernels
provide a smaller set of simple hardware abstractions. It is more stable than monolithic as
the kernel is unaffected even if the servers failed (i.e., File System). Micro kernels are part
of the operating systems like AIX, BeOS, Mach, Mac OS X, MINIX, and QNX. Etc
·
Hybrid Kernel
Hybrid kernels are extensions of micro kernels with some properties of monolithic
kernels. Hybrid kernels are similar to micro kernels, except that they include additional
code in kernel space so that such code can run more swiftly than it would were it in user
space. These are part of the operating systems such as Microsoft Windows NT, 2000 and
XP. Dragon Fly BSD, etc
·
Exokernel
Exokernels provides efficient control over hardware. It runs only services protecting
the resources (i.e. tracking the ownership, guarding the usage, revoking access to resources,
etc) by providing low-level interface for library operating systems and leaving the
management to the application.
Six types of common services are shown in the following figure below and explained
in subsequent sections
7
Fig:1.3: Representation of Common Services Offered By a RTOS System
Task Management
In RTOS, The application is decomposed into small, schedulable, and sequential
program units known as “Task”, a basic unit of execution and is governed by three time-
critical properties; release time, deadline and execution time. Release time refers to the
point in time from which the task can be executed. Deadline is the point in time by which
the task must complete. Execution time denotes the time the task takes to execute.
8
Suspended: Task put on hold temporarily
Task Control block: Task uses TCBs to remember its context. TCBs are data structures
residing in RAM, accessible only by RTOS
Scheduler: The scheduler keeps record of the state of each task and selects from among
them that are ready to execute and allocates the CPU to one of them. Various scheduling
algorithms are used in RTOS.
9
Fig:1.6: Process Flow of a Scheduler
Polled System with interrupts. In addition to polling, it takes care of critical tasks.
Round Robin : Sequences from task to task, each task getting a slice of time
Hybrid System: Sensitive to sensitive interrupts, with Round Robin system working in
background
10
Fig:1.9: Non-Preemptive Scheduling or Cooperative Multitasking
Dispatcher : The dispatcher gives control of the CPU to the task selected by the scheduler
by performing context switching and changes the flow of execution.
Task Synchronization & inter task communication serves to pass information amongst
tasks.
Task Synchronization
11
Event Objects
Event objects are used when task synchronization is required without resource
sharing. They allow one or more tasks to keep waiting for a specified event to occur. Event
object can exist either in triggered or non-triggered state. Triggered state indicates
resumption of the task.
Semaphores.
A semaphore has an associated resource count and a wait queue. The resource count
indicates availability of resource. The wait queue manages the tasks waiting for resources
from the semaphore. A semaphore functions like a key that define whether a task has the
access to the resource. A task gets an access to the resource when it acquires the
semaphore.
Inter task communication involves sharing of data among tasks through sharing of
memory space, transmission of data, etc. Inter task communications is executed using
following mechanisms
Message queues
A message queue is an object used for inter task communication through which task
send or receive messages placed in a shared memory. The queue may follow 1) First In
First Out (FIFO), 2) Last in First Out(LIFO) or 3) Priority (PRI) sequence. Usually, a
message queue comprises of an associated queue control block (QCB), name, unique ID,
12
memory buffers, queue length, maximum message length and one or more task waiting
lists. A message queue with a length of 1 is commonly known as a mailbox.
It permits distributed computing where task can invoke the execution of another
task on a remote computer.
Memory Management
Two types of memory managements are provided in RTOS – Stack and Heap. Stack
management is used during context switching for TCBs. Memory other than memory used
for program code, program data and system stack is called heap memory and it is used for
dynamic allocation of data space for tasks. Management of this memory is called heap
management.
Timer Management
Tasks need to be performed after scheduled durations. To keep track of the delays,
timers- relative and absolute are provided in RTOS.
RTOS provides various functions for interrupt and event handling, viz., Defining
interrupt handler, creation and deletion of ISR, referencing the state of an ISR, enabling
and disabling of an interrupt, etc. It also restricts interrupts from occurring when
modifying a data structure, minimize interrupt latencies due to disabling of interrupts
when RTOS is performing critical operations, minimizes interrupt response times.
RTOS generally provides large number of APIs to support diverse hardware device
drivers.
13
Features of RTOS
Informally, each standard in the POSIX set is defined by a decimal following the
POSIX. Thus, POSIX.1 is the standard for an application program interface in
the C language. POSIX.2 is the standard shell and utility interface (that is to say, the user's
command interface with the operating system). These are the main two interfaces, but
additional interfaces, such as POSIX.4 for thread management, have been developed or are
being developed. The POSIX interfaces were developed under the auspices of the Institute
of Electrical and Electronics Engineers (IEEE).
Task – A set of related tasks that are jointly able to provide some system
functionality.
14
Job – A job is a small piece of work that can be assigned to a processor, and that
may or may not require resources.
Release time of a job – It's a time of a job at which job becomes ready for
execution.
Execution time of a job: It is time taken by job to finish its execution.
Deadline of a job: It's time by which a job should finish its execution.
Processors: They are also known as active resources. They are important for the
execution of a job.
Maximum it is the allowable response time of a job is called its relative deadline
Response time of a job: It is a length of time from the release time of a job when
the instant finishes.
Absolute deadline: This is the relative deadline, which also includes its release
time.
Summary
RTOS is an operating system intended to serve real time application that process
data as it comes in, mostly without buffer delay. It offers priority-based scheduling, which
allows you to separate analytical processing from non-critical processing. Important
components of RTOS system are:
The Scheduler
Symmetric Multiprocessing
Function Library
Memory Management
Fast dispatch latency and
User-defined data objects and classes.
RTOS system occupy very less memory and consume fewer resources. Performance
is the most important factor required to be considered while selecting for a RTOS.
General-Purpose Operating System (GPOS) is used for desktop PC and laptop while Real-
Time Operating System (RTOS) only applied to the embedded application. Real-time
systems are used in Airlines reservation system, Air traffic control system, etc. The biggest
drawback of RTOS is that the system only concentrates on a few tasks.
15
SCHOOL OF ELECTRICAL & ELECTRONICS ENGINEERING
UNIT - II
RTOS PROGRAMMING-SECA5302
1
II PERFORMANCE METRICS AND SCHEDULING ALGORITHMS
An embedded system typically has enough CPU power to do the job, but
typically only just enough — there is no excess. Memory size is usually limited.
It is not unreasonably small, but there isn't likely to be any possibility of adding
more. Power consumption is usually an issue and the software — its size and
efficiency – can have a significant bearing on the number of watts burned by the
embedded device. It is clear that it is vital in an embedded system that the real
time operating system (RTOS) has the smallest possible impact on memory
footprint and makes very efficient use of the CPU.
RTOS metrics
There are three areas of interest when looking at the performance and usage
characteristics of an RTOS:
Memory – how much ROM and RAM does the kernel need and how is
this affected by options and configuration
2
information about kernel objects. However, nowadays, most kernels are
dynamically configured.
RAM space will be used for kernel data structures, including some or all
of the kernel object information, again depending upon whether the kernel is
statically or dynamically configured. There will also be some global variables. If
code is copied from flash to RAM, that space must also be accounted for. There
are a number of factors that affect the memory footprint of an RTOS. The CPU
architecture is key. The number of instructions can vary drastically from one
processor to another, so looking at size figures for, say, PowerPC gives no
indication of what the ARM version might be like.
Interrupt latency
System: the total delay between the interrupt signal being asserted and
the start of the interrupt service routine execution.
OS: the time between the CPU interrupt sequence starting and the
3
initiation of the ISR. This is really the operating system overhead, but
many people refer to it as the latency. This means that some vendors claim
zero interrupt latency.
ƮIL = ƮH + ƮOS
where:
Ideally, quoted figures should include the best and worst case scenarios. The
worst case is when the kernel disables interrupts. To measure a time interval,
like interrupt latency, with any accuracy, requires a suitable instrument. The
best tool to use is an oscilloscope. One approach is to use one pin on a GPIO
interface to generate the interrupt. This pin can be monitored on the ‘scope. At
the start of the interrupt service routine, another pin, which is also being
monitored, is toggled. The interval between the two signals may be easily read
from the instrument.
Importance
4
Many embedded systems are real time and it is those applications, along with
fault tolerant systems, where knowledge of interrupt latency is important. If the
requirement is to maximize bandwidth on a particular interface, the latency on
that specific interrupt needs to be measured. To give an idea of numbers, the
majority of systems exhibit no problems, even if they are subjected to interrupt
latencies of tens of microseconds
Scheduling latency
For most RTOS there are four key categories of service call:
Threading services
Synchronization services
5
Inter-process communication services
Memory services
All RTOS vendors provide performance data for their products, some of
which is more comprehensive than others. This information may be very useful, but
can also be misleading if interpreted incorrectly. It is important to understand the
techniques used to make measurements and the terminology used to describe the
results. There are also trade-offs – generally size against speed – and these, too, need
to be thoroughly understood. Without this understanding, a fair comparison is not
possible. If timing is critical to your application, it is strongly recommend that you
perform your own measurements. This enables you to be sure that the hardware
and software environment is correct and that the figures are directly relevant to
your application.
The operating system must guarantee that each task is activated at its
proper rate and meets its deadline. To ensure this, some periodic scheduling
algorithms are used. There are basic two types of scheduling algorithms
6
Fig:2.2:Classification of scheduling algorithm
In fixed priority if the kth job of a task T1 has higher priority than the kth
job of task T2 according to some specified scheduling event, then every job of
T1 will always execute first then the job of T2 i.e. on next occurrence priority
does not change. More formally, if job J(1,K) of task T1 has higher priority than
J(2,K) of task T2 then J(1,K+1) will always has higher priority than of J(2,K+1)
. One of best example of fixed priority algorithm is rate monotonic scheduling
algorithm.
Dynamic priority algorithms
7
the other tasks. One example of a dynamic priority algorithm is the earliest
deadline first algorithm.
For a given task set of n periodic tasks, processor utilization factor U is the
fraction of time that is spent for the execution of the task set. If Si is a task from
task set then Ci/Ti is the time spent by the processor for the execution of Si .
Processor utilization factor is denoted as
Similarly, for the task set of n periodic tasks processor utilization is greater
than one then that task set will not be schedulable by any algorithm. Processor
utilization factor tells about the processor load on a single processor. U=1 means
100% processor utilization. Following scheduling algorithms will be discussed in
details
Rate Monotonic (RM) Scheduling Algorithm
For example, we have a task set that consists of three tasks as follows
8
Tasks Release Execution Deadline (Di) Time
time(ri) time(Ci) period(Ti)
T1 0 0.5 3 3
T2 0 1 4 4
T3 0 2 6 6
A task set given in the above table is RM scheduling in the given figure. The
explanation of above is as follows
1. According to RM scheduling algorithm task with shorter period has
higher priority so T1 has high priority, T2 has intermediate priority and
T3 has lowest priority. At t=0 all the tasks are released. Now T1 has
highest priority so it executes first till t=0.5.
2. At t=0.5 task T2 has higher priority than T3 so it executes first for one-
time units till t=1.5. After its completion only one task is remained in the
system that is T3, so it starts its execution and executes till t=3.
3. At t=3 T1 releases, as it has higher priority than T3 so it preempts or
blocks T3 and starts it execution till t=3.5. After that the remaining part
of T3 executes.
4. At t=4 T2 releases and completes it execution as there is no task running
in the system at this time.
5. At t=6 both T1 and T3 are released at the same time but T1 has higher
priority due to shorter period so it preempts T3 and executes till t=6.5,
after that T3 starts running and executes till t=8.
9
6. At t=8 T2 with higher priority than T3 releases so it preempts T3 and
starts its execution.
7. At t=9 T1 is released again and it preempts T3 and executes first and at
t=9.5 T3 executes its remaining part. Similarly, the execution goes on.
Advantages
It is easy to implement.
If any static priority assignment algorithm can meet the deadlines then
rate monotonic scheduling can also do the same. It is optimal.
It consists of calculated copy of the time periods unlike other time-sharing
algorithms as Round robin which neglects the scheduling needs of the
processes.
Disadvantages
It is very difficult to support a periodic and sporadic tasks under RMA.
RMA is not optimal when tasks period and deadline differ.
10
laxity of a running task does not changes it remains same whereas the laxity all
other tasks is decreased by one after every one-time unit.
Example of Least Laxity first scheduling Algorithm
T1 0 2 6 6
T2 0 2 8 8
T3 0 3 10 10
L3= 10-(0+3) =7
11
As task T1 has least laxity so it will execute with higher priority.
Similarly, At t=1 its priority is calculated it is 4 and T2 has 5 and T3 has 6,
so again due to least laxity T1 continue to execute.
2. At t=2 T1is out of the system so Now we compare the laxities of T2 and
T3 as following
L2= 8-(2+2) =4
L3= 10-(2+3) =5
L3= 10-(6+1) =3
L3= 20-(10+3) =7
LLF is an optimal algorithm because if a task set will pass utilization test
then it is surely schedulable by LLF. Another advantage of LLF is that it
some advance knowledge about which task going to miss its deadline. On
other hand it also has some disadvantages as well one is its enormous
computation demand as each time instant is a scheduling event. It gives poor
performance when more than one task has least laxity.
12
Cyclic executives:
Scheduling tables
Frames
Frame size constraints
Generating schedules
Non-independent tasks
Pros and cons
Cyclic Scheduling
This is an important way to sequence tasks in a real time system. Cyclic
scheduling is static – computed offline and stored in a table. Task scheduling is
non-preemptive. Non-periodic work can be run during time slots not used by
periodic tasks. Implicit low priority for non-periodic work. Usually non-periodic
work must be scheduled preemptively. Scheduling table executes completely in
one hyper period H. Then repeats H is least common multiple of all task periods
N quanta per hyper period. Multiple tables can support multiple system modes
E.g., an aircraft might support takeoff, cruising, landing, and taxiing modes.
Mode switches permitted only at hyper period boundaries. Otherwise,
hard to meet deadlines.
Frames:
Divide hyper periods into frames .Timing is enforced only at frame
boundaries.
Consider a system with four task
13
task is executed as a function call and must fit within a single frame. Multiple
tasks may be executed in a frame size is f Number of frames per hyper period is
F = H/f.
1. Tasks must fit into frames so, f ≥ Ci for all tasks Justification: Non-
preemptive tasks should finish executing within a single frame
2. f must evenly divide H Equivalently, f must evenly divide P for some task i
Justification: Keep table size small
3. There should be a complete frame between the release and deadline of
every task
Justification: Want to detect missed deadlines by the time the deadline arrives
14
Drawbacks:
Summary:
• Off-line scheduling
• Doesn’t use the process abstraction of the OS
• Manually created table of procedures to be called – Waits for a periodic
interrupt for synchronization
• Minor cycle – Loops the execution of the procedures in the table
1. Jane W. S Liu, “Real Time Systems” Pearson Higher Education, 3rd Edition, 2000.
3. Jean J. Labrosse, “Micro C/OS-I : The real time kernel” CMP Books, 2nd Edition,2015.
5. Richard Barry, “Mastering the Free RTOS: Real Time Kernel”, Real Time Engineers
Ltd, 1st Edition, 2016.
15
SCHOOL OF ELECTRICAL & ELECTRONICS ENGINEERING
UNIT - III
1
III RESOURCE SHARING FOR REAL TIME TASKS
Resource sharing among tasks- Priority inversion Problem - Priority inheritance and
Priority ceiling Protocols – Features of commercial and open source real time operating
systems: Vxworks, QNX, Micrium OS, RT Linux and Free RTOS
Sharing of critical resources among tasks requires a different set of rules, compared
to the rules used for sharing resources such as a CPU among tasks. We have in the last
Chapter discussed how resources such as CPU can be shared among tasks. Priority
inversion is a operating system scenario in which a higher priority process is preempted by
a lower priority process. This implies the inversion of the priorities of the two processes.
A system malfunction may occur if a high priority process is not provided the
required resources.
Priority inversion may also lead to implementation of corrective measures. These
may include the resetting of the entire system.
The performance of the system can be reduces due to priority inversion. This may
happen because it is imperative for higher priority tasks to execute promptly.
System responsiveness decreases as high priority tasks may have strict time
constraints or real time response guarantees.
Sometimes there is no harm caused by priority inversion as the late execution of the
high priority process is not noticed by the system.
Solutions of Priority Inversion
Some of the solutions to handle priority inversion are given as follows −
Priority Ceiling
All of the resources are assigned a priority that is equal to the highest priority of any
task that may attempt to claim them. This helps in avoiding priority inversion
Disabling Interrupts
There are only two priorities in this case i.e. interrupts disabled and preemptible. So
priority inversion is impossible as there is no third option.
Priority Inheritance
This solution temporarily elevates the priority of the low priority task that is
executing to the highest priority task that needs the resource. This means that
medium priority tasks cannot intervene and lead to priority inversion.
No blocking
Priority inversion can be avoided by avoiding blocking as the low priority task
2
blocks the high priority task.
Random boosting
The priority of the ready tasks can be randomly boosted until they exit the critical
section.
Difference between Priority Inversion and Priority Inheritance
Both of these concepts come under Priority scheduling in Operating System. In one
line, Priority Inversion is a problem while Priority Inheritance is a solution. Literally,
Priority Inversion means that priority of tasks get inverted and Priority Inheritance means
that priority of tasks get inherited. Both of these phenomena happen in priority scheduling.
Basically, in Priority Inversion, higher priority task
(H) ends up waiting for middle priority task (M) when H is sharing critical section with
lower priority task (L) and L is already in critical section. Effectively, H waiting for M
results in inverted priority i.e. Priority Inversion. One of the solution for this problem is
Priority Inheritance.
In Priority Inheritance, when L is in critical section, L inherits priority of H at the
time when H starts pending for critical section. By doing so, M doesn’t interrupt L and H
doesn’t wait for M to finish. Please note that inheriting of priority is done temporarily i.e. L
goes back to its old priority when L comes out of critical section.
Priority Inheritance Protocol (PIP) is a technique which is used for sharing critical
resources among different tasks. This allows the sharing of critical resources among different
without the occurrence of unbounded priority inversions.
The basic concept of PIP is that when a task goes through priority inversion, the
priority of the lower priority task which has the critical resource is increased by the priority
inheritance mechanism. It allows this task to use the critical resource as early as possible
without going through the preemption. It avoids the unbounded priority inversion.
Working of PIP :
When several tasks are waiting for the same critical resource, the task which is
currently holding this critical resource is given the highest priority among all the tasks
which are waiting for the same critical resource. Now after the lower priority task
having the critical resource is given the highest priority then the intermediate priority
tasks cannot preempt this task. This helps in avoiding the unbounded priority inversion.
When the task which is given the highest priority among all tasks, finishes the job and
releases the critical resource then it gets back to its original priority value (which may
be less or equal). If a task is holding multiple critical resources then after releasing one
critical resource it cannot go back to it original priority value. In this case it inherits the
highest priority among all tasks waiting for the same critical resource.
3
Advantages of PIP :
Disadvantages of PIP :
Priority Inheritance Protocol has two major problems which may occur:
Deadlock :
Chain Blocking :
When a task goes through priority inversion each time it needs a resource then
this process is called chain blocking. For example, there are two tasks T1 and T2.
Suppose T1 has the higher priority than T2. T2 holds the critical resource CR1 and CR2.
T1 arrives and requests for CR1. T2 undergoes the priority inversion according to PIP.
Now, T1 request CR2, again T2 goes for priority inversion according to PIP. Hence,
multiple priority inversion to hold the critical resource leads to chain blocking.
The chained blocking problem of the Priority Inheritance Protocol is resolved in the
Priority Ceiling Protocol.
The basic properties of Priority Ceiling Protocols are:
4
1. Each of the resources in the system is assigned a priority ceiling.
2. The assigned priority ceiling is determined by the highest priority among all the jobs
which may acquire the resource.
3. It makes use of more than one resource or semaphore variable, thus eliminating chain
blocking.
4. A job is assigned a lock on a resource if no other job has acquired lock on that
resource.
5. A job J, can acquire a lock only if the job’s priority is strictly greater than the priority
ceilings of all the locks held by other jobs.
6. If a high priority job has been blocked by a resource, then the job holding that
resource gets the priority of the high priority task.
7. Once the resource is released, the priority is reset back to the original.
8. In the worst case, the highest priority job J1 can be blocked by T lower priority tasks
in the system when J1 has to access T semaphores to finish its execution.
Priority Scheduling Protocol can be used to tackle the problem of the priority
inversion problem unlike that of Priority Inheritance Protocol. It makes use of semaphores
to share the resources with the jobs in a real-time system.
Vx Works features:
High –performance
Unix performance
Unix -like, multitasking
Environment scalable and hierarchical RTOS
Hierarchical RTOS
Host and target based development approach Supports
Device Software Optimization ─ a new methodology that enables development and
running of device software faster, better and more reliably Vx works RTOS Kernel
VxWorks 6.x processor abstraction layer
5
The layer enables application design for new versions later by just changing the layer
hardware
interface
Supports advanced processor architectures ─ ARM, Cold Fire, MIPS, Intel, SuperH.
Hard real time applications
Supports kernel mode execution of Supports kernel mode execution of tasks
Supports open source Linux and TIPC protocol
Provides for the preemption points at kernel
Provides preemptive as well as round robin scheduling
Support POSIX standard asynchronous IOs
Support UNIX standard buffered I/Os
PTTS 1.1 (Since Dec. 2007)
IPCs in TIPC for network and clustered system environment
POSIX 1003.1b standard IPCs and interfaces additional availability
Separate context for tasks and ISRs
Micrium OS:
Portable. Offering unprecedented ease-of-use, μC/OS kernels are delivered with complete
source code and in-depth documentation. The μC/OS kernels run on huge number of
processor architectures
Scalable. The μC/OS kernels allow for unlimited tasks and kernel objects. The kernels'
memory footprint can be scaled down to contain only the features required for your
application, typically 6–24 KBytes of code space and 1 KByte of data space.
Efficient. Micrium's kernels also include valuable runtime statistics, making the internals of
your application observable. Identify performance bottlenecks, and optimize power usage,
early in your development cycle.
The features of the µC/OS kernels include:
6
Highly scalable: Unlimited number of tasks, priorities and kernel objects
Micrium provides two extensions to the µC/OS-II kernel that provide memory
protection and greater stability and safety for the applications.
7
Its features and benefits include:
Features of RT Linux:
Multi-tasking
Priority-based scheduling
Application tasks should be programmed to suit
Ability to quickly respond to external interrupts
Basic mechanisms for process communication and synchronization
Small kernel and fast context switch
8
Priority based kernel for embedded applications e.g. OSE, Vx Works, QNX,
VRTX32, pSOS . Many of them are commercial kernels . Applications should be designed
and programmed to suite priority-based scheduling e.g deadlines as priority etc . Real
Time Extensions of existing time-sharing OS. e.g. Real time Linux, Real time NT by e.g
locking RT tasks in main memory, assigning highest priorities etc .Research RT Kernels
e.g. SHARK, TinyOS . Run-time systems for RT programming languages e.g. Ada, Erlang,
Real-Time Java
Free RTOS is a popular real-time operating system kernel for embedded devices,
which has been ported to 35 microcontrollers. It is distributed under the GPL with an
additional restriction and optional exception. The restriction forbids benchmarking while
the exception permits users' proprietary code to remain closed source while maintaining
the kernel itself as open source, thereby facilitating the use of Free RTOS in proprietary
applications.
Free RTOS is designed to be small and simple. The kernel itself consists of only three
or four C files. To make the code readable, easy to port, and maintainable, it is written
mostly in C, but there are a few assembly functions included where needed (mostly in
architecture-specific scheduler routines). Thread priorities are supported. In addition there
are four schemes of memory allocation provided:
Allocate only;
Allocate and free with a very simple, fast, algorithm;
A more complex but fast allocate and free algorithm with memory coalescence;
C library allocate and free with some mutual exclusion protection
Key features:
Very small memory footprint, low overhead, and very fast execution.
Tick-less option for low power applications.
Equally good for hobbyists who are new to OSes, and professional developers
working on commercial products.
Scheduler can be configured for either preemptive or cooperative operation.
Co routine support (Co routine in Free RTOS is a very simple and lightweight
task that has very limited use of stack)
Trace support through generic trace macros. Tools such as Trace alyzer (a.k.a.
Free RTOS+Trace, provided by the Free RTOS partner Percepio) can thereby
record and visualize the runtime behavior of Free RTOS-based systems. This
includes task scheduling and kernel calls for semaphore and queue operations.
Trace analyzer is a commercial tool, but also available in a feature-limited free
version.
Features of a RTOS:
Allows multi-tasking
Scheduling of the tasks with priorities
9
Synchronization of the resource access
Inter-task communication
Time predictable
Interrupt handling
Predictability of timing
The timing behavior of the OS must be predictable
For all services of the OS, there is an upper bound on the execution time
Scheduling policy must be deterministic
The period during which interrupts are disabled must be short (to avoid
unpredictable delays in the processing of critical events)
The QNX RTOS v6.1 has a client-server based architecture. QNX adopts the
approach of implementing an OS with a 10 Kbytes micro-kernel surrounded by a team of
optional processes that provide higher-level OS services .Every process including the device
driver has its own virtual memory space. The system can be distributed over several nodes,
and is network transparent. The system performance is fast and predictable and is robust.
It supports Intel x86family of processors, MIPS, PowerPC, and Strong ARM .
QNX has successfully been used in tiny ROM-based embedded systems and in
several hundred node distributed systems.
VxWorks is the premier development and execution environment for complex real-
time and embedded applications on a wide variety of target processors. Three highly
integrated components are included with Vxworks: a high performance scalable real-time
operating system which executes on a target processor; a set of powerful cross-development
tools; and a full range of communications software options such as Ethernet or serial line
for the target connection to the host. The heart of the OS is the Wind microkernel which
supports multitasking, scheduling, inter task management and memory management. All
other functionalities are through processes. There is no privilege protection between system
and application and also the support for communication between processes on different
processors is poor.
1. Jane W. S Liu, “Real Time Systems” Pearson Higher Education, 3rd Edition, 2000.
2. Raj Kamal, “Embedded Systems- Architecture, Programming and Design” Tata McGraw
Hill, 2nd Edition, 2014.
3. Jean J. Labrosse, “Micro C/OS-I : The real time kernel” CMP Books, 2nd Edition,2015.
5. Richard Barry, “Mastering the Free RTOS: Real Time Kernel”, Real Time Engineers Ltd,
1st Edition, 2016.
6. David E. Simon, “ An Embedded Software Primer”, Pearson Education Asia Publication
10
SCHOOL OF ELECTRICAL & ELECTRONICS ENGINEERING
UNIT -IV
1
IV APPLICATION PROGRAMMING USING RTOS
Task synchronization using semaphores, Inter task communication: message queues and
pipes, Remote procedure call- Timers and Interrupts-Memory management and I/O
management
Task Management:
Fig:4.2:Task states
Non-Periodic or aperiodic tasks = all tasks that are not periodic, also known as Event
driven, their activations may be generated by external interrupts. Sporadic tasks=
aperiodic tasks with minimum inter arrival time Tmin.
Managing tasks:
2
Task creation: create a new TCB (task control block)
Task termination: remove the TCB
Change Priority: modify the TCB
State-inquiry: read the TCB
Task synchronization:
3
Run:
A task enters this state as it starts executing on the processor
Ready:
State of those tasks that are ready to execute but cannot be executed, because the
processor is assigned to another task.
Wait:
A task enters this state when it executes a synchronization primitive to wait for an event,
e.g. a wait primitive on a semaphore. In this case, the task is inserted in a queue
associated with the semaphore. The task at the head is resumed when the semaphore is
unlocked by a signal primitive.
Idle:
A periodic job enters this state when it completes its execution and has to wait
for the beginning of the next period.
4
In computing, a named pipe (also known as a FIFO) is one of the methods for intern-process
communication.
It is an extension to the traditional pipe concept on Unix. A traditional pipe is
“unnamed” and lasts only as long as the process.
A named pipe, however, can last as long as the system is up, beyond the life of the
process. It can be deleted if no longer used.
Usually a named pipe appears as a file and generally processes attach to it for inter-
process communication. A FIFO file is a special kind of file on the local storage which
allows two or more processes to communicate with each other by reading/writing
to/from this file.
A FIFO special file is entered into the filesystem by calling mkfifo() in C. Once we have
created a FIFO special file in this way, any process can open it for reading or writing, in
the same way as an ordinary file. However, it has to be open at both ends
simultaneously before you can proceed to do any input or output operations on it.
Understanding Pipes
Within a process
Writes to files can be read on files
Not very useful
Between processes
After a fork()
Writes to files by one process can be read on files by the other
Using Pipes:
Usually, the unused end of the pipe is closed by the process
If process A is writing and process B is reading, then process A would close files[0]
and process B would close files[1]
Reading from a pipe whose write end has been closed returns 0 (end of file)
Writing to a pipe whose read end has been closed generates SIGPIPE
PIPE_BUF specifies kernel pipe buffer size
Creating a Pipe
The primitive for creating a pipe is the pipe function. This creates both the reading
and writing ends of the pipe. It is not very useful for a single process to use a pipe to talk to
itself. In typical use, a process creates a pipe just before it forks one or more child processes.
The pipe is then used for communication either between the parent or child processes, or
between two sibling processes.
The pipe function is declared in the header file unistd.h. Here is an example of a
simple program that creates a pipe. The parent process writes data to the pipe, which is read
by the child process.
5
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
void
read_from_pipe (int file)
{
FILE *stream;
int c;
stream = fdopen (file, "r");
while ((c = fgetc (stream)) != EOF)
putchar (c);
fclose (stream);
}
void
write_to_pipe (int file)
{
FILE *stream;
stream = fdopen (file, "w");
fprintf (stream, "hello, world!\n");
fprintf (stream, "goodbye, world!\n");
fclose (stream);
}
int
main (void)
{
pid_t pid;
int mypipe[2];
6
/* Create the child process. */
pid = fork ();
if (pid == (pid_t) 0)
{
FIFO
A FIFO special file is similar to a pipe, except that it is created in a different way.
Instead of being an anonymous communications channel, a FIFO special file is entered into
the file system by calling mkfifo.
7
Once you have created a FIFO special file in this way, any process can open it for
reading or writing, in the same way as an ordinary file. However, it has to be open at both
ends simultaneously before you can proceed to do any input or output operations on it.
Opening a FIFO for reading normally blocks until some other process opens the same FIFO
for writing, and vice versa.
First: Co processes–Nothing more than a process whose input and output are both
redirected from another process
FIFOs–named pipes
With regular pipes, only processes with a common ancestor can communicate With
FIFOs, any two processes can communicate
Creating and opening a FIFO is just like creating and opening a file
FIFO details:
#include <sys/types.h>
#include <sys/stat.h>
intmkfifo(constchar *pathname, mode_tmode); The
modeargument is just like in open()
Can be opened just like a file
When opened, O_NONBLOCK bit is important
Not specified: open() for reading blocks until the FIFO is opened by a writer
Specified: open() returns immediately, but returns an error if opened for writing and no
reader exists
Mail box:
Mailboxes are similar to queues.
Capacity of mailboxes is usually fixed after initialization
Some RTOS allow only a single message in the mailbox. (mailbox is either full or
empty)
Some RTOS allow prioritization of messages
8
Mailbox functions at OS
Some OS provide the mailbox and queue both IPC functions. When the IPC functions
for mailbox are not provided by an OS, then the OS employs queue for the same purpose. A
mailbox of a task can receive from other tasks and has a distinct ID. Mailbox (for message)
is an IPC through a message at an OS that can be received only one single destined task for
the message from the tasks. Two or more tasks cannot take message from same Mailbox. A
task on an OS function call puts (means post and also send) into the mailbox only a pointer
to a mailbox message . Mailbox message may also include a header to identify the message-
type specification.
OS provides for inserting and deleting message into the mailbox message pointer.
Deleting means message-pointer pointing to Null. Each mailbox for a message need
initialization (creation) before using the functions in the scheduler for the message queue
and message pointer pointing to Null. There may be a provision for multiple mailboxes for
the multiple types or destinations of messages. Each mailbox has an ID. Each mailbox
usually has one message pointer only, which can point to message.
Mailbox Types
Fig:4.5:Classification of mailbox
When an OS call is to post into the mailbox, the message bytes are as per the pointed
number of bytes by the mailbox message pointer.
Fig:4.6:Features of mailbox
9
Mailbox IPC functions
1. OSMBoxCreate creates a box and initializes the mailbox contents with a NULL
pointer at *msg . 2. OSMBoxPost sends at *msg, which now does not point to Null.
2. An ISR can also post into mailbox for a task
3. OSMBoxWait (Pend) waits for *msg not Null, which is read when not Null and again
*msg points to Null. • The time out and error handling function can be provided with
Pend function argument. • ISR not permitted to wait for message into mailbox. Only
the task can wait
4. OSMBoxAccept reads the message at *msg after checking the presence yes or no [No
wait.] Deletes (reads) the mailbox message when read and *msg again points to Null
• An ISR can also accept mailbox message for a task
5. OSMBoxQuery queries the mailbox *msg.
6. OSMBoxDelete
Tasks are event driven or time driven. In RTOS environment there is less
significance for event driven mechanism. Just as processes share the CPU, they also share
physical memory. The concept of a logical address space that is bound to a separate
physical address space is central to proper memory management.
Logical address – generated by the CPU; also referred to as virtual address
Physical address – address seen by the memory unit
Logical and physical addresses are the same in compile-time and load time address binding
schemes; logical (virtual) and physical addresses differ in execution-time address-binding
scheme. Re locatable Means that the program image can reside anywhere in physical
memory. Binding Programs need real memory in which to reside. When is the location of
that real memory determined. This is called mapping logical to physical addresses. This
binding can be done at compile/link time. Converts symbolic to re locatable. Data used
within compiled source is offset within object module.
Compiler: If it’s known where the program will reside, then absolute code is
generated. Otherwise compiler produces re locatable code.
Execution: The code can be moved around during execution. Means flexible virtual
Mapping
10
Example of Memory Usage:
Message queues
A message queue is an object used for inter task communication through which task
send or receive messages placed in a shared memory. The queue may follow
1) First In First Out (FIFO), 2) Last in First Out(LIFO) or 3) Priority (PRI) sequence.
Usually, a message queue comprises of an associated queue control block (QCB), name,
unique ID, memory buffers, queue length, maximum message length and one or more
task waiting lists. A message queue with a length of 1 is commonly known as a mailbox.
11
Fig:4.8: Message queues
Pipes
12
2. The client stub marshalls (pack) the parameters into a message. Marshalling includes
converting the representation of the parameters into a standard format, and copying each
parameter into the message.
3. The client stub passes the message to the transport layer, which sends it to the remote server
machine.
4. On the server, the transport layer passes the message to a server stub, which demar
shalls(unpack) the parameters and calls the desired server routine using the regular procedure
call mechanism.
5. When the server procedure completes, it returns to the server stub (e.g., via a normal
procedure call return), which marshalls the return values into a message. The server stub then
hands the message to the transport layer.
6. The transport layer sends the result message back to the client transport layer, which hands
the message back to the client stub.
7. The client stub demarshalls the return parameters and execution returns to the caller.
RPC Issues
1. RPC Runtime: RPC run-time system is a library of routines and a set of services that handle
the network communications that underlie the RPC mechanism. In the course of an RPC call,
client-side and server-side run-time systems’ code handle binding, establish communications
over an appropriate protocol, pass call data between the client and server, and handle
communications errors.
2. Stub: The function of the stub is to provide transparency to the programmer-written
application code. On the client side, the stub handles the interface between the client’s local
procedure call and the run-time system, marshaling and unmarshaling data, invoking the RPC
run-time protocol, and if requested, carrying out some of the binding steps. On the server side,
the stub provides a similar interface between the run-time system and the local manager
procedures that are executed by the server.
3. Binding: The most flexible solution is to use dynamic binding and find the server at run time
when the RPC is first made. The first time the client stub is invoked, it contacts a name server
to determine the transport address at which the server resides.
Remote procedure calls support process oriented and thread oriented models.
The internal message passing mechanism of RPC is hidden from the user.
The effort to re-write and re-develop the code is minimum in remote procedure calls.
Remote procedure calls can be used in distributed environment as well as the local
environment.
Many of the protocol layers are omitted by RPC to improve performance.
13
Disadvantages of Remote Procedure Call
Some of the disadvantages of RPC are as follows
The remote procedure call is a concept that can be implemented in different ways. It is
not a standard.
There is no flexibility in RPC for hardware architecture. It is only interaction based.
There is an increase in costs because of remote procedure cal
Memory Management
The contents of a process address space do not need to be completely in place for a
process to execute. If a process references a part of its address space that is not resident in
main memory, the system pages the necessary information into memory. When system
resources are scarce, the system uses a two-level approach to maintain available resources. If
a modest amount of memory is available, the system will take memory resources away from
processes if these resources have not been used recently. Should there be a severe resource
shortage, the system will resort to swapping the entire context of a process to secondary
storage. This paging and swapping done by the system are effectively transparent to
processes, but a process may advise the system about expected future memory utili- zation as
a performance aid. A common technique for doing the above is virtual memory, which
simulates a much larger address space than is actually available, using a reserved disk area
for objects that are not in physical memory. The operating system’s kernel often performs
memory allocations that are needed for only the duration of a single system call. In a user
14
process, such short-term memory would be allocated on the run-time stack. Because the
kernel has a limited run-time stack, it is not feasible to allocate even moderately- sized blocks
of memory on it, so a more dynamic mechanism is needed. For example, when the system
must translate a path name, it must allocate a 1-kbyte buffer to hold the name. Other blocks
of memory must be more persistent than a single system call, and thus could not be allocated
on the stack even if there was space. An example is protocol-control blocks that remain
throughout the duration of a network connection.
This section discusses virtual memory techniques, memory allocation and deallocation,
memory protection and memory access control.
Virtual memory
The operating system uses virtual memory to manage the memory requirements of its
processes by combining physical memory with secondary memory (swap space) on a disk,
usually located on a hardware disk drive. Diskless systems use a page server to maintain their
swap areas on the local disk (extended memory). The translation from virtual to physical
addresses is implemented by a memory management unit (MMU), which may be either a
module of the CPU, or an auxiliary, closely coupled chip. The operating system is responsible
for deciding which parts of the program’s simulated main memory are kept in physical
memory, and also maintains the translation tables that map between virtual and physical
addresses. Three techniques of implementing virtual memory; paging, swapping and
segmentation.
(1) Paging
Almost all implementations of virtual memory divide the virtual address space of an
application program into pages; a page is a block of contiguous virtual memory addresses.
Here, the low-order bits of the binary representation of the virtual address are preserved,
and used directly as the low-order bits of the actual physical address; the high-order bits are
treated as a key to one or more address translation
Almost all implementations use page tables to translate the virtual addresses seen by
the application into physical addresses (also referred to as real addresses) used by the
hardware. The operating system stores the address translation tables, i.e. the mappings from
virtual to physical page numbers, in a data structure known as a page table. When the CPU
tries to reference a memory location that is marked as unavailable, the MMU responds by
raising an exception (commonly called a page fault) with the CPU, which then jumps to a
routine in the operating system. If the page is in the swap area, this routine invokes an
operation called a page swap, to bring in the required page.
The operating systems can have one page table or a separate page table for each
application. If there is only one, different applications running at the same time will share a
single virtual address space, i.e. they use different parts of a single range of virtual addresses.
The operating systems which use multiple page tables provide multiple virtual address
spaces, so concurrent applications seem to use the same range of virtual addresses, but their
separate page tables redirect to different real addresses
Fig:4.11: Abstract model for mapping virtual addresses to physical addresses in the
implementation of virtual memory
16
The above figure shows the virtual address spaces of two processes, X and Y, each
with their own page tables, which map each process’s virtual pages into physical pages in
memory. This shows that process X’s virtual page frame number 0 is mapped into memory at
physical page frame number 1 and that process Y’s virtual page frame number 1 is mapped
into physical page frame number 4. Each entry in the theoretical page table contains the
following information: (1) a valid flag that indicates whether this page table entry is valid; (2)
the physical page frame number that this entry describes; (3) the access control information
that describes how the page may be used.
(2) Swapping
Swap space is a portion of hard disk used for virtual memory that is usually a
dedicated partition (i.e., a logically independent section of a hard disk drive), created during
the installation of the operating system. Such a partition is also referred to as a swap
partition. However, swap space can also be a special file. Although it is generally preferable
to use a swap partition rather than a file, sometimes it is not practical to add or expand a
partition when the amount of RAM is being increased. In such case, a new swap file can be
created with a system call to mark a swap space.
It is also possible for a virtual page to be marked as unavailable because the page was never
previously allocated. In such cases, a page of physical memory is allocated and filled with
zeros, the page table is modified to describe it, and the program is restarted as above
The above figure illustrates how the virtual memory of a process might correspond to
what exists in physical memory, on swap, and in the file system. The U-area of a process
consists of two 4 kB pages (displayed here as U1 and U1) of virtual memory containing
information about the process that is needed by the system during execution. In this example,
these pages are shown in physical memory, and the data pages, D3 and D4, are shown as
being paged out to the swap area on disk. The text page, T4, has also been paged out, but it is
17
not written to the swap area as it exists in the file system. Those pages that have not yet been
accessed by the process (D5, T2, and T5) do not occupy any resources in physical memory or
in the swap area. The page swap operation involves a series of steps. Firstly it selects a page
in memory; for example, a page that has not been recently accessed and (preferably) has not
been modified since it was last read. If the page has been modified, the process writes the
modified page to the swap area. The next step in the process is to read in the information in
the needed page (the page corre-sponding to the virtual address the original program was
trying to reference when the exception occurred) from the swap file. When the page has been
read in, the tables for translating virtual addresses to physical addresses are updated to
reflect the revised contents of physical memory. Once the page swap completes, it exits, the
program is restarted and returns to the point that caused the exception.
(3) Segmentation
Some operating systems do not use paging to implement virtual memory, but use
segmentation instead. For an application process, segmentation divides its virtual address
space into variable-length segments, so a virtual address consists of a segment number and an
offset within the segment. Memory is always physically addressed with a single number
(called absolute or linear address). To obtain it, the microprocessor looks up the segment
number in a table to find a segment descriptor. This contains a flag indicating whether the
segment is present in main memory and, if so, the address of its starting point (segment’s
base address) and its length. It checks whether the offset within the segment is less than the
length of the segment and, if not, generates an interrupt. If a segment is not present in main
memory, a hardware interrupt is raised to the operating sys- tem, which may try to read the
segment into main memory, or to swap it in. The operating system may need to remove other
segments (swap out) in order to make space for the segment to be read in.
The difference between virtual memory implementations that use pages and those
using segments is not only about the memory division. Sometimes the segmentation is actually
visible to the user processes, as part of the semantics of the memory model. In other words,
instead of a process just having a memory which looks like a single large vector of bytes or
words, it is more structured. This is different from using pages, which does not change the
model visible to the process. This has important consequences. It is possible to combine
segmentation and paging, usually by dividing each segment into pages. In such systems,
virtual memory is usually implemented by paging, with segmentation used to provide
memory protection. The segments reside in a 32-bit linear paged address space, which
segments can be moved into and out of, and pages in that linear address space can be moved
in and out of main memory, providing two levels of virtual memory. This is quite rare,
however, most systems only use paging.
Dynamic memory allocation is the allocation of memory storage for use during the
run-time of a program, and is a way of distributing ownership of limited memory resources
among many pieces of data and code. A dynamically allocated object remains allocated until
it is deallocated explicitly, either by the programmer or by a garbage collector; this is notably
different from automatic and static memory allocation. It is said that such an object has
dynamic lifetime. Memory pools allow dynamic memory allocation comparable to malloc, or
the operator “new”. As those implementations suffer from fragmentation because of variable
block sizes, it can be impossible to use them in a real-time system due to performance
problems.
19
A more efficient solution is to pre-allocate a number of memory blocks of the same
size, called the memory pool. The application can allocate, access, and free blocks represented
by handles at run- time. Fulfilling an allocation request, which involves finding a block of
unused memory of a certain size in the heap, is a difficult problem. A wide variety of
solutions have been proposed, and some of the most commonly used are discussed here.
A free list is a data structure used in a scheme for dynamic memory allocation that
operates by connecting unallocated regions of memory together in a linked list, using the first
word of each unallocated region as a pointer to the next. It is most suitable for allocating
from a memory pool, where all objects have the same size. Free lists make the allocation and
deallocation operations very simple. To free a region, it is just added it to the free list. To
allocate a region, we simply remove a single region from the end of the free list and use it. If
the regions are variable-sized, we may have to search for a large enough region, which can be
expensive. Free lists have the disadvantage, inherited from linked lists, of poor locality of
reference and thus poor data cache utilization, and they provide no way of consolidating
adjacent regions to fulfill allocation requests for large regions. Nevertheless, they are still
useful in a variety of simple applications where a full-blown memory allocator is unnecessary,
or requires too much overhead.
(b) Paging
As mentioned earlier, the memory access part of paging is done at the hardware level
through page tables, and is handled by the MMU. Physical memory is divided into small
blocks called pages (typically 4 kB or less in size), and each block is assigned a page number.
The operating system may keep a list of free pages in its memory, or may choose to probe the
memory each time a request is made (though most modern operating systems do the former).
In either case, when a program makes a request for memory, the operating system allocates a
number of pages to it, and keeps a list of allocated pages for that particular program in
memory.
Memory protection
20
One step beyond the single-programming model is to provide multiprogramming
without memory protection. When a program is copied into memory, a linker-loader alters
the code of the program (loads, stores, jumps) to use the address of where the program lands
in memory. In this environment, bugs in any program can cause other programs to crash,
even the operating system. The third model is to have a multitasking operating system with
memory protection, which keeps user programs from crashing one another and the operating
system. Typically, this is achieved by two hardware- supported mechanisms: address
translation and dual-mode operation.
Each process is associated with an address space, or all the physical addresses it can
touch. However, each process appears to own the entire memory, with the starting virtual
address of 0. The missing piece is a translation table that translates every memory reference
from virtual addresses to physical addresses. Translation provides protection because there is
no way for a process to talk about other processes’ address, and it has no way of touching the
code or data of the operating system. The operating system uses physical addresses directly,
and involves no translation. When an exception occurs, the operating system is responsible
for allocating an area of physical memory to hold the missing information (and possibly in
the process pushing something else out to disk), bringing the relevant information in from the
disk, updating the translation tables, and finally resuming execution of the software that
incurred the exception.
Translation tables can offer protection only if a process cannot alter their content.
Therefore, a user process is restricted to only touching its address space under the user mode.
A CPU can change from kernel to user mode when starting a program, or vice versa through
either voluntary or involuntary mechanisms. The voluntary mechanism uses system calls,
where a user application asks the operating system to do something on its behalf. A system
call passes arguments to an operating system, either through registers or copying from the
user memory to the kernel memory. A CPU can also be switched from user to kernel mode
involuntarily by hardware interrupts (e.g., I/O) and program exceptions (e.g., segmentation
fault).
Dealing with race conditions is also one of the difficult aspects of memory
management. To manage memory access requests coming from the system, a scheduler is
necessary in the application layer or in the kernel, in addition to the MMU as a hardware
manager. The most common way of protecting data from concurrent access by the memory
access request scheduler is memory request contention. The semantics and methodologies of
memory access request contention should be the same as for I/O request contention.
Timer Management
Tasks need to be performed after scheduled durations. To keep track of the delays,
timers- relative and absolute are provided in RTOS.
RTOS provides various functions for interrupt and event handling, viz., Defining
interrupt handler, creation and deletion of ISR, referencing the state of an ISR, enabling
and disabling of an interrupt, etc. It also restricts interrupts from occurring when
modifying a data structure, minimize interrupt latencies due to disabling of interrupts
when RTOS is performing critical operations, minimizes interrupt response times.
22
Device I/O Management
One of the important jobs of an Operating System is to manage various I/O devices
including mouse, keyboards, touch pad, disk drives, display adapters, USB devices, Bit-
mapped screen, LED, Analog-to-digital converter, On/off switch, network connections,
audio I/O, printers etc. An I/O system is required to take an application I/O request and
send it to the physical device, then take whatever response comes back from the device and
send it to the application. I/O devices can be divided into two categories −
Block devices − A block device is one with which the driver communicates by sending
entire blocks of data. For example, Hard disks, USB cameras, Disk-On-Key etc.
Character devices − A character device is one with which the driver communicates by
sending and receiving single characters (bytes, octets). For example, serial ports,
parallel ports, sounds cards etc
Device Controllers
Device drivers are software modules that can be plugged into an OS to handle a
particular device. Operating System takes help from device drivers to handle all I/O devices.
The Device Controller works like an interface between a device and a device driver. I/O
units (Keyboard, mouse, printer, etc.) typically consist of a mechanical component and an
electronic component where electronic component is called the device controller. There is
always a device controller and a device driver for each device to communicate with the
Operating Systems. A device controller may be able to handle multiple devices. As an
interface its main task is to convert serial bit stream to block of bytes, perform error
correction as necessary.
Any device connected to the computer is connected by a plug and socket, and the
socket is connected to a device controller. Following is a model for connecting the CPU,
memory, controllers, and I/O devices where CPU and device controllers all use a common
bus for communication.
23
Communication to I/O Devices
The CPU must have a way to pass information to and from an I/O device. There are
three approaches available to communicate with the CPU and Device.
While using memory mapped IO, OS allocates buffer in memory and informs I/O
device to use that buffer to send data to the CPU. I/O device operates asynchronously with
CPU, interrupts CPU when finished. The advantage to this method is that every instruction
which can access memory can be used to manipulate an I/O device. Memory mapped IO is
used for most high-speed I/O devices like disks, communication interfaces.
Direct Memory Access (DMA)
Slow devices like keyboards will generate an interrupt to the main CPU after each
byte is transferred. If a fast device such as a disk generated an interrupt for each byte, the
operating system would spend most of its time handling these interrupts. So a typical
computer uses direct memory access (DMA) hardware to reduce this overhead. Direct
Memory Access (DMA) means CPU grants I/O module authority to read from or write to
memory without involvement. DMA module itself controls exchange of data between main
24
memory and the I/O device. CPU is only involved at the beginning and end of the transfer
and interrupted only after entire block has been transferred. Direct Memory Access needs a
special hardware called DMA controller (DMAC) that manages the data transfers and
arbitrates access to the system bus. The controllers are programmed with source and
destination pointers (where to read/write the data), counters to track the number of
transferred bytes, and settings, which includes I/O and memory types, interrupts and states
for the CPU cycles.
25
Fig:4.16: Design of I/O software
Device Drivers
Device drivers are software modules that can be plugged into an OS to handle a
particular device. Operating System takes help from device drivers to handle all I/O devices.
Device drivers encapsulate device-dependent code and implement a standard interface in
such a way that code contains device-specific register reads/writes. Device driver, is
generally written by the device's manufacturer and delivered along with the device on a CD-
ROM.
A device driver performs the following jobs −
26
The interrupt mechanism accepts an address ─ a number that selects a specific
interrupt handling routine/function from a small set. In most architectures, this address is
an offset stored in a table called the interrupt vector table. This vector contains the memory
addresses of specialized interrupt handlers.
Device-Independent I/O Software
The basic function of the device-independent software is to perform the I/O functions
that are common to all devices and to provide a uniform interface to the user-level software.
Though it is difficult to write completely device independent software but we can write some
modules which are common among all the devices. Following is a list of functions of device-
independent I/O Software −
27
SCHOOL OF ELECTRICAL & ELECTRONICS ENGINEERING
UNIT -V
1
V RTOS IMAGE BUILDING FOR DIFFERENT TARGET PLATFORMS
Porting of RTOS, Configuring RTOS for minimizing RAM consumption and increasing
Throughput- Building RTOS Image for Target platforms
Porting of RTOS:
Product development cycles are market driven, and market demands often require
vendors to compress development schedules. One approach to this is to simultaneously
develop similar products, yet with varying levels of product complexity. However,
scheduling pressures coupled with increased product complexity can be a recipe for
disaster, resulting in slipped schedules and missed opportunities. Consequently, vendors
are always on the alert for silver bullets, yet as developers, we know that they don't exist.
That said, it is still in our best interest to seek better ways of compressing development
cycles, and one way to do this is to port existing products to new hardware platforms,
adding new features along the way. This is the approach we used to demonstrate a proof-
in-concept when porting a legacy security application to a new hardware platform.
Our firm was hired to make enhancements to the client's existing 6502-based
product, and we quickly realized that this platform was running out of steam. Specifically,
the proposed features would significantly impact performance. Consequently, we proposed
three options for fixing this problem:
Completely rewriting the application on the current hardware.
Rewriting the application on a new, higher performance hardware.
Migrating portable portions of the application to the new hardware. After
considering the options, we decided to port to new hardware.
RTXC Overview
The Real-Time executive kernel (RTXC) supports three kinds of priority- based
task scheduling: preemptive (the default), round-robin, and time-slice. RTXC is robust,
supports hard deadlines, changeable task priorities, time and resource management, and
inter task communication. It also has a small RAM/ROM code footprint, standard API
interface, and is implemented in many processors. RTXC is divided into nine basic
components: tasks, mailboxes, messages, queues, semaphores, resources, memory
partitions, timers, and Interrupt Service Routines (ISRs). These components are further
subdivided into three groups that are used for inter task communication, synchronization,
and resource management. Moreover, component functionality is accessed via the standard
API interface.
2
Porting Activities Overview:
The first activity is design related, while the others are implementation related.
Moreover, the last three activities require an understanding of the new hardware—
knowing the specifics of what needs to happen to make the RTOS interact with the board.
System Architecture:
The best way to identify hardware components is to study the board's schematics.
Examining the NPE-167 board revealed that the I/O ports would be key for this project.
Why? Because this board used the processor's general-purpose ports to handle switches to
control CAN bus operation, the board's operating mode, control LED outputs, and
memory selection. I/O cards were controlled via the SPI bus, rather than I/O ports.
Ports can as either inputs or outputs. Examination of the NPE- 167 board showed
that 17 be configured ports are used. Eleven ports are used as switch inputs. From the
schematic we saw that switches 1-7 were used to set the MAC address for the CAN device.
CAN bus speed is controlled by switches 8-9, while the board operating mode is controlled
by switches 11-12. Switch 10 is not used. Four ports control the LEDs. There are three in
total. One LED is green, one red, and the third bicolor. Thus, four outputs are required to
control the three LEDs. Finally, two output ports are used as page selection for extended
memory.
NPE board addresses up to 512K of memory before having to make use of the page-
3
selection ports. Although we would configure the page-selection ports for the porting
process, we didn't have to use them because the total code footprint of the kernel, plus test
code, is 107K. RTXC's kernel is about 76K, and the porting test code fits within another
31K. In short, we would only use about 1/5 of the default memory to validate the porting
process.
The last necessary component for the port was to determine which timer to use as
the master time base. Timers are internal on the C167 processor, so they don't show up on
the schematic. So we had two options—choose a timer and write the code for that timer, or
use the BSP default timer. RTXC's C167 BSP uses a timer in its configuration. A trick to
simplify the initial porting process is to use the default timer that the BSP uses. Reviewing
the BSP documentation, we discovered that it uses timer 6 for the master timer. Once we
determined the components associated with the porting process, we could turn our
attention to figuring out which files needed to be changed.
Changing Files
We knew from the previous step that 11 ports were used for input and six ports for
output. Because these were general-purpose I/O ports, they needed to be initialized to work
as either inputs or outputs. This gave us an idea of where NPE- specific initialization code
needed to go—specifically, initialization code to set up these ports goes in the startup code.
For this project, initialization code was located in the cstart.a66 file that is located in the
Porting directory. Listing One is the code that configures the NPE-167 board I/O. Once
configured, I/O can be used by higher level RTOS and API functions. Once we figured out
where the I/O changes go, we needed to turn our attention to discovering and setting up the
mastertimer.
BSP set up the master timer for us because we were using default timer 6. Setup code
for this timer is located in cstart.a66 and rtxc main.c. Listing Two is a snippet of the
RTXC-specific code. After analyzing the architecture requirements, we discovered that the
only file to change for porting the NPE-167 board was cstart.a66. Granted, we knew we
would have to change other files as well, but those files are application specific.
This brought us to the third step, which was straightforward because we knew what
needed to be changed and where. Recall that all changes for basic porting functionality
occurred in cstart.a66. We also needed to write the code for initialization. We wrote code to
initialize the switches to handle CAN—but no other code— to deal with it because it is not
used in the basic port. For specifics, look at cstart.a66 and search for npe and rtxc labels to
find code changes specific to this port. Keep in mind, when porting to new hardware you
may want to adopt a similar strategy for partitioning the code for hardware- and RTOS-
specific changes. That is because partitioning code through the use of labels helps with code
maintainability.
Test Code
4
Finally, we needed to create some test code to test our port. Building the test code
application was a two-step process:
We compiled the RTXC kernel into a library object (rtxc.lib).
We compiled the test code and link in rtxc.lib to create the executable.
There are two directories for generating the test code, and they are stored at the
same level in the hierarchy. Moreover, all files for creating rtxc.lib are located in the kernel
directory. Alternatively, test code-specific files are located in the Porting directory.
The RTXCgen utility creates a set of files corresponding to each RTOS component.
For instance, application queues are defined in three files: cqueue.c, cqueue.h, and
cqueue.def. The same holds true for tasks, timers, semaphores, mailboxes, and the rest.
Changes to the number of RTOS components are handled by this utility. For example, if
we wanted to change the number of tasks used by the test code, we use RTXCgen to do it.
Figure 2 shows the contents of the task definition file for the test code application. Test code
files created by RTXCgen are placed in the Porting directory. Once RTXCgen has defined
the system resources, we are ready to build the project.
Creating the executable test code requires the build of two subprojects—the kernel
and test code. We performed builds using the Keil Microvision IDE (http://www.keil.com/).
Keil uses project files (*.prj files) to store its build information. RTXC kernel creation
consists of building the code using the librtxc.prj file located in the kernel directory.
Evoking the librtxc project compiles, links, and creates a librtxc object in the kernel
directory. Building the test code is accomplished using the NpeEg.prj file stored in the
Porting directory. Invoking the NpeEg project compiles and links files in the Porting
directory, and links the librtxc object in the kernel directory. The resulting executable is
then placed in the Porting directory as well. Once the test code was fully built, we were
ready to test the board port.
The test code is a simple application used to validate the porting process. Most of the
test code is located in main.c located in the Porting directory. The application works by
starting five tasks—two user and three system. User tasks execute alternatively, while
system tasks execute in the background. One user task begins running. It then outputs data
via one of the system tasks to the console. Next, it signals the other to wake up, and it puts
itself to sleep, thus waiting for the other task to signal it to wake up again.
Broadly speaking, there is read only memory (ROM – nowadays that is usually flash
5
memory) and read/write memory (RAM). ROM is where the code and constant data is
stored; RAM is used for variables. However, to improve performance, it is not uncommon
to copy code/data from ROM to RAM on boot up and then use the RAM copy. This is
effective because RAM is normally faster to access than ROM. So, when thinking about
of RTOS footprint, you need to consider ROM and RAM size, including the RAM copy
possibility.
The issue can become more complex. There may be on-chip RAM and external
memory available. The on-chip storage is likely to be faster, so it may be advantageous to
ensure that RTOS code/data is stored there, as its performance will affect the whole
application. In a similar fashion, code/data may be locked into cache memory, which tends
to offer even higher performance.
Compiler optimization
When building code, like an RTOS, the optimization setting applied to the compiler
affect both size and execution speed. Most of the time, code built for highest performance
(i.e. fastest) will be bigger; code optimized to be smaller will run slower. It is most likely
that an RTOS would normally be built for performance, not size.
Although an RTOS vendor, wanting to emphasize the small size of their product, might
make a different choice.
RTOS configuration
Real time operating systems tend to be very configurable and that configuration can
vary the RTOS size drastically. Most RTOS products are scalable, so the memory footprint
is determined by the actual services used by the application. The granularity of such
scalability varies from one product to another. In some cases, each individual service is
optional; in others, whole service groups are included or excluded – i.e. if support for a
particular type of RTOS object (e.g. semaphore) is required, all the relevant services are
included. On a larger scale, other options, like graphics, networking and other connectivity,
will affect the code size, as these options may or may not be needed/included.
Fig:5.1:RTOS Configuration
Runtime library
6
Typically, a runtime library will be used alongside an RTOS; this code needs to be
accommodated. Again, the code, being a library, may scale well according to the needs of a
particular application. Data size issues apart from a baseline amount of storage for
variables, the RAM requirements of an RTOS can similarly be affected by a number of
factors:
Compiler optimization
The number of RTOS objects (tasks, mailboxes, semaphores etc.) used by the
application will affect the RTOSRAM usage, as each object needs some RAM space.
Stack
Normally, the operating system has a stack and every task has its own stack; these
must all be stored in RAM. Allocation of this space may be done differently in each RTOS,
but it can never be ignored.
Dynamic memory
7
RTOS for Image Processing:
The quality and the size of image data (especially 3D medical Segmentation of a human
brain. data) is constantly increasing. Fast and optimally interactive post processing of these
images is a major concern. E. g., segmentation on them, morphing of different images,
sequence analysis and measurement are difficult tasks to be performed. Especially for
segmentation and for morphing purposes level set methods play an important role.
In the case of image segmentation f is a force which pushes the interface towards the
boundary of a segment region in an image. Usually f equals one in homogeneity regions of
the image, whereas f tends to zero close to the segment boundary. The discretization of the
level set model is performed with finite differences on an uniform quadrilateral or
octahedral grid. A characteristic of image processing methods is the above described
multiple iterative processing of data sets. Due to the possible restriction on the number
precision it is possible to work on integer data sets with a restricted range of values, i. e.,
an application specific word length. Furthermore, it is possible to incorporate parallel
execution of the update formulas
8
Fig:5.2: A hardware accelerated system for data Processing: Software and
Hardware Modules
Image processing algorithms as described consists of a complex sequence of
primitive operations, which have to be performed on each nodal value. By combining the
complete sequence of primitive operations into a compound operation it is possible to
reduce the loss of performance caused by the synchronous design approach.
This design approach, which is common to CPU designs as well as for FPGA
designs, is based on the assumption that all arithmetic operations will converge well in
advance to the clock tick, which will cause the results to be post processed. Therefore, the
maximum clock speed of such systems is defined by the slowest combinatorial path. In a
CPU this leads to ’waiting time’ for many operations. Furthermore, there is no need for
command fetching in FPGA designs, which solves another problem of CPU-based
algorithms. Additionally, it is possible to do arbitrary parallel data processing in a FPGA,
so that several nodal values can be updated simultaneously.
The input data rate for CPU-based and FPGA-based applications is determined by
the bandwidth of the available memory interface. A 2562 image results in 64k words,
resulting in 768 kBit data at 12 bit resolution. A CPU with a 16 bit wide data access would
need 128 kByte to store the original image data, without taking the memory for
intermediate results into account. The discussion is not restricted to the information
described in this section, there are many real time examples those can be considered as case
studies for image processing in RTOS. The process for generating a target image for the
QNX CAR platform is described below
9
Fig:5.3: Procedure to generate a QNX CAR platform target image
As part of the installation process for the QNX CAR platform, a workspace was created for
you that contains the scripts and configuration files you'll be using. These files are located in
the following
locations:
Scripts:
For Linux: $QNX_CAR_DEPLOYMENT /deployment/scripts/
For Windows: %QNX_CAR_DEPLOYMENT% \deployment\scripts
where QNX_CAR_DEPLOYMENT is install_location /qnx660/deployment/qnx-car/.
Configuration files:
For Linux: $QNX_CAR_DEPLOYMENT /boards/<platform >/etc/
For Windows: %QNX_CAR_DEPLOYMENT% \boards\<platform >\etc
10
2. Extract a BSP. For detailed instructions, see “Building a BSP ”.
3. Create an output directory where you want to have the image generated.
You must specify a valid directory name; the directory must exist prior to running the
mksysimage.py script, otherwise the image won't be generated.
The mksysimage.py utility generates images for various configurations. For example, for
SABRE
Lite, image files are created for SD and SD/SATA:
imx61sabre-dos-sd-sata.tar
imx61sabre-dos-sd.tar
imx61sabre-os.tar
imx61sabre-sd-sata.img
imx61sabre-sd.img
11