Real-Time Machine Vision Based Robot: G. Sathya G. Ramesh Chandra
Real-Time Machine Vision Based Robot: G. Sathya G. Ramesh Chandra
Real-Time Machine Vision Based Robot: G. Sathya G. Ramesh Chandra
Abstract
This paper discusses the practical implementation of Ø A USB to RS-232 converter (In case if the PC doesn’t
a real-time machine vision based autonomous mobile robot. have a serial port)
The relevant image processing concepts are implemented § Length of wire should be at least 2 meters
programmatically and the control of the servo motors is done Software requirements and their specifications
using the Basic Stamp 2 microcontroller. Some minor Ø Operating system
concepts regarding the serial-port or RS-232 communication, § Windows XP 32 or 64 bit
and block diagram are discussed. § Service Pack 2 or 3
. Ø Visual Studio .NET 2008 with MSDN
Keywords: AMR, Machine Vision, Target Tracking Ø DirectShow.NET API with AForge.NET wrapper
Ø Webcam device driver
1. Introduction Ø Driver for USB to RS-232 converter (In case if the PC
1.1 Experimental requirements doesn’t have a serial port)
The requirements for this project are divided into These are the minimum hardware and software requirements
two categories. One is the hardware requirements and other is as far as this robot is concerned. Use of advanced image
the software requirements. Both are equally important when it processing and pattern recognition techniques might require
comes to robotics based on machine vision. The requirements high end hardware to process each frame if the frame rate is a
are as listed below. matter of concern.
Hardware requirements and their specifications 1.2 The Block diagram
Ø Parallax Boe-Bot Robot Kit The block diagram illustrates the various stages and
§ Basic Stamp 2 microcontroller sub stages that are involved in the system. The block diagram
§ Board of Education also illustrates the closed loop nature of the system.
§ Robot’s aluminium body
§ Two Parallax servo motors
§ Four 1.5V AA sized battery
§ A RS232 serial cable
Ø A desktop or laptop PC
§ Processor: Intel Pentium4 or higher / AMD Athlon
4400+ dual core or higher
§ Hard-disk: 20 Gb or higher – preferably SATA drive
§ RAM: 512 Mb or higher with good bandwidth for data
transfer
§ Motherboard: A RS232 serial port and at least two
USB ports should be available
Ø A wireless or wired USB web camera
§ Resolution: 320x240 (minimum) or higher
§ Video frame rate: 10 – 15 frames/sec or higher
§ Focusing range: 30mm to infinity
§ Colour: 24 bit (RGB) or higher
§ Length of wire that connects the camera and PC should
be at least 2 meters in case of wired webcam
Copyright © 2009
Paper Identification Number: JAN09-10 Figure 1.2.1: Block diagram of the machine vision
This peer-reviewed journal paper has been robot system
published by the Pentagram Research Centre (P) The CCD (Charged Coupled Device) array is a
Limited. Responsibility of contents of this paper
rests upon the authors and not upon Pentagram
semiconductor device, which captures the incident light and
Research Centre (P) Limited. Copies can be creates corresponding current as the output. Since the output
obtained from the company for a cost. of the CCD video signal is very low, it is to be amplified.
Hence the amplifier strengthens the video signal that comes
61
Real-Time Machine Vision Based Robot
from the CCD. Since the video signal is purely analog in the object is on the left of the image or frame with respect to
nature, it is converted to digital by the analog to digital the reference co-ordinate and vice-versa when the error is
converter. All these three stages reside within the camera. positive. So the output of the comparator is only the error and
Hence a dashed box that encapsulates these three stages is its direction, which is a 1-D signal.
shown in the figure 1.2.1. All the image processing and tracking operations are
After the conversion of the analog video to digital, it performed on a computer programmatically which is
is time to process it. Since we have designed image processing represented by a dashed box in the figure 1.2.1 encapsulating
filters that are meant to be used in spatial domain, we need to both image processing and tracking stages.
convert the incoming digital video signal to a signal that can Once the servo controlling unit receives the error
be represented in spatial domain. Hence we introduce a digital signal from the computer (from comparator in other words), it
buffer memory that stores the incoming signal in each memory sends appropriate control pulses to the servo motors. Based on
location having their respective addresses. Each memory the direction, the desired servo motor will be activated. As
location represents a pixel’s location and intensity value. outlined earlier, the control pulses are nothing but PWM
Hence in this manner, the signal is now a spatial signal on pulses.
which spatial filters can now be applied. The buffer also acts 1.3 The Experimental Setup
as an accumulator for the video frames if the digital spatial The experimental setup deals with the arrangements
filters are no synchronous with the video camera. The spatial can connections that are made in this project. The three main
Gaussian filter is of 3x3 size and having an impulse response equipments in this project are the camera, the robot kit and the
sequence as discussed in chapter 4. Since first pass doesn’t computer. The assembling of the robot kit after its purchase is
prove to be effective on the noise which is Gaussian, the not mentioned as it is already covered in its manual. Figure
spatial signal is passed through another cascaded Gaussian 1.3.1 shows the setup required for object tracking.
filter that helps in reducing the amplitude of the noise further
and thereby making it negligible.
The colour segmentation spatial filter can be
compared to a bandpass filter for a 1-D signal. Each colour in
an image can be attributed to a unique frequency of light. For
example, colour red in the visible region of the
electromagnetic spectrum is of lowest frequency and the
colour violet has high frequency. The colours between violet
and red have frequencies that lie between the frequencies of
red and violet. The colour segmentation filter allows certain
frequencies of light that are of our interest and rejects the rest.
The range of colour values in this case belongs to
the object that is to be tracked. Since the light distribution over
the body of the object is not constant, a small range of values
are taken such that its variation is small. The word
segmentation implicitly tells that the object is discriminated Figure 1.3.1: Setup for object tracking
from the background on the basis of colour. Hence the output The camera is physically attached to the front side of
of this filter is a binary image or frame. The object in the the robot. However, in the figure 7.3.1 the camera is shown
image is represented by white pixels as foreground and the separately for our understanding purposes. The camera is
rest by black pixels as background. Hence we’ll be getting a connected to the PC using a special serial data bus called the
stream of binary images or frames from the output of the Universal Serial Bus (USB).
colour segmentation filter. The BS2 microcontroller that resides on the robot is
Next comes the dilation filter that operates on the interfaced to a serial RS-232 port. Hence the connection
incoming binary images. As discussed in chapter 3, dilation between the microcontroller and the PC is done using the RS-
filter grows the image. In this process of growth, the gaps or 232 serial communication cable. Again for the sake of
holes on the object which represent the background are understanding the connections, the microcontroller and the
eliminated so that the object looks like a continuous rigid servo motors are all shown separately in the figure 1.3.1
body. This makes tracking easier. Image processing ends here though they are fixed to the robot.
and the tracking stage comes next.
In the tracking stage we make use of a filter that 2. Image Processing And Tracking Using PC
extracts the foreground pixel co-ordinates. Next, the The video stream that comes from the camera is fed
foreground pixel co-ordinates which represent the object are to the computer through USB port. The video stream is first
averaged in the pixel averaging filter where the average of all decoded and decompressed digitally. The encoding depends
the co-ordinates of the pixels that represent the body is on the type of the camera uder use. Cheap cameras have low
calculated to give the centroid of the object. This is done for quality CCD compared to the expensive ones and thus the
each frame. compression that is used also would vary accordingly. The
The comparator compares of the centroid with a bandwidth of USB communication is 1.5 to 480 Mbps. The
reference and gives the error of the relative position of the bandwidth in turn depends on the type of USB controller used.
object with respect to the reference. Errors can be positive or Cheap USB controllers do not support high bandwidth.
negative along both x and y axes. But here, we consider only The calculations for a video of resolution 640X480
error along x-axis. If the error is negative, then the centroid of are given in table 2.1.
62
International Journal of Systemics, Cybernetics and Informatics (ISSN 0973-4864)
Table 2.1: Calculation for 640X480 video resolution Interface which is a set of functions, classes, protocols,
Input Parameters interfaces etc, that are used by applications or programs. For
Pixel size: 640 x 480 example, the Windows API called WIN32 is used by almost
Frame rate: 30 FPS many user programs like Microsoft Word, Excel, Powerpoint,
Colour depth: 8 bit per colour etc., to create graphical user interface (GUI), access files in
Calculation memory and other user interactions. The user program simply
One frame calls the functions or classes present in the API and pass some
Uncompressed size: 921.6 KB variables or parameters, and get the job done.
Moving Image In the case of this project, I made use of third party
Pixel rate: 9.22 Mhz APIs called DirectShow.NET and AForge.NET because there
Uncompressed bitrates: 221.18 Mbps or 27.65 MBps is no managed API for DirectShow. It means that DirectShow
Required storage: 1 second: 27.65 MB is made only for C++ only and not for .NET. The interface
30 seconds 829.44 MB ISampleGrabber is used to grab the individual frames.
1 minute: 1.66 GB Because of the complexity involved in implementing
5 minutes: 8.29 GB DirectShow.NET, we made use of AForge.NET framework
1 Hour: 99.53 GB library that uses Directshow but exposes functions or methods,
From the above calculations, an uncompressed video classes or interfaces that can be easily used to capture the
needs large memory (frame buffer) for temporary storage and incoming frames and process them.
large data transfer bandwidth. If the resolution is high, then we The codes that are developed for carrying out
may expect more memory and bandwidth. Hence it is required various operations in this project are discussed in the form of
to compress the video stream rather than raw data. If the Pseudo-Code. Before we discuss the pseudo code, knowledge
compression algorithm is lossy, then we may lose some high of Bitmap Class is a must since it is used. A bitmap is a type
quality details. Some cameras support compressions like of memory organization or image file format used to store
MPEG4 and other do not. Usually, an uncompressed video is digital images. The term bitmap comes from the computer
of lower resolution. Hence, one has to go in for a trade off programming terminology, meaning just a map of bits, a
when it comes to compression or resolution. spatially mapped array of bits. Bitmap commonly refers to the
The memory that is used to temporarily store the similar concept of a spatially mapped array of pixels. Raster
frames is called Frame Buffer. Typically the frame buffer is images in general may be referred to as bitmaps whether
stored in the memory chips on the video adapter. In some synthetic or photographic, in files or in memory. When
instances, however, the video chipset is integrated into the coming to .NET, a Bitmap is class used to work with images
motherboard design, and the frame buffer is stored in general defined by pixel data. The library can be used under the name
main memory. The main memory is nothing but RAM present space in C#.NET as using System.Drawing.Bitmap Figure 2.2
inside the computer. Figure 2.1shows how RAM is used as shows a binary bitmapped image. The left half of this diagram
frame buffer. shows the bits in the bitmap, and the right half depicts what
would show on screen.
63
Real-Time Machine Vision Based Robot
64
International Journal of Systemics, Cybernetics and Informatics (ISSN 0973-4864)
65
Real-Time Machine Vision Based Robot
one bit - or slightly more or less than one bit, that depends on
the modulation technique used. So the bit bate (bps) and baud
rate (baud per second) have this connection: bps = baud per
second x the number of bit per baud. The number of bit per
baud is determined by the modulation technique. Here are two
examples: When FSK ("Frequency Shift Keying", a
transmission technique) is used, each baud transmits one bit;
only one change in state is required to send a bit. Thus, the
modem's bps rate is equal to the baud rate: When we use a
baud rate of 2400, we use a modulation technique called phase
modulation that transmits four bits per baud. So: 2400 baud x
4 bits per baud = 9600 bps. Such modems are capable of 9600
Figure 3.1: RS-232 communication socket bps operation.
Table 3.1: Pin description of DB-9 Now that BS2 receives and reads the characters
DB-9 pin Description from the RS-232 port, it has to send PWM signals to the
1 Data Carrier Detect (DCD) corresponding servo motor. For this, we have to use the
PBasic command PULSOUT. The syntax is as given below.
2 Receive Data (RD)
Syntax: PULSOUT Pin, Duration. Its function is to generate a
3 Transmit Data (TD) pulse on Pin with a width of Duration.
4 Data Terminal Ready (DTR) § Pin is a variable/constant/expression (0 - 15) that
Signal Ground (SG) specifies the I/O pin to use. This pin will be set to output
5
mode.
6 Data Set Ready (DSR) § Duration is a variable/constant/expression (0 - 65535)
7 Request To Send (RTS) that specifies the duration of the pulse. The unit of time
8 Clear To Send (CTS) for Duration is described below.
PULSOUT sets Pin to output mode, inverts the state of that
9 No Connection (NC)
pin; waits for the specified Duration; then inverts the state of
the pin again; returning the bit to its original state. The unit of
Figure 3.1 shows the pinouts of the two styles of PC serial Duration is described above. The following example will
ports and how to connect them to the BASIC Stamp's I/O pin generate a 100 µs pulse on I/O pin 5:
(the 22 kΩ resister is not needed if connecting to the SIN pin). PULSOUT 5, 50 ‘generate a pulse on pin 5
Though not normally needed, the figure also shows loop back The polarity of the pulse depends on the state of the pin before
connections that defeat hardware handshaking used by some the command executes. In the example above, if pin 5 was
PC software. It should be noted that PC serial ports are always low, PULSOUT would produce a positive pulse. If the pin was
male connectors. Asynchronous serial communication relies high, PULSOUT would produce a negative pulse.
on precise timing. Both the sender and receiver must be set for If the pin is an input, the output state bit, won't necessarily
identical timing, usually expressed in bits per second (bps) match the state of the pin. What happens then? For example:
called baud. On BS2, SERIN requires a value called Pin 7 is an input and pulled high by a resistor as shown below.
Baudmode that tells it the important characteristics of the Suppose that pin 7 is low when we execute the instruction:
incoming serial data; the bit period, number of data and parity PULSOUT 7, 5 ' generate a pulse on pin 7
bits, and polarity. Figure 3.2 shows the sequence of events on that pin.
Difference between Bit rate and Baud rate Initially, pin 7 is high. Its output driver is turned off (because
The difference between the two is complicated and it is in input mode), so the 10 kΩ resistor sets the state on the
intertwining. They are dependent and inter-related. But the pin. When PULSOUT executes, it turns on the output.
simplest explanation is that a Bit Rate is how many data bits
are transmitted per second. A baud Rate is the measurement of
the number of times per second a signal in a communications
channel changes. Bit rates measure of the number of data bits
(that's 0's and 1's) transmitted in one second in a
communication channel. A figure of 2400 bits per second
means 2400 zeros or ones can be transmitted in one second,
hence the abbreviation "bps". Individual characters (for
Figure 3.2: Sequence of events on pin 7
example letters or numbers) which are also referred to as bytes
For BS2, the unit for pulse duration is 2µs and the
are composed of several bits. A baud rate, by definition,
means the number of times a signal in a communications maximum pulse width is 131.07ms. The entire schematic of
channel changes state or varies. For example, a 2400 baud rate the Servo control is as shown in figure 3.3. The right servo
means that the channel can change states up to 2400 times per motor is connected to PIN 13 and left servo is connected to
second. The term "change state", means that it can change PIN 12 of BS2.
from 0 to 1 or from 1 to 0 up to X (in this case, 2400) times Pseudo-code for servo control
per second. It also refers to the actual state of the connection, The following Pseudo-code is meant for programming in
PBasic programming language.
such as voltage, frequency or phase level). The main
Step1: Start
difference of the two is that one change of state can transmit
66
International Journal of Systemics, Cybernetics and Informatics (ISSN 0973-4864)
67
Real-Time Machine Vision Based Robot
68
International Journal of Systemics, Cybernetics and Informatics (ISSN 0973-4864)
Figure 4.1.14
Figure 4.1.10
Figure 4.1.15
Figure 4.1.11 Legend:
Figure Description
8.1.8 Rear view of the robot showing RS-232 connection.
8.1.9 Side view of the robot fitted with a web camera
Top view of the robot showing the BS2
8.1.10
microcontroller and the Board of Education
Bottom view of the robot showing two Parallax
8.1.11
Servo motors attached to the wheels
Robot connected to HP Pavillion DV-5 1106AX
8.1.12
laptop through serial port
8.1.13 Laptop screen shows the object tracking video frame
Red LED on the robot glows indicating that the
8.1.14
object is lost and hence no tracking
Green LED on the robot glows indicating that the
8.1.15
object has been found and the tracking has started
Figure 4.1.12
69
Real-Time Machine Vision Based Robot
70