Teller 10

A Voice-Commandable Robotic Forklift Working Alongside
Humans in Minimally-Prepared Outdoor Environments

Seth Teller Matthew R. Walter Matthew Antone Andrew Correa Randall Davis
Luke Fletcher Emilio Frazzoli Jim Glass Jonathan P. How Albert S. Huang
Jeong hwan Jeon Sertac Karaman Brandon Luders Nicholas Roy Tara Sainath
Abstract— One long-standing challenge in robotics is the

realization of mobile autonomous robots able to operate safely
in existing human workplaces in a way that their presence is
accepted by the human occupants. We describe the development
of a multi-ton robotic forklift intended to operate alongside
human personnel, handling palletized materials within existing,
busy, semi-structured outdoor storage facilities.
The system has three principal novel characteristics. The first
is a multimodal tablet that enables human supervisors to use
speech and pen-based gestures to assign tasks to the forklift,
including manipulation, transport, and placement of palletized
cargo. Second, the robot operates in minimally-prepared, semi-
structured environments, in which the forklift handles variable
palletized cargo using only local sensing (and no reliance on
GPS), and transports it while interacting with other moving Fig. 1. (left) The platform is a stock 2700 kg Toyota lift truck that we
vehicles. Third, the robot operates in close proximity to people, developed into (right) an autonomous vehicle that operates outdoors in
proximity to people; a military supervisor stands nearby. A safety driver
including its human supervisor, other pedestrians who may
may sit in the cabin, but does not touch the controls.
cross or block its path, and forklift operators who may climb
inside the robot and operate it manually. This is made possible
by novel interaction mechanisms that facilitate safe, effective
operation around people. commandable by military personnel without burdensome
We describe the architecture and implementation of the training. The robot also has to operate in a way that is
system, indicating how real-world operational requirements acceptable to existing military personnel with their current
motivated the development of the key subsystems, and provide
operational practices and culture.
qualitative and quantitative descriptions of the robot operating
in real settings. This paper presents the architecture and implementation of
the robotic forklift system arising from our efforts (Fig. 1).
I. I NTRODUCTION The system has a number of noteworthy aspects:
Motivated by a desire for increased automation of logistics • Autonomous operation in dynamic, minimally-prepared,
operations, we have developed a voice-commandable au- real-world environments, outdoors on uneven terrain
tonomous forklift capable of executing a limited set of com- without reliance on precision GPS, and in close prox-
mands to approach, engage, transport and place palletized imity to people;
cargo in a minimally-structured outdoor setting. • Speech understanding in noisy environments;
Rather than carefully preparing the environment to make it • Indication of robot state and imminent actions to by-
amenable to robot operation, we are developing a robot capa- standers;

ble of operating in existing human-occupied environments, • Supervisory gestures grounded in a world model com-
such as military Supply Support Activities (outdoor ware- mon to human and robot; and
houses). The robot has to operate safely outdoors on uneven • Robust, closed-loop pallet manipulation using only local
terrain, without specially-placed fiducial markers, guidewires sensing.

or other localization infrastructure, alongside people on foot, These characteristics enable the forklift to operate safely
human-driven vehicles, and eventually other robotic vehicles, and effectively despite challenging operational requirements,
and amidst palletized cargo stored and distributed according and differentiate our work from existing logistic automation
to existing conventions. The robot would also have to be approaches. Current warehouse automation systems [1] are
designed for permanent storage and distribution facilities,
Correa, Davis, Fletcher, Glass, Huang, Roy, Teller, and Walter where indoor environments may be highly prepared and
are at the Computer Science and Artificial Intelligence Laboratory; kept free of people, and substantial prior knowledge may
Frazzoli, How, Jeon, Karaman, and Luders are at the Laboratory
for Information and Decision Systems; MIT, Cambridge MA, USA. be assumed of manipuland placement and geometry. Some
Antone is at BAE Systems, Burlington MA, USA. Sainath is at IBM work has correspondingly focused on forklift control [2],
T.J. Watson Research Center, Yorktown Heights NY, USA. and pallet recognition [3], [4] and manipulation [5]–[7] for
limited pallet types and environment classes. In contrast, our We recognize that an early deployment of the robot would
vehicle is designed to operate in the dynamic, unstructured, not match the capability of an expert human operator. Our
and human-occupied facilities that are typical of the military mental model for the robot is a “rookie operator,” which
supply chain, and to handle cargo pallets with differing behaves cautiously and asks for help with difficult maneu-
geometry, appearance, and loads. vers. Thus, whenever the planner cannot identify a safe action
More generally, substantial attention has focused on devel- toward the desired goal, the robot can signal that it is “stuck”
oping mobile manipulators capable of operating in dynamic and request supervisor assistance. When the robot is stuck,
environments. Much of this work has focused on the prob- the human supervisor can either use the remote interface to
lems of planning and control [8]–[10], which are non-trivial abandon the current task, or any nearby human can climb
for a robot with many degrees of freedom and actuators into the robot’s cab and guide it through the difficulty via
exerting considerable force and torque. Others have studied ordinary manned operation. The technical challenges here
sensing in the context of object manipulation using tactile include designing the drive-by-wire system to seamlessly
feedback [11] or computer vision [12] to learn grasps [13] transition between unmanned and manned operation, and
and to manipulate articulated objects [14]. Researchers have designing the planner to handle mixed-initiative operation.
developed remotely-controlled mobile manipulators [15] and Humans in military warehouse settings expect human
ground robots [16], [17], requiring that the user teleoperate forklift operators to stop whenever a warning is shouted. We
the vehicle, a fundamental difference from our work, which have incorporated a continuously-running “shouted warning
eschews teleoperation in favor of a task-level human-robot detector” into the forklift, which pauses operation whenever
interface [18]. a shouted stop command is detected, and stays paused until
given an explicit go-ahead to continue.
II. D ESIGN C ONSIDERATIONS
Humans have a lifetime of prior experience with one
A number of elements of our system’s design are dictated another, and have built up powerful predictive models of
by the performance requirements of our task. how other humans will behave in almost any ordinary
The forklift must operate outdoors on gravel and packed situation [19]. We have no such prior models for robots,
earth. Thus, we chose to adopt a non-planar terrain rep- which in our view is part of the reason why humans are
resentation and a full 6-DOF model of chassis dynamics. uncomfortable around robots: we do not have a good idea of
We used an IMU to characterize the response of the forklift what they will do next. A significant design priority is thus
to acceleration, braking, and turning along paths of varying the development of subsystems to support social acceptance
curvature when unloaded and loaded with various masses. of the robot. We added an “annunciation subsystem” that uses
The forklift requires full-surround sensing for obstacle visible and audible cues to announce the near-term intention
avoidance. We chose to base the forklift’s perception on lidar of the robot to any human bystanders. The robot also uses
sensors, due to their robustness and high refresh rate. We this system to convey its own internal state, such as the
added cameras to provide situational awareness to a (possibly perceived number and location of any bystanders.
remote) human supervisor, and to support future vision-based
object recognition. We developed an automatic multi-sensor III. M OBILE M ANIPULATION P LATFORM
calibration method to bring all lidar and camera data into a
common coordinate frame. Our robot is based upon a Toyota 8FGU-15 manned
The forklift requires an effective command mechanism forklift (Fig. 1), a rear wheel-steered, liquid-propane fueled
usable by military personnel after minimal training. We lift truck with a gross vehicle weight of 2700 kg and a lift
chose to develop an interface based on spoken commands capacity of 1350 kg. We chose the Toyota vehicle for its
and stylus gestures made on a handheld tablet computer. relatively small size and the presence of electronic control of
Commands include: summoning the forklift to a specified some of the vehicle’s mobility and mast degrees of freedom,
area; picking up a pallet by circling its image on the tablet; which facilitated our drive-by-wire modifications.
and placing a pallet at a location indicated by circling. We devised a set of electrically-actuated mechanisms
To enable the system to accomplish complex pallet- involving servomotors to bring the steering column, brake
handling tasks, we currently require the human supervisor to pedal, and parking brake under computer control. A solenoid
break down complex commands into high-level subtasks (i.e., serves to activate the release latch to disengage the parking
not teleoperation). For example, to unload a truck, the super- brake. (Putting the parking brake under computer control is
visor must summon the forklift to the truck, indicate a pallet essential, since OSHA regulations [20] dictate that the park-
to pick up, summon the forklift to the pallet’s destination, ing brake be engaged whenever the operator exits the cabin;
and indicate to the forklift where on the ground the pallet in our setting, the robot sets the parking brake whenever it
must be placed. This procedure must be repeated for each relinquishes control to a human operator.) The interposition
pallet on that truck. We call this task breakdown “hierarchical of circuitry into the original forklift wiring permits control of
task-level autonomy.” Our ultimate goal is to reduce the the throttle, mast, carriage, and tine degrees of freedom, and
supervisor burden by making the robot capable of carrying enables detection of any control actions made by a human
out higher-level directives (e.g., completely unloading a truck operator. This detection capability is essential both for safety
pursuant to a single directive). and for seamless human-robot handoff.
Fig. 2. High-level system architecture.
In addition to converting the vehicle to drive-by-wire mounted cameras looking forward, left, right, and rearward
operation, we have added proprioceptive and exteroceptive in order to publish a 360◦ view of the forklift’s surround to
sensors, and audible and visible “annunciators” with which the supervisor’s tablet.
the robot can signal nearby humans. The system’s interface, For each lidar and camera, we estimate the 6-DOF rigid-
perception, planning, control, message publish-subscribe, body transformation relating that sensor’s frame to the body
and self-monitoring software (Fig. 2) runs as several dozen frame (the “extrinsic calibration”) through a chain of trans-
modules hosted on on-board laptop computers communicat- formations including all intervening actuatable degrees of
ing via message-passing over a standard network. A com- freedom. For each lidar and camera mounted on the forklift
modity wireless access point provides network connectivity body, this chain contains exactly one transform; for lidars
with the human supervisor’s handheld tablet computer. mounted on the mast, carriage, or tines, the chain has as
many as four transformations (e.g., sensor-to-tine, tine-to-
A. Proprioception mast, mast-to-carriage, and carriage-to-body).
We equipped the forklift with an integrated GPS/IMU unit
together with encoders mounted to the two (non-steering) C. Annunciation and Reflection
front wheels. The system relies mainly upon dead-reckoning We added LED signage, marquee lights, and audio speak-
for navigation, using the encoders and IMU to estimate short- ers to the exterior of the chassis and carriage, enabling the
term 6-DOF vehicle motion. Our smoothly-varying propri- forklift to “annunciate” its intended actions before carrying
oceptive strategy [21] incorporates coarse GPS estimates them out (§ V-A). The marquee lights also provide a “re-
largely for georeferenced topological localization. The fork flective display,” informing people nearby that the robot is
pose is determined from a tilt-angle sensor publishing to the aware of their presence (§ V-B), and using color coding to
Controller Area Network (CAN) bus and encoders measuring report other robot states.
tine height and lateral shift.
D. Computation
B. Exteroception Each proprioceptive and exteroceptive sensor is connected
For situational awareness and collision avoidance, we to one of four networked quad-core laptops. Three laptops
attached five lidars to the chassis in a “skirt” configuration, (along with the network switch, power supplies and relays)
facing forward-left and -right, left, right, and rearward, each are mounted in an equipment cabinet affixed to the roof, and
angled slightly downward so that the absence of a ground one is mounted behind the forklift carriage. A fifth laptop
return would be meaningful. We also attached five lidars in located in the operator cabin provides a diagnostic display.
a “pushbroom” configuration high up on the robot, oriented The supervisor’s tablet constitutes a distinct computational
downward and looking forward, forward-left and -right, and resource, maintaining a wireless connection to the forklift,
rearward-left and -right. We attached a lidar to each fork interpreting the supervisor’s spoken commands and stylus
tine, each scanning a half-disk parallel to and slightly above gestures, and displaying diagnostic information (§ IV-A).
that tine for pallet detection. We attached a lidar under the
chassis, scanning underneath the tines, allowing the forklift E. Software
to detect obstacles when cargo obscures the forward-facing We use a codebase originating in MIT’s DARPA Urban
skirts. We attached two vertically-scanning lidars outboard Challenge effort [22]. A low-level message-passing proto-
of the carriage in order to see around a carried load. We col [23] provides publish-subscribe inter-process commu-
attached beam-forming microphones oriented forward, left, nication among sensor handlers, the perception module,
right, and rearward to sense shouted warnings. Finally, we planner, controller, interface handler, and system monitoring
Fig. 3. A notional military warehouse layout.
(a) A pallet pickup gesture appears in red.

and diagnostic modules (Fig. 2). An “operator-in-the-cabin”
detector, buttons on the supervisor tablet, and a radio-
controlled kill switch (E-stop) provide local and remote
system-pause and system-stop capabilities. The tablet also
maintains a 10 Hz “heartbeat” connection with the forklift,
which pauses after several missed heartbeats.
F. Robot System Integrity
The architecture of the forklift is based on a hierarchy
of increasingly complex and capable layers. At the lowest
level, kill-switch wiring disables ignition on command. Next, (b) Lidar returns (red) within the resulting volume of interest.
a programmable logic controller (PLC) uses a simple relay Fig. 4. (a) The pallet indication gesture and (b) the lidar returns in the
ladder program to enable the drive-by-wire circuitry and volume of interest. Successful engagement does not require that the gesture
the actuator motor controllers from their default (braking) enclose the entire pallet and load.
state. The PLC requires a regular heartbeat signal from
the higher-level software and matching signals from the
actuator modules to enable drive-by-wire control. Higher A. Summoning and Manipulation Commands
still, the software architecture is designed with redundant The human supervisor directs the forklift using a Nokia
safety checks distributed across several networked computers N810 internet tablet that recognizes spoken commands and
that, upon detecting a fault, cause the bot to enter a “paused” sketched gestures [18]. Our SUMMIT library [24] handles
state. These safety checks include a number of inter-process speech recognition for summoning. Spoken commands are
heartbeat messages, such as a 50 Hz autonomy state message currently limited to a small set of utterances directing
without which all actuation processes default to a stopped movement, such as “Come to receiving.” The supervisor
(braking) state. Additional processes monitor sensor and indicates a target pallet for manipulation using a rough
inter-process communication timing and, upon detecting any circling gesture (Fig. 4(a)). The interface echoes each gesture
fault, bring the robot to a safe stopped state. as a cleaned-up closed shape, and publishes a “volume of
IV. M INIMALLY-P REPARED E NVIRONMENTS interest” corresponding to the interior of the cone emanating
from the camera and having the captured gesture as its
The forklift operates in outdoor environments with mini-
planar cross section (Fig. 4(b)). The volume of interest need
mal physical preparation. Specifically, we assume only that
not contain the entire pallet for engagement to succeed. A
the warehouse consists of adjoining regions. We capture the
similar gesture, made on a truck bed or on empty ground,
approximate GPS perimeter of each region and its military
indicates the location of a desired pallet placement. Gesture
designation (e.g., “receiving,” “storage,” and “issuing”), as
interpretation is thus context dependent.
well as a pair of “summoning points” that specify a rough
location and orientation for points of interest within each
B. Obstacle Detection
region and near each pallet bay in storage (Fig. 3). We
also specify GPS waypoints along a simple road network Obstacle detection is implemented using the skirt lidars,
connecting the regions. This data is provided statically to with an adaptation of the obstacle detection algorithm used
the forklift as part of an ASCII configuration file. on the DARPA Urban Challenge vehicle [22]. Returns from
The specified GPS locations need not be precise; their all lidars are collected in a smoothly-varying local coordinate
purpose is only to provide rough goal locations for the robot frame [21], clustered based on spatiotemporal consistency,
to adopt in response to summoning commands. Our naviga- and published (Fig. 2). The lidars are intentionally tilted
tion methodology [21] emphasizes local sensing and dead- down by 5 degrees, so that they will generate range returns
reckoning. Subsequent manipulation commands are executed from the ground when no object is present. The existence
using only local sensing, and thus have no reliance on GPS. of “infinite” range data then enables the detector to infer
V. O PERATION IN C LOSE P ROXIMITY TO P EOPLE
The robot employs a number of mechanisms intended
to increase overall safety. By design, all potential robot
trajectories conclude with the robot coming to a complete
stop (even though this leg of the trajectory may not always
be executed, particularly if another trajectory is chosen).
Consequently the robot moves more slowly when close to
obstacles (conservatively assumed to be people). The robot
also signals its internal state and intentions, in an attempt to
make people more accepting of its presence and more easily
able to predict its behavior [18].
A. Annunciation of Intent
The LED signage displays short text messages describing
current state (e.g., “paused” or “fault”) and any imminent
actions (e.g., forward motion or mast lifting). The marquee
Fig. 5. An approaching pedestrian causes the robot to pause. Lights
skirting the robot indicate distance to obstacles (green:far to red:close). lights encode forklift state as colors, and imminent motion
Verbal annunciators and signage indicate the induced pause. as moving patterns. Open-source software converts the text
messages to spoken English for broadcast through the audio
speakers. Text announcements are also exported to the tablet
environmental properties from failed returns (e.g., from ab- for display to the supervisor.
sorptive material). The consequence of the downward orien-
tation is a shorter maximum range, around 15 meters. Since B. Awareness Display
the vehicle’s speed does not exceed 2 m/s, this still provides The forklift also uses its annunciators to inform bystanders
7-8 seconds of sensing horizon for collision avoidance. that it is aware of their presence. Whenever a human is
To reject false positives from the ground (at distances detected in the vicinity, the marquee lights, consisting of
greater than the worst case ground slope), we require that strings of individually addressable LEDs, display a bright
consistent returns be observed from more than one lidar. region oriented in the direction of the detection (Fig. 5). If
Missing lidar returns are filled in at a reduced range to satisfy the estimated motion track is converging with the forklift, the
the conservative assumption that they arise from a human LED signage and speakers announce “Human approaching.”
(assumed to be 30 cm wide).
C. Autonomy Handoff
Pedestrian safety is central to our design choices. Though
lidar-based people detectors exist [25]–[27], we opted to When a human closely approaches the robot, it pauses
avoid the risk of misclassification by treating all objects for safety. (A speech recognizer runs on the forklift to
of suitable size as potential humans. The robot proceeds enable detection of shouted phrases such as “Forklift stop
slowly around stationary objects. Pedestrians who approach moving,” which also cause the robot to pause.) When a
too closely cause the robot to pause (Fig. 5), indicating as human (presumably a human operator) enters the cabin
such to the pedestrian. and sits down, the robot detects his/her presence in the
cabin through the report of a seat-occupancy sensor, or any
uncommanded press of the brake pedal, turn of the steering
C. Lidar-Based Servoing
wheel, or touch of the mast or transmission levers. In this
Picking up a pallet requires that the forklift accurately event, the robot reverts to behaving as a manned forklift,
insert its tines into the pallet slots, a challenge for a 2700 kg ceding autonomy.
forklift when the pallet’s pose and insert locations are not
VI. D EPLOYMENT AND R ESULTS
known a priori and when pallet structure and geometry
vary. Additionally, when the pallet is to be picked up from We deployed our system in two test environments con-
or placed on a truck bed, the forklift must account for figured as military Supply Support Activities (SSAs), in the
the unknown pose of the truck (distance from the forklift, general form shown in Fig. 3. These outdoor warehouses
orientation, and height), on which the pallet may be recessed. included receiving, bulk yard, and issuing areas connected by
Complicating these requirements is the fact that we have a simple road network. The bulk yards contained a number
only coarse extrinsic calibration for the mast lidars due to the of alphanumerically-labeled pallet storage bays.
unobservable compliance of the mast, carriage, and tines. We An Army staff sergeant, knowledgeable in military lo-
address these challenges with a closed-loop perception and gistics and an expert forklift operator, acted as the robot
control strategy that regulates the position and orientation of supervisor. In a brief training session, she learned how to
the tines based directly on lidar observations of the pallet provide speech and gesture input to the tablet computer, and
and truck bed. use its PAUSE and RUN buttons.
No No No
Fig. 7. Output of the pallet estimation algorithm during engagement of

a pallet on a truck bed. The figure shows a positive detection and the
corresponding estimate for the pallet’s pose and slot geometry based upon
the lidar returns for the region of interest (in pink). Insets at lower right
show scans within the interest volume that the system correctly classified
as not arising from a pallet face; these scans were of the truck bed and
undercarriage.
Fig. 6. (top) During a testing session, the robot navigates from a

stationary position around rows of cones and palletized cargo. (bottom) prediction error did not exceed 35 cm.
The robot rounds the first row of cones, identifying a tree of feasible paths
and executing an obstacle-free trajectory (magenta) through the perceived We also tested the robot’s ability to accomplish com-
obstacle field (red, with black penalty regions) to a target pose (green). manded motion to a variety of destination poses in the
vicinity of obstacles of varying sizes. When the route was
feasible, the forklift identified and executed a collision-free
A. Path Planning and Obstacle Avoidance route to the goal. For example, Fig. 6 shows an obstacle-free
The most basic mobility requirement for the robot is to trajectory through a working shuttle parking lot, including
move safely from a starting pose to its destination pose. pallets, traffic cones, pedestrians, and vehicles. Some actually
The path planning subsystem (Fig. 2) adapts the navigation feasible paths were erroneously classified as infeasible, due
framework developed at MIT for the DARPA Urban Chal- to a 25 cm safety buffer surrounding each detected obstacle.
lenge vehicle [22], [28]. The navigator identifies a waypoint We also tested the robot’s behavior when obstructed by a
path through the warehouse route network. A closed-loop pedestrian (a mannequin), in which case the robot stops and
prediction model incorporates pure pursuit steering con- waits for the pedestrian to move out of the way.
trol [29] and PI speed control. This prediction model may
represent general classes of autonomous vehicles; in this B. Pallet Engagement: Estimation and Manipulation
case, we developed a specific model for the dynamics of A fundamental capability of our system is its ability
our forklift platform. The motion planner uses the predic- to engage pallets, both from the ground and from truck
tion model to grow rapidly-exploring random trees (RRT) beds. With uneven terrain supporting the pallet and vehicle,
of dynamically feasible and safe trajectories toward these unknown truck geometry, variable unknown pallet geometry
waypoints [28]. The controller executes a selected trajectory and structure, and variation in load, successfully localizing
progressing toward the destination waypoint (Fig. 6). These and engaging the pallet is a challenging problem.
trajectories are selected in real-time to minimize an appro- Given the volume of interest arising from the supervisor’s
priate objective function, and are safe by construction. The gesture (Fig. 4(b)), the robot must detect the indicated
closed-loop nature of the algorithm [30] and the occasional pallet and locate the insertion slots on the pallet face. The
use of re-planning mitigate any disturbances or modeling estimation phase proceeds as the robot scans the volume of
errors that may be present. interest with the tine-mounted lidars by varying mast tilt and
A key performance metric for the navigation subsystem height. The result is a set of planar scans (Fig. 7). The system
is the ability to closely match the predicted trajectory with then searches within individual scans to identify candidate
the actual path, as significant deviations may cause the returns from the pallet face. We use a fast edge detection
actual path to become infeasible (e.g., due to obstacles). strategy that segments a scan into returns that form edge
During normal operation in several outdoor experiments, we segments. The detection algorithm then classifies sets of
recorded 97 different complex paths of varying lengths (6 m these weak “features” as to whether they correspond to a
to 90 m) and curvatures. For each, we measured the average pallet, based upon a rough prior on general pallet structure.
and maximum error between the predicted and actual vehicle When a pallet is detected, the module estimates its pose,
pose over the length of the path. In all cases, the average width, depth, and slot geometry. A similar module uses scans
prediction error did not exceed 12 cm, while the maximum from the vertical lidars to detect the truck bed and estimate
its pose relative to the robot. We also directed the robot to perform impossible tasks,
After detecting the target pallet and estimating its position such as lifting a pallet whose inserts were physically and
and orientation, the vehicle proceeds with the manipula- visually obscured by fallen cargo. In this case, the forklift
tion phase of pallet engagement. In order to account for paused and requested supervisor assistance. In general, such
unavoidable drift in the vehicle’s position relative to the assistance can come in three forms: the supervisor can
pallet, the system reacquires the pallet several times during command the robot to abandon the task; a human can modify
its approach. Finally, the vehicle stops about 2 m from the the world to make the robot’s task feasible; or a human
pallet, reacquires the slots, and servos the tines into the slots can climb into the forklift cabin and operate it through
using the filtered lidar scans. the challenging task. (In this case, we manually moved the
We tested pallet engagement in a gravel lot with pallets obstruction and resumed operation.)
of different types and with different loads. Using the tablet
interface, we commanded the forklift to pickup palletized E. Lessons Learned and Future Work
cargo off of the ground as well as a truck bed from a variety While our demonstrations were judged successful by mil-
of initial distances and orientations. Detection typically suc- itary observers, the prototype capability is crude. In oper-
ceeds when the forklift starts no more than 7.5 m from the ational settings, the requirement that the supervisor break
pallet, and the angle of the pallet face normal is no more down each complex task into explicit subtasks, and explicitly
than 30◦ off of the forklift’s initial heading. In 69 trials issue a command for each subtask, would likely become
in which detection succeeded, engaging pallets of various burdensome. We are working on increasing the robot’s auton-
types from the ground and a truck bed succeeded 64 times; omy level, for example, by enabling it to reason about higher-
the 5 engagement failures occurred when the forklift’s initial level tasks. Moreover, our robot is not yet capable of the sort
lateral offset from the pallet was more than 3 meters. of manipulations exhibited by expert human operators (e.g.,
lifting the edge of a pallet with one tine to rotate or reposition
C. Shouted Warning Detection it, gently bouncing a load to settle it on the tines, shoving
Preliminary testing of the shouted warning detector was one load with another, etc.).
performed with five male subjects in an outdoor gravel lot We learned a number of valuable lessons from testing
on a fairly windy day (6 m/s average wind speed), with with real military users. First, pallet indication gestures
wind gusts clearly audible in the array microphones. Sub- varied widely in shape and size. The resulting conical region
jects were instructed to shout either “Forklift stop moving” sometimes included extraneous objects, causing the pallet
or “Forklift stop” under six different operating conditions: detector to fail to lock on to the correct pallet. Second,
idling (reverberant noise); beeping; revving engine; moving people were spontaneously accommodating of the robot’s
forward; backing up (and beeping); and moving with another limitations. For example, if a speech command or gesture
truck nearby backing up (and beeping). Each subject shouted was misunderstood, the supervisor would cancel execution
commands under each condition (typically at increasing and repeat the command; if a shout wasn’t heard, the shouter
volume) until successful detection occurred. All subjects would repeat it more loudly. This behavior is consistent with
were ultimately successful under each condition; the worst the way a human worker might interact with a relatively
case required four attempts from one subject during the inexperienced newcomer.
initial idling condition. Including repetitions, a total of 36 Recognition of shouted speech in noisy environments
shouted commands were made, of which 26 were detected has received little attention in the speech community, and
successfully on the first try. The most difficult operating presents a significant challenge to current speech recognition
condition occurred when the engine was being revved (low technology. From a user perspective, it is likely that a user
SNR), resulting in five missed detections and the only two may not be able to remember specific “stop” commands, and
false positives. The other two missed detections occurred that the shouter will be stressed, especially if the forklift does
when the secondary truck was active. not respond to an initial shout. From a safety perspective, it
may be appropriate for the forklift to pause if it hears anyone
D. End-to-End Operation shout in its general vicinity. Thus, we are collecting a much
The robot was successfully demonstrated outdoors over larger corpus of general shouted speech, and aim to develop
two days in June 2009 at Fort Belvoir in Virginia. Under a capability to identify general shouted speech, as a precursor
voice and gesture command of a U.S. Army Staff Sergeant, to identifying any particular command. In addition, we are
the forklift unloaded pallets from a flatbed truck in the also exploring methods that allow the detection module to
receiving area, drove to a bulk yard location specified ver- adapt to new audio environments through feedback from
bally by the supervisor, and placed the pallet on the ground. users.
The robot, commanded by the supervisor’s stylus gesture Rather than require a GPS-delineated region map to
and verbally-specified destination, retrieved another indicated be supplied prior to operation, we are developing the
pallet from the ground and placed it on a flatbed truck in the robot’s ability to understand a narrated “guided tour” of the
issuing area. During operation, the robot was interrupted by workspace as an initialization step. During the tour, a human
shouted “Stop” commands, pedestrians (mannequins) were would drive the forklift through the workspace and speak the
placed in its path, and observers stood and walked nearby. name, type, or purpose of each environmental region as it
is traversed, perhaps also making tablet gestures to indicate [10] D. Berenson, J. Kuffner, and H. Choset, “An optimization approach
region boundaries. The robot would then infer region labels to planning for mobile manipulation,” in Proc. IEEE Int’l Conf. on
Robotics and Automation (ICRA), May 2008, pp. 1187–1192.
and travel patterns from the tour data. [11] R. Brooks et al., “Sensing and manipulating built-for-human environ-
ments,” Int’l J. of Humanoid Robotics, vol. 1, no. 1, pp. 1–28, 2004.
VII. C ONCLUSION [12] D. Kragic, L. Petersson, and H. I. Christensen, “Visually guided
We have demonstrated a proof-of-concept of an au- manipulation tasks,” Robotics and Autonomous Systems, vol. 40, no.
2–3, pp. 193–203, August 2002.
tonomous forklift able to perform rudimentary pallet manip- [13] A. Saxena, J. Driemeyer, J. Kerns, C. Osondu, and A. Y. Ng,
ulation outdoors in an unprepared environment. Our design “Learning to grasp novel objects using vision,” in Proc. Int’l Symp. on
and implementation strategy involved early and frequent Experimental Robotics (ISER), 2006.
[14] D. Katz and O. Brock, “Manipulating articulated objects with interac-
consultation with the intended users of our system, and tive perception,” in Proc. IEEE Int’l Conf. on Robotics and Automation
development of an end-to-end capability that would be cul- (ICRA), 2008, pp. 272–277.
turally acceptable in its intended environment. We introduced [15] J. Park and O. Khatib, “Robust haptic teleoperation of a mobile
manipulation platform,” in Experimental Robotics IX, ser. STAR
a number of effective mechanisms, including hierarchical Springer Tracts in Advanced Robotics, M. Ang and O. Khatib, Eds.,
task-level autonomy, “robot’s-eye-view” gestures indicating 2006, vol. 21, pp. 543–554.
manipulation and placement targets, manipulation of variable [16] T. Fong, C. Thorpe, and B. Glass, “PdaDriver: A handheld system for
remote driving,” in Proc. IEEE Int’l Conf. Advanced Robotics, July
palletized cargo, annunciation of intent, continuous detection 2003.
of shouted warnings, and seamless handoff between manned [17] M. Skubic, D. Anderson, S. Blisard, D. Perzanowski, and A. Schultz,
and unmanned operation. “Using a hand-drawn sketch to control a team of robots,” Autonomous
Robots, vol. 22, no. 4, pp. 399–410, May 2007.
[18] A. Correa, M. R. Walter, L. Fletcher, J. Glass, S. Teller, and
ACKNOWLEDGMENTS R. Davis, “Multimodal interaction with an autonomous forklift,” in
We gratefully acknowledge the support of the U.S. Army Proc. ACM/IEEE Int’l Conf. on Human-Robot Interaction (HRI),
Osaka, Japan, March 2010.
Logistics Innovation Agency (LIA) and the U.S. Army [19] B. Mutlu, F. Yamaoka, T. Kanda, H. Ishiguro, and N. Hagita, “Nonver-
Combined Arms Support Command (CASCOM). bal leakage in robots: communication of intentions through seemingly
This work was sponsored by the Department of the Air unintentional behavior,” in Proc. ACM/IEEE Int’l Conf. on Human-
Robot Interaction (HRI), New York, NY, 2009, pp. 69–76.
Force under Air Force Contract FA8721-05-C-0002. Any [20] United States Department of Labor Occupational Safety & Health
opinions, interpretations, conclusions, and recommendations Administration, “Powered industrial trucks – occupational safety
are those of the authors and are not necessarily endorsed by and health standards – 1910.178,” http://www.osha.gov/pls/oshaweb/
owadisp.show document?p table=STANDARDS&p id=9828, 1969.
the United States Government. [21] D. Moore, A. S. Huang, M. Walter, E. Olson, L. Fletcher, J. Leonard,
and S. Teller, “Simultaneous local and global state estimation for
R EFERENCES robotic navigation,” in Proc. IEEE Int’l Conf. on Robotics and
[1] P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating hundreds Automation (ICRA), Kobe, Japan, 2009, pp. 3794 – 3799.
of cooperative, autonomous vehicles in warehouses,” AI Magazine, [22] J. Leonard et al., “A perception-driven autonomous urban vehicle,”
vol. 29, no. 1, pp. 9–19, 2008. J. Field Robotics, vol. 25, no. 10, pp. 727–774, 2008.
[2] T. A. Tamba, B. Hong, and K.-S. Hong, “A path following control [23] A. S. Huang, E. Olson, and D. Moore, “Lightweight communications
of an unmanned autonomous forklift,” Int’l J. of Control, Automation and marshalling for low latency interprocess communication,” MIT,
and Systems, vol. 7, no. 1, pp. 113–122, 2009. Tech. Rep. MIT-CSAIL-TR-2009-041, 2009.
[3] R. Cucchiara, M. Piccardi, and A. Prati, “Focus-based feature extrac- [24] I. L. Hetherington, “PocketSUMMIT: Small-footprint continuous
tion for pallets recognition,” in Proc. British Machine Vision Conf., speech recognition,” in Proc. Interspeech, Antwerp, Aug. 2007, pp.
2000. 1465–1468.
[4] R. Bostelman, T. Hong, and T. Chang, “Visualization of pallets,” in [25] D. Hahnel, D. Schulz, and W. Burgard, “Mobile robot mapping in
Proc. SPIE Optics East Conference, Oct. 2006. populated environments,” Advanced Robotics, vol. 17, no. 7, pp. 579–
[5] D. Lecking, O. Wulf, and B. Wagner, “Variable pallet pick-up for 597, 2003.
automatic guided vehicles in industrial environments,” in Proc. IEEE [26] J. Cui, H. Zha, H. Zhao, and R. Shibasaki, “Laser-based detection and
Conf. on Emerging Technologies and Factory Automation, May 2006, tracking of multiple people in crowds,” Computer Vision and Image
pp. 1169–1174. Understanding, vol. 106, no. 2-3, pp. 300–312, 2007.
[6] J. Roberts, A. Tews, C. Pradalier, and K. Usher, “Autonomous hot [27] K. O. Arras, O. M. Mozos, and W. Burgard, “Using boosted features
metal carrier-navigation and manipulation with a 20 tonne industrial for the detection of people in 2D range data,” in Proc. IEEE Int’l
vehicle,” in Proc. IEEE Int’l Conf. on Robotics and Automation Conf. on Robotics and Automation (ICRA), Rome, Italy, Apr. 2007,
(ICRA), Rome, Italy, 2007, pp. 2770–2771. pp. 3402–3407.
[7] M. Seelinger and J. D. Yoder, “Automatic visual guidance of a forklift [28] Y. Kuwata, J. Teo, G. Fiore, S. Karaman, E. Frazzoli, and J. P. How,
engaging a pallet,” Robotics and Autonomous Systems, vol. 54, no. 12, “Real-time motion planning with applications to autonomous urban
pp. 1026–1038, December 2006. driving,” IEEE Trans. Control Systems Technology, vol. 17, no. 5, pp.
[8] O. Khatib, K. Yokoi, K. Chang, D. Ruspini, R. Holmberg, and 1105–1118, Sept. 2010.
A. Casal, “Coordination and decentralized cooperation of multiple [29] R. C. Coulter, “Implementation of the pure pursuit path tracking
mobile manipulators,” J. Robotic Systems, vol. 13, no. 11, pp. 755– algorithm,” The Robotics Institute, CMU, Pittsburg, PA, Tech. Rep.
764, 1996. CMU-RI-TR-92-01, Jan. 1992.
[9] O. Brock and O. Khatib, “Elastic strips: A framework for motion [30] B. Luders, S. Karaman, E. Frazzoli, and J. How, “Bounds on tracking
generation in human environments,” Int’l J. of Robotics Research, error using closed-loop rapidly-exploring random trees,” in Proc. IEEE
vol. 21, no. 12, pp. 1031–1052, 2002. American Control Conf. (ACC), Baltimore, MD, June-July 2010.

Teller 10

Uploaded by

Copyright:

Available Formats

Teller 10

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Teller 10

Uploaded by

Copyright:

Available Formats

A Voice-Commandable Robotic Forklift Working Alongside

Humans in Minimally-Prepared Outdoor Environments

Abstract— One long-standing challenge in robotics is the

amenable to robot operation, we are developing a robot capa- standers;

terrain, without specially-placed fiducial markers, guidewires sensing.

(a) A pallet pickup gesture appears in red.

Fig. 7. Output of the pallet estimation algorithm during engagement of

Fig. 6. (top) During a testing session, the robot navigates from a

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.