1 s2.0 S0921889022001373 Main

Robotics and Autonomous Systems 158 (2022) 104236
Contents lists available at ScienceDirect
Robotics and Autonomous Systems

journal homepage: www.elsevier.com/locate/robot
HAPTR2: Improved Haptic Transformer for legged robots’ terrain

classification
∗
Michał Bednarek , Michał R. Nowicki, Krzysztof Walas
Division of Robotics, Institute of Robotics and Machine Intelligence, Poznan University of Technology, 3A Piotrowo St., Poznan, 60-965, Poland
article info a b s t r a c t
Article history: The haptic terrain classification is an essential component of a mobile walking robot control system,
Available online 22 August 2022 ensuring proper gait adaptation to the changing environmental conditions. In practice, such compo-
nents are a part of an autonomous system and thus have to be lightweight, provide fast inference time,
Keywords:
and guarantee robustness to minor changes in recorded sensory data. We propose transformer-based
Legged robots
Deep learning methods HAPTR and HAPTR2 terrain classification methods that use force and torque measurements from feet
Data sets for robot learning to meet these requirements. For reliable comparison of the proposed solutions, we adapt two classical
machine learning algorithms (DTW-KNN and ROCKET), one temporal convolution network (TCN), and
use the state-of-the-art CNN-RNN. The experiments are performed on publicly available PUTany and
QCAT datasets. We show that the proposed HAPTR and HAPTR2 methods achieve accuracy on par
or better than state-of-the-art approaches with a lower number of parameters, faster inference time,
and improved robustness to input signal distortions. These features make HAPTR and HAPTR2 excel
in terrain recognition tasks when considering real-world requirements.
© 2022 Published by Elsevier B.V.
1. Introduction In non-reactive approaches, we need to perform terrain iden-

tification to perform an action. One of the approaches is the
Nowadays, we observe more and more applications of walking terrain classification with touch, as presented in Fig. 1. It is
robots. In these new deployments, the mobility of legged robots a well-researched topic, with most solutions focusing on force
is a crucial advantage over wheeled platforms that struggle to op- and torque signals from the sensors mounted on the robot feet
erate properly in human-made environments with non-flat areas, [5–8]. These methods focus on obtaining the best possible results
e.g., stairs. These robots also excel in natural settings like caves, measured as the accuracy on the registered dataset. We consider
as shown by the winning teams of the DARPA SubT Challenge. accuracy to be only a part of the whole picture. The method
The ANYBOTICS’s ANYmal [1] or Boston Dynamics’ Spot are great not only must achieve a satisfactory terrain classification, but
examples of legged robots’ maturity, and we can expect that they it has to be done quickly to allow the walking robot’s dynamic
will be deployed in significant numbers soon. But the real-world action. Moreover, it has to be performed with limited processing
scenarios require an increasing level of autonomy [2], especially resources as an autonomous walking robot has multiple modules
when it comes to the walking capabilities, i.e., robustness to all that operate simultaneously to achieve desired robustness to
real-world challenges. Finally, the distinct terrain classification
possible environments. Moreover, the legged platforms require
module is valuable source of information for improving localiza-
the algorithms to work with a limited computational and energy
tion [9] and motion planning [10]. Therefore, there is still a need
budget.
for separate classification blocks as there are applications where
One of the challenges of the real-world operation of the legged
end-to-end solutions are not suitable.
robots is the negotiation of the unknown and unexplored terrain
To meet these requirements, we propose HAPTR and HAPTR2
where the robot has to adapt its gait to the changing environ-
efficient deep learning methods based on transformers. In or-
mental conditions. It is possible to plan and predict the system’s
der to determine if the proposed solution is a good fit for real
behavior in advance in many scenarios. However, there are cases
deployment, we propose new evaluation methods targeting pre-
when the robot has to react to unexpected environmental mea- viously omitted aspects like inference time and robustness. To
surements [3]. Such reactive behavior was achieved in [4] by provide a baseline for these comparisons, we provide the results
presenting the robot’s motion as a terrain function. obtained with three adapted methods ranging from classical ma-
chine learning approaches to the convolutional neural network.
∗ Corresponding author. Moreover, we provide results from other state-of-the-art solu-
E-mail address: michalbednarek.e@gmail.com (M. Bednarek). tions whenever possible and compare the results on two publicly
https://doi.org/10.1016/j.robot.2022.104236
0921-8890/© 2022 Published by Elsevier B.V.
M. Bednarek, M.R. Nowicki and K. Walas Robotics and Autonomous Systems 158 (2022) 104236
only currents from motors and contact forces of quadrupedal

robot leg. Extracted features from input signals were passed to
AdaBoost [12] algorithm, which revealed that a Ground Reaction
Force (GRF) is one of the essential properties describing a terrain
under a moving leg. In [13], authors proposed classifying terrains
on tiny robots, where a GRF was directly measured from the
designed miniature array of sensors. Direct measurements of
the interaction between the robot and ground substrate were
described in [14]. Based on the terrain class, the quadruped robot
adjusted its Center of Gravity to accommodate terrain properties
changes. The step further in the use of terrain classification is
to allow the robot to adapt its gate to changing traction con-
ditions [15]. This line of work could be extended to predicting
the risk of collapse of the structures negotiated by the legged
machine [16].
Most of the research focuses on different types of hard materi-
als, but there are also soft substrates in dataset used in our work.
One of the papers, which is explicitly mentioning soft ground,
is [17]. The authors provide the method of in situ measuring the
terrain parameters. The more analytical approach for the robots
dealing with soft materials was presented in [18] and extended
Fig. 1. ANYmal robot collecting haptic force/torque (F/T) signals with its com- to gait adaptation in [19]. Moreover, terrain classification plays
pliant feet when walking on different terrains. Terrain classification is crucial a vital role in biology-inspired works. In [20] the changes of the
for gait adaptation to ensure the robot’s stability and has to be performed on a
impedance of leg-ground interaction were used for this purpose.
resource-constrained processing unit within a short amount of time. Therefore,
we propose transformer-based HAPTR and HAPTR2 methods to meet these However, the works which use CPG [21] or investigate flowable
requirements. grounds negotiation [22] might benefit from a classification mod-
ule to modulate the walking signals based on terrain information.
The terrain classification is also an essential component of space
available datasets. Our contribution can therefore be summarized rovers, starting from the foundational works on Terramechanics
as: from Mieczysław Gregory Bekker and the latest results such as on
a deep-learning, terrain classification system for Mars [23].
• The proposed transformer-based deep learning models for Terrain classification with other modalities like vision [24] or
terrain classification, called HAPTR and HAPTR2, are acoustic sensors [25] is feasible yet less accurate than direct force
lightweight, provide fast inference times, and are robust to measurements. For terrain classification, the fusion of three dif-
changes in registered signals. ferent modalities, vision, depth, and touch, was described in [26].
• The addition of Modality Attention Layer (MAL) as a way Moreover, tactile data allowed for self-supervised visual terrain
of combining force and torque signals, registered at robot learning in [27].
feet, which increases the robustness of terrain classification Recently in [5], authors analyzed tactile signals from impact
in real-world scenarios. motions of the quadrupedal robot’s leg to classify different soil
• The novel evaluation method of terrain classification algo- characteristics. In [8], authors proposed a recurrent neural net-
rithms inspired by autonomy requirements and based on work with convolutional blocks that achieved state-of-the-art
accuracy, accuracy on the worst terrain, inference time, and results in the terrain recognition task. However, that method
robustness. was able to work with fixed-length input data only. In [7], that
• The transparent evaluation of the proposed and several problem was revisited, and the masking mechanism for con-
adapted methods on two publicly available terrain classi- volutional neural networks was proposed to manage variable-
fication datasets to create a baseline for further works to length signals. In [6], authors also used raw and variable-length
compare and overcome. data recorded during walking sessions on different robots. Their
terrain classification method achieved better performance than
All the code and datasets to reproduce our results was made frequency-domain classifiers while requiring fewer annotations
publicly available.1 due to the semi-supervised ML approach.
All the above approaches focus on achieving the highest pos-
2. Related work sible accuracy on the selected dataset. In our work, we advo-
cate using other performance measures crucial for real-world
We divided our related work section into two subsections deployment of legged systems, i.e., number of network param-
presenting the existing approaches used for terrain recognition eters, inference time on the limited computational resources and
for mobile robots and the general time series classification ap- robustness to signal degradation.
proaches that could be applied to the terrain classification prob-
lem. 2.2. Time series classification
2.1. Terrain recognition for legged robots Time-series data mining has become an emerging field over
the past decades due to the increased availability of temporal
One of the first contemporary approaches to terrain recog- data [28]. We can approach the problem by looking at the sig-
nition for walking robots was presented in [11], where authors nal properties. The authors of [29] compared several similarity
tackled the problem of blindly estimating terrain properties using measures in the distance-based time series classification task,
showing that no individual distance measure is significantly bet-
1 https://github.com/kolaszko/haptic_transformer ter than the Dynamic Time Warping (DTW) [30]. Based on that
2
distance, a popular K-Nearest Neighbors (KNN-DTW) classifier Temporal convolutional network (TCN). A convolution-based TCN
could be adapted for time-domain signals, which is considered model for the sequence prediction initially proposed in [34] fits
as a strong baseline in the field. our requirements while handling longer sequences than recurrent
Recently, learning-based approaches started to play a signifi- neural networks. It does not contain any form of memory or
cant role in the time series classification field, including Recurrent recurrence while providing state-of-the-art results. The original
Neural Networks (RNN) with Convolutional blocks (CNN) or Fully TCN returns a sequence, which does not apply to our task. There-
Convolutional Neural Networks (FCN). In [31], authors proposed fore, we equipped it with the Multi-Layered Perceptron (MLP)
the InceptionTime classifier based on the Inception-v4 [32] ar- layer at the end to predict terrain classes. We also evaluated
chitecture achieving state-of-the-art scalability with decreased several configurations of the TCN that differ when it comes to
training time. A similar approach, called RandOm Convolutional the number of convolution levels (LE), hidden units per each level
KErnel Transform (ROCKET), was proposed in [33]. The authors (HI), and the number of hidden neurons in the (MLP):
showed that their model achieves state-of-the-art accuracy while
maintaining a relatively short training time by using multiple • Light - LE=4, HI=8, MLP=128,
convolution kernels instead of stacking an ensemble of classifiers. • Base - LE=8, HI=16, MLP=256,
In [34], authors presented the Temporal Convolutional Network • Large - LE=16, HI=25, MLP=256.
(TCN) as a convolutional alternative for recurrent neural net-
The TCN implementation was based on the original PyTorch
works with a comprehensive comparison to the state-of-the-art
code provided by the authors of [34].
recurrent models.
Despite the undeniable success of transformers [35] in nat-
3.2. Improved haptic transformer (HAPTR2)
ural language processing [36], image recognition [37] or object
detection [38], to the best of our knowledge, no prior works
The Haptic Transformer comes from our previous work [39],
used transformers in the terrain classification in the real-world
and it presents an initial adaptation of the transformer-based
scenario with a walking robot.
neural networks to the terrain classification problem. But with
3. Methods a piece of domain knowledge about the problem itself, we were
able to optimize its architecture to provide an improved Haptic
The following section demonstrates the adapted state-of-the- Transformer called HAPTR2, as presented in Fig. 2.
art approaches for the time-series classification to the terrain The main novelty of the proposed solution is the Modality
classification problem. Secondly, we describe an improved ver- Attention Layer (MAL) based on the Multi-Head Attention mecha-
sion of our HAPtic TRansformer (HAPTR2) model that was initially nism introduced in [35]. Firstly, input time series are split accord-
described in [39]. In addition, the list of all improvements that led ing to modalities (i.e., forces and torques signals measured at the
to the creation of HAPTR2 was included. robot feet are processed separately) and passed to 1D convolu-
tional layers. They, in turn, transform multi-dimensional signals
3.1. Adapted methods included in the comparison into their flattened modality representations of the same length
as inputs. Learnable linear layers process and shape each modality
Convolutional neural network with recurrent modules (CNN-RNN). representation to so-called queries (Q), keys (K), and values (V)
CNN-RNN model was initially described in [7], and improved for the dot product attention layer. There are as many queries
in [8] to achieve the best results in the terrain classification on as time samples in the input signal. Each sample is weighted
the PUTany dataset. The most recent version of this approach between
√ keys (represented by keys) and scaled by the factor
described in [9] enhances its performance by using residual layers (1/ dk ), where dk is the dimensionality of multiplied queries and
with masking and two bidirectional layers (Bidir) with two GRU keys. Therefore, the activation of the MAL layer is described with
cells in each layer. equation:
Dynamic time warping K-nearest neighbors (DTW-KNN). DTW- QK T

Attention(Q , K , V ) = softmax( √ )V . (1)
KNN [29] was used as a baseline for our considerations. It uses dk
the DTW algorithm to match query signal (input time series) to
the database of signals recorded for each terrain and compute The keys are composed of dk × dk matrix where rows represent
the distance between these sequences. For each classification, we a d − th modality with d = 1, 2, . . . , dk . Therefore, the closer the
determine the three closest matches (k=3) to the query signal query is to the corresponding key, the higher weight is associated
by matching a training sample to the whole database. A training with that modality. When force and torques are used as two
sample is then classified if the majority (at least 2) of matched modalities the dk is equal to 2 and keys are composed of 2 × 2
sequences from the database represent the same terrain class. In matrix. Finally, a softmax is applied on the scaled dot-product
our work, we used an efficient implementation of DTW-KNN in to obtain a probability distribution. Fig. 3 presents the principle
Python available in Sktime [40] library. of operation of the MAL. Typically, in transformer architectures,
the self-attention layer produces weights between all time steps,
Random convolutional kernel transform (ROCKET). ROCKET [33] e.g., in Natural Language Processing, to reveal the contextual
uses a large number of convolutional kernels to transform input relations between the first and other words in the sentence.
time series into a feature space. The learned feature space is used However, in our work, we organized the attention layer to find
as an input to a classical, trained linear classifier. In contrast importance weights between time steps and entire modalities
to artificial neural networks, kernels’ weights are not learnable. rather than between all pairs of time steps in the input signal.
Hence, the computational cost of training is low. The initial design Apart from the MAL, a complete list of changes in HAPTR2
was to work primarily with univariate signals. Therefore, in our from the previous version of the HAPTR model includes:
work, we transformed each axis of force/torque signal separately
to a feature space using 10 000 random kernels. Once in a feature • an improved learning rate scheduling [41] was used,
space, we concatenated different information channels to form • an output from the MAL was concatenated with original
one feature sequence, then passed to the linear classifier. To input signals by channels axis,
the best of our knowledge, no prior work in robotics used that • an average pooling layer replaced a mean operation before
method before for terrain recognition. an MLP classification layer,
3
Fig. 3. The visual explanation of the Modality Attention Layer used in our
experiments. We used it to increase the robustness of the HAPTR model.
methods as each robot, and each experimental setup was utterly

different. Fortunately, there are currently two publicly available
datasets, PUTany and QCAT, that we used to evaluate proposed
approaches. The following subsections introduce these datasets
and explain how each dataset was used for training and testing
Fig. 2. Improved HAPtic TRansformer is the follow-up model based on [39] with purposes.
Modality Attention Layer (MAL).
4.1. PUTany dataset
• a batch normalization layer was included in an MLP classi- In our experiments we used the PUTany dataset [9] which was
fication layer. recorded during walking sessions of the ANYmal robot on dif-
ferent real-world terrain samples with no additional exploratory
Before describing the data, we would like to draw a line moves. The created map with eight different terrain types was
between identification and classification in our context. Terrain presented in Fig. 4.
Identification: The identification term indicates the presence
Robot and sensor. During the walking session, the ANYmal robot
of multiple stages involving detection, recognition, classification,
presented in Fig. 1 was equipped with sensorized, compliant
and other techniques. The aim behind terrain identification goes
feet [42] that consist of flat contact surfaces with a range of
beyond naming the terrain type. It involves all the information
motion 50◦ for the pitch and 30◦ for the roll axis. The robot was
needed for a robot to traverse and interact with the terrain safely.
walking on a flat surface and a ramp requiring the feet to adapt
Terrain Classification: In machine learning, classification is the
to the terrain type and shape. Force/torque (F/T) sensors placed
process of categorizing a set of data into predefined classes. It
inside the feet were custom-made and can sense up to 1000 N
is the most utilized method in the field of terrain identification.
in the Z direction (along the robot’s leg), 400 N in the ground
However, terrain classification alone does not provide compre-
surface, and up to 10 Nm of torque in each axis at a frequency
hensive information about traversability since it often misses key
400 Hz. We cropped F/T signals to 160 samples registered during
terrain physical parameters. The terrain classification is a step
the instant of contact.
toward robot autonomy but requires an extension to reach the
level of complete terrain identification. Environment. Eight different terrains are used in the experiments
(Fig. 5): carpet, artificial grass, rubber, sand, foam, rocks, ceramic
4. Datasets tiles, and PVC. One can observe that the adaptive foot slopes
differently depending on the terrain type, properties, and shape.
Until recently, the terrain classification algorithms mainly To assure walking stability, we commanded the robot to perform
were evaluated on the sequences explicitly recorded by the au- a statically stable gait with only one leg in a flight phase at a time.
thors of the dataset and were not made available. Without public After a flight phase, all feet are placed on the ground. Then, a
data, it was almost impossible to compare the results of different flight phase starts all over again with a different leg.
4
did not differentiate feet nor did we evaluate performance for

each foot separately. The used dataset is publicly available.2 In
our experiments, we propose to treat forces from all legs as one
modality, while the IMU time series is the second one. However,
other splits are also possible.
5. Results and discussion
We start with a formal introduction of evaluation measures.

Then, we perform the accuracy evaluation and consider a per-
formance based on the model size and processing times. Based
on these results, we chose the most promising solution that
was compared to the state-of-the-art RNNs+FCL solution on two
datasets. The section concludes with the robustness analysis for
the best-proposed method.
Fig. 4. The map with eight terrain classes and a slope was used to register the
PUTany dataset. The ground truth map for data labeling was registered with 5.1. Real-world requirements for the terrain classification
a 3D laser scanner (SURPHASER 100HSX), while the walking robot’s pose was
determined with the OptiTrack system. Colors correspond to different classes:
red — rubber, green — carpet, blue — PVC, black — artificial grass, yellow —
Most commonly, the performance of the supervised terrain
ceramic tiles, brown — sand, dark blue — rocks, gray — foam. (For interpretation recognition algorithms is measured using accuracy (Acc), which
of the references to color in this figure legend, the reader is referred to the web is defined as:
version of this article.) c
Acc = , (2)
t
where c is the number of correct predictions while t is the total
Data description. For the terrain classification task, we randomly
number of predictions. While the accuracy captures an overall
divided recorded terrain samples into 3443 training, 1148 valida-
performance of the solution, the accuracy measure might be
tion, and 1148 test sets, with each training sample containing 160
skewed in favor of recognition of overrepresented classes if the
force and torque measurements (6 signals in total). The dataset
testing dataset is not balanced. We introduce a measure Acc min :
comes from the continuous walking session and thus includes an
uneven number of examples for different classes (Fig. 6 presents Acc min = min(Acc i ), (3)
i
their distribution). The dataset does not differentiate feet nor
test performance with signals of different lengths. The dataset is where Acc i is the accuracy for an ith terrain type, the goal of the
publicly available. Acc min measure is to capture the performance of the classifier on
the most challenging terrain type.
4.2. QCAT dataset The accuracy of the solution captures the ability to recognize
the terrain but does not capture if the method is viable to be
To validate HAPTR models on another state-of-the-art dataset, deployed and used on a real robot. The proposed classification
we chose the QCAT dataset [43]. The dataset was recorded during method must have a low computational burden and fast inference
multiple walking sessions of the quadrupedal, self-configurable time to work in real-time preventing any possible damages. We
robot called DyRET [44]. Measurements come from the Inertial assume that an inference time below 10 ms would satisfy these
Measurement Unit (IMU) sensor mounted in its base and forces needs as the typical frequency of the control loops of the legs
are performed hundreds of times per second. When it comes
from spherical sensors mounted at the tips of the robot’s legs.
to the model size, we want to minimize its size as a genuine
Robot and sensor. DyRET [44] is a four-legged robot with a dy- autonomous robot needs GPU capabilities for a range of other
namic morphology, designed to adapt lengths of its limbs to tasks, e.g., object detection. Therefore apart from general accu-
different terrains. The kinematic chain of its leg was composed racy, accuracy on the most challenging terrain, model size, and
of two rotational joints intended for locomotion, and two, slow- an inference time should be considered when choosing the best
changing prismatic joints for elongating and shortening the leg. terrain classification algorithm.
It had the Xsens MTI-30 IMU mounted in its base. It consists of
a 3-axis gyroscope, a 3-axis accelerometer, and a 3-axis magne- 5.2. Accuracy evaluation on PUTany dataset
tometer. Each foot was equipped with the 3-axis force sensor
Optoforce OMD-20-SH-80N. To compare our method to the public results, we used the
PUTany dataset in a train/test split, where we chose test data
Environment. The QCAT dataset was collected on the CSIRO’s from a different run than the data used for training. The results
QCAT site in Brisbane, Australia, in November 2019. The robot obtained by the classical algorithms (DTW-KNN, ROCKET), classi-
was traversing 6 different terrains: concrete, grass, gravel, mulch cal deep learning solution (TCN), and transformer-based approach
dirt, and sand. Each walking session was composed of 10 trials (HAPTR, HAPTR2) were compared to state-of-the-art CNN-RNN in
at 6 different speeds for 8 steps each. All sensors worked at Table 1.
the frequency of 100 Hz. There was no information about the The obtained results on the test set show that deep learning
inclination of traversed terrains. methods (TCN, HAPTR, HAPTR2, CNN-RNN) achieve better perfor-
Data description. The QCAT dataset consists of 2880 force and mance than any of the classical approaches (DTW-KNN, ROCKET).
IMU samples – 6 terrains × 10 trials × 6 speeds × 8 steps. The most basic and universal DTW-KNN approach achieved the
Each force and IMU sample had 22 dimensions – 4 × 3-axis worst performance (74%) that was improved by the ROCKET
force sensor and 2 × 3-axis angular velocities/linear accelera- (84.9%). The overall satisfactory accuracy of 84.9% for the ROCKET
tions, and 4-axis quaternion representing base’s orientation. Each
signal was 662 values long. Similar to the PUTany dataset, we 2 https://data.csiro.au/collection/csiro:46885v2
5
Fig. 5. Terrain types included in the dataset: carpet (a), artificial grass (b), rubber (c), sand (d), foam (e), rocks (f), ceramic tiles (g), PVC (h). Each terrain gave unique
F/T feedback enabling the classification of these samples.
5.3. Accuracy depending on the model size
The best performance on the chosen dataset is often obtained

with the largest networks that are not practical and can be
considered a dedicated solution for a particular dataset. There-
fore, the applied deep learning community is currently more
interested in efficiently formulating artificial neural networks and
proving that these architectures are more capable than their
predecessors with the same number of parameters like in the
case of EfficientNetV2 [45]. We present a similar analysis with
the accuracy presented in a function of the number of learnable
parameters as presented in Fig. 7. We omitted DTW-KNN in this
Fig. 6. The distribution of terrain samples for the PUTany dataset.
analysis as it is a non-learnable method.
In Fig. 7, one can notice that the CNN-RNN achieved the
highest accuracy, but its substantial number of learnable pa-
Table 1
The accuracy comparison measured on the test set of the rameters would not fit the restricted computational resources
PUTany dataset. of a mobile robot setup. Still, it appears to be more efficient
Method Variant Acc [%] Accmin [%] than the classical TCN approach or even less efficient ROCKET.
DTW-KNN – 74.0 54.4 (PVC) Both transformer-based solutions, HAPTR and HAPTR2, require
ROCKET – 84.9 47.3 (PVC) an order of magnitude fewer parameters to achieve performance
Light 84.5 68.7 (Art. grass) similar to CNN-RNN. The HAPTR2 contains more learnable param-
TCN Base 86.9 65.1 (Art. grass) eters than the HAPTR, as it contains an additional attention mech-
Large 87.5 72.3 (Art. grass) anism, but achieves significantly better results than the HAPTR.
Light 83.3 56.6 (Art. grass) Therefore, we strongly advocate using the family of transformer-
HAPTR Base 90.3 80.7 (Art. grass) based solutions whenever the efficiency of the methods is the
Large 91.7 74.7 (Art. grass)
primary factor.
Light 91.7 80.7 (Art. grass)
HAPTR2 Base 92.2 81.9 (Art. grass)
5.4. Inference time evaluation
Large 92.7 81.9 (Art. grass)
CNN-RNN [9] 93.0 86.7 (Art. grass)
The number of learnable parameters determines the ability
to deploy networks in resource-constrained environments. The
network’s size also impacts the training time, which we might
comes with a poor ability to perform on the most challenging consider irrelevant, as we trained them offline. The remaining
terrain (PVC) of 47.3, which might be insufficient to make any aspect of a chosen model size is inference time. An inference time
action based on the obtained prediction. determines the real-world viability of the proposed solution as
Among the deep learning methods, the respective variants of having a prolonged processing time might be unacceptable from
the HAPTR outperformed the classical approach (TCN). Similarly, the perspective of the control that has been done based on this
the improved HAPTR2 outperformed the HAPTR. For all of these result. Therefore, we gathered the processing times on CPU (Intel
methods, we reported that increasing a network’s size improved i7-9750H @ 2.600 GHz) and GPU (NVIDIA GeForce GTX 1660 Ti
its accuracy, and it also improved Acc min . Nevertheless, neither of Mobile) for all of the considered methods in Table 2 to determine
the presented solutions could overcome the state-of-the-art CNN- if presented solutions could be used on a real robot.
RNN solution that reported the best accuracy of 93.0% with the At first, let us consider a mobile robot equipped with a pow-
best Acc min = 86.7%. erful CPU but no GPU. Results of the accuracy comparison as a
In contrast to [9] or [6], we measured the accuracy without function of mean inference spans were presented in Fig. 8. In such
using cross-validation (CV). In our work, we chose to use a test a case, the DTW-KNN method performed poorly, and it took over
set independent of a training set as we consider it to be the 1600 ms to reach the desired classification result. This inference
only way to truly measure the performance of a network in a time would even increase if we add more signals, making this
setup resembling a real-world operation. The usage of CV might approach slow and unscalable for any real-world deployment.
not reflect this final performance as using testing samples close Also, a too long inference time was observed for ROCKET that
to training ones skews the final results. We compared results returned results in 180 ms, which is too long to implement any
obtained using n-fold CV and the accuracy obtained using an reaction-based behavior based on the obtained results. For the
independent testing sequence to verify our claim. All of the an- deep-learning-based methods, we got significantly lower infer-
alyzed solutions achieved accuracy lower by several percentage ence times. The shortest CPU inference time was reported for the
points using testing sequence than when using n-fold CV. Most HAPTR2. But it is also worth noting that the HAPTR in all variants
notably, even the best method of CNN-RNN reported 94.1% when and the TCN in Light and Base variants meet the previously
using CV [9] while we were only able to obtain 93.0% on the stated requirement of 10 ms terrain classification time. In this
independent testing sequence. comparison, the state-of-the-art CNN-RNN solution did not meet
6
Fig. 7. The accuracy as a function of a number of parameters reveals the efficiency of the applied method. Notice that the axis of parameters is in the logarithmic
scale. In this context, HAPTR2 significantly outperforms other algorithms.
Fig. 8. The accuracy of deep learning models as a function of a mean inference time on CPU. Notice that the axis of an inference time is in a logarithmic scale.
Table 2 time needed for calculations and related tasks. It includes the
Inference time for a single sample for evaluated terrain classification time of transmitting data between the main and GPU’s memory,
algorithms.
and this overhead is present in all the measurements of inference
Method Variant CPU [ms] GPU [ms]
time included in the comparison. The reduced time is visible
DTW-KNN 1619.86 ± 11.11 – only for the TCN in Large that was reduced to the accepted
ROCKET 180.62 ± 13.05 –
threshold. Much to our surprise, the inference time of the CNN-
Light 1.37 ± 0.29 1.86 ± 0.05 RNN increased on the GPU. We assume that the poor optimization
TCN Base 2.97 ± 0.24 3.37 ± 0.04
Large 16.82 ± 0.91 6.43 ± 0.06
of the RNN model on this particular GPU architecture was the root
cause. Another probable reason would be sequential processing in
Light 1.43 ± 0.09 1.61 ± 0.15
HAPTR Base 2.91 ± 0.22 2.73 ± 0.06 recurrent units, which cannot fully utilize a GPU computational
Large 5.60 ± 0.59 4.87 ± 0.15 power — GPUs were designed primarily for parallel processing.
Light 1.38 ± 0.03 1.53 ± 0.11 Eventually, the HAPTR2 appeared to be the most efficient method
HAPTR2 Base 2.25 ± 0.05 2.13 ± 0.05 based on the accuracy and the inference time on a CPU or GPU.
Large 3.99 ± 0.09 3.39 ± 0.05
CNN-RNN [9] 11.68 ± 0.83 30.60 ± 5.30 5.5. The choice and analysis of the best solution
Based on the results presented in the previous subsections, it

is clear that the choice of the terrain classification method should
our criteria, exceeding the set threshold for the inference time, not be performed solely on its accuracy result. When we consider
thus making it not suitable for a real-time operation. the accuracy on the most challenging terrain, model size, and
The results for a robot equipped with a GPU were presented inference time, we get a complex decision process that might
in Fig. 9, showing the trade-off between accuracy and inference yield several correct choices depending on a hardware platform.
time on a GPU. The implementation of classical methods (DTW- Overall, we indicate that the HAPTR2 family provides the best
KNN and ROCKET) were not GPU-friendly and thus were not efficiency regarded as the best choice for the trade-off between
presented in the plot. For deep-learning solutions, we saw similar the accuracy and remaining requirements. Therefore, for further
trends to the processing performed on a CPU. Surprisingly, we analysis, we chose HAPTR2Light . Consequently, we present the
did not observe acceleration in most cases when using a GPU. confusion matrix of this approach in Fig. 10 to provide more
In our experiments, we defined the inference time as the total insight into observed performance.
7
Fig. 9. The accuracy as a function of a mean inference time on GPU. The axis of time is in the logarithmic scale.
Table 3
Classification accuracy of the HAPTR2Light models
trained with and without the additional modality
attention module.
HAPTR2Light Acc [%] Accmin [%]
with MAL 91.7 80.7 (Art. grass)
without MAL 91.3 76.0 (Art. grass)
5.6. Robust robotic perception
The robot operating in the real-world environment has to

adapt to the changing conditions that cannot be predicted and
thus trained before the deployment. To achieve the desired gen-
eralization ability and robustness, we equipped our HAPTR2Light
model with the Modality Attention Layer. The MAL is responsible
for dynamic adaptation of the weights of sensing modalities. This
section evaluated our best-performing model with and without
this module to show its influence on predictions. It is important
to note that not all modalities are of equal importance, which we
firstly validated on the PUTany dataset. The HAPTR2Light model
trained solely on torques achieved 88% of accuracy on the test
dataset, while the same model trained on forces only resulted
Fig. 10. The confusion matrix presents the per-class accuracy obtained with the
HAPTR2Light .
in 60.0%. This experiment revealed the existence of a leading
modality in the PUTany dataset. Nevertheless, we achieved the
highest performance for joint predictions with both modalities.
Due to this fact, we consider them complementary.
The class that caused the lowest Accmin for all tested deep
Firstly, as the MAL can work as a standalone module, we
learning models was artificial grass. One can observe that it was
verified the influence of that layer on the overall accuracy perfor-
most often misinterpreted as the rubber — in 9.6% of predictions.
mance of our network. We present the results of the HAPTR2Light
Moreover, a rubber was also the most common mistake for a
model obtained on the PUTany test dataset with and without the
carpet, as the system was wrong in 5.2% of cases. We assume
MAL in Table 3. We observe a minor improvement in the general
that it is due to the designed terrain as a carpet was put on
classification accuracy (0.4%) due to the input modality weighting,
one slope, while artificial grass was placed close to the slope’s
beginning and end while rubber was on the second slope. That but the Accmin metric was higher by 4.7% when the architecture
confirms that terrain recognition is more challenging in non- included the MAL. That indicates that the model is more suited
flat areas. There is still room for further improvements, i.e., data when training data is unbalanced.
augmentation targeting these cases or utilization of orientation Mobile robots operate in various conditions that can influence
sensors to incorporate the information about the inclination to sensory measurements. In most cases, operation in an untrained
predictions. However, the highest classification error among all environment results in inferior performance to the samples famil-
classes was made for the PVC terrain incorrectly recognized as iar to the system. The goal is to design a system that can operate
sand in 10.5% of predictions of that class. The root cause of that despite these changes. We verified the performance of our net-
was the fact that both of them were neighboring each other. work in two simulated scenarios. At first, we changed the robot’s
Hence the sand particles were present in the PVC terrain, which payload which influenced the distribution of forces. In second, we
misled the perception system. As one can observe, the confu- simulated a sensory failure that might occur when a mobile robot
sion matrix is related not only to the similarity between terrain is traversing harsh terrain. We mimicked these cases by adding
but also by their placement in the map, which might be an a particular type of noise to an already normalized measurement
informative cue in a localization task [9]. input that was determined to have zero mean and a unit standard
8
Fig. 11. HAPTR2Light accuracy on the PUTany test dataset, in which a simulated payload was added to the force modality.
deviation. The inspiration for adding the uniform noise in our Table 4
setup was based on a real-world use case of inspecting the mine Classification accuracy of the proposed HAPTR2Light and state-of-the-art
RNNs+FCL measured on the QCAT and PUTany datasets with the 10-fold cross-
tunnels equipped with conveyor belts. Along the way, the robot validation providing mean, standard deviation (SD), and min and max values
was experiencing the noise from a large industrial transformer for each fold. The best results are bolded.
and electric motors, which were the root of the noise on the Dataset Mean [%] SD [%] Min [%] Max [%]
analog sensors. HAPTR2Light 93.85 0.82 92.68 95.29
In the payload change scenario, we modeled an additional PUTany
RNNs+FCL [6] 93.20 0.89 92.06 95.39
increase in mass by adding a bias to forces. Each step consists HAPTR2Light 97.33 1.21 95.49 98.96
of two phases, i.e., a stance and a flight. We assumed that a QCAT
RNNs+FCL [6] 96.60 0.89 95.49 98.61
robot’s leg swings from 60 to 15 degrees from the normal to
the ground during the stance phase. In the following scenario,
we added a simulated payload vector that acts along the gravity
vector and has an increasing length. However, we are aware emergence of public datasets that facilitate impartial comparison
that changes in payload would influence the torques as well, between methods. In our comparison, we decided to compare
but we do not find a convincing method to simulate this effect. our results to the most recent method RNNs+FCL [6]. In that arti-
Firstly, the deep learning model was trained on original data, cle, the authors evaluated the RNNs+FCL method on the PUTany
but its weights remained fixed during the simulation. Then, we dataset and on their dataset QCAT (that was made publicly avail-
added a bias in the range of 0.0 to 2.0 to the input force signal able). To ensure proper comparison, we evaluated our HAPTR2Light
to measure the model’s performance. These values correspond using the same evaluation procedure on both of these datasets
to the robot weight increasing from an initial weight to three using their cross-validation steps with the same data splits. The
times the original weight. Such significant change is unrealistic obtained results are presented in Table 4.
but proves the generalization capabilities of the network. The Our HAPTR2Light outperformed RNNs+FCL on both datasets
obtained accuracy depending on the simulated increase in weight with an accuracy margin of 0.64% for PUTany and 0.73% for QCAT
is presented in Fig. 11. In this simulation, the HAPTR2Light with datasets. The results achieved similar standard deviations thus
the MAL achieved higher accuracy on the PUTany test dataset in proving that most methods would probably work similarly in
a whole range of artificial biases introduced in input modalities. the real-world environment. However, our model is significantly
Moreover, for an unexpected 3× change in weight, we can see smaller comparing the number of parameters and is more suit-
that the accuracy drops just by approximately 7.5% when the MAL able for deploying on a real robot than RNNs+FCL. According to
is not used. the implementation shared by the authors of [6], the RNNs+FCL
In the simulated case of a sensory failure, we add a uniformly consists of 395106 trainable variables with recurrent units, while
distributed noise from a range of 0.0 to 0.25 to sensor measure- our HAPTR2Light had only 12568 weights in total, which is over
ments. Like the previous scenario, we also tested performance 30 times less. Moreover, we measured an inference time equal to
on the PUTany dataset measuring the accuracy at different noise 130.44 ms on a GPU and 38.44 ms on a CPU for the RNNs+FCL.
levels. Fig. 12 shows the accuracy obtained for the data degrada- Similar to [9], the inference on a GPU took longer than on a CPU.
tion scenario. As one can observe, a significant improvement in As previously stated, GPUs are preferable processing units only
the model’s robustness can be noticed for both input modalities, when we process large batches of data. Our experiments focused
achieving over 10% of accuracy improvement for the highest noise on the real-time robotics perspective, which prefers the inference
levels when it used the MAL. Moreover, the HAPTR was more of a single sample to reduce the delay between the measurement
robust to the changes in the force measurements proving that the and the processed result.
torque measurements might have a higher impact on the final
performance of the network. 6. Conclusions
5.7. Comparison to the state-of-the-art The HAPTR and HAPTR2 are novel methods to tackle the
terrain recognition problem with transformer-based neural net-
The terrain classification algorithms for walking robots are work architectures. In our work, we primarily focused on the
mostly incomparable due to the different types of terrain used real-world applicability and compared our approach with mul-
and different hardware platforms. But it is changing due to the tiple data-driven methods, including adapted non-deep learning
9
Fig. 12. HAPTR2Light accuracy on the PUTany test dataset, in which a uniform noise with an increasing range was added to one input modality (a force or a torque)
to simulate a sensor failure.
(DTW-KNN, ROCKET) and deep learning models (TCN, CNN-RNN). [3] A. Roennau, G. Heppner, M. Nowicki, J. Zoellner, R. Dillmann, Reactive
In our benchmark, we paid attention to the accuracy, the number posture behaviors for stable legged locomotion over steep inclines and
large obstacles, in: 2014 IEEE/RSJ International Conference on Intelligent
of learnable parameters, and the inference time of each method.
Robots and Systems, 2014, pp. 4888–4894.
Our tests revealed that the HAPTR2 provides the best trade-off [4] D. Bellicoso, et al., Perception-less terrain adaptation through whole
between the accuracy and the number of parameters directly body control and hierarchical optimization, in: IEEE-RAS International
impacting inference time. We also observed that the inference Conference on Humanoid Robots, 2016, pp. 558–564.
time of state-of-the-art CNN-RNN takes too long to be applied on [5] H. Kolvenbach, C. Bärtschi, L. Wellhausen, R. Grandia, M. Hutter, Haptic
inspection of planetary soils with legged robots, IEEE Robot. Autom. Lett.
a real robot proving the need for broader evaluation than direct
4 (2) (2019) 1626–1632.
accuracy measurement. [6] A. Ahmadi, T. Nygaard, N. Kottege, D. Howard, N. Hudson, Semi-supervised
Moreover, we tackled the robustness of robotic perception gated recurrent neural networks for robotic terrain classification, IEEE
systems and proposed the Modality Attention Layer in HAPTR2. Robot. Autom. Lett. 6 (2) (2021) 1848–1855, arXiv:2011.11913.
By assigning weights to entire modalities (forces, torques, inertial [7] J. Bednarek, M. Bednarek, L. Wellhausen, M. Hutter, K. Walas, What
am I touching? Learning to classify terrain via haptic sensing, in: IEEE
sensor readings) using the dot product attention layer, we let the
International Conference on Robotics and Automation, ICRA, 2019, pp.
model self-attend to relevant parts of an input data stream. It 7187–7193.
resulted in increased robustness of the perception system against [8] J. Bednarek, M. Bednarek, P. Kicki, K. Walas, Robotic touch: Classification of
payload changes and deterioration of signals quality. However, materials for manipulation and walking, in: IEEE International Conference
the MAL was implemented as a universal, standalone module on Soft Robotics, RoboSoft, 2019, pp. 527–533.
[9] R. Buchanan, J. Bednarek, M. Camurri, M.R. Nowicki, K. Walas, M. Fallon,
that possibly could assign weights to any user-defined modalities, Navigating by touch: Haptic Monte Carlo localization via geometric sensing
creating new opportunities for future research. Afterward, to and terrain classification, Auton. Robots 45 (2021) 843–857.
establish a fair comparison with the current state of the art, we [10] D. Belter, J. Wietrzykowski, P. Skrzypczyński, Employing natural terrain
investigated the performance of the HAPTR2 on the QCAT dataset. semantics in motion planning for a multi-legged robot, J. Intell. Robot.
The results showed that the HAPTR2 outperformed the com- Syst. 93 (2019) http://dx.doi.org/10.1007/s10846-018-0865-x.
[11] M.A. Hoepflinger, et al., Haptic terrain classification on natural terrains for
plex RNNs+FCL approach considering accuracy and inference time legged robots, in: Proc. of the 13th International Conference on Climbing
performance while having over 30× less learnable parameters. and Walking Robots, CLAWAR 2010, IEEE, 2010, pp. 785–792.
[12] Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in:
Declaration of competing interest Proc. of the 13th International Conference on Machine Learning, ICML
’96, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 1996, pp.
148–156.
The authors declare that they have no known competing finan-
[13] X.A. Wu, T.M. Huh, R. Mukherjee, M. Cutkosky, Integrated ground reaction
cial interests or personal relationships that could have appeared force sensing and terrain classification for small legged robots, IEEE Robot.
to influence the work reported in this paper. Autom. Lett. 1 (2) (2016) 1125–1132.
[14] X. Li, W. Wang, J. Yi, Ground substrate classification for adaptive quadruped
Acknowledgments locomotion, in: Proceedings - IEEE International Conference on Robotics
and Automation, 2017, pp. 3237–3243.
[15] X.A. Wu, T.M. Huh, A. Sabin, S.A. Suresh, M.R. Cutkosky, Tactile sensing
This work is supported from the European Union’s Horizon and terrain-based gait control for small legged robots, IEEE Trans. Robot.
2020 Research and Innovation programme under grant agree- 36 (1) (2020) 15–27.
ment No 780883, THING. [16] E. Tennakoon, T. Peynot, J. Roberts, N. Kottege, Probe-before-step walking
M. R. Nowicki is supported by the Foundation for Polish Sci- strategy for multi-legged robots on terrain with risk of collapse, in:
Proceedings - IEEE International Conference on Robotics and Automation,
ence (FNP).
2020, pp. 5530–5536.
[17] W. Bosworth, J. Whitney, S. Kim, N. Hogan, Robot locomotion on hard
References and soft ground: Measuring stability and ground properties in-situ, in:
Proceedings - IEEE International Conference on Robotics and Automa-
[1] M. Hutter, et al., Anymal - a highly mobile and dynamic quadrupedal tion, 2016-June, IEEE, 2016, pp. 3582–3589, http://dx.doi.org/10.1109/ICRA.
robot, in: 2016 IEEE/RSJ International Conference on Intelligent Robots and 2016.7487541.
Systems, IROS, 2016, pp. 38–44. [18] S. Fahmi, G. Fink, C. Semini, On state estimation for legged locomotion
[2] A. Bouman, et al., Autonomous spot: Long-range autonomous exploration over soft terrain, IEEE Sens. Lett. 5 (1) (2021) 1–4, arXiv:2101.02279.
of extreme environments with legged locomotion, in: 2020 IEEE/RSJ [19] S. Fahmi, M. Focchi, A. Radulescu, G. Fink, V. Barasuol, C. Semini, STANCE:
International Conference on Intelligent Robots and Systems, IROS, 2020, Locomotion adaptation over soft terrain, IEEE Trans. Robot. 36 (2) (2020)
pp. 2518–2525. 443–457, arXiv:1904.12306.
10
[20] J.C. Arevalo, D. Sanz-Merodio, M. Cestari, E. Garcia, Identifying ground- [39] M. Bednarek, M. Łysakowski, J. Bednarek, M.R. Nowicki, K. Walas, Fast
robot impedance to improve terrain adaptability in running robots, Int. J. haptic terrain classification for legged robots using transformer, in: 2021
Adv. Robot. Syst. 12 (1) (2015) 1, http://dx.doi.org/10.5772/59888. European Conference on Mobile Robots, ECMR, 2021, pp. 1–7.
[21] A.J. Ijspeert, A. Crespi, D. Ryczko, J.-M. Cabelguen, From swimming [40] M. Löning, et al., Sktime: A unified interface for machine learning with
to walking with a salamander robot driven by a spinal cord model, time series, 2019, arXiv arXiv:1909.07872.
Science 315 (5817) (2007) 1416–1420, http://dx.doi.org/10.1126/science. [41] I. Loshchilov, F. Hutter, SGDR: Stochastic gradient descent with warm
1138353, URL https://www.science.org/doi/abs/10.1126/science.1138353, restarts, 2017, arXiv:1608.03983.
arXiv:https://www.science.org/doi/pdf/10.1126/science.1138353. [42] G. Valsecchi, R. Grandia, M. Hutter, Quadrupedal locomotion on uneven
[22] R.D. Maladen, Y. Ding, C. Li, D.I. Goldman, Undulatory swimming terrain with sensorized feet, IEEE Robot. Autom. Lett. 5 (2) (2020)
in sand: Subsurface locomotion of the sandfish lizard, Science 325 1548–1555.
(5938) (2009) 314–318, http://dx.doi.org/10.1126/science.1172490, URL [43] R. Ahmadi, T. Nygaard, N. Kottege, D. Howard, N. Hudson, QCAT legged
https://www.science.org/doi/abs/10.1126/science.1172490, arXiv:https: robot terrain classification dataset, 2020.
//www.science.org/doi/pdf/10.1126/science.1172490. [44] T.F. Nygaard, C.P. Martin, J. Torresen, K. Glette, Self-modifying morphology
[23] A.M. Barrett, M.R. Balme, M. Woods, S. Karachalios, D. Petrocelli, L. Joudrier, experiments with DyRET: Dynamic robot for embodied testing, in: 2019
E. Sefton-Nash, NOAH-H, a deep-learning, terrain classification system for International Conference on Robotics and Automation, ICRA, 2019, pp.
mars: Results for the ExoMars rover candidate landing sites, Icarus 371 9446–9452.
(2022) 114701, http://dx.doi.org/10.1016/j.icarus.2021.114701, URL https: [45] M. Tan, Q.V. Le, EfficientNetV2: Smaller models and faster training, in: M.
//www.sciencedirect.com/science/article/pii/S0019103521003560. Meila, T. Zhang (Eds.), Proceedings of the 38th International Conference
[24] P. Filitchkin, K. Byl, Feature-based terrain classification for LittleDog, in: on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, in:
IEEE International Conference on Intelligent Robots and Systems, IROS, (2) Proceedings of Machine Learning Research, vol. 139, PMLR, 2021, pp.
IEEE, 2012, pp. 1387–1392. 10096–10106.
[25] J. Christie, N. Kottege, Acoustics based terrain classification for legged
robots, in: Proceedings - IEEE International Conference on Robotics and
Michał Bednarek graduated from Poznan University
Automation, 2016-June, IEEE, 2016, pp. 3596–3603.
of Technology (PUT) and received his B.Sc. and M.Sc.
[26] K. Walas, Terrain classification and negotiation with a walking robot, J.
in Automatic Control and Robotics. He is a Ph.D. can-
Intell. Robot. Syst. 78 (3–4) (2015) 401–423.
didate at the Faculty of Automatic Control, Robotics,
[27] L. Wellhausen, A. Dosovitskiy, R. Ranftl, K. Walas, C. Cadena, M. Hut-
and Electrical Engineering at PUT, researching the field
ter, Where should i walk(predicting terrain properties from images via
of robust perception in robotics. Currently, he works
self-supervised learning, IEEE Robot. Autom. Lett. 4 (2) (2019) 1509–1516.
as a Research Assistant in the Institute of Robotics
[28] H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, P.A. Muller, Deep
and Machine Intelligence at PUT. His main point of
learning for time series classification: a review, Data Min. Knowl. Discov.
scientific interest is the development of deep-learning-
33 (4) (2019) 917–963, arXiv:1809.04356.
based robotic perception systems for walking and
[29] J. Lines, A. Bagnall, Time series classification with ensembles of elastic
manipulation tasks.
distance measures, Data Min. Knowl. Discov. 29 (3) (2015) 565–592.
[30] H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for
spoken word recognition, IEEE Trans. Acoust. Speech Signal Process. 26 Michal R. Nowicki graduated from Poznan University
(1) (1978) 43–49. of Technology (PUT) in Poland receiving B.Sc. in Com-
[31] H. Ismail Fawaz, et al., InceptionTime: Finding AlexNet for time series puter Science and both B.Sc. and M.Sc. in Automatic
classification, Data Min. Knowl. Discov. 34 (6) (2020) 1936–1962, arXiv: Control and Robotics. He received (with honors) Ph.D.
1909.04939. in Robotics in 2018 for his thesis concerning infor-
[32] C. Szegedy, S. Ioffe, V. Vanhoucke, A.A. Alemi, Inception-v4, inception- mation fusion using factor graphs in SLAM. Currently,
ResNet and the impact of residual connections on learning, in: Proceedings he is a Research Assistant Professor in the Institute
of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI ’17, AAAI of Robotics and Machine Intelligence at PUT, Poland.
Press, 2017, pp. 4278–4284. His research interests are related to optimization based
[33] A. Dempster, F. Petitjean, G.I. Webb, ROCKET: exceptionally fast and methods for multi-sensor fusion including problems
accurate time series classification using random convolutional kernels, related to localization, mapping, and sensory setup
Data Min. Knowl. Discov. 34 (5) (2020) 1454–1495, arXiv:1910.13051. calibration.
[34] S. Bai, J.Z. Kolter, V. Koltun, An empirical evaluation of generic con-
volutional and recurrent networks for sequence modeling, 2018, arXiv:
1803.01271. Krzysztof Walas graduated from Poznan University of
[35] A. Vaswani, et al., Attention is all you need, in: Advances in Neural Technology (PUT) in Poland receiving M.Sc. in Auto-
Information Processing Systems, Vol. 30, NIPS, 2017. matic Control and Robotics. He received (with honors)
[36] T.B. Brown, et al., Language models are few-shot learners, 2020, CoRR Ph.D. in Robotics in 2012 for his thesis concerning
arXiv:2005.14165. legged robots locomotion in structured environments.
[37] A. Dosovitskiy, et al., An image is worth 16×16 words: Transformers for Currently, he is an Assistant Professor in the Institute
image recognition at scale, 2020, arXiv:2010.11929. of Robotics and Machine Intelligence at PUT, Poland.
[38] N. Carion, et al., End-to-end object detection with transformers, in: His research interests are related to robotic perception
Computer Vision – ECCV 2020, Springer International Publishing, Cham, for physical interaction applied both to walking and
2020, pp. 213–229. grasping tasks.
11

1 s2.0 S0921889022001373 Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S0921889022001373 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0921889022001373 Main

Uploaded by

Copyright:

Available Formats

Robotics and Autonomous Systems 158 (2022) 104236

Contents lists available at ScienceDirect

Robotics and Autonomous Systems

HAPTR2: Improved Haptic Transformer for legged robots’ terrain

1. Introduction In non-reactive approaches, we need to perform terrain iden-

only currents from motors and contact forces of quadrupedal

Dynamic time warping K-nearest neighbors (DTW-KNN). DTW- QK T

methods as each robot, and each experimental setup was utterly

did not differentiate feet nor did we evaluate performance for

5. Results and discussion

We start with a formal introduction of evaluation measures.

5.3. Accuracy depending on the model size

The best performance on the chosen dataset is often obtained

Based on the results presented in the previous subsections, it

5.6. Robust robotic perception

The robot operating in the real-world environment has to

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.