TSP 2019

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Automatic EMG-based Hand Gesture Recognition

System using Time-Domain Descriptors and


Fully-Connected Neural Networks
Ana Antonia Neacsu∗ , George Cioroiu∗,† , Anamaria Radoi∗ , and Corneliu Burileanu∗

Center for Advanced Research on New Materials, Products and Innovative Processes, University Politehnica of Bucharest

Research Institute for Artificial Intelligence ”M. Draganescu”, Romanian Academy Bucharest, Romania

Abstract—Hand gesture recognition has numerous applications general, affected by noise. For this reason, apart from using
in medical (e.g., prosthetics), engineering (e.g., robot manip- adequate signal processing techniques to reduce the effect of
ulation) and, even, military research areas (e.g., UAV control noise, powerful recognition algorithms are required.
applications). This paper proposes a fast and accurate method
to identify hand gesture categories based on electromyographic In the field of gesture classification, several approaches
(EMG) signals registered by a commercial sensor (e.g., Myo have been proposed, e.g., multistream Hidden Markov Models
Armband developed by Ontario-based Thalmic Labs), which is (HMM) have been combined with decision trees to determine
placed on the user’s forearm. The proposed method is based 18 types of hand gestures based on data collected from
on the extraction of time-domain features and a neural network accelerometer sensors and EMG data [1].
architecture to perform the classification of the EMG signals. In
order to evaluate the performance of the proposed algorithm, we In [3], an integrated approach for the identification of
use a publicly available dataset with 7 hand gesture categories. daily hand movements with a view to control prosthetic
The proposed hand gesture recognition system achieves a 99.78 members is proposed. The EMG signals were acquired using
% overall performance accuracy, which is comparable to that two electrodes attached on two specific muscles of the hand.
reported by applying other state-of-the-art methods, but is able Features are extracted in the frequency domain, while inserting
to work in real-time conditions
Keywords—hand gesture recognition, neural networks, EMG a dimensionality reduction stage (based either on Principal
signals Component Analysis (PCA) or RELIEF feature selection algo-
rithm [4]) before the application of the classifier. Results have
I. I NTRODUCTION shown that the information carried by the Empirical Mode
A significant amount of research has been focused on Decomposition (EMD) extracted features may further increase
human-computer interaction based on gestures, vision and the classification accuracy [5].
voice. Hand gesture recognition provides an intelligent, natural Recently, neural networks have proved great potential in
and convenient way of human-computer interaction (HCI). Its solving gesture classification tasks [6], [7]. In [7], a convolu-
main applications are sign language recognition (SLR) and tional network (ConvNet) is augmented with transfer learning
gesture-based control [1]. Sign language recognition has the techniques to leverage inter-user data from the first data
goal of interpreting signs automatically using a computer, in set, alleviating the burden imposed on a single individual
order to help deaf people communicate easier with the society. to generate a vast quantity of training data for sEMG-based
Although it is highly structured and based on an alphabet and gesture recognition. Specifically, the transfer learning tech-
symbols, it also serves as a good basic for the development nique proposed in [7] divides the recognition problem into
of general gesture-based human-computer interaction. learning a general mapping between the muscle activity and
Surface Electromyography (sEMG) is the electrical man- the hand gestures (i.e., source task) and learning a specific
ifestation of the neuromuscular activity associated with the mapping (i.e., target task). The proposed transfer learning
contracting muscle. This technology may be used by physi- scheme achieved good results, but the features used are compu-
cally impaired persons to control rehabilitation and assistive tationally complex, involving time-frequency transformations,
devices. EMG is also used in many research domains, e.g. e.g. Continuous Wavelet Transform (CWT).
biomechanics, gesture-based control applications, neuromus- Another method for gesture recognition is mentioned in
cular physiology, sign language recognition, military applica- [8], where an EMG-based pattern recognition algorithm is
tions, games and virtual reality [2]. Among the challenges proposed for classification of joint wrist angular position
that accompany a recognition system based on sEMG signals, during flexion and extension movements from EMG signals.
signal acquisition plays an important role. This process is, in The algorithm considers features in both time and frequency
domains. The pattern recognition stage uses a recurrent neural
This work was supported by the Ministry of Innovation and Research, network (RNN) as classifier. Results show that shallow Neural
UEFISCDI, project SPIA-VA, agreement 2SOL/2017, grant PN-III-P2-2.1-
SOL-2016-02-0002 and project UAV Platforms, agreement 1SOL/2016, grant Networks have better performance than architectures with
PN-III-P2-2.1-SOL-2016-01-0008. numerous layers containing autoencoders.
Feature
Windowing
Myo Extraction
armband

Classification Optimization Application

Fig. 1. General overview of the system

This paper proposes a real time automatic gesture recogni- Fig. 2. Proposed Neural Network-based architecture
tion system, for seven basic gestures, based on sEMG signals.
The signals are acquired using Myo armband, equipped with
8 circularly arranged EMG sensors, placed on the forearm. EMG (8). Considering x to be the analysed signal of length
The classification is performed using a fully connected Neural L, the above mentioned descriptors are defined as follows:
Nerwork, whose training involves a free-source data set1 , also
L−1
acquired with Myo armband, detailed in [7]. 1 X
M AV (x) = |xk | (1)
The rest of the paper is organized as follows: in Section II L
k=0
the proposed method is presented, including the features and
the architecture used for the classification. Section III details 
the experimental setup and the corresponding results. The last ZC(x) = | k : (|xk − xk−1 | ≥ α) ∧ (sgn(xi ) 6= sgn(xi−1 )) |
(2)
section, IV, is dedicated to concluding remarks.

II. P ROPOSED METHOD SSC(x) = |k : (xk − xk−1 ) · (xk − xk+1 ) ≥ α| (3)


A general overview of the proposed method is shown in
Figure 1. The signals captured by the armband are transmitted L−1
X
to a computer via Bluetooth. The sampling frequency of the W L(x) = |xk − xk−1 | (4)
device is Fs = 200 Hz, whilst for the analysis, we considered k=1
a rectangle sliding window of 250 ms (50 samples) length, L−1  3
with an overlap between consecutive windows of 200 ms (40 1 X xk − x
Skewness(x) = (5)
samples). The window size is chosen to allow lower statistical L σ
k=0
variance in the feature sets and a continuous classification of v
the EMG signals. u L−1
u1 X
RM S(x) = t x2k (6)
A. Feature Extraction L
k=0
In general, the feature extraction module has an impor- L−1
tant influence over the recognition system. Numerous signal 1 X 2
Activity(x) = (xk − x) (7)
processing techniques and mathematical models are used to L
k=0
efficiently extract relevant information from the EMG data. So
L−1
far, research and extensive efforts have been made in the signal X
processing area, which led to the development of faster and IEM G(x) = | xk |, (8)
k=0
better algorithms. Since the purpose of this work is to design
a fast and efficient algorithm for gesture recognition, the fea- where sgn is the sign function, L is the length of the analysis
tures used for classification must be simple and cost-effective window, xk is the current vector, σ is the mean value of the
to compute. Time-frequency analysis methods involving the window and x is the mean value of the vector.
Short-Time Fourier Transform, the wavelet transform or the
B. Classification
wavelet packet transform would require a larger computation
time, with no improvement over the classification performance Recently, the use of machine learning algorithms has be-
[9]. For these reasons, only time-domain descriptors were come more prominent, as they are being employed in var-
considered. The features used in this paper are: Mean absolute ious tasks, ranging from simple regressions up to complex
value (MAV) (1), Zero Crossing Rate (ZC) (2), Slope Sign multinomial classification. In the field of EMG-based gesture
Changes (SSC) (3), Waveform Length (WL) (4), Skewness (5), recognition Neural Networks can be successfully used for
Root Mean Square (RMS) (6), Hjorth Activity (7), Integrated classification, if the data set contains sufficient examples. Deep
Network-based architectures can learn very complex patterns,
1 https://github.com/Giguelingueling/MyoArmbandDataset but they are prone to overfitting. However, such architectures
Fig. 3. The seven hand gestures considered during the experiment [7]

may be time-consuming, hence, not adequate for real time ap-


plications. This article proposes a fully connected architecture
Fig. 4. Raw EMG signal captured with Myo
with a forward pass of less than 4.5 ms, including the feature
extraction stage. The model parameters were determined using
the cross entropy loss. Considering l targets, the cross entropy
subjects performing two rounds of 7 gestures. In both cases,
loss for a single example is given by the sum:
the subjects were asked to perform the seven gestures in two
l
X rounds, for a period of 20 seconds.
E=− (ti log2 (yi ) + (1 − ti ) log2 (1 − yi )) (9) As a side remark, the position of the armband is not
i=1
important because, for every user, the armband was slid until
where t1 , t2 , ..., tl are the targets and y1 , y2 , ..., yl are the the forearm circumference matched the armband. However, the
outputs of the neural network architecture. orientation is important in order to keep the 8 channels in the
The model parameters are determined via backpropagation, same order for every user.
using the ADAM optimizer instead of the classical Stochastic Moreover, we consider that if the user performs the same
Gradient Descent (SGD). The reason for using the ADAM gesture for 20 seconds, then any continuous part of the
optimizer is its robustness to changes in hyperparameters [10]. signal is that gesture. Having this in mind, we partition the
The entire architecture diagram is displayed in Fig. 2. The signals into 50 continuous samples, each equivalent to 250
proposed network is composed of 6 hidden layers, having milliseconds. Each sample receives the same label as the
128, 128,128, 64, 32 and 16 neurons each. After each layer, signal they originated from. For each sample, we extract eight
a batch normalization step is performed to avoid overfitting. features as detailed in Section II A.
The activation used for all the layers is ReLu, except for the For our neural network, we propose an architecture consist-
output layer which uses Softmax activation. ing on only Fully Connected Layers. We carried out several
III. E XPERIMENTS experiments with varying depth, from 4 to 7 hidden layers,
but, in all cases, the number of neurons in a hidden layer
A. Experimental Setup
was set to be bigger than the number of neurons in the next
We used a free-source data set, considering seven gestures hidden layer. The architecture consisting of 6 hidden layers
that relate to the basic movements of the hand, namely four achieved the highest performance, whereas the number of
gestures for hand mobility (i.e., left / wrist flexion, right / neurons considered are 128, 128, 128, 64, 32, 16.
wrist extension, down / ulnar deviation, up / radial deviation),
two gestures for hand grip (hold / hand close, release / hand B. Experimental Results
open) and one gesture for neutral position [7]. Examples of
such gestures are displayed in Fig. 3. The EMG activity In order to measure the performance of the proposed
was recorded using the Myo armband created by Thalmic classification system, we compute the overall classification
Labs. The Myo armband has 8 dry sEMG (surface EMG) accuracy (OA), the per-class accuracies (PC) and Kappa index
sensors placed circularly (i.e., each EMG is composed of 8 (K). These measures are computed starting from the confusion
channels). An example of a normalized 8-channels raw signal matrix C which has the number of predicted labels on the
is depicted in Fig 4. The main advantage of the Myo armband columns and the ground truth on the rows:
is the fact that it is non-invasive and it can be used without l
X Cii
any preparations. However, these benefits come with severe OA = (10)
limitations since dry electrodes are not as accurate in capturing i=1
N
the EMG activity compared to gel-based ones. Even under Cii
P Ci = , ∀i ∈ {1, ..., l} (11)
these conditions, the proposed method is able to detect the Ci+
movements of the hand in a very accurate manner. 1
Pl 1
Pl
N i=1 Cii − N 2 i=1 Ci+ C+i
The training data set consists of 18 able-bodied subjects, K= 1
Pl (12)
whilst the evaluation data set consists of 17 able-bodied 1 − N 2 i=1 Ci+ C+i
TABLE I. P ER - CLASS ACCURACY RATES (PC), OVERALL ACCURACY
(OA) AND K APPA INDEX (K) OF PROPOSED METHOD FOR HAND GESTURE The average running times for both methods, decomposed
RECOGNITION . on feature extraction and prediction stages, are shown in
Proposed Table II. We mention that all the experiments were carried
Class DL-TL [7] on an NVIDIA Quadro M4000 GPU.
method
Neutral 99.96 98.89 During our experiments, we observed that increasing the
Radial Deviation 99.75 99.46 number of layers would result in a model that fails to gener-
Wrist Flexion 99.86 98.42 alize, thus, being prone to overfitting. The same phenomenon
Ulnar Deviation 99.72 96.52
occurs when there are too many identical layers repeated in
Wrist Extension 99.72 99.55
Hand Close 99.86 99.43 the model architecture.
Hand Open 99.58 94.61
C. Optimization and Application
OA 99.78 98.12
K 0.99 0.98 The system for hand gesture recognition described in this
paper is working fast because the computations needed for a
TABLE II. AVERAGE RUNNING TIME PER GESTURE ( DECOMPOSED ON forward pass are not time-consuming if compared to Convolu-
FEATURE EXTRACTION AND PREDICTION STAGES ). tional Neural Networks (ConvNets). For real-time application,
Feature since the inference part of the system is done almost instantly,
Method Prediction Total an optimization module can be added without creating a
extraction
Proposed 1.9 ms 2.5 ms 4.4 ms perceptible delay for the user.
DL-TL [7] 50.2 ms 19.4 ms 69.9 ms
IV. C ONCLUSIONS
This paper presents a novel method for hand gesture recog-
where N represents the total number of analysed EMG signals, nition based on simple and easy to compute time-domain fea-
l is the number of gesture categories, Cij is the number of tures. Using a relative easy to train neural network architecture,
signals in ground truth class j and classified as class i and the the proposed system is able to accurately recognize 7 basic
values Ci+ and C+j are computed as: hand gestures in a timely manner, being a candidate for online
recognition of rapidly varying hand gestures.
l
X
Ci+ = Cij (13) R EFERENCES
j=1 [1] X. Zhang, X. Chen, Y. Li, V. Lantz, K. Wang, and J. Yang, “A framework
l for hand gesture recognition based on accelerometer and emg sensors,”
IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems
X
C+j = Cij (14) and Humans, vol. 41, no. 6, pp. 1064–1076, 2011.
i=1 [2] J. Kim, S. Mastnik, and E. André, “Emg-based hand gesture recognition
for realtime biosignal interfacing,” in Proceedings of the 13th interna-
Moreover, the proposed method is compared to other state- tional conference on Intelligent user interfaces. ACM, 2008, pp. 30–39.
of-the-art methods, such as the transfer learning-based method [3] C. Sapsanis, G. Georgoulas, and A. Tzes, “Emg based classification of
basic hand movements based on time-frequency features,” in Control &
presented in [7] (abbreviated DL-TL). The results are synthe- Automation (MED), 2013 21st Mediterranean Conference on. IEEE,
sized in Table I. 2013, pp. 716–722.
Compared to the solution proposed in [7], the overall [4] R. J. Urbanowicz, M. Meeker, W. L. Cava, R. S. Olson, and J. H. Moore,
“Relief-based feature selection: Introduction and review,” CoRR, vol.
accuracy achieved by our recognition system (99.78%) is abs/1711.08421, 2017.
higher. Moreover, for all 7 hand gestures, the reported per-class [5] Y. Guo, Q. Wang, S. Huang, and A. Abraham, “Hand gesture recognition
accuracies are greater than the ones obtained by the DL-TL system using single-mixture source separation and flexible neural trees,”
Journal of Vibration and Control, vol. 20, no. 9, pp. 1333–1342, 2014.
method [7]. [6] M. R. Ahsan, M. I. Ibrahimy, and O. O. Khalifa, “Electromygraphy
An important aspect for real-time applications is the average (emg) signal based hand gesture recognition using artificial neural
running time, which, in the case of the proposed recogni- network (ann),” in (ICOM), 2011 4th International Conference On
Mechatronics. IEEE, 2011, pp. 1–6.
tion technique, is 4.4 ms. Compared to the DL-TL method, [7] U. Côté-Allard, C. L. Fall, A. Drouin, A. Campeau-Lecours, C. Gosselin,
this represents a speedup of 16 times. However, apart from K. Glette, F. Laviolette, and B. Gosselin, “Deep learning for electromyo-
the feature extraction and prediction stages, the most time- graphic hand gesture signal classification using transfer learning,” IEEE
Transactions on Neural Systems and Rehabilitation Engineering, 2019.
consuming part of the DL-TL method is the transfer learning [8] A. D. Orjuela-Cañón, A. F. Ruı́z-Olaya, and L. Forero, “Deep neural
stage which requires almost 5.25 minutes. This represents a network for emg signal classification of wrist position: Preliminary
drawback for real-time applications in the context of sEMG- results,” in Computational Intelligence (LA-CCI), 2017 IEEE Latin
American Conference on. IEEE, 2017, pp. 1–5.
based gesture recognition, since the convolutional network [9] K. Englehart and B. Hudgins, “A robust, real-time control scheme for
(ConvNet) scheme presented in [7] relies upon transfer learn- multifunction myoelectric control,” IEEE Transactions on Biomedical
ing techniques. These techniques leverage inter-user data and Engineering, vol. 50, no. 7, pp. 848–854, July 2003.
[10] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
increase the overall accuracy by pre-training a model on CoRR, vol. abs/1412.6980, 2014.
multiple subjects before training it on a new participant, but
come at the cost of spending additional time for learning a
specific mapping.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy