Chen 2016
Chen 2016
Chen 2016
Abstract—Due to the advantages of deep learning, in this paper, differentiating materials of interest with increased classification
a regularized deep feature extraction (FE) method is presented accuracy. Moreover, with respect to advances in hyperspectral
for hyperspectral image (HSI) classification using a convolutional technology, the fine spatial resolution of recently operated
neural network (CNN). The proposed approach employs several
convolutional and pooling layers to extract deep features from sensors makes the analysis of small spatial structures in images
HSIs, which are nonlinear, discriminant, and invariant. These possible [1]. The aforementioned advances make the hyper-
features are useful for image classification and target detection. spectral data a useful tool for a wide variety of applications.
Furthermore, in order to address the common issue of imbalance By increasing the dimensionality of the images in the spec-
between high dimensionality and limited availability of training tral domain, theoretical and practical problems may arise. In
samples for the classification of HSI, a few strategies such as L2
regularization and dropout are investigated to avoid overfitting this manner, conventional techniques which are developed for
in class data modeling. More importantly, we propose a 3-D multispectral data are no longer efficient for the processing
CNN-based FE model with combined regularization to extract of high-dimensional data mostly due to the so-called curse of
effective spectral–spatial features of hyperspectral imagery. Fi- dimensionality [2]. In order to address the curse of dimension-
nally, in order to further improve the performance, a virtual ality, feature extraction (FE) is considered as a crucial step in
sample enhanced method is proposed. The proposed approaches
are carried out on three widely used hyperspectral data sets: HSI processing [3]. However, due to the spatial variability of
Indian Pines, University of Pavia, and Kennedy Space Center. spectral signatures, HSI FE is still a challenging task [4].
The obtained results reveal that the proposed models with sparse In the early stage of the study on HSI FE, the focus was on
constraints provide competitive results to state-of-the-art methods. spectral-based methods, including principal component analy-
In addition, the proposed deep FE opens a new window for further sis (PCA) [5], independent component analysis (ICA) [6], linear
research.
discriminant analysis [7], etc. [8], [9]. These methods apply
Index Terms—Convolutional neural network (CNN), deep linear transformations to extract potentially better features of
learning, feature extraction (FE), hyperspectral image (HSI) the input data in the new domain. With respect to the complex
classification.
light-scattering mechanisms of nature objects (e.g., vegetation),
hyperspectral data are inherently nonlinear [10], [11], which
I. I NTRODUCTION make linear transformation-based methods not that suitable for
the analysis of such data.
H YPERSPECTRAL images (HSIs) are usually composed
of several hundreds of spectral data channels of the
same scene. The detailed spectral information provided by
Since 2000, when two papers on manifold learning were
published in Science [12], [13], manifold learning has become
a hot topic in many research areas, including hyperspectral
hyperspectral sensors increases the power of accurately
remote sensing. Manifold learning attempts finding the intrinsic
structure of nonlinearly distributed data, which is expected to be
Manuscript received August 1, 2015; revised February 16, 2016; accepted highly useful for hyperspectral FE [14].
June 12, 2016. Date of publication July 18, 2016; date of current version
August 11, 2016. This work was supported in part by the Fundamental Research Alternatively, the nonlinear problem can be addressed by
Funds for the Central Universities under Grant HIT.NSRIF.2013028 and in part kernel-based algorithms for data representation [15]. Kernel
by the National Natural Science Foundation of China under Grant 61301206. methods map the original data into a higher dimensional Hilbert
(Corresponding author: Yushi Chen.)
Y. Chen, H. Jiang, and C. Li are with the Department of Information En- space and offer a possibility of converting a nonlinear problem
gineering, School of Electronics and Information Engineering, Harbin Institute to a linear one [16].
of Technology, Harbin 150001, China (e-mail: chenyushi@hit.edu.cn; halo91@ Recent studies have suggested incorporating spatial informa-
163.com; lcy_buzz@mail.dlut.edu.cn).
X. Jia is with the School of Engineering and Information Technology, tion into a spectral-based FE system [17]. With the development
The University of New South Wales, Canberra, A.C.T. 2600, Australia (e-mail: of imaging technology, hyperspectral sensors can provide good
x.jia@adfa.edu.au). spatial resolution. As a result, detailed spatial information has
P. Ghamisi is with Signal Processing in Earth Observation, Technische
Universität München, 80333 Munich, Germany, and also with the Remote become available [18]. It has been found that spectral–spatial
Sensing Technology Institute (IMF), German Aerospace Center (DLR), 82234 FE methods provide good improvement in terms of classifi-
Weßling, Germany (e-mail: p.ghamisi@gmail.com). cation performance [19]. In [20], a method was introduced
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. based on the fusion of morphological operators and support
Digital Object Identifier 10.1109/TGRS.2016.2584107 vector machine (SVM), which leads to high classification
0196-2892 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
CHEN et al.: DEEP FEATURE EXTRACTION AND CLASSIFICATION OF HYPERSPECTRAL IMAGES 6233
accuracy. In [21], the proposed framework extracted the spatial hyperspectral data simultaneously, it is reasonable to formulate
and spectral information using loopy belief propagation and a 3-D CNN. Furthermore, to address the problem of over-
active learning. The sparse representation [22] of extended fitting caused by limited training samples of hyperspectral
morphological attribute profile was investigated to incorporate data, we design a combined regularization strategy, including
spatial information in remote sensing image classification in rectified linear unit (ReLU) and dropout to achieve better model
[23], which further improves classification accuracy. In the generalization.
hyperspectral remote sensing community, most of the current In this paper, we investigate the application of supervised
FE methods consider only one-layer processing, which down- CNN, which is one of the deep models, in HSI FE and develop
grades the capacity of feature learning. a 3-D CNN model for effective spectral-and-spatial-based HSI
Most of FE and classification methods are not based on a classification. It is challenging to apply deep learning to HSI
“deep” manner. The widely used PCA and ICA are single- since its data structure is complex and the number of training
layer learning methods [24]. Classifiers such as linear SVMs samples is limited. In computer vision, the number of training
and logistic regression (LR) can be attributed as single-layer samples varies from tens of thousands to tens of millions [32],
classifiers, whereas decision tree or kernel SVMs are believed [33], whereas having such a large number of training samples is
to have two layers [24]. not common in hyperspectral remote sensing classification. In
On the other hand, it is found in neuroscience that the visual general, a neural network has a powerful representation capa-
system of primate human is characterized by a sequence of bility with abundant training samples. Without enough training
different levels of processing (on the order of 10), and this kind samples, a neural network faces a problem of “overfitting,”
of learning system performs very well in the tasks of object which means that the classification performance of test data will
recognition [25]. Deep learning-based methods, which include be downgraded. This problem is expected when deep learning
two or more layers to extract new features, are designed to is applied to remote sensing data while this paper presents a
simulate the process from the retina to the cortex, and these solution to make such approaches feasible for situations when
deep architectures have a potential to yield high performances only a limited number of training samples is available. We
in image classification and target detection [26], [27]. use several regularization methods, including L2 regularization,
Undesired scattering from other objects may deform the and dropout strategies to handle the overfitting issue.
spectral characteristics of the object of interest. Furthermore, The main goal of this paper is to propose a deep FE method
other factors such as different atmospheric scattering conditions for HSI classification. With the help of training samples, the
and intraclass variability make it extremely difficult to extract proposed CNN models extract the abstract and robust features
the features of hyperspectral data effectively. To address such of HSI, which are important for classification. In more detail,
issues, deep architecture is known as a promising option since the main contributions are listed as follows.
it can potentially lead to more abstract features at high levels,
1) Three deep FE architectures based on a CNN are pro-
which are generally robust and invariant [28].
posed to extract the spectral, spatial, and spectral–spatial
Very recently, some deep models have been proposed for
features of HSI. The designed 3-D CNN can extract the
hyperspectral remote sensing image processing [49]. To the
spectral–spatial features effectively, which leads to better
best of our knowledge, a deep learning method, i.e., stacked
classification performance.
autoencoder (SAE), was proposed for HSI classification in 2014
2) To address the problem of overfitting caused by the
[29]. Later, an improved autoencoder was proposed based on
limited number of training samples, some regularization
sparse constraint [50]. In 2015, another deep model, entitled
strategies, including L2 regularization and dropout, are
deep belief network (DBN), was proposed [30]. The deep
used in the training process.
models could extract the robust features and outperform other
3) In order to further improve the performance, a virtual
methods in terms of classification accuracy. However, due to
sample enhanced method is proposed to create training
the full connection of different layers in the aforementioned
samples from the imaging procedure perspective.
approaches, they demand to train a lot of parameters, which
4) The hierarchical features of different depth extracted
is an undesirable factor due to the lack of available training
from HSI are visualized and analyzed for the first time.
samples. Furthermore, SAE and DBN cannot extract the spa-
5) The proposed methods are applied on three well-known
tial information efficiently because they need to represent the
hyperspectral data sets. In this context, we compared the
spatial information into a vector before the training stage.
proposed methods with some traditional methods from
Convolutional neural network (CNN) uses local connections
a different perspective such as classification accuracy,
to effectively extract the spatial information and shared weights
analysis of complexities, and processing time.
to significantly reduce the number of parameters. Very recently,
an unsupervised convolutional network has been proposed The remainder of this paper is organized as follows:
for remote sensing image analysis. This method uses greedy Section II presents the description of CNN and 1-D CNN-based
layerwise unsupervised pretraining to formulate a deep CNN HSI spectral FE frameworks. Sections III and IV present the
model [31]. spatial and spectral–spatial FE frameworks for HSI classifica-
Compared with the unsupervised method, supervised CNN tion, respectively. The virtual sample enhanced CNN is intro-
may extract more effective features with the help of class- duced in Section V. The experiments conducted are reported in
specific information, which can be provided by training Section VI. We conclude this paper in Section VII with some
samples. To extract the spectral and spatial information of discussions.
6234 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
and classification. In the FE procedure, LR is taken into account activation of each output unit sums to 1 so that we can deem
to adjust the weights and biases in the back-propagation. After the output as a set of conditional probabilities. For given input
the training, the learned features can be used in conjunction vector R, the probability that the input belongs to category i can
with classifiers such as LR, K-nearest neighbor (KNN), and be estimated as follows:
SVMs [1].
The proposed architecture is shown in Fig. 3. The input of the eWi R+bi
system is a pixel vector of hyperspectral data, and the output of P (Y = i|R, W, b) = s(W R + b) = Wj R+bj (5)
je
the system is the label of the pixel vector. It consists of several
convolutional and pooling layers and an LR layer. In Fig. 3, as
where W and b are the weights and biases of the LR layer, and
an example, the flexible CNN model includes two convolution
the summation is done over all the output units.
layers and two pooling layers. There are three feature maps in
In the LR, the size of the output layer is set to be the same
the first convolution layer and six feature maps in the second
as the total number of classes defined, and the size of the input
convolution layer.
layer is set to be the same as the size of the output layer of
After several layers of convolution and pooling, the in-
the CNN. Since the LR is implemented as a single-layer neural
put pixel vector can be converted into a feature vector,
network, it can be merged with the former layers of networks to
which captures the spectral information in the input pixel
form a deep classifier.
vector. Finally, we use LR or other classifiers to fulfill the
classification step.
The power of CNN depends on the connections (weights) of
D. L2 Regularization of CNN
the network; hence, it is very important to find a set of proper
weights. Gradient back-propagation is the core fundamental Overfitting is a common problem of neural network ap-
algorithm for all kinds of neural networks. In this paper, the proaches, which means that the classification results can be very
model parameters are initialized randomly and trained by an good on the training data set but poor on the test data set. In this
error back-propagation algorithm. case, HSI will be classified with low accuracy. The number of
Before setting an updating rule for the weights, one needs training samples is limited in HSI classification, which often
to properly set an “error” measure, i.e., a cost function. There leads to the problem of overfitting.
are several ways to define such a cost function. In our imple- To avoid overfitting, it is necessary to adopt additional tech-
mentation, a mini-batch update strategy is adopted, which is niques such as regularization. In this section, we introduce L2
suitable for large data set processing, and the cost is computed regularization in the proposed model, which is a penalizing
on a mini-batch of inputs [37] model with extreme parameter values [41].
L2 regularization encourages the sum of the squares of
1
m
the parameters to be small, which can be added to learning
c0 = − [xi log(zi ) + (1 − xi ) log(1 − zi )] . (4)
m i=1 algorithms that minimize a cost function. Equation (4) is then
modified to
Here, m denotes the mini-batch size. Two variables xi and
λ 2
N
zi denote the ith predicted label and the label in the mini-
c = c0 + w (6)
batch, respectively. The i summation is done over the whole 2m j=1 j
mini-batch. Our hope turns to optimize (4) using mini-batch
stochastic gradient descent.
LR is a type of probabilistic statistical classification model. where m denotes the mini-batch size, N is the number of
It measures the relation between a categorical variable and the weights, and λ is a free parameter that needs to be tuned
input variables using probability scores as the predicted values empirically. In addition, the coefficient, 1/2, is used to simplify
of the input variables. the process of the derivation.
To perform classification by utilizing the learned features In (6), one can see that L2 regularization can make w small.
from the CNN, we employ an LR classifier, which uses soft- In most cases, it can help with the reduction of the bias of the
max as its output-layer activation. Softmax ensures that the model to mitigate the overfitting problem.
6236 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
Fig. 4. Architecture of CNN with spatial features for HSI classification. The first step of processing is PCA along with spectral dimension, and then CNN is
introduced to extract layerwise deep features.
Fig. 6. Architecture of 3-D CNN with spectral–spatial features for HSI classification.
Fig. 6 shows the architecture of 3-D CNN for HSI classi- in a local minimal of the loss function, which results in poor
fication. We choose K × K × B neighborhoods of a pixel as performance. To obtain proper weights, a lot of samples are
an input to the 3-D CNN model, in which B is the number required in the training procedure. However, these samples
of bands. Each layer of CNN contains 3-D convolution and are usually obtained by manual labeling of a small number
pooling. As an example, a 4 × 4 × 32 kernel or a 5 × 5 × of pixels in an image or based on some field measurements.
32 kernel can be applied to 3-D convolution, and a 2 × 2 kernel Therefore, the collection of these samples is both expensive
can be applied for subsampling. After performing a deep 3-D and time demanding. Consequently, the number of available
CNN, the LR approach is conducted for the classification step. training samples is usually limited, which is a challenging issue
in supervised classification. To solve the dilemma, we utilize
C. Regularizations Based on Sparse Constraints virtual sample as a promising tool from a different perspective.
The issue of high dimensionality and limited number of The virtual sample method tries to create new training sam-
training samples makes the overfitting a serious problem, par- ples from given training samples. The critical issue is how to
ticularly when the input is a 3-D cube. The dimensionality of generate proper samples while we figure out a solution from
the spectral-based CNN, which is presented in Section II-C, the imaging procedure perspective. Because of the complex
is around a couple of hundreds (the number of bands); the situation of lighting in the large scene, objects of the same class
dimensionality of the spatial-based CNN, which is presented in show different characteristics in different locations. Therefore,
Section III-B, is around several hundreds (K ×K, e.g., K = 27); we can simulate a virtual sample by multiplying a random fac-
the dimensionality of the spectral-and-spatial-based CNN, tor to a training sample and adding random noise. Furthermore,
which is presented in Section IV-B, is around several thousands we can generate a virtual sample from two given samples of the
(K × K × B). It is easy to obtain that the high dimensionality same class with proper ratios. The virtual sample idea is helpful
of the input data may lead to an overfitting situation. In order to in the training of a CNN.
handle the issue of 3-D CNN, a combined regularization strategy To tackle the problem of having limited training samples,
based on sparse constraint is developed, which includes ReLU instead of regularization such as L2 regularization and dropout,
and dropout, and applies dropout in the fully connected layer. virtual samples have been generated and added to the training
There are different kinds of ReLUs available to apply. In this samples.
paper, the adopted ReLU is a simple nonlinear operation that
accepts the input of a neuron if it is positive, whereas it returns
A. Changing Radiation-Based Virtual Samples
to 0 if the input is negative. In many applications, ReLUs in
CNNs can improve the performances [42]. Remote sensing, including hyperspectral imaging, usually
Dropout is a recently introduced method to handle overfit- contains a large scene, whereas the objects of the same class
ting. It sets the output of some hidden neurons to zero, which in different locations are affected by different radiation. Virtual
means that the dropped neurons do not contribute in the forward samples can be created by simulating the imaging procedure.
pass and they are not used in the back-propagation procedure. In New virtual sample yn is obtained by multiplying a random
different training epochs, the deep CNN forms a different neural factor and adding random noise to a training sample xm
network by dropping neurons randomly. The dropout method
prevents complex co-adaptations [43]. y n = αm xm + βn. (9)
By using ReLU and dropout, the outputs of many neurons
The training sample xm is a cube extracted from the hyper-
are 0. We use several ReLUs and dropouts at several layers to
spectral cube, which contains the spectral and spatial informa-
achieve powerful sparse-based regularization for the deep net-
tion of pixel to be classified.
work and address the overfitting problem in HSI classification.
In (9), αm indicates the disturbance of light intensity,
which can vary under many situations such as seasons and
V. V IRTUAL S AMPLE E NHANCED CNN
atmospheric conditions, whereas β controls the weight of the
As a matter of fact, CNN has a lot of weights needed to random Gaussian noise n, which may result from the interac-
be trained. Inappropriate weights may cause getting trapped tion of adjacent pixels and instrumental error.
6238 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
TABLE I
L AND -C OVER C LASSES AND N UMBERS OF P IXELS
ON THE I NDIAN P INES D ATA S ET
Fig. 7. Indian Pines data set. (Left) False color composite image (bands 28, 19,
and 10) and (right) ground truth.
TABLE II TABLE IV
L AND -C OVER C LASSES AND N UMBERS OF P IXELS A RCHITECTURES OF THE 1-D CNN ON T HREE D ATA S ETS
ON THE U NIVERSITY OF PAVIA D ATA S ET
Fig. 12. Weights of the second and third convolutional layers on the University
of Pavia data set. In the first column of the image, there are 12 filters, and each
tiny image contains 42(6 × 7) weights of a convolutional kernel. The second
one shows 24 filters and 96(12 × 8) weights in a tiny image. (a) Learned
weights of the second convolutional layer. (b) Learned weights of the third
convolutional layer.
Fig. 11. Weights of the first convolutional layer on the University of Pavia data
set. Each tiny image (1 × 8) stands for the weights of a convolutional kernel.
There are six convolutional kernels in the first convolutional layer. The intensity In order to evaluate the effectiveness of the extracted fea-
of each pixel stands for the value of corresponding weight. (a) Randomly tures, the similarity in the same class and the divisibility
initialized weights of first convolution layer. (b) Learned weights of first
convolution layer. between different classes are shown in Table V in a quantitative
way. We selected three classes for calculation and calculated
in a row represent the connection intensity of the network. Each the divisibility of different classes with J − M distance. The
convolutional kernel can extract the unique feature of the input. J − M distance is defined as [45]
Fig. 11 shows the weights of the first convolutional layer on the
University of Pavia data set. The weights are randomly ini- Jij = 2(1 − e−Bij ) (11)
tialized and trained using back-propagation methods. From −1
1 ci + cj
Fig. 11(b), the learned weights show some structures. For Bij = (mi − mj )T (mi − mj )
8 2
example, the intensities of the first row are high on the left ⎛ (c +c ) ⎞
i j
side and low on the right side. Fig. 14 shows the weights of 1 2
the 2-D CNN, and it is helpful for the understanding. Different + log ⎝ ⎠ (12)
2 |ci ||cj |
convolutional kernels can extract the features from different
perspectives, and the abundant features are helpful for further
where mi and ci are sample’s average vector and covariance ma-
processing.
trix. Bij is the Bhattacharyya distance between the two classes.
Fig. 12 shows the weights learned at the second and third
The similarity in the same class is evaluated with the correla-
convolutional layers in an image form where the brightness
tion coefficient on a scale of −1 to 1. The correlation coefficient
is proportional to the value of the weights. There are 12 and
calculation formula is defined as follows:
24 convolutional kernels at layers 2 and 3, respectively. The
numbers of weights, i.e., 42 at layer 2 and 96 at layer 3, are C(x, y)
arranged in an image form artificially. Different convolutional ρx,y = (13)
D(x) D(y)
kernels can extract the features from different perspectives. The
abundant features are helpful for further processing. where x and y are two feature vectors, whereas C(x, y) is a
The learned features, which are obtained by the convolution covariance matrix. D(x) and D(y) are the variances of two
of inputs and kernels, on the University of Pavia data set vectors. We use the mean of all correlation coefficients to
are illustrated as curves in Fig. 13. The class of Meadows is evaluate the similarity in the same class.
selected for visualization, and the extracted features after each The higher similarity within class and the higher divisibility
convolutional layer are shown with a different color. It is shown between classes make the classification step smoother. From
that these different features are extracted by different convolu- Table V, by comparing the calculated results in different layers,
tion kernels. The extracted features become more abstract after one can see that features have a high similarity in the same
the third convolutional and pooling layers. class and large divisibility in different classes as the number of
CHEN et al.: DEEP FEATURE EXTRACTION AND CLASSIFICATION OF HYPERSPECTRAL IMAGES 6241
Fig. 13. Extracted features after convolution and pooling layers on the University of Pavia data set. (a) Original spectral information. (b) and (c) Features after
the first convolutional layer. (d)–(f) Features after the second convolutional layer. (g)–(i) Features after the third convolutional layer.
TABLE V
S IMILARITY AND D IVISIBILITY OF S PECTRAL F EATURES ON THE U NIVERSITY OF PAVIA D ATA S ET
convolutional layers increases. Therefore, the results infer that In order to have a fair comparison, we used 10% of the training
the extracted features are valid and efficient. samples to find the best parameters of FE methods using grid
3) Comparisons With Different FE Methods and Classifiers: search. Theresult reported in Tables VI–VIII are the best classifi-
In this set of experiments, CNN was compared with the PCA, fac- cation results when the number of features was properly selected
tor analysis (FA), and locally linear embedding (LLE) in order to for each FE method. On the selection of parameters, the number
investigate the potential of CNN for hyperspectral spectral FE. of features was chosen in the range of 10 to N (i.e., the number
PCA is a widely used FE method. FA is a linear statistical method of hyperspectral bands) with an interval of 10. The number of
designed for potential factors from observed variables to replace neighbors in LLE has been changed in a range from 1 to 10. The
original data [46]. LLE is a popular nonlinear dimension reduc- final classification results such as OA, AA, and Kappa were cal-
tion method, which is considered as a kind of manifold learning culated on the test data set. In this set of experiments, CNN was
algorithm [47]. In this paper, the effectiveness of different FE compared with the PCA, FA, and LLE in order to investigate the
methods is evaluated mainly through classification results. We potential of CNN for hyperspectral spectral FE. PCA is a widely
also classify the features using several classifiers such as KNN used FE method. FA is a linear statistical method designed for
classifier and a nonlinear SVM based on radial basis function potential factors from observed variables to replace original data
(RBF-SVM). Using the same features with different classifiers, [46]. LLE is a popular nonlinear dimension reduction method,
we can evaluate the effectiveness of the extracted features. which is considered as a kind of manifold learning algorithm
Tables VI–VIII show that the CNN-based FE methods al- [47]. In this paper, the effectiveness of different FE methods is
ways provide the best performances of OA, AA, and Kappa for evaluated mainly through classification results. We also classify
all three data sets. The classification accuracy values are given the features using several classifiers such as KNN classifier and
in the form of mean ± standard deviation from the perspective an RBF-SVM. Using the same features with different classi-
of statistics, which is used as a measurement of volatility. fiers, we can evaluate the effectiveness of the extracted features.
6242 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
TABLE VI
C LASSIFICATION R ESULTS O BTAINED BY D IFFERENT FE A PPROACHES ON THE I NDIAN P INES D ATA S ET
TABLE VII
C LASSIFICATION R ESULTS O BTAINED BY D IFFERENT FE A PPROACHES ON THE U NIVERSITY OF PAVIA D ATA S ET
CHEN et al.: DEEP FEATURE EXTRACTION AND CLASSIFICATION OF HYPERSPECTRAL IMAGES 6243
TABLE VIII
C LASSIFICATION R ESULTS O BTAINED BY D IFFERENT FE A PPROACHES ON THE KSC D ATA S ET
Fig. 14. Weights of the first convolutional layer. Each tiny image (4 × 4) stands for a convolutional kernel. There are 32 kernels in the first convolutional layer.
The intensity of each pixel stands for the value of corresponding weight. (a) Randomly initialized weights of the first convolution layer of the University of Pavia
data set. (b) Learned weights of the first convolution layer of the University of Pavia data set.
Fig. 15. Extracted features of the University of Pavia data set. There are six rows, and each row of images represents one class. There are four columns in the
figure. The first column is allocated to the input images. The second column is allocated to the four feature maps after the first convolution. The third column is
allocated to the four feature maps after the first ReLU operation. The last column is composed of the four features of the first pooling operation.
TABLE X
S IMILARITY AND D IVISIBILITY OF S PATIAL F EATURES ON THE U NIVERSITY OF PAVIA D ATA S ET
TABLE XIV
A RCHITECTURE OF THE 3-D C ONVOLUTION N EURAL N ETWORK
TABLE XII
C LASSIFYING W ITH S PATIAL F EATURES ON THE
U NIVERSITY OF PAVIA D ATA S ET
Fig. 17. Classification results with and without dropout on the (left) Indian Pines, (middle) University of Pavia, and (right) KSC data sets.
Fig. 18. Training error with and without ReLU on the (left) Indian Pines, (middle) University of Pavia, and (right) KSC data sets.
TABLE XV
S IMILARITY AND D IVISIBILITY OF S PECTRAL –S PATIAL F EATURES ON THE U NIVERSITY OF PAVIA D ATA S ET
and the learning rate was 0.003. In this set of experiments, the TABLE XVI
C LASSIFICATION W ITH S PECTRAL –S PATIAL F EATURES
number of training epochs CNNs is 400. ON THE I NDIAN P INES D ATA S ET
There are three factors (dropout, ReLU, and the size of the
spatial window) that influence the final classification accuracy
significantly, and they are analyzed in the following.
In the proposed architecture, dropout plays an important
role to address overfitting. In this experiment, the results
(classification error) with and without dropout on the three data
sets are presented in Fig. 17. In the figure, the training errors
without dropout regularization are very low after dozens of
epochs, whereas the test errors without dropout are very high.
This is the problem of overfitting. For the training and test errors
with dropout, the training errors are relatively high, whereas the
test errors are relatively low. This means that the model with
dropout has a good capability of generalization.
The effectiveness of the dropout can be explained in two
ways. The first one is to prevent co-adaptations of the units
on the training samples, and the second one is to average the
predictions of many different networks [43]. If a hidden unit
knows its collaborative units, it leads to good performance on
the training data. However, these units might not perform well
on the test data set. However, if a hidden unit adapts well on
many different collaborative units, it will be more dependent
on itself rather than depending on some certain combinations
of hidden units. Dropout strategy makes it possible to train
different networks, and each network gets a classification result. and the full width is 2W + 1. To have a fair comparison, we
As the training procedure continues, most of the networks give resize other spatial sizes to 27 × 27 and get classification
the correct results to eliminate incorrect results on the final accuracy values using the models aforementioned. For the
classification results. Indian Pines data set, the OA can reach the highest and the value
ReLU is another important factor that is influential to final is nearly 98% when the half width is 14. For the University of
performance. Krizhevsky et al. claimed that the nonsaturating Pavia and KSC data sets, the results show that the best accuracy
nonlinear function as ReLU can gain better performances than values are obtained when the half width is 13.
these saturating nonlinearities such as sigmoid function [26]. In order to evaluate the effectiveness of the extracted
The classification errors with and without ReLU on the three spectral–spatial features, Table XV presents the similarity in
data sets are demonstrated in Fig. 18. From Fig. 18, conver- the same class and the divisibility between the different classes.
gence of the models with sigmoid function are slower than Compared with Tables V and X, after convolution operations,
convergence of the models with ReLU. In particular, on the the spectral–spatial features get the highest similarity in the
Indian Pines data set, a CNN with ReLU (red solid lines) same class and the highest divisibility between the different
reaches a 50% error rate six times faster than the same network classes, which shows that the spectral–spatial features have the
with sigmoid (blue dashed lines). On the other hand, the models potential for accurate classification.
with ReLU can lead to lower training error (close to 0) at the 2) Comparative Experiments With Other Spectral–Spatial
end of training. In summary, CNN with ReLU can accelerate Methods: We also conducted RBF-SVM with the original data
convergence and improve the training accuracy. sets and extended morphological profile (EMP) for compari-
The size of 3-D input is an important parameter too. The son. EMP followed by SVM is an advanced spatial–spectral
dimensionality toward spectral dimension is fixed, whereas the classification method for hyperspectral data. We used opening
dimensionalities toward spatial dimension are changeable. A and closing operations on the first five, seven, and three prin-
set of experiments is organized to get a proper size of 3-D inputs. cipal components of the Indian Pines, University of Pavia, and
Fig. 19 shows the results using different sizes of spatial window. KSC data sets to extract structural information, respectively. In
The half widths of spatial size are set to W = [11, 12, 13, 14, 15], the experiments, the structuring element used was a disk and
6248 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
TABLE XX
C LASSIFICATION A CCURACY VALUES ON THE I NDIAN P INES D ATA S ET
TABLE XXI
C LASSIFICATION A CCURACY VALUES ON THE
U NIVERSITY OF PAVIA D ATA S ET
TABLE XVIII
C LASSIFICATION W ITH S PECTRAL –S PATIAL
F EATURES ON THE KSC D ATA S ET
TABLE XXII
C LASSIFICATION A CCURACY VALUES ON THE KSC D ATA S ET
Fig. 20. Indian Pines. (a) False color image. (b)–(f) Classification maps for different classifiers: (b) 1D-SVM, (c) 3D-EMP-SVM, (d) 3D-CNN, (e) 3D-CNN with
Method A, and (f) 3D-CNN with Method B.
Fig. 21. University of Pavia. (a) False color image. (b)–(f) Classification maps for different classifiers: (b) 1D-SVM, (c) 3D-EMP-SVM, (d) 3D-CNN, (e) 3D-CNN
with Method A, and (f) 3D-CNN with Method B.
Fig. 22. KSC. (a) False color image. (b)–(f) Classification maps for different classifiers: (b) 1D-SVM, (c) 3D-EMP-SVM, (d) 3D-CNN, (e) 3D-CNN with
Method A, and (f) 3D-CNN with Method B.
Under the condition of limited training samples, CNN with optimized. From the resulting images, we can figure out how
virtual samples outperformed EMP-based and original CNN the proposed FE method affects the classification results.
methods in terms of OA, AA, and Kappa coefficient. This From Figs. 20–22, it is obvious that the spectral classi-
proves that CNN with virtual samples is a powerful tool for fication method (1D-SVM) always results in noisy scatter
HSI classification. points in the images [see Figs. 20(a)–22(a)]. While the
In Tables XVIII–XX, in comparison with the original CNN, spectral–spatial methods correct this shortcoming, which elim-
the OA improved by 0.97%, 0.12%, and 0.76% in the Indian inate noisy scattered points of misclassification. The CNN with
Pines, University of Pavia, and KSC data sets, respectively. virtual sample method gives more detailed classification maps
Moreover, the variances of OA are degraded too, which means [see Fig. 22(e) and (f)].
that the CNNs with virtual samples are less influenced by Obviously, both of the proposed virtual sample approaches
different training samples. can increase the classification accuracy of CNN significantly
It can be also found in experiments that CNN classifier will under insufficient training data.
achieve a better performance in terms of classification accuracy
if more virtual samples are created.
2) Classification Maps: At last, the classification accuracy
VII. D ISCUSSION AND C ONCLUSION
values are examined to form a visual perspective. The 1D-
SVM, 3D-EMP-SVM, 3D-CNN, and 3D-CNN with virtual In order to harvest the powerfulness of deep models for
samples are selected to classify the whole images. Figs. 20–22 HSI FE and classification, in this paper, we have proposed
are classification maps of different methods investigated in this deep CNN architectures to extract the spectral, spatial, and
paper for the three data sets. All parameters in these models are spectral-and-spatial-based deep features.
6250 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
The design of proper deep CNN models is the first important [6] A. Villa, J. A. Benediktsson, J. Chanussot, and C. Jutten, “Hyperspectral
issue we are facing. In the design of the spectral deep model, image classification with independent component discriminant analysis,”
IEEE Trans. Geosci. Remote Sens., vol. 49, no. 12, pp. 4865–4876,
we use a small local reception field and three to five convolu- Dec. 2011.
tional layers. For the spatial deep model, we use a small local [7] T. V. Bandos, L. Bruzzone, and G. Camps-Valls, “Classification of hy-
reception field. For the spectral-and-spatial-based deep model, perspectral images with regularized linear discriminant analysis,” IEEE
Trans. Geosci. Remote Sens., vol. 47, no. 3, pp. 862–873, Mar. 2009.
we use a special 3-D CNN model with a large reception field [8] L. M. Bruce, C. H. Koger, and J. Li, “Dimensionality reduction of
in the spectral domain and a small reception field in the spatial hyperspectral data using discrete wavelet transform feature extraction,”
domain to extract the integrated features of HSI. The proper IEEE Trans. Geosci. Remote Sens., vol. 40, no. 10, pp. 2331–2338,
Oct. 2002.
design will balance the capacity and complexity of the network, [9] L. O. Jimenez and D. A. Landgrebe, “Hyperspectral data analysis and
which is very important for further FE and classification. supervised feature reduction via projection pursuit,” IEEE Trans. Geosci.
In hyperspectral remote sensing cases, only limited training Remote Sens., vol. 37, no. 6, pp. 2653–2667, Nov. 1999.
[10] D. Lunga, S. Prasad, M. M. Crawford, and O. Ersoy, “Manifold-learning-
samples are available. To solve the problem of overfitting, we based feature extraction for classification of hyperspectral data: A review
use L2 regularization for spectral CNN. When the input is a 3-D of advances in manifold learning,” IEEE Signal Process. Mag., vol. 31,
cube, overfitting becomes more serious. We then adopt a regu- no. 1, pp. 55–66, Jan. 2014.
[11] T. Han, and D. Goodenough, “Investigation of nonlinearity in hyperspec-
larization entitled dropout. The proper regularization strategies tral imagery using surrogate data methods,” IEEE Trans. Geosci. Remote
play an important role for accurate classification of HSI. Sens., vol. 46, no. 10, pp. 2840–2847, Oct. 2008.
[12] B. Tenenbaum, V. Silva, and C. Langford, “A global geometric framework
Parameters affect the classification accuracy and computa- for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500,
tional complexity. In the realization of deep CNNs for HSI pp. 2319–2323, Dec. 2000.
FE and classification, we gather some useful experience on [13] S. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by
locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326,
parameter setting. The experimental results suggest that one Dec. 2000.
or two layers often provide limited capacity in FE of HSI. [14] C. M. Bachmann, T. L. Ainsworth, and R. A. Fusina, “Improved man-
Based on our experimental results, we suggest using a three- ifold coordinate representations of large-scale hyperspectral scenes,”
IEEE Trans. Geosci. Remote Sens., vol. 44, no. 10, pp. 2786–2803,
layer CNN with 4 × 4 or 5 × 5 convolution kernel and 2 × 2 Oct. 2006.
pooling kernel in each layer for HSI FE. [15] B. Scholkopf and A. J. Smola, Learning With Kernels. Cambridge, MA,
By using proper architecture and powerful regularization, the USA: MIT Press, 2002.
[16] B. C. Kuo, C. H. Li, and J. M. Yang, “Kernel nonparametric weighted
proposed 3-D deep CNN has been demonstrated to provide excel- feature extraction for hyperspectral image classification,” IEEE Trans.
lent classification performance under the condition of limited Geosci. Remote Sens., vol. 47, no. 4, pp. 1139–1155, Apr. 2009.
training samples. The proposed deep model is promising with [17] A. Plaza, J. Plaza, and G. Martin, “Incorporation of spatial constraints into
spectral mixture analysis of remotely sensed hyperspectral data,” in Proc.
high potential, which opens a new window for further research. IEEE Int. Workshop Mach. Learn. Signal Process., Grenoble, France,
In order to further improve the performance of CNN-based 2009, pp. 1–6.
methods, a method entitled virtual sample is proposed. Virtual [18] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and
J. C. Tilton, “Advances in spectral–spatial classification of hyperspectral
samples are generated by changing radiation and different mix- images,” Proc. IEEE, vol. 101, no. 3, pp. 652–675, Mar. 2013.
ture. Then, the training samples and the created virtual samples [19] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. A. Benediktsson, “SVM- and
are used together in order to train a CNN. MRF-based method for accurate classification of hyperspectral images,”
IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 736–740, Oct. 2010.
In summary, to address the HSI FE and classification prob- [20] M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. Sveinsson, “Spectral
lem with limited training samples, we propose an idea of big and spatial classification of hyperspectral data using SVMs and mor-
phological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 11,
network with strong constraints. The big feedforward DNN pp. 3804–3814, Nov. 2008.
using deep 3-D CNN with virtual samples achieves by far the [21] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Spectral–spatial classifica-
best results in terms of classification accuracy. tion of hyperspectral data using loopy belief propagation and active
learning,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 2, pp. 844–856,
CNN is a hot topic in machine learning and computer vi- Feb. 2013.
sion. Various improvements have been made in recent years, [22] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classifi-
and they can be also used in the proposed CNN architecture. cation using dictionary-based sparse representation,” IEEE Trans. Geosci.
Remote Sens., vol. 49, no. 10, pp. 3973–3985, Oct. 2011.
The proposed model can be combined with post-classification [23] B. Song, J. Li, J. M. Bioucas-Dias, and J. A. Benediktsson, “Remotely
processing to enhance mapping performance. It deserves to be sensed image classification using sparse representations of morphological
investigated as a possible future work. attribute profiles,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 8,
pp. 5122–5136, Aug. 2013.
[24] Y. Bengio, A. Courville, and P. Vincent, “Representation learning. A
R EFERENCES review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell.,
[1] J. A. Benediktsson and P. Ghamisi, Spectral–Spatial Classification of Hy- vol. 35, no. 8, pp. 1798–1828, Aug. 2013.
perspectral Remote Sensing Images. Boston, MA, USA: Artech House, [25] N. Kruger et al., “Deep hierarchies in primate visual cortex what can
2015. we learn for computer vision?” IEEE Trans. Pattern Anal. Mach. Intell.,
[2] G. Hughes, “On the mean accuracy of statistical pattern recognizers,” vol. 35, no. 8, pp. 1847–1871, Aug. 2013.
IEEE Trans. Inf. Theory, vol. IT-14, no. 1, pp. 55–63, Jan. 1968. [26] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with
[3] J. B. Dias et al., “Hyperspectral remote sensing data analysis and future deep convolutional neural networks,” in Proc. Neural Inf. Process. Syst.,
challenges,” IEEE Geosci. Remote Sens. Mag., vol. 1, no. 2, pp. 6–36, Lake Tahoe, NV, USA, 2012, pp. 1106–1114.
Feb. 2013. [27] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data
[4] X. Jia, B. Kuo, and M. M. Crawford, “Feature mining for hyperspectral with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, Jul. 2006.
image classification,” Proc. IEEE, vol. 101, no. 3, pp. 676–679, [28] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learn-
Mar. 2013. ing applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
[5] G. Licciardi, P. R. Marpu, J. Chanussot, and J. A. Benediktsson, “Linear pp. 2278–2324, Nov. 1998.
versus nonlinear PCA for the classification of hyperspectral data based on [29] Y. Chen, Z. Lin, X. Zhao, and G. Wang, “Deep learning-based classifi-
the extended morphological profiles,” IEEE Geosci. Remote Sens. Lett., cation of hyperspectral data,” IEEE J. Sel. Topics Appl. Earth Observ.
vol. 9, no. 3, pp. 447–451, May 2011. Remote Sens., vol. 7, no. 6, pp. 2094–2107, Jun. 2014.
CHEN et al.: DEEP FEATURE EXTRACTION AND CLASSIFICATION OF HYPERSPECTRAL IMAGES 6251
[30] Y. Chen, X. Zhao, and X. Jia, “Spectral–spatial classification of hyper- Hanlu Jiang received the Bachelor’s degree in re-
spectral data based on deep belief network,” IEEE J. Sel. Topics Appl. mote sensing science and technology in 2014 from
Earth Observ. Remote Sens., vol. 8, no. 6, pp. 1–12, Jun. 2015. the Harbin Institute of Technology, Harbin, China,
[31] A. Romero, C. Gatta, and G. Camps-Valls, “Unsupervised deep feature where she is currently working toward the Master’s
extraction for remote sensing image classification,” IEEE Trans. Geosci. degree in the School of Electronics and Information
Remote Sens., vol. 54, no. 3, pp. 1349–1362, Mar. 2016. Engineering.
[32] Y. LeCun, C. Cortes, and C. Burges, The MNIST Database of Handwritten Her research area is in remote sensing image
Digits. [Online]. Available: http://yann.lecun.com/exdb/mnist/ processing technologies.
[33] J. Deng and F. Li, “ImageNet: A large-scale hierarchical image database,”
in Proc. CVPR, Miami, FL, USA, 2009, pp. 248–255.
[34] G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep
belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, Jul. 2006.
[35] N. LeRoux and Y. Bengio, “Deep belief networks are compact uni-
versal approximators,” Neural Comput., vol. 22, no. 8, pp. 2192–2207,
Aug. 2010.
[36] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, Chunyang Li has been working toward the Master’s
“Stacked denoising autoencoders,” J. Mach. Learn. Res., vol. 11, no. 12, degree in the Department of Information Engineer-
pp. 3371–3408, Dec. 2010. ing, School of Electronics and Information Engineer-
[37] Z. Zuo et al., “Learning contextual dependence with convolutional hier-
ing, Harbin Institute of Technology, Harbin, China,
archical recurrent neural networks,” IEEE Trans. Image Process., vol. 25,
since 2015.
no. 7, pp. 2983–2996, Jul. 2016.
Her research concerns remote sensing image
[38] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE
processing based on deep learning methods.
CVPR, Boston, MA, USA, 2015, pp. 1–9.
[39] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in Proc. ICLR, 2015, pp. 1–14.
[40] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies
for accurate object detection and semantic segmentation,” in Proc. IEEE
CVPR, Columbus, OH, USA, 2014, pp. 581–587.
[41] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation
for non-orthogonal problems,” Technimetrics, vol. 12, no. 1, pp. 55–67,
Jan. 1970.
[42] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Xiuping Jia (SM’03) received the B.Eng. degree
Boltzmann machines,” in Proc. Int. Conf. Mach. Learn., Haifa, Israel, from the Beijing University of Posts and Telecom-
2010, pp. 807–814. munications, Beijing, China, in 1982 and the Ph.D.
[43] G. E. Hinton et al., “Improving neural networks by preventing co- degree in electrical engineering from The University
adaptation of feature detectors,” Comput. Sci., vol. 3, no. 4, pp. 212–223, of New South Wales, Sydney, Australia, in 1996.
2012. Since 1988, she has been with the School of
[44] R. E. Edwards, H. Zhang, and L. E. Parker, “Approximate l-fold cross- Engineering and Information Technology, The Uni-
validation with least squares SVM and kernel ridge regression,” in Proc. versity of New South Wales, Canberra, Australia,
12th ICMLA, Miami, FL, USA, Dec. 2013, pp. 58–64. where she is currently a Senior Lecturer. She is also a
[45] W. Hofmann, “Remote sensing: The quantitative approach,” IEEE Trans. Guest Professor with Harbin Engineering University,
Pattern Anal. Mach. Intell., vol. 3, no. 6, pp. 713–714, Jun. 1981. Harbin, China, and an Adjunct Researcher with the
[46] D. J. Bartholomew, F. Steele, J. Galbraith, and I. Moustaki, “Analysis of National Engineering Research Center for Information Technology in Agricul-
multivariate social science data,” Struct. Equation Model. Multidiscipli- ture, Beijing. She is the coauthor of the remote sensing textbook titled Remote
nary J., vol. 18, no. 4, pp. 686–693, Apr. 2011. Sensing Digital Image Analysis [Springer-Verlag, 3rd ed. (1999) and 4th ed.
[47] H. Yang, F. Qin, and Y. Wang, “LLE-PLS nonlinear modeling method for (2006)]. Her research interests include remote sensing and image data analysis.
near infrared spectroscopy and its application,” Spectrosc. Spectral Anal., Dr. Jia served as the inaugural Chair of the IEEE Australia Capital Territory
vol. 27, no. 10, pp. 1955–1958, Oct. 2007. and New South Wales Section GRSS Chapter from 2010 to 2013. She is
[48] C. Chang and C. Lin, “LIBSVM: A library for support vector machines,” an Associate Editor of the IEEE T RANSACTIONS ON G EOSCIENCE AND
ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1–27, Mar. 2011. R EMOTE S ENSING.
[49] L. G. Chova, D. Tuia, G. Moser, and G. C. Valls, “Multimodal classifi-
cation of remote sensing images: A review and future directions,” Proc.
IEEE, vol. 103, no. 9, pp. 1560–1584, Nov. 2015.
[50] C. Tao, H. Pan, Y. Li, and Z. Zou, “Unsupervised spectral–spatial fea-
ture learning with stacked sparse autoencoder for hyperspectral im-
agery classification,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 12, Pedram Ghamisi (S’12–M’15) received the B.Sc.
pp. 2438–2442, Dec. 2015. degree in civil (survey) engineering from the Islamic
[51] J. A. Benediktsson, J. A. Palmason, and J. R. Sveinsson, “Classification Azad University, South Tehran Branch, Tehran, Iran;
of hyperspectral data from urban areas based on extended morphological the M.Sc. degree (with first class honors) in re-
profiles,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 480–491, mote sensing from Khajeh Nasir Toosi University of
Mar. 2005. Technology, Tehran, in 2012; and the Ph.D. degree
in electrical and computer engineering from the
University of Iceland, Reykjavik, Iceland, in 2015.
He was a Postdoctoral Research Fellow with the
Yushi Chen (M’11) received the Ph.D. degree from
University of Iceland. Since October 2015, he has
the Harbin Institute of Technology, Harbin, China, been a Postdoctoral Research Fellow with Signal
in 2008. Processing in Earth Observation, Technical University of Munich, Munich,
Currently, he is an Associate Professor with the
Germany, and a Researcher with the Remote Sensing Technology Institute
School of Electronics and Information Engineering,
(IMF), German Aerospace Center (DLR), Weßling, Germany. His research
Harbin Institute of Technology. His research interests interests are in remote sensing and image analysis with special focus on spectral
include remote sensing data processing and machine and spatial techniques for hyperspectral image classification and the integration
learning.
of LiDAR and hyperspectral data for land-cover assessment.
In 2015, Dr. Ghamisi won the prestigious Alexander von Humboldt
Fellowship.