A2

View metadata, citation and similar papers at core.ac.
uk brought to you by CORE

provided by UMP Institutional Repository
2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications
Review of Deep Convolution Neural Network in

Image Classification
Ahmed Ali Mohammed Al-Saffar, Hai Tao, Mohammed Ahmed Talab
Faculty of Computer Systems and Software Engineering
Universiti Malaysia Pahang
Pahang, Malaysia
haitao@ump.edu.my
Abstract—With the development of large data age, recognition system, feature extraction methods such as Scale-
Convolutional neural networks (CNNs) with more hidden layers invariant feature transform (SIFT [1]) and histogram of
have more complex network structure and more powerful feature oriented gradients (HOG [2]) were used, and then the extracted
learning and feature expression abilities than traditional machine Feature input classifier for classification and recognition. These
learning methods. The convolution neural network model trained features are essentially a feature of manual design. For
by the deep learning algorithm has made remarkable different identification problems, the extracted features have a
achievements in many large-scale identification tasks in the field direct impact on the performance of the system, so the
of computer vision since its introduction. This paper first researchers need to study the problem areas to be studied in
introduces the rise and development of deep learning and
order to design Adaptability to better features, thereby
convolution neural network, and summarizes the basic model
structure, convolution feature extraction and pooling operation
improving system performance. This period of image
of convolution neural network. Then, the research status and recognition system is generally for a specific identification task,
development trend of convolution neural network model based and the size of the data is not large, generalization ability is
on deep learning in image classification are reviewed, which is poor, it is difficult in the practical application of the problem to
mainly introduced from the aspects of typical network structure achieve accurate identification effect.
construction, training method and performance. Finally, some
problems in the current research are briefly summarized and II. CONVOLUTION NEURAL NETWORK
discussed, and the new direction of future development is
forecasted. Deep learning is a branch of machine learning, which is
one of the major breakthroughs and research hotspots in
Keywords—Deep learning; convolution neural network; image machine learning in recent years. In 2006, Geoffery Hinton, a
recognition; Image Classification. professor of computer science at the University of Toronto, and
his student, Ruslan Salakhutdinov, published an article in the
I. INTRODUCTION international top academic journal Science, [3], for the first
time in the depth of learning. This paper mainly points out two
Computer vision (CV) is a study of how to use computer points: (1) Artificial neural network with multiple hidden
simulation of human visual science, its main task is through the layers has a very powerful feature learning ability. The
collection of images (or video) analysis and understanding, to characteristics extracted by the training model have more
make judgments or decisions. In the past few decades, CV has abstract and more basic expression of the original input data, (2)
made great progress and development. The Image recognition By using the unsupervised learning algorithm to achieve a
is a kind of technology that uses computer to process, analyze method called "layer initialization" to achieve the input data
and understand the image to identify the target and object of information hierarchical expression, which can effectively
different modes. It is a major research direction in the field of reduce the depth of the neural network Training difficulty.
computer vision. In the image-based intelligent data acquisition Subsequently, the depth of learning in academia and industry
and Processing has a very important role and impact. The use continues to heat up, in the speech recognition, image
of image recognition technology can effectively deal with the recognition and natural language processing and other fields to
detection and identification of specific target objects (such as obtain a breakthrough. Since 2011, the researchers first in the
face, handwritten characters or goods), image classification and voice recognition problem on the application of in-deep
subjective image quality assessment and other issues. learning technology, the accuracy rate increased by 20% to
At present, image recognition technology has great 30%, made more than a decade the biggest breakthrough. Only
commercial market and good application prospect in Internet a year later, the deep learning model based on convolution
applications such as image search, commodity neural network has achieved great performance improvement
recommendation, user behavior analysis and face recognition. in large-scale image classification tasks, and set off the upsurge
At the same time, high-tech such as intelligent robot, of deep learning. In the literature [4], two kinds of acoustic
unmanned driving and unmanned aerial vehicle Industry and modeling methods based on deep neural network are proposed,
biology, medicine and geology and many other disciplines which are more effective than the traditional modeling method,
have broad application prospects. In the early image and have been made larger in Uyghur's large vocabulary
978-1-5386-3849-1/17/$31.00 ©2017 IEEE
26
continuous speech recognition of the performance of the similar to a biological neural network, and the capacity of the
upgrade. At present, Google, Microsoft and Facebook and model can be adjusted by changing the depth and breadth of
many other Internet technology companies competed to invest the network, and has a strong assumption for natural images
a lot of resources, research and development layout of large- (statistical smoothness and local Correlation) . Therefore,
scale depth of learning system. CNNs can effectively reduce the learning complexity of the
network model, have fewer network connections and weight
In the early 1960s, Hubel and Wiesel, through the study of parameters, and are more likely to be trained than the fully
cat's visual cortical system of cat, proposed the concept of connected network with a considerable size.
receptive field [5] and found the hierarchical processing
mechanism of information in the visual cortical pathway,
Nobel Prize in Physiology or Medicine. By the mid-1980s, B. Network Structure
Fukushima et al. [6], which was based on the concept of A simple convolution neural network model structure
receptive field, could be seen as the first realization of diagram shown in Fig. 1, the network model consists of two
Convolution neural networks (CNNs) and the first neuron- convolution layers (C1, C2) and two sub-sampling layer (S1,
based Between the local connectivity and the hierarchical S2) alternately. First, the original input image is convoluted by
structure of the artificial neural network. The neural cognition three trained filters (called convolution kernel) and addable
machine decomposes a visual pattern into many subpatterns, bias vectors. Three feature maps are generated in the C1 layer,
and these subpattern features are processed by hierarchical and then, for each feature map The localized regions are
cascaded feature planes so that the model is very good even in weighted and averaged, and three new feature maps are
the case of small targets of the target object Recognition ability. obtained in the S1 layer through a nonlinear activation function.
After that, the researchers began experimenting with the use of These feature maps are then convoluted with the three trained
an artificial neural network (actually a shallow model with only filters of the C2 layer, and three feature maps are output
one hidden layer node) called a multi-layer sensor [7] instead through the S2 layer. The final output of the S2 layer is
of manually extracting features and using A simple stochastic vectorized and then input into the traditional neural network for
gradient descent method to train the model, and further training.
proposed a back propagation algorithm for calculating the error
gradient, which was subsequently proved to be very effective C. Convolution Feature Extraction
[8]. In 1990, LeCun et al. [9] studied the handwritten digital Natural images have its inherent characteristics, that is, for
identification problem, first proposed the use of gradient back a part of the image, its statistical characteristics and other parts
propagation algorithm training convolution neural network of the same. This means that the features learned in this section
model, and in MNIST [10] handwritten digital data set to show can also be used on another part, so the same learning feature
relative to the time Other methods for better performance. The can be used for all positions on the image. In other words, for
success of the gradient back propagation algorithm and the large-size image recognition problems, a small piece of local
convolution neural network brings new hope to the machine data is randomly selected from the image as a training sample,
learning field. It opens up the wave of machine learning based some features are learned from the small sample, and then
on the statistical learning model, and also brings the artificial these features are used as filters, with the original whole image
neural network into a new stage of vigorous development. At For convolution operations, resulting in the original image at
present, the convolution neural network has become a research any position on the different characteristics of the activation
hotspot in the field of speech analysis and image recognition. It value. Given a large image with a resolution of r × c, it is
is the first real learning model of successful training of multi- defined as xlarge. First, a small sample of xx is taken from
layer neural networks, which is more obvious when the input xlarge, and k features and activation values f (W) are obtained
of the network is multidimensional The advantages. Conch by training sparsely from the encoder (1) xsmall + b (1)),
neural network has been applied to different large-scale where W (1) and b (1) are the trained parameters. And then
machine learning problems such as speech recognition, image calculate the corresponding activation value fs (W (1) xsmall +
recognition and natural speech processing, as the new machine b (1) for each x × the size of xs in xlarge), and further use the
learning boom has been explored in depth.
A. Concept
Convolution neural network is a multi-layer artificial neural
network specially designed to handle two-dimensional input
data. Each layer in the network is composed of multiple two-
dimensional planes, and each plane consists of multiple
independent neurons Composition, adjacent two layers of
neurons connected to each other, and in the same layer of
neurons are not connected between. CNNs are inspired by the
early time delay neural networks [11] and TDNNs. TDNN
reduces the computational complexity in the network training
process by sharing the weights in the time dimension, and is
suitable for processing speech signals and time Sequence signal. Fig. 1. Simplified convolution neural network structure.
CNNs use a weight-sharing network structure to make it more
27
activation value of xsmall and convolution of these activation
values fs, We obtain the feature map of k × (r - a + 1) × (c - b +
1) convolution. Two-dimensional convolution calculation
diagram shown in Fig. 2. For example, for a raw input image
with a resolution of 128 x 128, it is assumed that 200 8x8 size
feature fragments of the image have been obtained by pre-
training. Then, by using these 200 feature fragments, each 8 ×
8 small block region in the original image is convolved, and
each feature fragment can get a convolution feature map of 121
× 121, and finally the whole image can be obtained 200 × 121
× 121 convolution feature map. Lu Hongtao et al: Research on
the Application of Depth Convolution Neural Network in
Computer Vision 3 Fig. 2 Schematic diagram of two-
Fig. 2. Illustration of two-dimensional convolution operation
dimensional convolution operation
D. Pooling Operation
By extracting the features extracted from the convolution
layer into the classifier for training, the final classification
result can be output. Theoretically, all the features extracted
from the convolution layer can be input directly into the
classifier, but this will require very large computational
overhead, especially for large-size high-resolution images. For
example, for an image sample with an input of 96 × 96 size, it
is assumed that convolution operations are performed using
200 8 × 8 size convolution cores in the convolution layer. Each
convolution kernel outputs one (96 - 8 + 1) × (96 - 8 + 1) = 7
921 dimension, the final convolution layer will output a feature
vector of 7 921 × 200 = 1 584 200 dimensions. The ability to
input such high-dimensional features into the classifier requires
Fig. 3. The maximum pool operation operation diagram
a very large computational resource and a serious over-fitting
problem. However, since the image has a "static" attribute, the
feature obtained in a local region of the image is highly likely III. IMAGE CLASSIFICATION
to apply equally in another local area. Thus, it is possible to
The Image classification problem is through the analysis of
perform aggregate statistical operations on the characteristics
the image, the image is classified as a number of categories of
of the different locations in a local area of the image, which is
one, the main emphasis on the overall image of the semantic
referred to as "pooling". For example, calculate the maximum
judgments. There are a lot of tagged data sets for evaluating
(or average) of a convolution feature in the local area, called
image classification algorithms, such as CIFAR-10/100 [12],
the maximum pool (or average pool). Specifically, assuming
Caltech-101/256 [13-14] and ImageNet [15], where ImageNet
that the pooled area size is m × n, after the convolution feature
contains more than 15 000 000 High-resolution images with
is obtained, the convolution feature is divided into a plurality of
labels, these images are divided into more than 22,000
m × n size disjoint areas, and then the pooling operation is
categories. From 2010 to the present, the annual ImageNet
performed on these areas, Get the characteristic map after
Large Scale Visual Recognition Challenge (ILSVRC) image
pooling in. Fig. 3.
classification competition is an important event to evaluate the
The maximum pooling is performed on a 4-block non- image classification algorithm. Its data set is a subset of
coincident sub-region using a 3 × 3 size window to obtain a ImageNet, which contains millions of images, which are
pooled feature map. If the continuous range in the image is divided into 1,000 categories. Among them, the 2010 and 2011
selected as the pooled area and only the convolution features winners are using the traditional image classification algorithm,
generated by the same implicit neurons are used for pooling, mainly using SIFT, LBP [16] and other algorithms to manually
these pooled feature units have translation invariance. That is, extract features, and then extract the characteristics used to
even if the object in the original image produces a small support the support vector machine (Support vector machine,
translation, the same pooling feature can still be obtained, and SVM) and other classifiers for classification, the best result is
the classifier can still output the same classification result. 28. 2% error rate [17]. ILSVRC2012 is an important turning
These statistical features can not only greatly reduce the point in the field of large-scale image classification. In this
dimension of the eigenvector, but also reduce the tournament, Alex Krizhevsky et al. Proposed AlexNet [18] for
computational effort required by the training classifier and the first time to apply deep learning to large-scale image
expand the training data effectively, which is helpful to prevent classification and achieved 16. 4% error rate, which is about
over-fitting. 10% lower than the second team using the traditional algorithm.
As shown in Fig. 4, AlexNet is an 8-layer convolution neural
network, the first five layers are convolutions, and the last three
are all connected layers, where the last layer is classified by
28
softmax. The model uses Rectified linear units (ReLU) to the Microsoft Asia Research Institute team designed by
replace the traditional Sigmoid and tanh functions as neuron's SPPNet [22], they proposed a new pooling method called the
nonlinear activation functions, and proposes the Dropout space pyramid pool, as shown in Fig. 6. Most of the
method to reduce the over-fitting problem. convolution neural network models require that the input image
size be fixed, so the original image needs to be cut, which will
After the development of AlexNet model, the model based result in the loss of the original image information; or the need
on deep convolution neural network began to replace the to adjust the size and aspect ratio of the image, Resulting in
traditional image classification algorithm to become the distortion. Note that the size of the input image is not limited,
mainstream method used in the ILSVRC image classification only the full connection layer due to the number of parameters
competition team. ILSVRC2013's winning team Clarifai [19] fixed, you need to ensure that the input dimension is fixed.
proposed a set of convolution neural network visualization However, the output dimension of the convolution layer
method, the use of deconvolution network of AlexNet each changes as the input dimension changes, so it is necessary to
convolution layer to visualize, in order to analyze the ensure that the input image size is fixed. Therefore, the role of
characteristics of each layer to learn, so Deepened the space pyramid pooling is the loss of any dimension Lv
understanding of why the convolution neural network can Hongtao et al: Depth Convolution neural network in the
achieve good results in image classification, and thus improved
application of computer vision review, so that the network can
the model, made 11. 7% error rate. ILSVRC2014 image accept any size of the image as input.
classification results compared to the previous year made a
major breakthrough, which won the Google team made by The pooling method divides the input into fixed number of
Google Team [20] to 6. 7% error rate reduces the error rate of local space blocks and maximizes pooling in each block to
the image classification game to half of the best record of the ensure that the output dimension is fixed. Using multi-level
past. spatial block division method, you can extract the
characteristics of different scales. At the beginning of 2015,
The enhancement of the convolution neural network is PReLU-Nets [23], a researcher at the Microsoft Asia Research
based on the multi-scale processing method. This paper Institute, made 4 on the ILSVRC image classification data set.
proposes the Inception module based on Network in network 94% of the top-5 error rate, becoming the first time in the data
[21]. The structure of the Inception module is shown in Fig. 5, set over the human eye recognition effect (error rate of about 5.
and its main idea is to find the optimal local sparse structure of 1% [17]) model. Compared with the previous convolution
the image and replace it with a dense component. In this way, neural network model, the model has two improvements, one is
we can achieve effective dimensionality reduction, which can to promote the traditional modified linear unit (ReLU),
increase the width and depth of the network under the same proposed parametric correction linear unit (PReLU). The
computing resources. On the other hand, we can reduce the activation function can adaptively learn the parameters of the
parameters that need to be trained, so as to reduce the over- correction unit and can improve the accuracy of the recognition
fitting problem and improve the model's ability to promote The if the additional calculation cost is negligible. At the same time,
In ILSVRC2014, The 1% error rate of the third place is from this model deduces a set of robust initialization method by
modeling the modified linear unit (ReLU / PReLU), which can
make the model with more layers (such as 30 models with
weighted layer) convergence. Shortly thereafter, Google
normalized each mini-batch while training the network, calling
it Batch normalization, applying the training method to
GoogleNet, and 4 on the ILSVRC2012 data set. 82% of the
top-5 error rate [24]. Normalization is a commonly used input
Fig. 4. Simplified AlexNet model structure
Fig. 5. Simplified Inception module structure [20] . Fig. 6. Space pyramid pool model structure [22]
29
data preprocessing method for training deep neural networks, the connection between the layers in the network, so that the
which can reduce the influence of the initial weight of the accuracy rate can be improved without decreasing with the
training parameters on the training effect and accelerate the increase of the network layer The Because ImageNet has the
convergence. So Google's researchers apply the normalized characteristics of large scale and large image class, the model
method to the activation function within the network, trained by ImageNet has a strong ability of generalization, and
normalizing the transmission data between layers. Since the it can get good classification result on other data sets. If the
training uses a random gradient descent method, such image is further fine-tuned on the target data set , And most of
normalization can only be done in each mini-batch, so it is the training with only the target data set to get better results.
named Batch normalization. The method can make the training The first to use the convolution neural network for object
can use a higher learning rate, reduce training time; at the same detection RCNN model [27], is to use ImageNet trained
time reduce the over-fitting, improve the accuracy rate. AlexNet model in the PASCAL VOC data set for fine-tuning
Although the convolution neural network already has a strong after the extraction of image features, made than the previous
image learning ability, this type of model lacks learning for model 20% higher accuracy. In addition, the models trained
invariance of image space, especially the lack of learning for with the ImageNet dataset are applied to other types of data
image rotation invariance [19]. The globalization transformer sets such as remote sensing image classification [28], indoor
[25] proposed by Google DeepMind aims to enhance the scene classification [29], and have achieved better results than
accuracy of its image classification by increasing the learning previous methods. Since the depth of learning for the first time
ability of the convolution neural network for image spatial in ILSVRC2012 was applied to the image classification game
invariance. The Spatial transformer is a module that can be and achieved remarkable results, based on the depth of learning
added at any depth of the convolutional neural network. It can method model began to be widely used in the field of image
perform a series of spatial transformations on the input data, recognition, the emergence of new depth of the neural network
making the output feature. model is constantly refreshing the game record , But also
makes the depth of the neural network model for the image
During the training process, the module can autonomously features of the learning ability to upgrade. At the same time,
learn the parameters required for spatial transformation and due to the emergence of large-scale data sets such as ImageNet
does not need to add any additional supervisory processing to and MSCOCO, the depth network model can be well trained,
the training. In the results of the ILSVRC2015 from the and the model trained by a large number of data has stronger
ImageNet Computer Vision Recognition Challenge at the end generalization ability and can better adapt to the practical
of 2015, the depth of the 152-layer deep residual network from application Need to learn the data set to enhance the
the Microsoft Asia Research Institute team received image classification effect.
detection, image classification and image positioning at an
absolute advantage. Of the championship, which in the image
classification of the data set made 3. 57% error rate [26]. With IV. CONCLUSIONS
the deepening of the number of convolution neural networks, Deep learning is currently a very popular research
the training process of the network is more difficult, resulting direction, the use of convolution neural network convolution
in the accuracy rate began to reach saturation or even decline. layer, pool layer and the whole connection layer and other
The team's researchers believe that when a network reaches the basic structure, you can let the network structure to learn and
optimal training effect, it may require some layers of output extract the relevant features, and to be used. This feature
and input exactly the same; then let the network layer learning provides many conveniences for many studies, eliminating the
value of 0 residual function than learning function is easier. need for a very complex modeling process. In addition, deep
Therefore, the deep residual network will be used in the learning is now in the image classification, object detection,
residual representation of the network, put forward the idea of attitude estimation and image segmentation and so on have
residual learning. As shown in Fig. 7, in order to achieve the been very big results and progress. On the one hand, the depth
residual learning, the Shortcut connection method is applied to of learning application is very wide, and versatility, can
continue to work to expand it to other applications. On the
other hand, there are still many potentials to learn, and it is
worth exploring and discovering. In the future, despite the fact
that many of the previous discussions are supervised (for
example, the last layer of the trained network will calculate a
loss value based on the real value and then adjust the
parameters), and the supervised study does achieve a very
large The success of. The application of deep learning in
unsupervised learning is likely to be a future trend. After all,
in the case of humans or animals, in most cases, we do not
know what it is by knowing the name of the thing. In the
future field of computer vision, it is expected that the recurrent
neural network (RNN) based on deep learning will become a
very popular network model and will achieve a better
breakthrough in more applied research with progress. In
Fig. 7. Residual Learning Module [26] addition, the combination of strong chemical methods to train
an end-to-end learning system is gradually possible, so that the
30
learning system with independent learning ability, can take the [10] Y. LeCun, C. Cortes. MNIST handwritten digit database [EB / OL].
initiative to learn the relevant features of the representation Http: // yann. Lecun. Com / exdb / mnist, 2010.
and abstraction. At present, research combined with deep [11] A. Waibel, et al. Phoneme recognition using time-delay neural
networks. Acoustics, Speech and Signal Processing, IEEE Transactions
learning and intensive learning is still in its infancy, but some on, 1989, 37 (3): 328-339.
research in this area has achieved good performance in multi- [12] A. Krizhevsky. Learning multiple layers of features from tiny images.
object recognition tasks and video game learning. So that Toronto, Canada: University of Toronto, 2009.
many of the relevant areas of researchers are excited about one [13] L. Fei-Fei, R. Fergus, P. Perona. Learning generative visual models from
of the reasons. It is noteworthy that natural language few training examples: An incremental bayesian approach tested on 101
processing is also the potential to learn the future stage of the object categories. Computer Vision and Image Understanding, 2007, 106
potential to show their skills, for example, for an article or a (1): 59-70.
large text, can be designed based on some depth of the neural [14] G. Griffin, A. Holub, P. Perona. Caltech-256 object category dataset.
network model (such as RNN) method and Strategy, can Technical Report 7694, http: // authors. Library. Caltech Edu / 7694,
California Institute of Technology, 2007.
effectively understand the text content. In general, people now
[15] J. Deng, et al. Imagenet: A large-scale hierarchical image database.
use the depth of learning and some simple reasoning, it has Computer Vision and Pattern Recognition (CVPR), IEEE Conference
been in the field of voice and image has achieved very good on. Miami, USA: IEEE, 2009: 248-255.
results. There is reason to believe that if the current feature of [16] T. Ahonen, A. Hadid, M. Pietikainen. Face description with local binary
the network extraction can be further optimized so that it can patterns: Application to face recognition. Pattern Analysis and Machine
more "freely" to express the characteristics, coupled with Intelligence, IEEE Transactions on, 2006, 28 (12): 2037-2041.
some complex reasoning, then the depth of learning will be in [17] O. Russakovsky, et al. Imagenet large scale visual recognition challenge
the application of artificial intelligence to achieve greater International Journal of Computer Vision, 2015, 115 (3): 211-252.
Progress. [18] A. Krizhevsky, J. Sutskever, G. E. Hinton. Imagenet classification with
deep convolutional neural networks. Advances in Neural Information
Processing Systems. Cambridge: MIT Press, 2012: 1097-1105.
ACKNOWLEDGMENT [19] M. D. Zeiler, R. Fergus. Visualizing and understanding convolutional
This work was supported in part by RDU1603102. networks. New York: Springer International Publishing, 2014: 818-833.
[20] C. Szegedy, et al. Going deeper with convolutions. Computer Vision and
Pattern Recognition (CVPR), IEEE Conference on. Boston, USA: IEEE,
2015: 1-9.
REFERENCES [21] M. Lin, Q. Chen, S. Yan. Network in network [EB / OL]. Http: // arxiv.
Org / abs / 1312. 4400,2013.
[1] D. G. Lowe, “Distinctive image features from scale-invariant
keypoints,” International Journal of Computer Vision, 2004, vol. 60, [22] K. He, et al. Spatial pyramid pooling in deep convolutional networks for
no.2, pp: 91-110, Nov. 2004. visual recognition. Computer VisionECCV 2014. New York: Springer
International Publishing, 2014: 346-361.
[2] N. Dalal, B. Triggs. Histograms of oriented gradients for human
detection. Computer Vision and Pattern Recognition (CVPR), IEEE [23] K. He, et al. Delving deep into rectifiers: Surpassing human-level
Computer Society Conference on. San Diego, USA: IEEE, 2005, 1: 886- performance on imagenet classification [EB / OL]. Http: // arxiv. Org /
893. abs / 1502. 01852,2015.
[3] G .E. Hinton, R.R Salakhutdinov. Reducing the dimensionality of data [24] S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network
with neural networks [J]. Science, 2006, 313 (5786): 504- 507. training by reducing internal covariate shift [EB / OL]. Http: // arxiv.
Org / abs / 1502. 03167,2015.
[4] T. Maimaitiaili, L. Dai. Deep neural network based uyghur large
vocabulary continuous speech recognition Journal of Data Acquisition [25] M. Jaderberg, K. Simonyan, A. Zisserman. Spatial transformer
and Processing, 2015, 30 (2): 365-371. networks. Advances in Neural Information Processing Systems.
Montréal, Canada: [s. N ] 2015: 2008-2016.
[5] D. H. Hubel, T. N. Wiesel. Receptive fields, binocated interaction of
functional architecture in the cat's visual cortex. The Journal of [26] K. He, et al. Deep residual learning for image recognition [EB / OL].
Physiology, 1962, 160 (1): 106-154. Http: // arxiv. Org / abs / 1512. 03385,2015.
[6] K. Fukushima, S. Miyake. Neocognitron: A new algorithm for pattern [27] R. Girshick, et al. Rich feature hierarchies for accurate object detection
recognition tolerant of deformations and shifts in position. Pattern and semantic segmentation. Computer Vision and Pattern Recognition
Recognition, 1982, 15 (6): 455-469. (CVPR), IEEE Conference on. Columbus, USA: IEEE, 2014: 580-587.
[7] D. W. Ruck, S. K. Rogers, M. Kabrisky. Feature selection using a [28] M. Castelluccio, et al. Land use classification in remote sensing images
multilayer perceptron Journal of Neural Network Computing, 1990, 2 by convolutional neural networks [EB / OL]. Http: // arxiv. Org / abs /
(2): 40-48. 1508. 00092,2015.
[8] D. E. Rumelhart, G. E. Hinton, R. J. Williams. Learning representations [29] M. Hayat, et al. A spatial layout and scale invariant feature
by back-propagating errors. Nature, 1986, 323: 533-538. representation for indoor scene classification [EB / OL]. Http: // arxiv.
Org / abs / 1506. 05532,2015.
[9] Y. LeCun, et al. Handwork digit recognition with a back-propagation
network. Advances in Neural Information Processing Systems.
Colorado, USA: [s. N ], 1990: 396-404.
31

A2

Uploaded by

Copyright:

Available Formats

A2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A2

Uploaded by

Copyright:

Available Formats

View metadata, citation and similar papers at core.ac.

uk brought to you by CORE

Review of Deep Convolution Neural Network in

978-1-5386-3849-1/17/$31.00 ©2017 IEEE

Fig. 4. Simplified AlexNet model structure

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.