Autoencoder: Difference between revisions

Content deleted Content added
History: a little harvard style citation
Citation bot (talk | contribs)
Removed parameters. | Use this bot. Report bugs. | #UCB_CommandLine
 
(11 intermediate revisions by 4 users not shown)
Line 3:
{{Use dmy dates|date=March 2020|cs1-dates=y}}
{{Machine learning|Artificial neural network}}
An '''autoencoder''' is a type of [[artificial neural network]] used to learn [[Feature learning|efficient codings]] of unlabeled data ([[unsupervised learning]]). An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an [[Feature learning|efficient representation]] (encoding) for a set of data, typically for [[dimensionality reduction]], to generate lower-dimensional embeddings for subsequent use by other [[machine learning]] algorithms.<ref>{{Cite book|last1=Bank |first1=Dor |last2=Koenigstein |first2=Noam |last3=Giryes |first3=Raja |year=2023 |chapter=Autoencoders |editor-last1=Rokach |editor-first1=Lior |editor-last2=Maimon |editor-first2=Oded |editor-last3=Shmueli |editor-first3=Erez |title=Machine learning for data science handbook |chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-24628-9_16 |language=en |pages=353–374 |doi=10.1007/978-3-031-24628-9_16|isbn=978-3-031-24627-2 }}</ref>
 
Variants exist, aimingwhich aim to forcemake the learned representations to assume useful properties.<ref name=":0" /> Examples are regularized autoencoders (''Sparsesparse'', ''Denoisingdenoising'' and ''Contractivecontractive'' autoencoders), which are effective in learning representations for subsequent [[Statistical classification|classification]] tasks,<ref name=":4" /> and [[Variational_autoencoder|''Variationalvariational'' autoencoders]], withwhich can be applicationsused as [[generative model]]s.<ref name=":11">{{cite journal |arxiv=1906.02691|doi=10.1561/2200000056|bibcode=2019arXiv190602691K|title=An Introduction to Variational Autoencoders|date=2019|last1=Welling|first1=Max|last2=Kingma|first2=Diederik P.|journal=Foundations and Trends in Machine Learning|volume=12|issue=4|pages=307–392|s2cid=174802445}}</ref> Autoencoders are applied to many problems, including [[face recognition|facial recognition]],<ref>Hinton GE, Krizhevsky A, Wang SD. [http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf Transforming auto-encoders.] In International Conference on Artificial Neural Networks 2011 Jun 14 (pp. 44-51). Springer, Berlin, Heidelberg.</ref> [[Feature (computer vision)|feature detection]],<ref name=":2">{{Cite book|last=Géron|first=Aurélien|title=Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow|publisher=O’Reilly Media, Inc.|year=2019|location=Canada|pages=739–740}}</ref> [[anomaly detection]], and acquiring[[Word embedding|learning the meaning of words]].<ref>{{cite journal|doi=10.1016/j.neucom.2008.04.030|title=Modeling word perception using the Elman network|journal=Neurocomputing|volume=71|issue=16–18|pages=3150|date=2008|last1=Liou|first1=Cheng-Yuan|last2=Huang|first2=Jau-Chi|last3=Yang|first3=Wen-Chie|url=http://ntur.lib.ntu.edu.tw//handle/246246/155195 }}</ref><ref>{{cite journal|doi=10.1016/j.neucom.2013.09.055|title=Autoencoder for words|journal=Neurocomputing|volume=139|pages=84–96|date=2014|last1=Liou|first1=Cheng-Yuan|last2=Cheng|first2=Wei-Chen|last3=Liou|first3=Jiun-Wei|last4=Liou|first4=Daw-Ran}}</ref> AutoencodersIn areterms alsoof generative[[Synthetic modelsdata|data whichsynthesis]], autoencoders can also be used to randomly generate new data that is similar to the input data (training) data).<ref name=":2" />
 
{{Toclimit|3}}
Line 48:
 
====Sparse autoencoder====
Inspired by the [[sparse coding]] hypothesis in neuroscience, sparse autoencoders (SAE) are variants of autoencoders, such that the codes <math>E_\phi(x)</math> for messages tend to be ''sparse codes'', that is, <math>E_\phi(x)</math> is close to zero in most entries. Sparse autoencoders may include more (rather than fewer) hidden units than inputs, but only a small number of the hidden units are allowed to be active at the same time.<ref name="domingos">{{cite book |last1=Domingos |first1=Pedro |author-link=Pedro Domingos |title=The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World |title-link=The Master Algorithm |date=2015 |publisher=Basic Books |isbn=978-046506192-1 |at="Deeper into the Brain" subsection |chapter=4}}</ref> Encouraging sparsity improves performance on classification tasks.<ref name=":51">{{Cite journal |last1=Frey |first1=Brendan |last2=Makhzani |first2=Alireza |date=2013-12-19 |title=k-Sparse Autoencoders |arxiv=1312.5663 |bibcode=2013arXiv1312.5663M}}</ref> [[File:Autoencoder sparso.png|thumb|Simple schema of a single-layer sparse autoencoder. The hidden nodes in bright yellow are activated, while the light yellow ones are inactive. The activation depends on the input.]]
There are two main ways to enforce sparsity. One way is to simply clamp all but the highest-k activations of the latent code to zero. This is the '''k-sparse autoencoder'''.<ref name=":1">{{cite arXiv |eprint=1312.5663 |class=cs.LG |first1=Alireza |last1=Makhzani |first2=Brendan |last2=Frey |title=K-Sparse Autoencoders |date=2013}}</ref>
 
Line 61:
For each input <math>x</math>, let the actual sparsity of activation in each layer <math>k</math> be<math display="block">\rho_k(x) = \frac 1n \sum_{i=1}^n a_{k, i}(x)</math>where <math>a_{k, i}(x)</math> is the activation in the <math>i</math> -th neuron of the <math>k</math> -th layer upon input <math>x</math>.
 
The sparsity loss upon input <math>x</math> for one layer is <math>s(\hat\rho_k, \rho_k(x))</math>, and the sparsity regularization loss for the entire autoencoder is the expected weighted sum of sparsity losses:<math display="block">L_{sparsity}(\theta, \phi) = \mathbb \mathbb E_{x\sim\mu_X}\left[\sum_{k\in 1:K} w_k s(\hat\rho_k, \rho_k(x)) \right]</math>Typically, the function <math>s</math> is either the [[Kullback–Leibler divergence|Kullback-Leibler (KL) divergence]], as<ref name=":51" /><ref name=":6">Ng, A. (2011). [https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf Sparse autoencoder]. ''CS294A Lecture notes'', ''72''(2011), 1-19.</ref><ref>{{Cite journal|last1=Nair|first1=Vinod|last2=Hinton|first2=Geoffrey E.|date=2009|title=3D Object Recognition with Deep Belief Nets|url=http://dl.acm.org/citation.cfm?id=2984093.2984244|journal=Proceedings of the 22nd International Conference on Neural Information Processing Systems|series=NIPS'09|location=USA|publisher=Curran Associates Inc.|pages=1339–1347|isbn=9781615679119}}</ref><ref>{{Cite journal|last1=Zeng|first1=Nianyin|last2=Zhang|first2=Hong|last3=Song|first3=Baoye|last4=Liu|first4=Weibo|last5=Li|first5=Yurong|last6=Dobaie|first6=Abdullah M.|date=2018-01-17|title=Facial expression recognition via learning deep sparse autoencoders|journal=Neurocomputing|volume=273|pages=643–649|doi=10.1016/j.neucom.2017.08.043|issn=0925-2312}}</ref>
 
::<math>s(\rho, \hat\rho) = KL(\rho || \hat{\rho}) = \rho \log \frac{\rho}{\hat{\rho}}+(1- \rho)\log \frac{1-\rho}{1-\hat{\rho}}</math>
Line 96:
 
==== Minimal description length autoencoder ====
<ref name=":5">{{Cite journal |last1=Hinton |first1=Geoffrey E |last2=Zemel |first2=Richard |date=1993 |title=Autoencoders, Minimum Description Length and Helmholtz Free Energy |url=https://proceedings.neurips.cc/paper/1993/hash/9e3cfc48eccf81a0d57663e129aef3cb-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Morgan-Kaufmann |volume=6}}</ref>
{{Empty section|date=March 2024}}
 
Line 123:
 
== History ==
(Oja, 1982)<ref>{{Cite journal |last=Oja |first=Erkki |date=1982-11-01 |title=Simplified neuron model as a principal component analyzer |url=https://link.springer.com/article/10.1007/BF00275687 |journal=Journal of Mathematical Biology |language=en |volume=15 |issue=3 |pages=267–273 |doi=10.1007/BF00275687 |pmid=7153672 |issn=1432-1416}}</ref> noted that PCA is equivalent to a neural network with one hidden layer with identity activation function. In the language of autoencoding, the input-to-hidden module is the encoder, and the hidden-to-output module is the decoder. Subsequently, in (Baldi and Hornik, 1989)<ref>{{Cite journal |lastlast1=Baldi |firstfirst1=Pierre |last2=Hornik |first2=Kurt |date=1989-01-01 |title=Neural networks and principal component analysis: Learning from examples without local minima |url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 |journal=Neural Networks |volume=2 |issue=1 |pages=53–58 |doi=10.1016/0893-6080(89)90014-2 |issn=0893-6080}}</ref> and (Kramer, 1991)<ref name=":12" /> generalized PCA to autoencoders, which they termed as "nonlinear PCA".
 
Immediately after the resurgence of neural networks in the 1980s, it was suggested in 1986<ref>{{Cite book |lastlast1=Rumelhart |firstfirst1=David E. |url=https://direct.mit.edu/books/book/4424/Parallel-Distributed-ProcessingExplorations-in-the |title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations |last2=McClelland |first2=James L. |last3=AU |date=1986 |publisher=The MIT Press |isbn=978-0-262-29140-8 |language=en |chapter=2. A General Framework for Parallel Distributed Processing |doi=10.7551/mitpress/5236.001.0001}}</ref> that a neural network be put in "auto-association mode". This was then implemented in (Harrison, 1987)<ref>Harrison TD (1987) A Connectionist framework for continuous speech recognition. Cambridge University Ph. D. dissertation</ref> and (Elman, Zipser, 1988)<ref>{{Cite journal |lastlast1=Elman |firstfirst1=Jeffrey L. |last2=Zipser |first2=David |date=1988-04-01 |title=Learning the hidden structure of speech |url=https://pubs.aip.org/jasa/article/83/4/1615/826094/Learning-the-hidden-structure-of-speechLearning |journal=The Journal of the Acoustical Society of America |language=en |volume=83 |issue=4 |pages=1615–1626 |doi=10.1121/1.395916 |pmid=3372872 |bibcode=1988ASAJ...83.1615E |issn=0001-4966}}</ref> for speech and in (Cottrell, Munro, Zipser, 1987)<ref>{{Cite journal |lastlast1=Cottrell |firstfirst1=Garrison W. |last2=Munro |first2=Paul |last3=Zipser |first3=David |date=1987 |title=Learning Internal Representation From Gray-Scale Images: An Example of Extensional Programming |url=https://escholarship.org/uc/item/2zs7w6z8 |journal=Proceedings of the Annual Meeting of the Cognitive Science Society |language=en |volume=9 |issue=0}}</ref> for images.<ref name=":14" /> In (Hinton, Salakhutdinov, 2006),<ref name=":72">{{Citecite journal |lastlast1=Hinton |firstfirst1=G. E. |last2=Salakhutdinov |first2=R. R. |date=2006-07-28 July 2006 |title=Reducing the Dimensionality of Data with Neural Networks |url=https://www.science.org/doi/10.1126/science.1127647 |journal=Science |language=en |volume=313 |issue=5786 |pages=504–507 |bibcode=2006Sci...313..504H |doi=10.1126/science.1127647 |issnpmid=0036-807516873662 |s2cid=1658773}}</ref> [[Deep belief network|deep belief networks]] were developed. These train a pair [[Restricted Boltzmann machine|restricted Boltzmann machines]] as encoder-decoder pairs, then train another pair on the latent representation of the first pair, and so on.<ref name="scholar">{{Cite journal |vauthors=Hinton G |year=2009 |title=Deep belief networks |journal=Scholarpedia |volume=4 |issue=5 |pages=5947 |bibcode=2009SchpJ...4.5947H |doi=10.4249/scholarpedia.5947 |doi-access=free}}</ref>
 
The first applications of AE date to early 1990s.<ref name=":0" /><ref>{{Cite journal |last=Schmidhuber |first=Jürgen |date=January 2015 |title=Deep learning in neural networks: An overview |journal=Neural Networks |volume=61 |pages=85–117 |arxiv=1404.7828 |doi=10.1016/j.neunet.2014.09.003 |pmid=25462637 |s2cid=11715509}}</ref><ref>{{Cite journal |lastname=Hinton |first=Geoffrey E |last2=Zemel |first2=Richard |date=1993 |title=Autoencoders, Minimum Description Length and Helmholtz Free Energy |url=https"://proceedings.neurips.cc/paper/1993/hash/9e3cfc48eccf81a0d57663e129aef3cb-Abstract.html5" |journal=Advances in Neural Information Processing Systems |publisher=Morgan-Kaufmann |volume=6}}</ref> Their most traditional application was [[dimensionality reduction]] or [[feature learning]], but the concept became widely used for learning [[generative model]]s of data.<ref name="VAE">{{cite arXiv |eprint=1312.6114 |class=stat.ML |author1=Diederik P Kingma |first2=Max |last2=Welling |title=Auto-Encoding Variational Bayes |date=2013}}</ref><ref name="gan_faces">Generating Faces with Torch, Boesen A., Larsen L. and Sonderby S.K., 2015 {{url|http://torch.ch/blog/2015/11/13/gan.html}}</ref> Some of the most powerful [[Artificial intelligence|AIs]] in the 2010s involved autoencoder modules as a component of larger AI systems, such as VAE in [[Stable Diffusion]], discrete VAE in Transformer-based image generators like [[DALL-E|DALL-E 1]], etc.
 
During the early days, when the terminology was uncertain, the autoencoder has also been called identity mapping,<ref>{{Cite journal |lastlast1=Baldi |firstfirst1=Pierre |last2=Hornik |first2=Kurt |date=1989-01-01 |title=Neural networks and principal component analysis: Learning from examples without local minima |url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 |journal=Neural Networks |volume=2 |issue=1 |pages=53–58 |doi=10.1016/0893-6080(89)90014-2 |issn=0893-6080}}</ref><ref name=":12" /> auto-associating,<ref>{{Cite journal |lastlast1=Ackley |firstfirst1=D |last2=Hinton |first2=G |last3=Sejnowski |first3=T |date=March 1985-03 |title=A learning algorithm for boltzmann machines |url=http://doi.wiley.com/10.1016/S0364-0213(85)80012-4 |journal=Cognitive Science |language=en |volume=9 |issue=1 |pages=147–169 |doi=10.1016/S0364-0213(85)80012-4}}</ref> self-supervised backpropagation,<ref name=":12" /> or Diabolo network.<ref>{{Cite journal |last1=Schwenk |first1=Holger |last2=Bengio |first2=Yoshua |date=1997 |title=Training Methods for Adaptive Boosting of Neural Networks |url=https://proceedings.neurips.cc/paper/1997/hash/9cb67ffb59554ab1dabb65bcb370ddd9-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=MIT Press |volume=10}}</ref><ref name="bengio" />
 
During the early days, when the terminology was uncertain, the autoencoder has also been called identity mapping,<ref>{{Cite journal |last=Baldi |first=Pierre |last2=Hornik |first2=Kurt |date=1989-01-01 |title=Neural networks and principal component analysis: Learning from examples without local minima |url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 |journal=Neural Networks |volume=2 |issue=1 |pages=53–58 |doi=10.1016/0893-6080(89)90014-2 |issn=0893-6080}}</ref><ref name=":12" /> auto-associating,<ref>{{Cite journal |last=Ackley |first=D |last2=Hinton |first2=G |last3=Sejnowski |first3=T |date=1985-03 |title=A learning algorithm for boltzmann machines |url=http://doi.wiley.com/10.1016/S0364-0213(85)80012-4 |journal=Cognitive Science |language=en |volume=9 |issue=1 |pages=147–169 |doi=10.1016/S0364-0213(85)80012-4}}</ref> self-supervised backpropagation,<ref name=":12" /> or Diabolo network.<ref>{{Cite journal |last1=Schwenk |first1=Holger |last2=Bengio |first2=Yoshua |date=1997 |title=Training Methods for Adaptive Boosting of Neural Networks |url=https://proceedings.neurips.cc/paper/1997/hash/9cb67ffb59554ab1dabb65bcb370ddd9-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=MIT Press |volume=10}}</ref><ref name="bengio" />
== Applications ==
The two main applications of autoencoders are [[dimensionality reduction]] and [[information retrieval]] (or [[Content-addressable memory|associative memory]]),<ref name=":0">{{Cite book|url=http://www.deeplearningbook.org|title=Deep Learning|last1=Goodfellow|first1=Ian|last2=Bengio|first2=Yoshua|last3=Courville|first3=Aaron|publisher=MIT Press|date=2016|isbn=978-0262035613}}</ref> but modern variations have been applied to other tasks.
Line 194 ⟶ 195:
* [[Sparse dictionary learning]]
* [[Deep learning]]
 
== Further reading ==
 
* {{cite book |last1=Bank |first1=Dor |title=Machine Learning for Data Science Handbook |last2=Koenigstein |first2=Noam |last3=Giryes |first3=Raja |publisher=Springer International Publishing |year=2023 |isbn=978-3-031-24627-2 |publication-place=Cham |chapter=Autoencoders |doi=10.1007/978-3-031-24628-9_16}}
* {{Cite book |last1=Goodfellow |first1=Ian |title=Deep learning |last2=Bengio |first2=Yoshua |last3=Courville |first3=Aaron |date=2016 |publisher=The MIT press |isbn=978-0-262-03561-3 |series=Adaptive computation and machine learning |location=Cambridge, Mass |chapter=14. Autoencoders |chapter-url=https://www.deeplearningbook.org/contents/autoencoders.html}}
 
==References==
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy