Autoencoder: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 17:52, 4 October 2024 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 7,906 edits →History: a little harvard style citation Tag: Visual edit ← Previous edit		Latest revision as of 04:06, 2 November 2024 edit undo Citation bot (talk \| contribs) Bots 5,307,573 edits Removed parameters. \| Use this bot. Report bugs. \| #UCB_CommandLine
(11 intermediate revisions by 4 users not shown)
Line 3: {{Use dmy dates\|date=March 2020\|cs1-dates=y}} {{Machine learning\|Artificial neural network}} An '''autoencoder''' is a type of [[artificial neural network]] used to learn [[Feature learning\|efficient codings]] of unlabeled data ([[unsupervised learning]]). An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an ~~[[Feature learning\|~~efficient representation]] (encoding) for a set of data, typically for [[dimensionality reduction]], to generate lower-dimensional embeddings for subsequent use by other [[machine learning]] algorithms.<ref>{{Cite book\|last1=Bank \|first1=Dor \|last2=Koenigstein \|first2=Noam \|last3=Giryes \|first3=Raja \|year=2023 \|chapter=Autoencoders \|editor-last1=Rokach \|editor-first1=Lior \|editor-last2=Maimon \|editor-first2=Oded \|editor-last3=Shmueli \|editor-first3=Erez \|title=Machine learning for data science handbook \|chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-24628-9_16 \|language=en \|pages=353–374 \|doi=10.1007/978-3-031-24628-9_16\|isbn=978-3-031-24627-2 }}</ref> Variants exist, ~~aiming~~which aim to ~~force~~make the learned representations to assume useful properties.<ref name=":0" /> Examples are regularized autoencoders (''~~Sparse~~sparse'', ''~~Denoising~~denoising'' and ''~~Contractive~~contractive'' autoencoders), which are effective in learning representations for subsequent [[Statistical classification\|classification]] tasks,<ref name=":4" /> and [[Variational_autoencoder\|''~~Variational~~variational'' autoencoders]], ~~with~~which can be ~~applications~~used as [[generative model]]s.<ref name=":11">{{cite journal \|arxiv=1906.02691\|doi=10.1561/2200000056\|bibcode=2019arXiv190602691K\|title=An Introduction to Variational Autoencoders\|date=2019\|last1=Welling\|first1=Max\|last2=Kingma\|first2=Diederik P.\|journal=Foundations and Trends in Machine Learning\|volume=12\|issue=4\|pages=307–392\|s2cid=174802445}}</ref> Autoencoders are applied to many problems, including [[~~face recognition\|~~facial recognition]],<ref>Hinton GE, Krizhevsky A, Wang SD. [http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf Transforming auto-encoders.] In International Conference on Artificial Neural Networks 2011 Jun 14 (pp. 44-51). Springer, Berlin, Heidelberg.</ref> [[Feature (computer vision)\|feature detection]],<ref name=":2">{{Cite book\|last=Géron\|first=Aurélien\|title=Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow\|publisher=O’Reilly Media, Inc.\|year=2019\|location=Canada\|pages=739–740}}</ref> [[anomaly detection]], and ~~acquiring~~[[Word embedding\|learning the meaning of words]].<ref>{{cite journal\|doi=10.1016/j.neucom.2008.04.030\|title=Modeling word perception using the Elman network\|journal=Neurocomputing\|volume=71\|issue=16–18\|pages=3150\|date=2008\|last1=Liou\|first1=Cheng-Yuan\|last2=Huang\|first2=Jau-Chi\|last3=Yang\|first3=Wen-Chie\|url=http://ntur.lib.ntu.edu.tw//handle/246246/155195 }}</ref><ref>{{cite journal\|doi=10.1016/j.neucom.2013.09.055\|title=Autoencoder for words\|journal=Neurocomputing\|volume=139\|pages=84–96\|date=2014\|last1=Liou\|first1=Cheng-Yuan\|last2=Cheng\|first2=Wei-Chen\|last3=Liou\|first3=Jiun-Wei\|last4=Liou\|first4=Daw-Ran}}</ref> ~~Autoencoders~~In ~~are~~terms ~~also~~of ~~generative~~[[Synthetic ~~models~~data\|data ~~which~~synthesis]], autoencoders can also be used to randomly generate new data that is similar to the input ~~data~~ (training) data).<ref name=":2" /> {{Toclimit\|3}} Line 48: ====Sparse autoencoder==== Inspired by the [[sparse coding]] hypothesis in neuroscience, sparse autoencoders (SAE) are variants of autoencoders, such that the codes <math>E_\phi(x)</math> for messages tend to be ''sparse codes'', that is, <math>E_\phi(x)</math> is close to zero in most entries. Sparse autoencoders may include more (rather than fewer) hidden units than inputs, but only a small number of the hidden units are allowed to be active at the same time.<ref name="domingos">{{cite book \|last1=Domingos \|first1=Pedro \|author-link=Pedro Domingos \|title=The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World \|title-link=The Master Algorithm \|date=2015 \|publisher=Basic Books \|isbn=978-046506192-1 \|at="Deeper into the Brain" subsection \|chapter=4}}</ref> Encouraging sparsity improves performance on classification tasks.<ref name=":51"~~>{{Cite~~ ~~journal \|last1=Frey \|first1=Brendan \|last2=Makhzani \|first2=Alireza \|date=2013-12-19 \|title=k-Sparse Autoencoders \|arxiv=1312.5663 \|bibcode=2013arXiv1312.5663M}}<~~/~~ref~~> [[File:Autoencoder sparso.png\|thumb\|Simple schema of a single-layer sparse autoencoder. The hidden nodes in bright yellow are activated, while the light yellow ones are inactive. The activation depends on the input.]] There are two main ways to enforce sparsity. One way is to simply clamp all but the highest-k activations of the latent code to zero. This is the '''k-sparse autoencoder'''.<ref name=":1">{{cite arXiv \|eprint=1312.5663 \|class=cs.LG \|first1=Alireza \|last1=Makhzani \|first2=Brendan \|last2=Frey \|title=K-Sparse Autoencoders \|date=2013}}</ref> Line 61: For each input <math>x</math>, let the actual sparsity of activation in each layer <math>k</math> be<math display="block">\rho_k(x) = \frac 1n \sum_{i=1}^n a_{k, i}(x)</math>where <math>a_{k, i}(x)</math> is the activation in the <math>i</math> -th neuron of the <math>k</math> -th layer upon input <math>x</math>. The sparsity loss upon input <math>x</math> for one layer is <math>s(\hat\rho_k, \rho_k(x))</math>, and the sparsity regularization loss for the entire autoencoder is the expected weighted sum of sparsity losses:<math display="block">L_{sparsity}(\theta, \phi) = \mathbb \mathbb E_{x\sim\mu_X}\left[\sum_{k\in 1:K} w_k s(\hat\rho_k, \rho_k(x)) \right]</math>Typically, the function <math>s</math> is either the [[Kullback–Leibler divergence\|Kullback-Leibler (KL) divergence]], as<ref name=":51" /><ref name=":6">Ng, A. (2011). [https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf Sparse autoencoder]. ''CS294A Lecture notes'', ''72''(2011), 1-19.</ref><ref>{{Cite journal\|last1=Nair\|first1=Vinod\|last2=Hinton\|first2=Geoffrey E.\|date=2009\|title=3D Object Recognition with Deep Belief Nets\|url=http://dl.acm.org/citation.cfm?id=2984093.2984244\|journal=Proceedings of the 22nd International Conference on Neural Information Processing Systems\|series=NIPS'09\|location=USA\|publisher=Curran Associates Inc.\|pages=1339–1347\|isbn=9781615679119}}</ref><ref>{{Cite journal\|last1=Zeng\|first1=Nianyin\|last2=Zhang\|first2=Hong\|last3=Song\|first3=Baoye\|last4=Liu\|first4=Weibo\|last5=Li\|first5=Yurong\|last6=Dobaie\|first6=Abdullah M.\|date=2018-01-17\|title=Facial expression recognition via learning deep sparse autoencoders\|journal=Neurocomputing\|volume=273\|pages=643–649\|doi=10.1016/j.neucom.2017.08.043\|issn=0925-2312}}</ref> ::<math>s(\rho, \hat\rho) = KL(\rho \|\| \hat{\rho}) = \rho \log \frac{\rho}{\hat{\rho}}+(1- \rho)\log \frac{1-\rho}{1-\hat{\rho}}</math> Line 96: ==== Minimal description length autoencoder ==== <ref name=":5">{{Cite journal \|last1=Hinton \|first1=Geoffrey E \|last2=Zemel \|first2=Richard \|date=1993 \|title=Autoencoders, Minimum Description Length and Helmholtz Free Energy \|url=https://proceedings.neurips.cc/paper/1993/hash/9e3cfc48eccf81a0d57663e129aef3cb-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=Morgan-Kaufmann \|volume=6}}</ref> {{Empty section\|date=March 2024}} Line 123: == History == (Oja, 1982)<ref>{{Cite journal \|last=Oja \|first=Erkki \|date=1982-11-01 \|title=Simplified neuron model as a principal component analyzer \|url=https://link.springer.com/article/10.1007/BF00275687 \|journal=Journal of Mathematical Biology \|language=en \|volume=15 \|issue=3 \|pages=267–273 \|doi=10.1007/BF00275687 \|pmid=7153672 \|issn=1432-1416}}</ref> noted that PCA is equivalent to a neural network with one hidden layer with identity activation function. In the language of autoencoding, the input-to-hidden module is the encoder, and the hidden-to-output module is the decoder. Subsequently, in (Baldi and Hornik, 1989)<ref>{{Cite journal \|~~last~~last1=Baldi \|~~first~~first1=Pierre \|last2=Hornik \|first2=Kurt \|date=1989-01-01 \|title=Neural networks and principal component analysis: Learning from examples without local minima \|url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 \|journal=Neural Networks \|volume=2 \|issue=1 \|pages=53–58 \|doi=10.1016/0893-6080(89)90014-2 \|issn=0893-6080}}</ref> and (Kramer, 1991)<ref name=":12" /> generalized PCA to autoencoders, which they termed as "nonlinear PCA". Immediately after the resurgence of neural networks in the 1980s, it was suggested in 1986<ref>{{Cite book \|~~last~~last1=Rumelhart \|~~first~~first1=David E. \|url=https://direct.mit.edu/books/book/4424/Parallel-Distributed-ProcessingExplorations-in-the \|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations \|last2=McClelland \|first2=James L. \|last3=AU \|date=1986 \|publisher=The MIT Press \|isbn=978-0-262-29140-8 \|language=en \|chapter=2. A General Framework for Parallel Distributed Processing \|doi=10.7551/mitpress/5236.001.0001}}</ref> that a neural network be put in "auto-association mode". This was then implemented in (Harrison, 1987)<ref>Harrison TD (1987) A Connectionist framework for continuous speech recognition. Cambridge University Ph. D. dissertation</ref> and (Elman, Zipser, 1988)<ref>{{Cite journal \|~~last~~last1=Elman \|~~first~~first1=Jeffrey L. \|last2=Zipser \|first2=David \|date=1988-04-01 \|title=Learning the hidden structure of speech \|url=https://pubs.aip.org/jasa/article/83/4/1615/826094/Learning-the-hidden-structure-of-speechLearning \|journal=The Journal of the Acoustical Society of America \|language=en \|volume=83 \|issue=4 \|pages=1615–1626 \|doi=10.1121/1.395916 \|pmid=3372872 \|bibcode=1988ASAJ...83.1615E \|issn=0001-4966}}</ref> for speech and in (Cottrell, Munro, Zipser, 1987)<ref>{{Cite journal \|~~last~~last1=Cottrell \|~~first~~first1=Garrison W. \|last2=Munro \|first2=Paul \|last3=Zipser \|first3=David \|date=1987 \|title=Learning Internal Representation From Gray-Scale Images: An Example of Extensional Programming \|url=https://escholarship.org/uc/item/2zs7w6z8 \|journal=Proceedings of the Annual Meeting of the Cognitive Science Society \|language=en \|volume=9 ~~\|issue=0~~}}</ref> for images.<ref name=":14" /> In (Hinton, Salakhutdinov, 2006),<ref name=":72">{{~~Cite~~cite journal \|~~last~~last1=Hinton \|~~first~~first1=G. E. \|last2=Salakhutdinov \|first2=R. R. \|date=~~2006-07-~~28 July 2006 \|title=Reducing the Dimensionality of Data with Neural Networks ~~\|url=https://www.science.org/doi/10.1126/science.1127647~~ \|journal=Science ~~\|language=en~~ \|volume=313 \|issue=5786 \|pages=504–507 \|bibcode=2006Sci...313..504H \|doi=10.1126/science.1127647 \|~~issn~~pmid=~~0036-8075~~16873662 \|s2cid=1658773}}</ref> [[Deep belief network\|deep belief networks]] were developed. These train a pair [[Restricted Boltzmann machine\|restricted Boltzmann machines]] as encoder-decoder pairs, then train another pair on the latent representation of the first pair, and so on.<ref name="scholar">{{Cite journal \|vauthors=Hinton G \|year=2009 \|title=Deep belief networks \|journal=Scholarpedia \|volume=4 \|issue=5 \|pages=5947 \|bibcode=2009SchpJ...4.5947H \|doi=10.4249/scholarpedia.5947 \|doi-access=free}}</ref> The first applications of AE date to early 1990s.<ref name=":0" /><ref>{{Cite journal \|last=Schmidhuber \|first=Jürgen \|date=January 2015 \|title=Deep learning in neural networks: An overview \|journal=Neural Networks \|volume=61 \|pages=85–117 \|arxiv=1404.7828 \|doi=10.1016/j.neunet.2014.09.003 \|pmid=25462637 \|s2cid=11715509}}</ref><ref~~>{{Cite~~ ~~journal \|last~~name=~~Hinton \|first=Geoffrey E \|last2=Zemel \|first2=Richard \|date=1993 \|title=Autoencoders, Minimum Description Length and Helmholtz Free Energy \|url=https~~":~~//proceedings.neurips.cc/paper/1993/hash/9e3cfc48eccf81a0d57663e129aef3cb-Abstract.html~~5" ~~\|journal=Advances in Neural Information Processing Systems \|publisher=Morgan-Kaufmann \|volume=6}}<~~/~~ref~~> Their most traditional application was [[dimensionality reduction]] or [[feature learning]], but the concept became widely used for learning [[generative model]]s of data.<ref name="VAE">{{cite arXiv \|eprint=1312.6114 \|class=stat.ML \|author1=Diederik P Kingma \|first2=Max \|last2=Welling \|title=Auto-Encoding Variational Bayes \|date=2013}}</ref><ref name="gan_faces">Generating Faces with Torch, Boesen A., Larsen L. and Sonderby S.K., 2015 {{url\|http://torch.ch/blog/2015/11/13/gan.html}}</ref> Some of the most powerful [[Artificial intelligence\|AIs]] in the 2010s involved autoencoder modules as a component of larger AI systems, such as VAE in [[Stable Diffusion]], discrete VAE in Transformer-based image generators like [[DALL-E\|DALL-E 1]], etc. During the early days, when the terminology was uncertain, the autoencoder has also been called identity mapping,<ref>{{Cite journal \|~~last~~last1=Baldi \|~~first~~first1=Pierre \|last2=Hornik \|first2=Kurt \|date=1989-01-01 \|title=Neural networks and principal component analysis: Learning from examples without local minima \|url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 \|journal=Neural Networks \|volume=2 \|issue=1 \|pages=53–58 \|doi=10.1016/0893-6080(89)90014-2 \|issn=0893-6080}}</ref><ref name=":12" /> auto-associating,<ref>{{Cite journal \|~~last~~last1=Ackley \|~~first~~first1=D \|last2=Hinton \|first2=G \|last3=Sejnowski \|first3=T \|date=March 1985~~-03~~ \|title=A learning algorithm for boltzmann machines \|url=http://doi.wiley.com/10.1016/S0364-0213(85)80012-4 \|journal=Cognitive Science \|language=en \|volume=9 \|issue=1 \|pages=147–169 \|doi=10.1016/S0364-0213(85)80012-4}}</ref> self-supervised backpropagation,<ref name=":12" /> or Diabolo network.<ref>{{Cite journal \|last1=Schwenk \|first1=Holger \|last2=Bengio \|first2=Yoshua \|date=1997 \|title=Training Methods for Adaptive Boosting of Neural Networks \|url=https://proceedings.neurips.cc/paper/1997/hash/9cb67ffb59554ab1dabb65bcb370ddd9-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=MIT Press \|volume=10}}</ref><ref name="bengio" />▼ ▲During the early days, when the terminology was uncertain, the autoencoder has also been called identity mapping,<ref>{{Cite journal \|last=Baldi \|first=Pierre \|last2=Hornik \|first2=Kurt \|date=1989-01-01 \|title=Neural networks and principal component analysis: Learning from examples without local minima \|url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 \|journal=Neural Networks \|volume=2 \|issue=1 \|pages=53–58 \|doi=10.1016/0893-6080(89)90014-2 \|issn=0893-6080}}</ref><ref name=":12" /> auto-associating,<ref>{{Cite journal \|last=Ackley \|first=D \|last2=Hinton \|first2=G \|last3=Sejnowski \|first3=T \|date=1985-03 \|title=A learning algorithm for boltzmann machines \|url=http://doi.wiley.com/10.1016/S0364-0213(85)80012-4 \|journal=Cognitive Science \|language=en \|volume=9 \|issue=1 \|pages=147–169 \|doi=10.1016/S0364-0213(85)80012-4}}</ref> self-supervised backpropagation,<ref name=":12" /> or Diabolo network.<ref>{{Cite journal \|last1=Schwenk \|first1=Holger \|last2=Bengio \|first2=Yoshua \|date=1997 \|title=Training Methods for Adaptive Boosting of Neural Networks \|url=https://proceedings.neurips.cc/paper/1997/hash/9cb67ffb59554ab1dabb65bcb370ddd9-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=MIT Press \|volume=10}}</ref><ref name="bengio" /> == Applications == The two main applications of autoencoders are [[dimensionality reduction]] and [[information retrieval]] (or [[Content-addressable memory\|associative memory]]),<ref name=":0">{{Cite book\|url=http://www.deeplearningbook.org\|title=Deep Learning\|last1=Goodfellow\|first1=Ian\|last2=Bengio\|first2=Yoshua\|last3=Courville\|first3=Aaron\|publisher=MIT Press\|date=2016\|isbn=978-0262035613}}</ref> but modern variations have been applied to other tasks. Line 194 ⟶ 195: * [[Sparse dictionary learning]] * [[Deep learning]] == Further reading == * {{cite book \|last1=Bank \|first1=Dor \|title=Machine Learning for Data Science Handbook \|last2=Koenigstein \|first2=Noam \|last3=Giryes \|first3=Raja \|publisher=Springer International Publishing \|year=2023 \|isbn=978-3-031-24627-2 \|publication-place=Cham \|chapter=Autoencoders \|doi=10.1007/978-3-031-24628-9_16}} * {{Cite book \|last1=Goodfellow \|first1=Ian \|title=Deep learning \|last2=Bengio \|first2=Yoshua \|last3=Courville \|first3=Aaron \|date=2016 \|publisher=The MIT press \|isbn=978-0-262-03561-3 \|series=Adaptive computation and machine learning \|location=Cambridge, Mass \|chapter=14. Autoencoders \|chapter-url=https://www.deeplearningbook.org/contents/autoencoders.html}} ==References==

Autoencoder: Difference between revisions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.