Content deleted Content added
→History: Minimal description length principle |
→Training: mv 1 sentence to another section |
||
Line 42:
In the ideal setting, the code dimension and the model capacity could be set on the basis of the complexity of the data distribution to be modeled. A standard way to do so is to add modifications to the basic autoencoder, to be detailed below.<ref name=":0" />
Autoencoders have been interpreted as nonlinear [[principal component analysis]]<ref name=":12" /> and with the [[Minimum description length|minimum description length principle]].<ref name=":15" />▼
==Variations==
Line 121 ⟶ 124:
Researchers have debated whether joint training (i.e. training the whole architecture together with a single global reconstruction objective to optimize) would be better for deep auto-encoders.<ref name=":9">{{cite arXiv |eprint=1405.1380|last1=Zhou|first1=Yingbo|last2=Arpit|first2=Devansh|last3=Nwogu|first3=Ifeoma|last4=Govindaraju|first4=Venu|title=Is Joint Training Better for Deep Auto-Encoders?|class=stat.ML|date=2014}}</ref> A 2015 study showed that joint training learns better data models along with more representative features for classification as compared to the layerwise method.<ref name=":9" /> However, their experiments showed that the success of joint training depends heavily on the regularization strategies adopted.<ref name=":9" /><ref>R. Salakhutdinov and G. E. Hinton, “Deep Boltzmann machines,” in AISTATS, 2009, pp. 448–455.</ref>
▲Autoencoders have been interpreted as nonlinear [[principal component analysis]]<ref name=":12" /> and with the [[Minimum description length|minimum description length principle]].<ref name=":15" />
== History ==
|