0% found this document useful (0 votes)
76 views

Lecture Notes 02

The document is a lecture on basic artificial neural networks. It discusses the mathematical model of a neuron, the perceptron algorithm for learning a single neuron classifier, single layer neural networks and their limitations, and introduces multi-layer neural networks. It also covers logistic regression as an example of learning a probabilistic single layer classifier and discusses adding an additional hidden layer to increase the capacity of neural networks.

Uploaded by

zhao linger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Lecture Notes 02

The document is a lecture on basic artificial neural networks. It discusses the mathematical model of a neuron, the perceptron algorithm for learning a single neuron classifier, single layer neural networks and their limitations, and introduces multi-layer neural networks. It also covers logistic regression as an example of learning a probabilistic single layer classifier and discusses adding an additional hidden layer to increase the capacity of neural networks.

Uploaded by

zhao linger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Lecture 2: Basic Artificial Neural

Networks

Xuming He
SIST, ShanghaiTech
Fall, 2020

9/9/2020 Xuming He – CS 280 Deep Learning 1


Logistics
 Course project
 Each team consists of 3~5 members
 You may make exceptions if you are among top 10% in first 3
quizzes

 Full course schedule on Piazza


 HW1 out next Monday
 Tutorial schedule: please vote on Piazza

 TA office hours
 See Piazza for detailed schedule and location

9/9/2020 Xuming He – CS 280 Deep Learning 2


Outline
 Artificial neuron
 Perceptron algorithm

 Single layer neural networks


 Network models

 Example: Logistic Regression

 Multi-layer neural networks


 Limitations of single layer networks

 Networks with single hidden layer

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu


Liang@Princeton’s course notes
9/9/2020 Xuming He – CS 280 Deep Learning 3
Mathematical model of a neuron

9/9/2020 4
Single neuron as a linear classifier
 Binary classification

9/9/2020 Xuming He – CS 280 Deep Learning 5


How do we determine the weights?
 Learning problem

9/9/2020 Xuming He – CS 280 Deep Learning 6


Linear classification
 Learning problem: simple approach

• Drawback: Sensitive to “outliers”

9/9/2020 Xuming He – CS 280 Deep Learning 7


1D Example
 Compare two predictors

9/9/2020 Xuming He – CS 280 Deep Learning 8


Perceptron algorithm
 Learn a single neuron for binary classification

https://towardsdatascience.com/perceptron-explanation-implementation-and-a-visual-example-3c8e76b4e2d1

9/9/2020 Xuming He – CS 280 Deep Learning 9


Perceptron algorithm
 Learn a single neuron for binary classification

 Task formulation

9/9/2020 Xuming He – CS 280 Deep Learning 10


Perceptron algorithm
 Algorithm outline

9/9/2020 Xuming He – CS 280 Deep Learning 11


Perceptron algorithm
 Intuition: correct the current mistake

9/9/2020 Xuming He – CS 280 Deep Learning 12


Perceptron algorithm
 The Perceptron theorem

9/9/2020 Xuming He – CS 280 Deep Learning 13


Hyperplane Distance
Perceptron algorithm
 The Perceptron theorem: proof

9/9/2020 Xuming He – CS 280 Deep Learning 15


Perceptron algorithm
 The Perceptron theorem: proof

9/9/2020 Xuming He – CS 280 Deep Learning 16


Perceptron algorithm
 The Perceptron theorem: proof intuition

9/9/2020 Xuming He – CS 280 Deep Learning 17


Perceptron algorithm
 The Perceptron theorem: proof

9/9/2020 Xuming He – CS 280 Deep Learning 18


Perceptron algorithm
 The Perceptron theorem

9/9/2020 Xuming He – CS 280 Deep Learning 19


Perceptron Learning problem
 What loss function is minimized?

9/9/2020 Xuming He – CS 280 Deep Learning 20


Perceptron algorithm
 What loss function is minimized?

9/9/2020 Xuming He – CS 280 Deep Learning 21


Perceptron algorithm
 What loss function is minimized?

9/9/2020 Xuming He – CS 280 Deep Learning 22


Outline
 Artificial neuron
 Perceptron algorithm

 Single layer neural networks


 Network models

 Example: Logistic Regression

 Multi-layer neural networks


 Limitations of single layer networks

 Networks with single hidden layer

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu


Liang@Princeton’s course notes
9/9/2020 Xuming He – CS 280 Deep Learning 23
Single layer neural network

9/9/2020 24
Single layer neural network

9/9/2020 25
Single layer neural network

9/9/2020 26
What is the output?
 Element-wise nonlinear functions
 Independent feature/attribute detectors

9/9/2020 Xuming He – CS 280 Deep Learning 27


What is the output?
 Nonlinear functions with vector input
 Competition between neurons

9/9/2020 Xuming He – CS 280 Deep Learning 28


What is the output?
 Nonlinear functions with vector input
 Example: Winner-Take-All (WTA)

9/9/2020 Xuming He – CS 280 Deep Learning 29


A probabilistic perspective
 Change the output nonlinearity

 From WTA to Softmax function

9/9/2020 Xuming He – CS 280 Deep Learning 30


Multiclass linear classifiers

 The WTA prediction: one-hot encoding of its predicted label

9/9/2020 Xuming He – CS 280 Deep Learning 31


Probabilistic outputs

9/9/2020 Xuming He – CS 280 Deep Learning 32


How to learn a multiclass classifier?
 Define a loss function and do minimization

9/9/2020 Xuming He – CS 280 Deep Learning 33


Learning a multiclass linear classifier
 Design a loss function for multiclass classifiers
 Perceptron?
 Yes, see homework
 Hinge loss
 The SVM and max-margin (see CS231n)
 Probabilistic formulation
 Log loss and logistic regression
 Generalization issue
 Avoid overfitting by regularization

9/9/2020 Xuming He – CS 280 Deep Learning 34


Example: Logistic Regression
 Learning loss: negative log likelihood

9/9/2020 Xuming He – CS 280 Deep Learning 35


Logistic Regression
 Learning loss: example

9/9/2020 Xuming He – CS 280 Deep Learning 36


Logistic Regression
 Learning loss: questions

9/9/2020 Xuming He – CS 280 Deep Learning 37


Logistic Regression
 Learning loss: questions

9/9/2020 Xuming He – CS 280 Deep Learning 38


Learning with regularization
 Constraints on hypothesis space
 Similar to Linear Regression

9/9/2020 Xuming He – CS 280 Deep Learning 39


Learning with regularization
 Regularization terms

 Priors on the weights


 Bayesian: integrating out weights
 Empirical: computing MAP estimate of W

9/9/2020 Xuming He – CS 280 Deep Learning 40


L1 vs L2 regularization

https://www.youtube.com/watch?v=jEVh0uheCPk
9/9/2020 Xuming He – CS 280 Deep Learning 41
L1 vs L2 regularization
 Sparsity

9/9/2020 Xuming He – CS 280 Deep Learning 42


Optimization: gradient descent
 Gradient descent

 Learning rate matters

9/9/2020 Xuming He – CS 280 Deep Learning 43


Optimization: gradient descent
 Stochastic gradient descent

9/9/2020 Xuming He – CS 280 Deep Learning 44


Optimization: gradient descent
 Stochastic gradient descent

9/9/2020 Xuming He – CS 280 Deep Learning 45


Interpreting network weights
 What are those weights?

9/9/2020 Xuming He – CS 280 Deep Learning 46


Outline
 Artificial neuron
 Perceptron algorithm

 Single layer neural networks


 Network models

 Example: Logistic Regression

 Multi-layer neural networks


 Limitations of single layer networks

 Networks with single hidden layer

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu


Liang@Princeton’s course notes
9/9/2020 Xuming He – CS 280 Deep Learning 47
Capacity of single neuron
 Binary classification
 A neuron estimates
 Its decision boundary is linear, determined by its weights

9/9/2020 Xuming He – CS 280 Deep Learning 48


Capacity of single neuron
 Can solve linearly separable problems

 Examples

9/9/2020 Xuming He – CS 280 Deep Learning 49


Capacity of single neuron
 Can’t solve non linearly separable problems

 Can we use multiple neurons to achieve this?

9/9/2020 Xuming He – CS 280 Deep Learning 50


Capacity of single neuron
 Can’t solve non linearly separable problems
 Unless the input is transformed in a better representation

9/9/2020 Xuming He – CS 280 Deep Learning 51


Capacity of single neuron
 Can’t solve non linearly separable problems

 Unless the input is transformed in a better representation

9/9/2020 Xuming He – CS 280 Deep Learning 52


Adding one more layer
 Single hidden layer neural network
 2-layer neural network: ignoring input units

 Q: What if using linear activation in hidden layer?

9/9/2020 Xuming He – CS 280 Deep Learning 53


Capacity of neural network
 Single hidden layer neural network
 Partition the input space into regions

9/9/2020 Xuming He – CS 280 Deep Learning 54


Capacity of neural network
 Single hidden layer neural network
 Form a stump/delta function

9/9/2020 Xuming He – CS 280 Deep Learning 55


Capacity of neural network
 Single hidden layer neural network

9/9/2020 Xuming He – CS 280 Deep Learning 56


Multi-layer perceptron
 Boolean case
 Multilayer perceptrons (MLPs) can compute more complex
Boolean functions
 MLPs can compute any Boolean function
 Since they can emulate individual gates
 MLPs are universal Boolean functions

9/9/2020 Xuming He – CS 280 Deep Learning 57


Capacity of neural network
 Universal approximation
 Theorem (Hornik, 1991)
A single hidden layer neural network with a linear output unit can
approximate any continuous function arbitrarily well, given enough
hidden units.
 The result applies for sigmoid, tanh and many other hidden
layer activation functions

 Caveat: good result but not useful in practice


 How many hidden units?
 How to find the parameters by a learning algorithm?

9/9/2020 Xuming He – CS 280 Deep Learning 58


General neural network
 Multi-layer neural network

9/9/2020 Xuming He – CS 280 Deep Learning 59


Multilayer networks
Multilayer networks
Why more layers (deeper)?
 A deep architecture can represent certain functions more
compactly
 (Montufar et al., NIPS’14)
 Functions representable with a deep rectifier net can require an
exponential number of hidden units with a shallow one.

9/9/2020 Xuming He – CS 280 Deep Learning 62


Why more layers (deeper)?
 A deep architecture can represent certain functions more
compactly
 Example: Boolean functions
 There are Boolean functions which require an exponential number
of hidden units in the single layer case
 require a polynomial number of hidden units if we can adapt the
number of layers

 Example: multivariate polynomials (Rolnick & Tegmark, ICLR’18)


 Total number of neurons m required to approximate natural classes
of multivariate polynomials of n variables
 grows only linearly with n for deep neural networks, but grows
exponentially when merely a single hidden layer is allowed.

9/9/2020 Xuming He – CS 280 Deep Learning 63


Why more layers (deeper)?

9/9/2020 Xuming He – CS 280 Deep Learning 64


Summary
 Artificial neurons
 Single-layer network
 Multi-layer neural networks
 Next time
 Computation in neural networks
 Convolutional neural networks

9/9/2020 Xuming He – CS 280 Deep Learning 65

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy