0% found this document useful (0 votes)

76 views

Lecture Notes 02

The document is a lecture on basic artificial neural networks. It discusses the mathematical model of a neuron, the perceptron algorithm for learning a single neuron classifier, single layer neural networks and their limitations, and introduces multi-layer neural networks. It also covers logistic regression as an example of learning a probabilistic single layer classifier and discusses adding an additional hidden layer to increase the capacity of neural networks.

Uploaded by

zhao linger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views

Lecture Notes 02

Uploaded by

zhao linger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Lecture 2: Basic Artificial Neural

Networks

Xuming He
SIST, ShanghaiTech
Fall, 2020

9/9/2020 Xuming He – CS 280 Deep Learning 1

Logistics
 Course project
 Each team consists of 3~5 members
 You may make exceptions if you are among top 10% in first 3
quizzes

 Full course schedule on Piazza

 HW1 out next Monday
 Tutorial schedule: please vote on Piazza

 TA office hours
 See Piazza for detailed schedule and location

9/9/2020 Xuming He – CS 280 Deep Learning 2

Outline
 Artificial neuron
 Perceptron algorithm

 Single layer neural networks

 Network models

 Example: Logistic Regression

 Multi-layer neural networks

 Limitations of single layer networks

 Networks with single hidden layer

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu

Liang@Princeton’s course notes
9/9/2020 Xuming He – CS 280 Deep Learning 3
Mathematical model of a neuron

9/9/2020 4
Single neuron as a linear classifier
 Binary classification

9/9/2020 Xuming He – CS 280 Deep Learning 5

How do we determine the weights?
 Learning problem

9/9/2020 Xuming He – CS 280 Deep Learning 6

Linear classification
 Learning problem: simple approach

• Drawback: Sensitive to “outliers”

9/9/2020 Xuming He – CS 280 Deep Learning 7

1D Example
 Compare two predictors

9/9/2020 Xuming He – CS 280 Deep Learning 8

Perceptron algorithm
 Learn a single neuron for binary classification

https://towardsdatascience.com/perceptron-explanation-implementation-and-a-visual-example-3c8e76b4e2d1

9/9/2020 Xuming He – CS 280 Deep Learning 9

Perceptron algorithm
 Learn a single neuron for binary classification

 Task formulation

9/9/2020 Xuming He – CS 280 Deep Learning 10

Perceptron algorithm
 Algorithm outline

9/9/2020 Xuming He – CS 280 Deep Learning 11

Perceptron algorithm
 Intuition: correct the current mistake

9/9/2020 Xuming He – CS 280 Deep Learning 12

Perceptron algorithm
 The Perceptron theorem

9/9/2020 Xuming He – CS 280 Deep Learning 13

Hyperplane Distance
Perceptron algorithm
 The Perceptron theorem: proof

9/9/2020 Xuming He – CS 280 Deep Learning 15

Perceptron algorithm
 The Perceptron theorem: proof

9/9/2020 Xuming He – CS 280 Deep Learning 16

Perceptron algorithm
 The Perceptron theorem: proof intuition

9/9/2020 Xuming He – CS 280 Deep Learning 17

Perceptron algorithm
 The Perceptron theorem: proof

9/9/2020 Xuming He – CS 280 Deep Learning 18

Perceptron algorithm
 The Perceptron theorem

9/9/2020 Xuming He – CS 280 Deep Learning 19

Perceptron Learning problem
 What loss function is minimized?

9/9/2020 Xuming He – CS 280 Deep Learning 20

Perceptron algorithm
 What loss function is minimized?

9/9/2020 Xuming He – CS 280 Deep Learning 21

Perceptron algorithm
 What loss function is minimized?

9/9/2020 Xuming He – CS 280 Deep Learning 22

Outline
 Artificial neuron
 Perceptron algorithm

 Single layer neural networks

 Network models

 Example: Logistic Regression

 Multi-layer neural networks

 Limitations of single layer networks

 Networks with single hidden layer

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu

Liang@Princeton’s course notes
9/9/2020 Xuming He – CS 280 Deep Learning 23
Single layer neural network

9/9/2020 24
Single layer neural network

9/9/2020 25
Single layer neural network

9/9/2020 26
What is the output?
 Element-wise nonlinear functions
 Independent feature/attribute detectors

9/9/2020 Xuming He – CS 280 Deep Learning 27

What is the output?
 Nonlinear functions with vector input
 Competition between neurons

9/9/2020 Xuming He – CS 280 Deep Learning 28

What is the output?
 Nonlinear functions with vector input
 Example: Winner-Take-All (WTA)

9/9/2020 Xuming He – CS 280 Deep Learning 29

A probabilistic perspective
 Change the output nonlinearity

 From WTA to Softmax function

9/9/2020 Xuming He – CS 280 Deep Learning 30

Multiclass linear classifiers


 The WTA prediction: one-hot encoding of its predicted label

9/9/2020 Xuming He – CS 280 Deep Learning 31

Probabilistic outputs

9/9/2020 Xuming He – CS 280 Deep Learning 32

How to learn a multiclass classifier?
 Define a loss function and do minimization

9/9/2020 Xuming He – CS 280 Deep Learning 33

Learning a multiclass linear classifier
 Design a loss function for multiclass classifiers
 Perceptron?
 Yes, see homework
 Hinge loss
 The SVM and max-margin (see CS231n)
 Probabilistic formulation
 Log loss and logistic regression
 Generalization issue
 Avoid overfitting by regularization

9/9/2020 Xuming He – CS 280 Deep Learning 34

Example: Logistic Regression
 Learning loss: negative log likelihood

9/9/2020 Xuming He – CS 280 Deep Learning 35

Logistic Regression
 Learning loss: example

9/9/2020 Xuming He – CS 280 Deep Learning 36

Logistic Regression
 Learning loss: questions

9/9/2020 Xuming He – CS 280 Deep Learning 37

Logistic Regression
 Learning loss: questions

9/9/2020 Xuming He – CS 280 Deep Learning 38

Learning with regularization
 Constraints on hypothesis space
 Similar to Linear Regression

9/9/2020 Xuming He – CS 280 Deep Learning 39

Learning with regularization
 Regularization terms

 Priors on the weights

 Bayesian: integrating out weights
 Empirical: computing MAP estimate of W

9/9/2020 Xuming He – CS 280 Deep Learning 40

L1 vs L2 regularization

https://www.youtube.com/watch?v=jEVh0uheCPk
9/9/2020 Xuming He – CS 280 Deep Learning 41
L1 vs L2 regularization
 Sparsity

9/9/2020 Xuming He – CS 280 Deep Learning 42

Optimization: gradient descent
 Gradient descent

 Learning rate matters

9/9/2020 Xuming He – CS 280 Deep Learning 43

Optimization: gradient descent
 Stochastic gradient descent

9/9/2020 Xuming He – CS 280 Deep Learning 44

Optimization: gradient descent
 Stochastic gradient descent

9/9/2020 Xuming He – CS 280 Deep Learning 45

Interpreting network weights
 What are those weights?

9/9/2020 Xuming He – CS 280 Deep Learning 46

Outline
 Artificial neuron
 Perceptron algorithm

 Single layer neural networks

 Network models

 Example: Logistic Regression

 Multi-layer neural networks

 Limitations of single layer networks

 Networks with single hidden layer

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu

Liang@Princeton’s course notes
9/9/2020 Xuming He – CS 280 Deep Learning 47
Capacity of single neuron
 Binary classification
 A neuron estimates
 Its decision boundary is linear, determined by its weights

9/9/2020 Xuming He – CS 280 Deep Learning 48

Capacity of single neuron
 Can solve linearly separable problems

 Examples

9/9/2020 Xuming He – CS 280 Deep Learning 49

Capacity of single neuron
 Can’t solve non linearly separable problems

 Can we use multiple neurons to achieve this?

9/9/2020 Xuming He – CS 280 Deep Learning 50

Capacity of single neuron
 Can’t solve non linearly separable problems
 Unless the input is transformed in a better representation

9/9/2020 Xuming He – CS 280 Deep Learning 51

Capacity of single neuron
 Can’t solve non linearly separable problems

 Unless the input is transformed in a better representation

9/9/2020 Xuming He – CS 280 Deep Learning 52

Adding one more layer
 Single hidden layer neural network
 2-layer neural network: ignoring input units

 Q: What if using linear activation in hidden layer?

9/9/2020 Xuming He – CS 280 Deep Learning 53

Capacity of neural network
 Single hidden layer neural network
 Partition the input space into regions

9/9/2020 Xuming He – CS 280 Deep Learning 54

Capacity of neural network
 Single hidden layer neural network
 Form a stump/delta function

9/9/2020 Xuming He – CS 280 Deep Learning 55

Capacity of neural network
 Single hidden layer neural network

9/9/2020 Xuming He – CS 280 Deep Learning 56

Multi-layer perceptron
 Boolean case
 Multilayer perceptrons (MLPs) can compute more complex
Boolean functions
 MLPs can compute any Boolean function
 Since they can emulate individual gates
 MLPs are universal Boolean functions

9/9/2020 Xuming He – CS 280 Deep Learning 57

Capacity of neural network
 Universal approximation
 Theorem (Hornik, 1991)
A single hidden layer neural network with a linear output unit can
approximate any continuous function arbitrarily well, given enough
hidden units.
 The result applies for sigmoid, tanh and many other hidden
layer activation functions

 Caveat: good result but not useful in practice

 How many hidden units?
 How to find the parameters by a learning algorithm?

9/9/2020 Xuming He – CS 280 Deep Learning 58

General neural network
 Multi-layer neural network

9/9/2020 Xuming He – CS 280 Deep Learning 59

Multilayer networks
Multilayer networks
Why more layers (deeper)?
 A deep architecture can represent certain functions more
compactly
 (Montufar et al., NIPS’14)
 Functions representable with a deep rectifier net can require an
exponential number of hidden units with a shallow one.

9/9/2020 Xuming He – CS 280 Deep Learning 62

Why more layers (deeper)?
 A deep architecture can represent certain functions more
compactly
 Example: Boolean functions
 There are Boolean functions which require an exponential number
of hidden units in the single layer case
 require a polynomial number of hidden units if we can adapt the
number of layers

 Example: multivariate polynomials (Rolnick & Tegmark, ICLR’18)

 Total number of neurons m required to approximate natural classes
of multivariate polynomials of n variables
 grows only linearly with n for deep neural networks, but grows
exponentially when merely a single hidden layer is allowed.

9/9/2020 Xuming He – CS 280 Deep Learning 63

Why more layers (deeper)?

9/9/2020 Xuming He – CS 280 Deep Learning 64

Summary
 Artificial neurons
 Single-layer network
 Multi-layer neural networks
 Next time
 Computation in neural networks
 Convolutional neural networks

9/9/2020 Xuming He – CS 280 Deep Learning 65

Unit 5
No ratings yet
Unit 5
61 pages
A Guide To Convolutional Neural Networks
100% (2)
A Guide To Convolutional Neural Networks
209 pages
Lecture 3: Basic Neural Networks: Multi-Layer Neural Networks
No ratings yet
Lecture 3: Basic Neural Networks: Multi-Layer Neural Networks
55 pages
Lecture Notes 01
No ratings yet
Lecture Notes 01
77 pages
6S191_MIT_DeepLearning_L1
No ratings yet
6S191_MIT_DeepLearning_L1
108 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
unit 3 .
No ratings yet
unit 3 .
48 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Deep Learning.pdf
No ratings yet
Deep Learning.pdf
289 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
151 pages
lec05
No ratings yet
lec05
46 pages
1 Slides ANN
No ratings yet
1 Slides ANN
90 pages
8. Deep learning
No ratings yet
8. Deep learning
95 pages
1703929933487-NLP Language
No ratings yet
1703929933487-NLP Language
106 pages
UNIT 3-Multilayer-Perceptrons
No ratings yet
UNIT 3-Multilayer-Perceptrons
23 pages
Lect 12 -Deep Feed Forward NN- Review
No ratings yet
Lect 12 -Deep Feed Forward NN- Review
93 pages
deep learning
No ratings yet
deep learning
11 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
104 pages
Lecture_1
No ratings yet
Lecture_1
10 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Unit 1 and Unit 2
No ratings yet
Unit 1 and Unit 2
30 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Section06 DeepLearning
No ratings yet
Section06 DeepLearning
92 pages
02A-DL2023-NN-basics
No ratings yet
02A-DL2023-NN-basics
52 pages
DL Sessional 1
No ratings yet
DL Sessional 1
301 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
NoteGPT Summary DL Mod2
No ratings yet
NoteGPT Summary DL Mod2
8 pages
Main
No ratings yet
Main
183 pages
Module I
No ratings yet
Module I
109 pages
Dave Reed: Connectionist Approach To AI
No ratings yet
Dave Reed: Connectionist Approach To AI
26 pages
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
No ratings yet
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
65 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Lecture 02 - Neural Networks - 4p
No ratings yet
Lecture 02 - Neural Networks - 4p
10 pages
3-Intro To Deep Learning and Perceptron
No ratings yet
3-Intro To Deep Learning and Perceptron
43 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Lecture 3-4
No ratings yet
Lecture 3-4
50 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
Lecture_2 (1)
No ratings yet
Lecture_2 (1)
52 pages
1c Perceptrons4
No ratings yet
1c Perceptrons4
5 pages
1c Perceptrons
No ratings yet
1c Perceptrons
20 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
Lecture+8
No ratings yet
Lecture+8
65 pages
Ebook Deep Learning Objective Type Questions
No ratings yet
Ebook Deep Learning Objective Type Questions
102 pages
DL_UNIT-4_Part-1
No ratings yet
DL_UNIT-4_Part-1
10 pages
02 Neural Network
No ratings yet
02 Neural Network
28 pages
Artificial Neural Network Concepts and Examples
No ratings yet
Artificial Neural Network Concepts and Examples
61 pages
16-dl-1 - converted
No ratings yet
16-dl-1 - converted
9 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
Aidl Unit III
No ratings yet
Aidl Unit III
79 pages
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
No ratings yet
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
50 pages
Lecture 2 - Introduction To Deep Learning - Updated
No ratings yet
Lecture 2 - Introduction To Deep Learning - Updated
118 pages
Deep Neural Network (DNN)
100% (1)
Deep Neural Network (DNN)
80 pages
DL unit 4 perfect pdf_1
No ratings yet
DL unit 4 perfect pdf_1
23 pages
Introduction to Artificial Neural Networks
No ratings yet
Introduction to Artificial Neural Networks
31 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
Machine Learning Q and AI: 30 Essential Questions and Answers on Machine Learning and AI
From Everand
Machine Learning Q and AI: 30 Essential Questions and Answers on Machine Learning and AI
Sebastian Raschka
5/5 (1)
BE Comp - Deep Learning
No ratings yet
BE Comp - Deep Learning
1 page
Fa24 BCS 090 - C
No ratings yet
Fa24 BCS 090 - C
6 pages
HRM Project Report
No ratings yet
HRM Project Report
26 pages
Alisha Industrial Report
No ratings yet
Alisha Industrial Report
34 pages
People Analytics: Transforming HR Strategy With Data Science
No ratings yet
People Analytics: Transforming HR Strategy With Data Science
17 pages
Cost-Sensitive Trees For Interpretable Reinforcement Learning
No ratings yet
Cost-Sensitive Trees For Interpretable Reinforcement Learning
9 pages
SAP Business AI Course Catalog 1716602322
No ratings yet
SAP Business AI Course Catalog 1716602322
15 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
Self Organizing Map
No ratings yet
Self Organizing Map
4 pages
7 best ChatGPT prompts for lawyers in 2024
No ratings yet
7 best ChatGPT prompts for lawyers in 2024
1 page
2024 Work Trend Index Annual Report 6 7 24 666b2e2fafceb
No ratings yet
2024 Work Trend Index Annual Report 6 7 24 666b2e2fafceb
39 pages
What Is AI ? - Types of AI - What Is Line Scout ? - Line Scout Robot - Advantages of Line Scout - Disadvantage of Line Scout - Specification - Implementation of Line Scout
No ratings yet
What Is AI ? - Types of AI - What Is Line Scout ? - Line Scout Robot - Advantages of Line Scout - Disadvantage of Line Scout - Specification - Implementation of Line Scout
16 pages
Microsoft Applied Skills Poster
No ratings yet
Microsoft Applied Skills Poster
1 page
Ethical-Dilemmas-in-AI-and-Automation
No ratings yet
Ethical-Dilemmas-in-AI-and-Automation
9 pages
Data Driven Reservoir Modeling 1st Edition Shahab D. Mohaghegh - Download the ebook with all fully detailed chapters
100% (3)
Data Driven Reservoir Modeling 1st Edition Shahab D. Mohaghegh - Download the ebook with all fully detailed chapters
74 pages
Warriorplus Com O2 A Yzw8m2 0
No ratings yet
Warriorplus Com O2 A Yzw8m2 0
29 pages
J. HAUD230-1-Jan-Jun2024-SA2-MK-V2-09112023
No ratings yet
J. HAUD230-1-Jan-Jun2024-SA2-MK-V2-09112023
7 pages
Handout Technology 2018 - para AV2020
No ratings yet
Handout Technology 2018 - para AV2020
105 pages
GreatLearning AI and ML Brochure
No ratings yet
GreatLearning AI and ML Brochure
19 pages
Alba, Kenneth
No ratings yet
Alba, Kenneth
486 pages
06 Learning
No ratings yet
06 Learning
51 pages
Panaverse DAO
No ratings yet
Panaverse DAO
43 pages
2024 Qms Trends
No ratings yet
2024 Qms Trends
14 pages
Logipix Border Brochure 2022 Web
No ratings yet
Logipix Border Brochure 2022 Web
17 pages
Unsw Courses List A-Z
No ratings yet
Unsw Courses List A-Z
5 pages
2 Handout 1m Dcsn07c 2nd Ay2425 (1)
No ratings yet
2 Handout 1m Dcsn07c 2nd Ay2425 (1)
30 pages
Kamal Singh Kasera Gitm
No ratings yet
Kamal Singh Kasera Gitm
35 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Stress Detection Using Deep Neural Networks
No ratings yet
Stress Detection Using Deep Neural Networks
11 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.