0% found this document useful (0 votes)

27 views45 pages

Week-12 - Introduction To ML-NN-CNN

Uploaded by

grupsakli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views45 pages

Week-12 - Introduction To ML-NN-CNN

Uploaded by

grupsakli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Dr.

Ahmet Esad TOP

ahmetesadtop@aybu.edu.tr
o Formal definition by Tom M. Mitchell
o "A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured by P, improves with experience E."
o Learning from experiences is natural
o For humans or animals
o Humans collect information from events or observation of facts
o When a new event occurs and the result is unknown,
o The collected knowledge is used

o ML enables machines to learn from previous experiences

o ML techniques learn directly from the data itself
o No pre-determined equations or explicitly programmed decisions
o They essentially build the path to the answer using just the data
o ML algorithms find patterns or regularities in data
o It can make judgments, estimations, or take actions

o Data size is crucial for ML

o As the number of samples increase, performance is improved
Traditional Programming

Data
Computer Output
Program

Machine Learning

Data
Computer Program
Output
o ML is used when:
o Human expertise does not exist (navigating on Mars)
o Humans can’t explain their expertise (speech recognition)
o Models must be customized (personalized medicine)
o Models are based on huge amounts of data (genomics)

o Learning isn’t always useful:

o There is no need to “learn” to calculate payroll
o Examples: Network intrusion detection, e-mail filtering, speech recognition,
bioinformatics, and computer vision
o When developing explicit algorithms is challenging (or they fail)
o ML is employed
o ML divided into 2 categories
o supervised learning
o unsupervised learning
o Unsupervised learning has no labels
o without the corresponding output
o some patterns and relations can be found
o It learns underlying structure and distribution in the data (to model them)
o No correct answer or teacher (supervisor) available in this learning type
o After the similarities or differences has been revealed
o The data can be grouped up
o If grouped according to some rules → association solution
o association detects sets of items that frequently occur together in dataset
o If grouped according to inherent groupings in the data → clustering solution
o Clustering splits the dataset into groups according to similarities
o Common unsupervised learning algorithms are K-Means Clustering, Principal Component
Analysis (PCA), Hidden Markov Model, and Apriori algorithm
o Learns from labeled data
o Data must be composed of pairs
o It learns a generalized rule of mapping of the pairs
o from sample inputs to their desired outputs
o After training with samples
o it produces a function (mapping)
o new inputs to their unknown outputs
o intention is accurately discovering the labels
o success depends on the generalization capacity of the algorithm
o Y=f(X) → Y is output, X is input, and f() is the learned mapping function
o In supervised learning
o a supervisor assigns labels to data,
o then it is processed by one of the supervised learning algorithms
o to generate the desired function
oSupervised learning uses two techniques:
oclassification
o predicts discrete responses
o their outputs are categorical such as "black" or "white"
o E.g.: "yes" or "no"
oregression
o predicts continuous responses
o their outputs are real numbers such as "temperature"
othey are used to generate predictive models
o Given (x 1, y1), (x 2, y2), ..., (x n, yn)
o Learn a function f(x) to predict y given x
o y is real-valued == regression

9
September Arctic Sea Ice Extent

8
7
(1,000,000 sq km)

6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
o Given (x 1, y1), (x 2, y2), ..., (x n, yn)
o Learn a function f(x) to predict y given x
o y is categorical == classification

Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size
Predict Benign Predict Malignant

Tumor Size
Classification
2.1
⇒ 𝑔𝑜𝑜𝑑
1.8

Regression
2.1
⇒ .9
1.8
With a likelihood of 90%, this email is good
oOne of the most popular ML approaches is Artificial Neural Networks (ANN)
o also referred to as neural networks (NN)
oIt is an information processing system
oIt can be employed for both unsupervised or supervised learning
oThere is no pre-knowledge or set of programmed rules for the task expected
to be performed by the ANN before the training
oAn ANN simply takes input data (i.e., example data)
o and learns the ability to perform the required task
o by parsing the data and detecting patterns inside the data
o ANN learns (e.g., categorizing animals from images like ‘wolf’, ’giraffe’, or ‘dog’) by using
sample images
o it gets the image with a corresponding output (i.e., labels) as its training data
o they should be in the form of described pairs
o Initially, ANN uses the first sample image as its input
o feeds forward, and then receives the output of the first image
o According to the output, it measures the error
o analyzes how close it is to the intended result
o Then, it makes some adjustments to the weights
o by using the gradient descent algorithm
o Its weights are more accurately adjusted after several iterations
o using various samples
o An ANN is a network made up of several nodes
o each node communicates with linked nodes
o the receiving node process what it gets and sends the new info to next linked nodes
o Nodes are referred to as "artificial neurons" and are comparable to biological neurons
o Each node uses a nonlinear function to produce its output
o the function’s input is the sum of all the inputs of the node
o Edges, which are like "neurotransmitters", are the connections between nodes
o Each edge has its own weight, which is going to be updated after a backpropagation pass
o Layers are groups of neurons that are on the same level
o Each layer waits until the preceding layer has completed all of its computations
o In 1958, Rosenblatt invented the perceptron
o his single-layer perceptron was unable to solve the XOR issue
o until the backpropagation method was created in 1975
o One-layered perceptrons (excluding the input layer) can be used to solve "AND" and
"OR" gates
o but a one-layered perceptron cannot be used to create an "XOR" gate
o as a single line is not enough to split "XOR" in a Cartesian plane
o The structure and function of the human brain (i.e., biological neural networks and
neurons) serve as an inspiration for ANN
o Inputs are feature values
o Each feature has a weight
o Sum is the activation

w1
o If the activation is: f1


w2
o Positive, output +1 f2 >0?
o Negative, output -1 w3
f3
o "XOR" must be separated by using at least 2 lines
o at least a 2-layered perceptron network (excluding the input layer) is required
o 2 hidden layers, as layers between the input and the output are known as hidden layers
o This system is called as "Multilayer Perceptron (MLP)"
o Every node except the input layer uses a nonlinear activation function for its output
o e.g., sigmoid function or hyperbolic tangent
o MLPs employ the backpropagation approach for their training phase
o While training an ANN, the gradient needs to be calculated to update the weights
o after a forward pass
o this is done by backpropagation
o Gradient descent is employed
o it determines the gradient of the loss function
o The error is propagated to previous layers and neurons
o directly or indirectly connected neurons to the output neuron
o Each neuron’s net (i.e., incoming) values are calculated
o Each neuron’s out (i.e., outgoing) values are calculated
o The squared errors for the outputs are then determined
o The squared error cost function is minimized using
gradient descent
o weights are revised after each iteration, and this process
continues until the cost is as low as possible
o Gradient descent is guaranteed to converge to a
hypothesis with minimum squared error
o If the given learning rate is sufficiently small
o Gradient descent has the risk that it can over-step
the minimum in the error surface
o If the learning rate is too large
o Gradient descent may not find the global optimum
o If there are multiple local optima in the error surface
o Converging to a local optimum is sometimes slow
o -> To overcome the issues, variants of gradient
descent were developed
o batch gradient descent tends to overlap the global optima
o as it updates after seeing the whole data
o stochastic gradient descent tends to get stuck at the local
optima
o as it updates after seeing each sample
o Deep learning (DL) is a subset of ML where the learning procedure occurs in deeper
structures
o Deeper structures → the presence of several hidden layers
o Deep networks may contain tens or even hundreds of hidden layers
o traditional NNs only have one or two hidden layers
o DL eliminates the requirement for manual feature extraction unlike ML
o by converting the data into intermediate feature representations
o it extracts features first-hand from the data itself
o Another standout benefit of DL is its capability to continuously enhance its performance
o it keeps getting better performance as the size of the data increases
o improvement of technology → huge amount of data and powerful GPUs are available recently
o this situation has made DL so popular recently
o "Deep Learning" term came out to AI community in 1986 by Rina Dechter
o In 1989, Yann LeCun et al. developed a DNN that could read handwritten ZIP codes from mail using
the backpropagation approach
o ML approaches were more popular back in the day because of the high processing cost
of ANNs
o Later, advances in GPU technology became more significant
o making DL considerably more popular than other methods
o The "Big Bang of DL" occurred in 2009, when Nvidia trained DNNs on Nvidia GPUs
o NNs form the basis of the majority of all DL methods
o DNN is an ANN with multiple hidden layers
o As DNNs are feedforward networks, data travels
from the input layer to the output layer
o DNNs have a strong modeling capability since
they can sort out linear or non-linear
connections
o When modeling complex data, putting additional
layers in hidden layers may reduce the number
of units required in each hidden layer
o unlocks getting the combination of features from
previous layers
o In general, DNNs are very challenging to train
o CNN is a notable exception for training deep networks
o A 7-layered CNN called LeNet-5 was introduced in 1998 by LeCun et al. to recognize digits
from 32x32 photos.
o The resolution was limited to 32x32 due to the limited hardware capabilities available at the time
o CNNs attracted great attention after the computing industry acquired advanced
hardware capabilities.
o Several studies have utilized and demonstrated how to train CNNs on GPUs and their appropriate
approaches
o Nowadays, CNN is one of the most prominent and effective DNN types
o It is generally applied to computer vision such as image/video recognition
o The CNN is designed to minimize the need for pre-processing
o making it a very convenient choice for many applications
o CNN does not require manual feature engineering
o CNN also has the success of achieving state-of-the-art results
o can be re-trained for new data or tasks
o When people view a picture of a cat
o they can identify it based on its unique features
o such as its claws, four legs, tail, and whiskers
o Similarly, a CNN can classify a picture of a cat by processing the low-level features
o such as curves and edges
o and then creating more abstract concepts using multiple convolutional layers
o Traditional MLP architectures suffer from not scaling well to higher-resolution images
o Due to the "curse of dimensionality"
o a phenomenon
o the number of weights required by the model increases exponentially
o The reason behind this is full connectivity between nodes
o a fully connected layer of a 32x32 input image requires 1024 weights
o a fully connected layer of a 224x224 input image requires 50,176 weights
o However, a convolutional layer in a CNN can operate with a much smaller number of weights
o E.g., a 7x7 filter that convolves on a 32x32 or 224x224 image will always require only 49 learnable parameters
o regardless of the size of the input image
o This efficiency makes CNNs a more practical choice for image classification tasks
o CNNs are composed of layers with three-dimensional neurons
o each of which is connected to a small region of the previous layer known as the receptive field
o this structure allows CNNs to operate with fewer weights compared to traditional MLP architectures
o as the connections between neurons are more localized and not fully connected
o CNNs differ from ANNs in the types of operations performed by their hidden layers
o They include a combination of
o convolutional layers,
o pooling layers,
o a softmax layer,
o fully connected layers,
o and Rectified Linear Units (ReLUs)
o The convolutional layer is always the primary component of every CNN
o The other layers are inserted between convolutional layers
o The fully connected layers are placed at the end of the network
o These hidden layers serve to introduce non-linearity and maintain the dimensions of the input data
o It is characterized by a set of learnable filters
o These filters are used to scan the receptive field to search for matching patterns
o The filters are represented as rectangular arrays of numbers that serve as feature identifiers
o helping the CNN to identify and extract important features from the input data
o slide across the width and height of the input image
o ReLU is a type of activation function that is
widely used in deep learning models
o introduces non-linearity to the model
o generally follows each convolutional layer
o One of the main advantages of ReLU is its
simplicity
o its implementation is pretty straightforward and
does not require additional hyperparameters
o ReLU has improved the training speed of DNNs
o compared to other activation functions (e.g.,
sigmoid or hyperbolic tangent).
o ReLU also helps to alleviate the vanishing
gradient problem
o which occurs when the gradients of the weights
in the network become too small (i.e., training
becomes slow).
o Pooling layer is an essential component of CNN
architectures
o the majority of them are used right after the convolutional
layers
o Their duty is to reduce the spatial dimensions of the results
generated by the convolutional layers
o by merging the outputs of multiple neurons into a single neuron
that utilizes non-linear functions
o This simplification (i.e., downsampling) operation reduces
the number of parameters
o hence reduces computational overhead, as well as helping to
prevent overfitting
o Overfitting occurs when a model becomes too closely tuned
(i.e., close to ideal or fully ideal) to the training data but fails
on the test data (i.e., a generalization issue)
o pooling layers also help to maintain the spatial invariance
of the network
o i.e., can recognize an object regardless of its position in the
image
o Dropout is a handy regularization technique used to
reduce overfitting in neural networks
o It can be applied at different levels of the network
o This method consists of randomly dropping out certain
activations in a layer, with a probability commonly set at
0.5
o half of the hidden neurons are dropped out randomly
o once the training is completed, these neurons are recovered
with their weights
o This technique is beneficial for preventing overfitting
o It can also improve the model’s generalization
capabilities
o FC layers, also known as dense layers, are typically
placed at the end of a CNN
o Learns non-linear combinations of high-level feature
activations that have been extracted through a series
of convolutional and pooling layers
o FC layers are also used to map high-level feature
activations to the final output, making predictions or
classifications
o Combines and mixes important information from all
preceding convolutional layers
o The Softmax layer’s main duty is to perform multi-class classification
o It is typically placed at the end of a CNN as the last layer
o The softmax function (i.e., a probability distribution function) inspired the name of this layer
o through the softmax function, it produces probabilities of each class
o indicates the output class that the input is most likely to belong to
Thanks for your attention!

Unit 5
No ratings yet
Unit 5
61 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Day 1 S3
No ratings yet
Day 1 S3
29 pages
AI CH5 Lattest
No ratings yet
AI CH5 Lattest
34 pages
UNIT 4 ML NN ,DL,CNN-1
No ratings yet
UNIT 4 ML NN ,DL,CNN-1
84 pages
M2_AI_Chap1_neural-network
No ratings yet
M2_AI_Chap1_neural-network
60 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
72 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Mid Summary
No ratings yet
Mid Summary
13 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
4 Neural Network
No ratings yet
4 Neural Network
74 pages
21CS743 Model Question Paper Solution
No ratings yet
21CS743 Model Question Paper Solution
32 pages
Machine Learning Techniques-bcds062!01!01[1]
No ratings yet
Machine Learning Techniques-bcds062!01!01[1]
66 pages
Chapter 6 AI
No ratings yet
Chapter 6 AI
52 pages
Fundamentals of ML 1
No ratings yet
Fundamentals of ML 1
38 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
MachineLearning Lecture 2
No ratings yet
MachineLearning Lecture 2
23 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
Reviewer
No ratings yet
Reviewer
7 pages
4. Ai_foundations of Machine Learning i
No ratings yet
4. Ai_foundations of Machine Learning i
40 pages
NN PDF
No ratings yet
NN PDF
23 pages
DL Intro
No ratings yet
DL Intro
64 pages
ml-lab
No ratings yet
ml-lab
75 pages
Notes_ML_02_Slides_RNN_ANN
No ratings yet
Notes_ML_02_Slides_RNN_ANN
105 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
Neural NetworksChapter2Sup
No ratings yet
Neural NetworksChapter2Sup
20 pages
Lesson 1 - History, Definitions and Basic Concepts
No ratings yet
Lesson 1 - History, Definitions and Basic Concepts
6 pages
Module1 ECO-598 AI & ML Aug 21
No ratings yet
Module1 ECO-598 AI & ML Aug 21
45 pages
2023-Lecture11-NeuralNetworks
No ratings yet
2023-Lecture11-NeuralNetworks
48 pages
asset-v1_MKAU+SEng9032+DEV_01+type@asset+block@ChapOne
No ratings yet
asset-v1_MKAU+SEng9032+DEV_01+type@asset+block@ChapOne
29 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
deep learning
No ratings yet
deep learning
11 pages
Lecture_2 (1)
No ratings yet
Lecture_2 (1)
52 pages
UNIT 1
No ratings yet
UNIT 1
38 pages
ai faheem
No ratings yet
ai faheem
16 pages
Module 3
No ratings yet
Module 3
97 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Contemporary ML For Physicists
No ratings yet
Contemporary ML For Physicists
91 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
neural network
No ratings yet
neural network
18 pages
Deep Learning Lecture 0 Introduction Alexander Tkachenko
No ratings yet
Deep Learning Lecture 0 Introduction Alexander Tkachenko
31 pages
Deep learning Module 1 Chapter 1
No ratings yet
Deep learning Module 1 Chapter 1
18 pages
MLT unit -1
No ratings yet
MLT unit -1
38 pages
74c5dc97-7099-4f63-bbfd-c8b837a28b56
No ratings yet
74c5dc97-7099-4f63-bbfd-c8b837a28b56
18 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Module 1
No ratings yet
Module 1
175 pages
cv3
No ratings yet
cv3
159 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
L04 Slides.mlp1
No ratings yet
L04 Slides.mlp1
22 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
ANN-TP (1)
No ratings yet
ANN-TP (1)
40 pages
Chapter 5 - 7
No ratings yet
Chapter 5 - 7
72 pages
Machine Learning
No ratings yet
Machine Learning
68 pages
Neural Networks
From Everand
Neural Networks
Sasha Kurzweil
No ratings yet
Week-6 - Informed Search and Local Search
No ratings yet
Week-6 - Informed Search and Local Search
38 pages
Ceng301 Dbms Session 12
No ratings yet
Ceng301 Dbms Session 12
31 pages
CENG301 DBMS - Session-3
100% (1)
CENG301 DBMS - Session-3
13 pages
CENG301 DBMS - Session-6
No ratings yet
CENG301 DBMS - Session-6
28 pages
Alexnet Paper
No ratings yet
Alexnet Paper
39 pages
AIML QB
No ratings yet
AIML QB
9 pages
IMINT Target Detection Using Deep Learning
No ratings yet
IMINT Target Detection Using Deep Learning
83 pages
Ijisa V4 N11 8
No ratings yet
Ijisa V4 N11 8
8 pages
2009.05673 - Jeff Heaton - Applications of Deep Learning in TF 2.0
No ratings yet
2009.05673 - Jeff Heaton - Applications of Deep Learning in TF 2.0
569 pages
Credit Risk Analysis Using Machine and Deep Learning
No ratings yet
Credit Risk Analysis Using Machine and Deep Learning
19 pages
Umi Okstate 1120 PDF
No ratings yet
Umi Okstate 1120 PDF
67 pages
Prediction of Two Phase Flow PDF
No ratings yet
Prediction of Two Phase Flow PDF
5 pages
List of Thesis Topics For Petroleum Engineering
100% (2)
List of Thesis Topics For Petroleum Engineering
5 pages
شبكات عصبية ٢
No ratings yet
شبكات عصبية ٢
6 pages
Lecture8 Computational Graph Pytorch TF
No ratings yet
Lecture8 Computational Graph Pytorch TF
64 pages
COMPUVISION
No ratings yet
COMPUVISION
27 pages
MASTER
No ratings yet
MASTER
218 pages
Machine Learning Approach For Predictive Maintenance in Hydroelectric Power Plants
No ratings yet
Machine Learning Approach For Predictive Maintenance in Hydroelectric Power Plants
6 pages
Back-Propagation Neural Network
No ratings yet
Back-Propagation Neural Network
2 pages
Unit 4 NNDL-1
No ratings yet
Unit 4 NNDL-1
12 pages
Deep Learning - IIT Ropar - Unit 6 - Week 3
No ratings yet
Deep Learning - IIT Ropar - Unit 6 - Week 3
4 pages
Mras Sensorless Vector Control of Induct PDF
No ratings yet
Mras Sensorless Vector Control of Induct PDF
5 pages
Final
No ratings yet
Final
45 pages
MCQ Suggestions For Ca4 and Semester Examination
No ratings yet
MCQ Suggestions For Ca4 and Semester Examination
12 pages
Machine Learning ISA-2 Answer Bank
No ratings yet
Machine Learning ISA-2 Answer Bank
28 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
ML Lecture#4
No ratings yet
ML Lecture#4
109 pages
A Neural Network Approach To Ordinal Regression
No ratings yet
A Neural Network Approach To Ordinal Regression
6 pages
ML
No ratings yet
ML
49 pages
VLSI Implementation of Transcendental Function Hyperbolic Tangent For Deep NN
No ratings yet
VLSI Implementation of Transcendental Function Hyperbolic Tangent For Deep NN
10 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Deep Learning - Assignment 11 Your Name, Roll Number 1. What Is The Difference Between Backpropagation Algorithm and Backpropagation Through Time (BPTT) Algorithm ?
No ratings yet
Deep Learning - Assignment 11 Your Name, Roll Number 1. What Is The Difference Between Backpropagation Algorithm and Backpropagation Through Time (BPTT) Algorithm ?
10 pages
Sample Final Q1
No ratings yet
Sample Final Q1
4 pages
Chapter 3 Ann
No ratings yet
Chapter 3 Ann
26 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week-12 - Introduction To ML-NN-CNN

Uploaded by

Week-12 - Introduction To ML-NN-CNN

Uploaded by

Dr.

Ahmet Esad TOP

o ML enables machines to learn from previous experiences

o Data size is crucial for ML

o Learning isn’t always useful:

Breast Cancer (Malignant / Benign)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.