Machine Learning Algorithms For Iot

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

CHAPTER 9.

MACHINE LEARNING ALGORITHMS FOR IoT

IFA’2021 1
DATA SCIENCE OVERVIEW

IFA’2021 2
MACHINE LEARNING

IFA’2021 3
How to deal with so many devices and huge amount of data in IoT?
• Device Management
* Number of devices in IoT is extremely large
* Connection between them and with the sinks over very large distances
* Connectivity of devices is very important
* Very large collected data must be managed efficiently

• Device Diversity and Interoperability


* Many products from many different companies
• Integration of Data from Multiple Sources
* Very large data will be collected from different sources such as sensors, mobile devices, etc
* Interpretation of these data is challenging

• Scale, Data Volume, and Performance


* Big Data Problem (how to handle and analyze the data)
• Flexibility and Evolution of Applications
* New use cases and new business models
IFA’2021 4
BOTTOMLINE

n Many IoT applications: health, transportation, smart home, smart city,


agriculture, education, etc.

n Main element of most of these applications is an intelligent learning


mechanism for prediction (i.e., regression, classification, and
clustering), data mining and pattern recognition or data analytics in
general.

IFA’2021 5
MACHINE LEARNING BASICS
n Traditional Programming

Data (Input)
Computer Output
Model

n Machine Learning
Data (Input)
Computer Learned Model
Output

IFA’2021 6
MACHINE LEARNING BASICS
n Machine learning gives computers/machines the ability to learn without being explicitly programmed

Machine Learning
Training Data Algorithm

Prediction

Testing Data Learned


Prediction
Model

n It consists of methods that can learn from and make predictions on data

n If training Data is labeled à Supervised Learning


n If it is unlabeled à Unsupervised Learning

IFA’2021 7
EXAMPLES OF MACHINE LEARNING PROBLEMS
n Computer Vision & Speech Processing & Data Analytics

n Pattern Recognition
– Facial identities or facial expressions
– Handwritten or spoken words (e.g., Siri)
– Medical images
– Sensor Data/IoT

n Pattern Generation
– Generating images or motion sequences
n Anomaly Detection
– Unusual patterns in the telemetry from physical and/or virtual
plants
– Unusual sequences of credit card transactions Facial Recognition-
– Unusual patterns of sensor data from a nuclear power plant Pattern Recognition Example
n Prediction
– Future stock prices or currency exchange rates
IFA’2021 8
EXAMPLES OF MACHINE LEARNING PROBLEMS
n Object Recognition Example:
Object Detected: Motorbike

IFA’2021 9
PURPOSE OF MACHINE LEARNING ALGORITHMS

n Development of computer models for learning processes that:


– Provide solutions to the problem of knowledge acquisition,
– Enhance the performance of developed systems

n Adoption of computational methods to improve machine performance by


detecting and describing consistencies and patterns in training data

IFA’2021 10
MACHINE LEARNING ALGORITHMS: INTRODUCTION

n ML was introduced in the late 1950's as a technique for AI


n Over time, its focus evolved and shifted more to algorithms which are computationally
viable and robust
n In the last decade, ML techniques are used for:
– Classification
– Regression and
– Density estimation
in a variety of applications such as bioinformatics, speech recognition, spam detection,
computer vision, fraud detection, wireless and wired networks, computer architectures.

n Algorithms and techniques come from diverse fields including statistics, mathematics,
neuroscience, electrical engineering, mechanical engineering, industrial/systems
engineering, computer science etc.
IFA’2021 11
MACHINE LEARNING ALGORITHMS: MAIN CATEGORIES

A ML algorithm takes a set of samples as an input named a training set.


Three main categories of learning: Supervised, Unsupervised, Reinforcement

1. Supervised (Inductive) Learning:


Training set consists of samples of input vectors together with their corresponding
appropriate target vectors, also known as labels. Training data include desired outputs
2. Unsupervised Learning:
No labels are required for the training set. Training data do not include desired outputs.
3. Reinforcement Learning
Deals with the problem of learning the appropriate action or sequence of actions to be
taken for a given situation in order to maximize payoff. Rewards from sequence of actions

FOCUS: SUPERVISED and UNSUPERVISED LEARNING since they have been and are still widely
applied in IoT smart data analysis.
IFA’2021 12
MACHINE LEARNING ALGORITHMS:
SUPERVISED LEARNING

n Objective is to learn how to predict the appropriate output vector for a given input
vector.

n Applications where the target labels consist of a finite number of discrete


categories are known as classification tasks. (Classes are pre-defined; Mixed
input; ML classifies them)

n Cases where the target labels are composed of one or more continuous
variables are known as regression tasks.

IFA’2021 13
MACHINE LEARNING ALGORITHMS:
SUPERVISED LEARNING EXAMPLE
Known
Data

It’s an
apple!

Apples Model New Response

Known
Response New Data
(Labels)

IFA’2021 14
MACHINE LEARNING ALGORITHMS:
UNSUPERVISED LEARNING

n Defining the objective of unsupervised learning is difficult.

n One of the major objectives is to identify sensible clusters of similar samples


within the input data, known as clustering. (Classes are not defined; ML
clusters similar data)

n Moreover, the objective may be the discovery of a useful internal


representation of the input data by preprocessing the original input variable in
order to transfer it into a new variable space.

n This preprocessing stage can significantly improve the result of the subsequent
machine learning algorithm and is named feature extraction.
IFA’2021 15
MACHINE LEARNING ALGORITHMS:
UNSUPERVISED LEARNING: CLUSTERING EXAMPLE

Pattern
Detected!

Model
Input Data (Clustering
(No Labels) Algorithm)
Response (Clustered)
(without knowing their
classes)
IFA’2021 16
OVERVIEW OF ML ALGORITHMS
Data
Regression/
DATA Classification/ Data
DATA DATA DATA FEATURE
CLASSIFICATION/ Clustering/
CLASSIFICATION REGRESSION REGRESSION CLUSTERING EXTRACTIO
Feature Anomaly
N
Extraction Detection
• k-Nearest • Linear • Classification
• k-Means * Principal
Neighbor Regression and Component
Regression Analysis One-class
• Density-Based
• Naïve Bayes Trees Feed Forward Support
• Support Spatial
* Canonical Neural Network* Vector
Clustering of
• Support Vector • Random Correlation
Machines
Applications Analysis
Vector Regression Forests with Noise
Machines
• Bagging

IFA’2021 17
OVERVIEW OF ML ALGORITHMS AND THEIR USE CASES IN IoT

Machine Learning Algorithm IoT, Smart City Use Cases Metric to Optimize
Classification Smart Traffic Traffic Prediction, Increase Data Abbreviation
Clustering Smart Traffic, Smart Health Traffic Prediction, Increase Data Abbreviation
Traffic Prediction, Increase Data Abbreviation,
Anomaly Detection Smart Traffic, Smart Environment
Finding Anomalies in Power Dataset
Support Vector Regression Smart Weather Prediction Forecasting
Linear Regression Economics, Market analysis, Energy usage Real Time Prediction, Reducing Amount of Data
Classification and Regression Trees Smart Citizens Real Time Prediction, Passengers Travel Pattern
Support Vector Machine All Use Cases Classify Data, Real Time Prediction
Passengers' Travel Pattern, Efficiency of the
K-Nearest Neighbors Smart Citizen
Learned Metric
Food Safety, Passengers Travel Pattern,
Naive Bayes Smart Agriculture, Smart Citizen
Estimate the Numbers of Nodes
Outlier Detection, fraud detection, Analyze
Smart City, Smart Home, Smart Citizen, Small Data set, Forecasting Energy
k-Means
Controlling Air and Traffic Consumption, Passengers Travel Pattern,
Stream Data Analyze

IFA’2021 18
OVERVIEW OF ML ALGORITHMS AND THEIR USE CASES IN IoT

Machine Learning Algorithm IoT, Smart City Use Cases Metric to Optimize
Labeling Data, Fraud Detection, Passengers Travel
Density-Based Clustering Smart Citizen
Pattern
Reducing Energy Consumption, Forecast the
Feed Forward Neural Network Smart Health States of Elements, Overcome the Redundant
Data and Information
Principal Component Analysis Monitoring Public Places Fault Detection
Canonical Correlation Analysis Monitoring Public Places Fault Detection
One-class Support Vector Machines Smart Human Activity Control Fraud Detection, Emerging Anomalies in the data

IFA’2021 19
NEURAL NETWORKS: INTRODUCTION

n Neural Networks are a biologically-inspired programming paradigm which enable a


computer to learn from observational data.

n It is modelled after the human brain and the nervous system.


– Process information much more like the brain than a serial computer
n 2 most important properties
– Highly parallel
– Learning

n It is based on very simple principles but shows very complex behaviors.


n Applications
– As powerful problem solvers: Speech Recognition, Computer Vision
– As biological models
IFA’2021 20
NEURAL NETWORKS: BIOLOGICAL NEURONS

n We are born with about 100 billion neurons


n Human brain contains approx. 1011 neurons where each connects to approx. 104 others
n Signals “move” via electrochemical signals
n Synapses release a chemical transmitter – the sum of which can cause a threshold to
be reached – causing the neuron to “fire”
n Synapses can be inhibitory or excitatory

Dendrites

Axon
Soma

Basic computational Synapse


unit of brain: Neuron
IFA’2021 21
Neural Networks: Biological vs Artificial Neurons

Neuron vs Node
x1 x1
x2
y1 x2 f(x) y1
xn
xn

Synapse vs Weights
Synapse

xi yi

Weight
Inputs are received by dendrites, and if the input levels are over a threshold, the neuron fires,
passing a signal through the axon to the synapse which then connects to another neuron.
IFA’2021 22
NEURAL NETWORKS: INTRODUCTION

n An artificial network consists of a pool of simple processing units, also


called neurons or nodes, which communicate by sending signals to each
other over a large number of weighted connections.

Input Hidden
Layer Output
Mapping brain to Layer
neural networks
Layer

Inputs Outputs

Weights
IFA’2021 Nodes
23
NEURAL NETWORKS:
FEEDFORWARD NEURAL NETWORK

n Input Nodes: Input Hidden Output


– Provide information from the outside world to the
network and are together referred to as the “Input Layer Layer Layer
Layer”.

n Hidden Nodes:
– No direct connection with the outside world (hence
the name “hidden”).
– They perform computations and transfer information
from the input nodes to the output nodes.
– A collection of hidden nodes forms a “Hidden Layer”.
– While a feedforward network will only have a single
Output 1
input layer and a single output layer, it can have zero
or multiple Hidden Layers.

n Output Nodes: Output 2


– Are collectively referred to as the “Output Layer” and
are responsible for computations and transferring
information from the network to the outside world.
Information
IFA’2021 24
NEURAL NETWORKS:
FEEDFORWARD NEURAL NETWORK

n Information flow is unidirectional Input Hidden Output


– Data is presented to Input layer Layer Layer Layer
– Passed on to Hidden Layer
– Passed on to Output layer

n Information is distributed
Output 1
n Information processing is parallel

Output 2
Internal representation
(interpretation) of data Information
IFA’2021 25
PERCEPTRONS: FORWARD PROPAGATION

Linear (weighted)
BIAS combination of inputs
b=1 Output
𝜽𝟎
𝐦

𝜽𝟏 𝐲! = 𝐠 𝛉𝟎 + ' 𝐱𝐢 𝛉𝐢
x1 𝐢&𝟏
𝜽𝟐 ∑ ŷ Non-linear Bias
x2 acivaion funcion
𝜽𝒎
xm
Inputs Weights Sum Activation Output
Function

IFA’2021 26
ACTIVATION (TRANSFER) FUNCTIONS

+1
Output +1 Output Output

t
Input
Input 0
Input

𝑆𝑇𝐸𝑃: 𝑓(𝑛) = 𝑢( (𝑛) SIGMOID : f (n) = 1 LINEAR : f (n) = n


-n
1 e
+

n Other common activation Step – At a threshold output becomes 1


functions: Sigmoid – Compress a real value to a number between 0 and 1 (0.9 is usually
– Hyperbolic Tangent considered as 1 since it never reaches 1, and 0.1 for zero)
– Rectified Linear Unit Linear – Weighted sum of inputs as activation level
IFA’2021 27
NEURAL NETWORKS:
ACTIVATION (TRANSFER) FUNCTIONS
n Ac;va;on func;ons introduce non-lineari;es into the network (the world is non-linear)
n For example, if we want to build a neural network to disOnguish between red and green points:

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Linear activation functions will produce linear NonlineariOes allow us to approximate


decisions no matter the network size arbitrarily complex funcOons
IFA’2021 28
BUILDING NEURAL NETWORKS WITH PERCEPTRONS:
SINGLE LAYER NEURAL NETWORK

n Examining a single hidden unit z2:


z1
– Take weighted combination of inputs
x1 (𝟏)
𝜽𝟏,𝟐
and apply bias
(𝟏) z2 ŷ1
𝜽𝟐,𝟐
x2
n The procedure stays the same for each
hidden unit, but the values may be (𝟏) z3 ŷ2
xm 𝜽𝒎,𝟐
different due to different weights from
input to hidden layer zd1
𝒎
(𝟏) (𝟏)
𝒛𝟐 = 𝜽𝟎,𝟐 +% 𝒙𝒋𝜽𝒋,𝟐
𝒋(𝟏
(𝟏) (𝟏) (𝟏) (𝟏)
= 𝜽𝟎,𝟐 + 𝒙𝟏𝜽𝟏,𝟐 +𝒙𝟐𝜽𝟐,𝟐 +𝒙𝒎𝜽𝒎,𝟐
IFA’2021 29
BUILDING NEURAL NETWORKS WITH PERCEPTRONS:
SINGLE LAYER NEURAL NETWORK

n Add a single hidden layer 𝜽(𝟏) z1 𝒈(𝒛𝟏 ) 𝜽(𝟐)


between input and output x1
of previous network 𝒈(𝒛𝟐 )
z2 ŷ1
n Hidden layer cannot be x2
observed from the z3 ŷ2
outside, it is learned xm 𝒈(𝒛𝟑 )
zd1
n Two weight matrices: 𝒈(𝒛𝒅𝟏 )
– 𝜽(𝟏) : input to hidden Inputs Hidden
𝒎
(𝟐) 𝒛𝒊 =
(𝟏)
𝜽𝟎,𝒊 +%
(𝟏)
𝒙𝒋 𝜽𝒋,𝒊 Final Output
– 𝜽 : hidden to output 𝒋(𝟏 𝒅𝟏
(𝟐) (𝟐)
'𝒊 =
𝒚 𝜽𝟎,𝒊 +% 𝒈(𝒛𝒋) 𝜽𝒋,𝒊
𝒋(𝟏
IFA’2021 30
BUILDING NEURAL NETWORKS WITH PERCEPTRONS:
SINGLE LAYER NEURAL NETWORK

n Connections between the z1


layers are replaced by ☒
symbol
x1
n The ☒ symbol denotes fully z2 ŷ1
connected or dense layers x2
n Fully connected layers: z3 ŷ2
– each input to the layer is xm
connected to each output zd1
of the layer
Inputs Hidden Final Output

IFA’2021 31
BUILDING NEURAL NETWORKS WITH PERCEPTRONS:
MULTI LAYER

Stack muliple hidden layers back to


Zk,1 back, to create muli-layer (deep)
neural networks
x1
Zk,2 ŷ1
x2 ... ...
Zk,3
ŷ2
xm
Zk,dk

Inputs Hidden Final Output


𝒅𝒌#𝟏
(𝒌) (𝒌)
𝒛𝒌,𝒊 = 𝜽𝟎,𝒊 +% 𝒈(𝒛𝒌,𝟏,𝒋 )𝜽𝒋,𝒊
𝒋*𝟏

IFA’2021 32
FEEDFORWARD NEURAL NETWORKS:
TWO TYPES

Hidden
Layer Output
Layer

Single Layer Perceptron with Multi Layer Perceptron with


Multiple Inputs ( Direct Multiple Inputs
outputs) (one or more hidden layers)
(no hidden layers)
IFA’2021 33
FEEDFORWARD NEURAL NETWORKS:
MULTI LAYER PERCEPTRON

Inputs First Hidden Second Hidden Output


Layer Layer Layer

IFA’2021 34
NEURAL NETWORKS: MODEL TRAINING MECHANISM
Training Dataset
x1 x2 x3 x4 y1 y2 y3
. . . . . . .
. . . . . . .
. . . . . . .

Hidden Layers Prediction


x1
!1
𝒚 Loss y1
x2 Function
!2
𝒚 y2
x3 min
!3 ,𝒊||
||yi – 𝒚 y3
𝒚
x4 y
Backpropagation
Input
IFA’2021 35
PERCEPTRONS: TRAINING PERCEPTRONS

n Learning involves choosing values for the weights


n Our goal in training is to find the best set of weights and biases that minimizes
the loss funcJon.
n Loss funcJon: measures the difference between the predicted value and the
actual value.
Step 4: Adjust Parameters

Step 1 Weight Qi Step 2 Estimated Step 3


Step 5 Parameter ))
output (𝒚 Error
xi Bias (b) Compute
Labeled
Values Error
Input Perceptron Target
Examples output (y)

y
IFA’2021 36
PERCEPTRONS: TRAINING PERCEPTRONS
STEP 1:
– Inputs are given random weights (usually between –0.5 and 0.5)
STEP2:
– An item of training data (x,y) pair is presented, and the perceptron provides es;mated output (!
𝒚)
STEP3:
– Loss func;on ||yi – 𝒚#𝒊|| computes the error
STEP4:
– Based on the error, the weights are modified according to (also known as back propaga;on):
)𝒊 ))
𝜽𝒊 ← 𝜽𝒊 +(𝒂 ∗ 𝒙𝒊 ∗ ( 𝒚𝒊 − 𝒚

)𝒊 is the output generated by the


where yi is the target output for the training example, 𝒚
perceptron and a is the learning rate, between 0 and 1 (usually small such as 0.1)
STEP 5:
– Cycle through training data (x,y) elements unOl successfully classify all examples
l Each cycle known as an epoch
IFA’2021 37
PERCEPTRONS: TRAINING PERCEPTRONS
n Naturally, the right values for the weights and biases determine the accuracy of the
predicions.

n The process of fine-tuning the weights and biases from the input data is known as training
the Neural Network.

n Determine a weight vector that causes the perceptron to produce correct +-1 output for
each of the given training examples

n Learning rate a can decay over ime


n Converges to successful weight:
– When classifies correctly, error is 0 and weight does not change

– If xi>0, then increasing Qi brings it closer to the correct classificaion

IFA’2021 38
PERCEPTRONS: DELTA RULE AND GRADIENT DESCENT

n Perceptron training rule only converges within a finite number of steps to a


weight vector that correctly classifies all training data when training data are
linearly separable and has a small learning rate a

n If the training data are not linearly separable another approach called the d rule
using gradient descent
– Same basic rule for finding the update values for weights
– Changes/Differences
l Do not incorporate the threshold in the output value (un-thresholded
perceptrons)
l Wait to update the weights unMl the cycle is complete

IFA’2021 39
CONVERGENCE OF DELTA RULE

Converges asymptotically toward the minimum error hypothesis, possibly


requiring unbounded time, but converges regardless of whether the training
data are linearly separable

n Converges because there is a single global minimum

n If a is too large, gradient descent may overstep the minimum, so a is usually


reduced as the number of steps grow
n Can be slow – required 1000s of steps (cycles)
n If there are multiple local minima, might not find global minima

IFA’2021 40
PERCEPTRONS: BACKPROPAGATION

(𝒍)
n Each hidden node 𝑗 is “responsible” for some fraction of the error 𝛿𝒋 in
each of the output nodes to which it connects

(𝒍)
n 𝛿𝒋 is divided according to the strength of the connection between hidden
node and the output node

n Then, the “blame” is propagated back to provide the error values for the
hidden layer

IFA’2021 41
PERCEPTRONS: BACKPROPAGATION ALGORITHM

n Initialize all network weights with small random numbers

n Until termination condition is met, Do


– For each <x,y> in training data, Do
Propagate the input forward through the network:
l Input the instance x to the network and compute the output 𝒚 0u of every unit u in the network
Propagate the errors backward through the network:
l For each network output unit k, calculate its error term δk: 𝛿- ← 𝒚 )𝑘(1 − 𝒚 )𝑘)(𝑦- − 𝒚 )𝑘 )
l For each hidden unit h, calculate its error term δh:
𝛿. ← 𝑦3. (1 − 𝑦3. ) % 𝜃-. 𝛿-
-∈0123124
l Update each network weight 𝜽ji: 𝜽34 ← 𝜽34 + Δ𝜽34
where: Δ𝜽56 = 𝑏𝛿5 𝑥56

IFA’2021 42
PERCEPTRONS: BACKPROPAGATION ALGORITHM

n Update weights incrementally


n known as a stochastic approximation to gradient descent

n When to stop:
– After fixed number of iterations
– Error falls below some threshold
– Once the error on a separate validation set of examples meets some criterion

n Important question – come back to

– May not find the global minimum because many local minima.

– Can run several times to find global minima – in practice it works well

IFA’2021 43
PERCEPTRONS: BACKPROPAGATION

+1 +1 +1
(𝟑) (𝟑) (𝟒) (𝟒)
𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏
𝒙𝟏
(𝟐)
𝒛𝟏
→ 𝒂𝟏𝟐 = 𝒈(𝒛𝟏 )(𝟐) (𝟑) (𝟒)
(𝟐) 𝜹𝟏 𝜹𝟏
𝜹𝟏

𝒙𝟐 (𝟐)
𝒛𝟐 → 𝒂𝟐
(𝟐) (𝟑)
𝒛𝟐 → 𝒂𝟐
(𝟑)

(𝟐) (𝟑)
𝜹𝟐 𝜹𝟐
(:)
δ9 = “error” of node j in layer l

IFA’2021 44
PERCEPTRONS: BACKPROPAGATION

+1 +1 +1
(𝟐) (𝟐) (𝟑) (𝟑) (𝟒) (𝟒)
𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏
𝒙𝟏 (𝟐) (𝟑) (𝟒)
𝜹𝟏 𝜹𝟏 𝜹𝟏
(𝟒) (𝟒)
𝜹𝟏 = 𝒂𝟏 −𝒚
𝒙𝟐 (𝟐)
𝒛𝟐 →
(𝟐)
𝒂𝟐
(𝟑)
𝒛𝟐 →
(𝟑)
𝒂𝟐
(𝟐) (𝟑)
𝜹𝟐 𝜹𝟐
(𝒍)
𝜹𝒋 = “error” of node 𝒋 in layer 𝒍

IFA’2021 45
PERCEPTRONS: BACKPROPAGATION

+1 +1 +1
(𝟐) (𝟐) (𝟑) (𝟑) (𝟒) (𝟒)
𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏
𝒙𝟏 (𝟐) (𝟑) (𝟒)
𝜹𝟏 𝜹𝟏 𝜹𝟏
(𝟑)
𝒙𝟐 (𝟐)
𝒛𝟐 →
(𝟐)
𝒂𝟐
(𝟑)
𝒛𝟐 →
(𝟑)
𝒂𝟐 𝜣𝟏𝟐
(𝟐) (𝟑)
𝜹𝟐 𝜹𝟐
(𝒍)
𝜹𝒋 = “error” of node 𝒋 in layer 𝒍 (𝟑) (𝟑) (𝟒)
𝜹𝟐 = 𝜣𝟏𝟐 ×𝜹𝟏

IFA’2021 46
PERCEPTRONS: BACKPROPAGATION

+1 +1 +1
(𝟐) (𝟐) (𝟑) (𝟑) (𝟒) (𝟒)
𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏
𝒙𝟏 (𝟐) (𝟑) (𝟒)
𝜹𝟏 𝜹𝟏 (𝟑) 𝜹𝟏
𝜣𝟏𝟏
𝒙𝟐 (𝟐)
𝒛𝟐 → 𝒂𝟐
(𝟐) (𝟑)
𝒛𝟐 → 𝒂𝟐
(𝟑)

(𝟐) (𝟑)
𝜹𝟐 𝜹𝟐
(𝟑) (𝟑) (𝟒)
𝜹𝟏 = 𝜣𝟏𝟏 ×𝜹𝟏
(𝒍)
𝜹𝒋 = “error” of node 𝒋 in layer 𝒍 (𝟑) (𝟑) (𝟒)
𝜹𝟐 = 𝜣𝟏𝟐 ×𝜹𝟏

IFA’2021 47
PERCEPTRONS: BACKPROPAGATION

+1 +1 +1
(𝟐) (𝟐) (𝟑) (𝟑) (𝟒) (𝟒)
𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏
𝒙𝟏 (𝟐) (𝟑) (𝟒)
𝜹𝟏 (𝟐)
𝜹𝟏
𝜹𝟏
𝜣𝟏𝟐

𝒙𝟐 (𝟐)
𝒛𝟐 → 𝒂𝟐
(𝟐) (𝟑)
𝒛𝟐 → 𝒂𝟐
(𝟑)

(𝟐) (𝟑)
𝜹𝟐 (𝟐)
𝜣𝟐𝟐 𝜹𝟐
(𝒍) (𝟐) (𝟐) (𝟑) (𝟐) (𝟑)
𝜹𝒋 = “error” of node 𝒋 in layer 𝒍 𝜹𝟐 = 𝜣𝟏𝟐 ×𝜹𝟏 + 𝜣𝟐𝟐 ×𝜹𝟐

IFA’2021 48
EXAMPLE: HOUSING MARKET

n Given the data about the previous house sales, can we predict whether
a current house will be sold or not?

Price Number of Sale


Rooms Result
$500K 5 1 (sold)

$300K 2 0 (not sold)


Training Data
$700K 1 1 (sold)

$600K 3 0 (not sold)

$100K 2 ?
IFA’2021 49
EXAMPLE: HOUSING MARKET

n Step 1: Forward propagation


– All weights are randomly assigned.
n Consider the network takes the first training example
– Input to the network= [500K, 5] ([price, # of rooms])
– Desired output from the network = [1,0] ([prob. of sale, prob. of no sale])

Input Hidden Layer


Layer 𝜽1 Output Layer Incorrect Output!
1 1
Probability of sale = 0.4 (Target= 1)
Price 𝜽2
500K Error= 1-0.4=0.6
𝜽3
# of rooms Probability of no sale = 0.6 (Target= 0)

5 Error= 0-0.6=-0.6

IFA’2021 50
EXAMPLE: HOUSING MARKET

n Step 2: Back propagation and Weight Update


– Calculate the total error at the output nodes.
– Propagate these errors back through the network using Backpropagation to calculate the
gradients
– Use an optimization method to adjust all weights with the aim of reducing the error

Input Layer Hidden Layer Output Layer


1 𝜽1’ 1
Price 𝜽2’
500K
# of rooms 𝜽3’
5

IFA’2021 51
EXAMPLE: HOUSING MARKET

n Step 3: Repeat the process for all data points


– Then, our network has learned from those examples
n Now let us predict if the house will be sold or not.
– Input [100K,2]
The prediction says the
Input Layer Hidden Layer house will be sold!
Output Layer
1 𝜽1’ 1
Price Probability of sale=0.8
100K 𝜽2’
# of rooms 𝜽3’
Probability of no sale=0.2
2

IFA’2021 52
EXAMPLE: STUDENT PASS/FAIL

n Given the data about the previous students, can we predict whether a
current student will pass or fail?

Hours Midterm Grade Final Exam


Studied Result
30 67 1 (pass)

12 75 0 (fail)

16 89 1 (pass) Training Data


45 56 0 (fail)

25 70 ?

IFA’2021 53
EXAMPLE: STUDENT PASS/FAIL

n Step 1: Forward propagation


– All weights are randomly assigned.
n Consider the network takes the first training example
– Input to the network= [30, 67] ([hours studied, midterm grades])
– Desired output from the network = [1,0] ([prob. of pass, prob. of fail])

Input Layer Hidden Layer Output Layer


𝜽1 Incorrect Output!
1 1
Probability of pass = 0.4 (Target= 1)
Hours studied 𝜽2
30 Error= 1-0.4=0.6
𝜽3
Midterm grades Probability of fail = 0.6 (Target= 0)
67 Error= 0-0.6=-0.6

IFA’2021 54
EXAMPLE: STUDENT PASS/FAIL

n Step 2: Back propagaJon and Weight Update


– Calculate the total error and the output nodes.
– Propagate these errors back through the network using Backpropaga;on to calculate the
gradients
– Use an op;miza;on method to adjust all weights with the aim of reducing the error

Input Layer Hidden Layer Output Layer


1 𝜽1’ 1
Hours studied 𝜽2’
30
𝜽3’
Midterm grades
67

IFA’2021 55
EXAMPLE: STUDENT PASS/FAIL

n Step 3: Repeat the process for all data points


– Then, our network has learned from those examples
n Now let us predict if the test student will pass or fail.
– Input [25,70]

Input Layer Hidden Layer Output Layer The prediction says the
student will pass!
1 1
𝜽1’
Hours studied Probability of pass=0.8
25 𝜽2’

Midterm grades 𝜽3’


Probability of fail=0.2
70

IFA’2021 56
ONLINE TOOL: HTTP://PLAYGROUND.TENSORFLOW.ORG
TENSORFLOW PLAYGROUND

n An interactive visualization web application


n Select:
– Learning rate
– Activation function
n Select dataset and features:
– Select ratio of training to test data
n Adjust # of layers, # of neurons in each layer
n Click PLAY to see how the network learns:
– The weights are updated at each EPOCH
– The thickness of lines connected nodes represents the weight
n Observe how the output changes

IFA’2021 57
EXAMPLE: SENSOR LOCALIZATION
n For example, sensor node localization problem (i.e., determining node's geographical position)
n Node localization can be based on propagating angle and distance measurements of the received
signals from anchor nodes.
n Such measurements may include received signal strength indicator (RSSI), time of arrival (TOA),
and time difference of arrival (TDOA) as in Figure.
n After several training, the neurons can compute the location of the node.

Illustration example of node localization in WSNs in 3D space using


IFA’2021
supervised neural networks 58
DEEP LEARNING: INTRODUCTION

n Use several layers of Neural Networks


n Learn data representaions with muliple level of abstracions using a training set

IFA’2021 59
DEEP LEARNING: INTRODUCTION

nIt does not require feature extraction

Car
Not Car

Car
Not Car

IFA’2021 60
DEEP LEARNING: PERSPECTIVE

Artificial Intelligence

Machine Learning
Any technique A subset of AI that
that enables includes complex
computers to staOsOcal
Deep Learning
mimic human techniques that
intelligence. enable machines The subset of machine
to improve tasks learning composed of
with data. algorithms that permit
socware to train itself in a
mulOlayered neural networks
with vast amount of data.

IFA’2021 61
DEEP LEARNING: VITAL FOR IoT

§ DL has been actively utilized in many IoT applications in recent years.

§ DL and IoT are among the top 3 strategic technology trends for 2017 that
were announced at Gartner Symposium/ITxpo 2016

§ Because traditional ML approaches do not address the emerging analytic


needs of IoT systems.

§ IoT systems need different modern data analytic approaches and AI methods
according to the hierarchy of IoT data generation and management.

IFA’2021 62
DEEP LEARNING ALGORITHMS:

• Convolutional Neural Networks


• Recurrent Neural Networks
• Long Short Term Memory
• AutoEncoders
• Variational AutoEncoders
• Generative Adverserial Networks
• Restricted Boltzmann Machine
• Deep Belief Networks
• Ladder Networks

IFA’2021 63
SUMMARY OF DEEP LEARNING MODELS & IoT APPLICATIONS

Learning Typical input Sample IoT


Model Category Characteristics
Model data Applications

• Suitable for feature extraction,


•Machinery fault
dimensionality reduction
diagnosis
AE Generative Unsupervised Various • Same number of input and output units
• Emotion
• The output reconstructs input data
recognition
• Works with unlabeled data
• Identify
• Processes sequences of data through
movement
Serial, time- internal memory
RNN Discriminative Supervised pattern
series • Useful in IoT applications with time-
• Behavior
dependent data
detection
• Indoor
• Suitable for feature extraction,
localization
Unsupervised dimensionality reduction, and
RBM Generative Various • Energy
, Supervised classification
consumption
• Expensive training procedure
prediction

IFA’2021 64
SUMMARY OF DEEP LEARNING MODELS & IoT APPLICATIONS

Mod Learning Typical Sample IoT


Category Characteristics
el Model input data Applications
• Fault
• Suitable for hierarchical features detection
Unsupervised,
Supervised discovery classification
DBN Generative Various
• Greedy training of the network • Security
layer by layer threat
identification
Serial, • Human
• Good performance with data of
time-series, activity
long time lag
LSTM Discriminative Supervised long time recognition
• Access to memory cell is protected
dependent • Mobility
by gates
data prediction
• Convolution layers take biggest part
• Plant disease
of computations
2-D (image, detection
CNN Discriminative Supervised • Less connection compared to DNNs.
sound, etc.) • Traffic sign
• Needs a large training dataset for
detection
visual tasks.
IFA’2021 65
SUMMARY OF DEEP LEARNING MODELS & IoT APPLICATIONS

Learning Typical input Sample IoT


Model Category Characteristics
Model data Applications
VAE Generative Semi- Various • A class of Auto-encoders • Intrusion
supervised • Suitable for scarcity of labeled data detection
• Failure
detection
GAN Hybrid Semi- Various • Suitable for noisy data • Localization
supervised • Composed of two networks: a and wayfinding
generator and a discriminator • Image to text

Ladder Hybrid Semi- Various • Suitable for noisy data • Face


Net supervised • Composed of three networks: two recognition
encoders and one decoder • Authentication

IFA’2021 66
RESEARCH TRENDS AND OPEN ISSUES
Challenge 1. IoT Data CharacterisMcs

n High quality informaMon is required since the quality directly effects the
accuracy of knowledge extracMon.
n IoT data characterisMcs:
– High volume
– Fast velocity
– Variety of data
– Consists mostly of raw data
– Distributed nature
n SoluMon: SemanMc technologies tend to enhance the abstracMon of IoT data
through annotaMon algorithms, while they require further effort to overcome
its velocity and volume.
IFA’2021 67
RESEARCH TRENDS AND OPEN ISSUES

Challenge 2. IoT Applications

n Each application has its own unique features.


n IoT Applications require:
– Privacy of collected personal or business data is highly critical
– Network security and data encryption

n If security is ignored in the design and implementation, an infected


network of IoT devices can lead to a crisis.

IFA’2021 68
RESEARCH TRENDS AND OPEN ISSUES

Challenge 3. IoT Data Analytic Algorithms

n According to the characteristic of smart data, analytic algorithms should be able to handle big data
n Algorithms must be able to analyze
– Data coming from a variety of sources
– In real time

n Solution: Deep learning algorithms can reach high accuracy if they have enough data and time
– Cons:
l They can be easily influenced by noisy smart data
l Neural network based algorithms lack interpretation
(Data scientist cannot understand the reasons for the model results)
l Semi-supervised algorithms, which model a small amount of labeled data with a large
amount of unlabeled data can assist

IFA’2021 69

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy