Machine Learning Algorithms For Iot

CHAPTER 9.
MACHINE LEARNING ALGORITHMS FOR IoT
IFA’2021 1
DATA SCIENCE OVERVIEW
IFA’2021 2
MACHINE LEARNING
IFA’2021 3
How to deal with so many devices and huge amount of data in IoT?
• Device Management
* Number of devices in IoT is extremely large
* Connection between them and with the sinks over very large distances
* Connectivity of devices is very important
* Very large collected data must be managed efficiently
• Device Diversity and Interoperability

* Many products from many different companies
• Integration of Data from Multiple Sources
* Very large data will be collected from different sources such as sensors, mobile devices, etc
* Interpretation of these data is challenging
• Scale, Data Volume, and Performance

* Big Data Problem (how to handle and analyze the data)
• Flexibility and Evolution of Applications
* New use cases and new business models
IFA’2021 4
BOTTOMLINE
n Many IoT applications: health, transportation, smart home, smart city,

agriculture, education, etc.
n Main element of most of these applications is an intelligent learning

mechanism for prediction (i.e., regression, classification, and
clustering), data mining and pattern recognition or data analytics in
general.
IFA’2021 5
MACHINE LEARNING BASICS
n Traditional Programming
Data (Input)
Computer Output
Model
n Machine Learning
Data (Input)
Computer Learned Model
Output
IFA’2021 6
MACHINE LEARNING BASICS
n Machine learning gives computers/machines the ability to learn without being explicitly programmed
Machine Learning
Training Data Algorithm
Prediction
Testing Data Learned

Prediction
Model
n It consists of methods that can learn from and make predictions on data
n If training Data is labeled à Supervised Learning

n If it is unlabeled à Unsupervised Learning
IFA’2021 7
EXAMPLES OF MACHINE LEARNING PROBLEMS
n Computer Vision & Speech Processing & Data Analytics
n Pattern Recognition
– Facial identities or facial expressions
– Handwritten or spoken words (e.g., Siri)
– Medical images
– Sensor Data/IoT
n Pattern Generation
– Generating images or motion sequences
n Anomaly Detection
– Unusual patterns in the telemetry from physical and/or virtual
plants
– Unusual sequences of credit card transactions Facial Recognition-
– Unusual patterns of sensor data from a nuclear power plant Pattern Recognition Example
n Prediction
– Future stock prices or currency exchange rates
IFA’2021 8
EXAMPLES OF MACHINE LEARNING PROBLEMS
n Object Recognition Example:
Object Detected: Motorbike
IFA’2021 9
PURPOSE OF MACHINE LEARNING ALGORITHMS
n Development of computer models for learning processes that:

– Provide solutions to the problem of knowledge acquisition,
– Enhance the performance of developed systems
n Adoption of computational methods to improve machine performance by

detecting and describing consistencies and patterns in training data
IFA’2021 10
MACHINE LEARNING ALGORITHMS: INTRODUCTION
n ML was introduced in the late 1950's as a technique for AI

n Over time, its focus evolved and shifted more to algorithms which are computationally
viable and robust
n In the last decade, ML techniques are used for:
– Classification
– Regression and
– Density estimation
in a variety of applications such as bioinformatics, speech recognition, spam detection,
computer vision, fraud detection, wireless and wired networks, computer architectures.
n Algorithms and techniques come from diverse fields including statistics, mathematics,
neuroscience, electrical engineering, mechanical engineering, industrial/systems
engineering, computer science etc.
IFA’2021 11
MACHINE LEARNING ALGORITHMS: MAIN CATEGORIES
A ML algorithm takes a set of samples as an input named a training set.

Three main categories of learning: Supervised, Unsupervised, Reinforcement
1. Supervised (Inductive) Learning:

Training set consists of samples of input vectors together with their corresponding
appropriate target vectors, also known as labels. Training data include desired outputs
2. Unsupervised Learning:
No labels are required for the training set. Training data do not include desired outputs.
3. Reinforcement Learning
Deals with the problem of learning the appropriate action or sequence of actions to be
taken for a given situation in order to maximize payoff. Rewards from sequence of actions
FOCUS: SUPERVISED and UNSUPERVISED LEARNING since they have been and are still widely
applied in IoT smart data analysis.
IFA’2021 12
MACHINE LEARNING ALGORITHMS:
SUPERVISED LEARNING
n Objective is to learn how to predict the appropriate output vector for a given input
vector.
n Applications where the target labels consist of a finite number of discrete

categories are known as classification tasks. (Classes are pre-defined; Mixed
input; ML classifies them)
n Cases where the target labels are composed of one or more continuous
variables are known as regression tasks.
IFA’2021 13
SUPERVISED LEARNING EXAMPLE
Known
Data
It’s an
apple!
Apples Model New Response
Known
Response New Data
(Labels)
IFA’2021 14
UNSUPERVISED LEARNING
n Defining the objective of unsupervised learning is difficult.
n One of the major objectives is to identify sensible clusters of similar samples

within the input data, known as clustering. (Classes are not defined; ML
clusters similar data)
n Moreover, the objective may be the discovery of a useful internal

representation of the input data by preprocessing the original input variable in
order to transfer it into a new variable space.
n This preprocessing stage can significantly improve the result of the subsequent
machine learning algorithm and is named feature extraction.
IFA’2021 15
UNSUPERVISED LEARNING: CLUSTERING EXAMPLE
Pattern
Detected!
Model
Input Data (Clustering
(No Labels) Algorithm)
Response (Clustered)
(without knowing their
classes)
IFA’2021 16
OVERVIEW OF ML ALGORITHMS
Data
Regression/
DATA Classification/ Data
DATA DATA DATA FEATURE
CLASSIFICATION/ Clustering/
CLASSIFICATION REGRESSION REGRESSION CLUSTERING EXTRACTIO
Feature Anomaly
N
Extraction Detection
• k-Nearest • Linear • Classification
• k-Means * Principal
Neighbor Regression and Component
Regression Analysis One-class
• Density-Based
• Naïve Bayes Trees Feed Forward Support
• Support Spatial
* Canonical Neural Network* Vector
Clustering of
• Support Vector • Random Correlation
Machines
Applications Analysis
Vector Regression Forests with Noise
Machines
• Bagging
IFA’2021 17
OVERVIEW OF ML ALGORITHMS AND THEIR USE CASES IN IoT
Machine Learning Algorithm IoT, Smart City Use Cases Metric to Optimize
Classification Smart Traffic Traffic Prediction, Increase Data Abbreviation
Clustering Smart Traffic, Smart Health Traffic Prediction, Increase Data Abbreviation
Traffic Prediction, Increase Data Abbreviation,
Anomaly Detection Smart Traffic, Smart Environment
Finding Anomalies in Power Dataset
Support Vector Regression Smart Weather Prediction Forecasting
Linear Regression Economics, Market analysis, Energy usage Real Time Prediction, Reducing Amount of Data
Classification and Regression Trees Smart Citizens Real Time Prediction, Passengers Travel Pattern
Support Vector Machine All Use Cases Classify Data, Real Time Prediction
Passengers' Travel Pattern, Efficiency of the
K-Nearest Neighbors Smart Citizen
Learned Metric
Food Safety, Passengers Travel Pattern,
Naive Bayes Smart Agriculture, Smart Citizen
Estimate the Numbers of Nodes
Outlier Detection, fraud detection, Analyze
Smart City, Smart Home, Smart Citizen, Small Data set, Forecasting Energy
k-Means
Controlling Air and Traffic Consumption, Passengers Travel Pattern,
Stream Data Analyze
IFA’2021 18
OVERVIEW OF ML ALGORITHMS AND THEIR USE CASES IN IoT
Machine Learning Algorithm IoT, Smart City Use Cases Metric to Optimize
Labeling Data, Fraud Detection, Passengers Travel
Density-Based Clustering Smart Citizen
Pattern
Reducing Energy Consumption, Forecast the
Feed Forward Neural Network Smart Health States of Elements, Overcome the Redundant
Data and Information
Principal Component Analysis Monitoring Public Places Fault Detection
Canonical Correlation Analysis Monitoring Public Places Fault Detection
One-class Support Vector Machines Smart Human Activity Control Fraud Detection, Emerging Anomalies in the data
IFA’2021 19
NEURAL NETWORKS: INTRODUCTION
n Neural Networks are a biologically-inspired programming paradigm which enable a

computer to learn from observational data.
n It is modelled after the human brain and the nervous system.

– Process information much more like the brain than a serial computer
n 2 most important properties
– Highly parallel
– Learning
n It is based on very simple principles but shows very complex behaviors.

n Applications
– As powerful problem solvers: Speech Recognition, Computer Vision
– As biological models
IFA’2021 20
NEURAL NETWORKS: BIOLOGICAL NEURONS
n We are born with about 100 billion neurons

n Human brain contains approx. 1011 neurons where each connects to approx. 104 others
n Signals “move” via electrochemical signals
n Synapses release a chemical transmitter – the sum of which can cause a threshold to
be reached – causing the neuron to “fire”
n Synapses can be inhibitory or excitatory
Dendrites
Axon
Soma
Basic computational Synapse

unit of brain: Neuron
IFA’2021 21
Neural Networks: Biological vs Artificial Neurons
Neuron vs Node
x1 x1
x2
y1 x2 f(x) y1
xn
xn
Synapse vs Weights
Synapse
xi yi
Weight
Inputs are received by dendrites, and if the input levels are over a threshold, the neuron fires,
passing a signal through the axon to the synapse which then connects to another neuron.
IFA’2021 22
NEURAL NETWORKS: INTRODUCTION
n An artificial network consists of a pool of simple processing units, also

called neurons or nodes, which communicate by sending signals to each
other over a large number of weighted connections.
Input Hidden
Layer Output
Mapping brain to Layer
neural networks
Layer
Inputs Outputs
Weights
IFA’2021 Nodes
23
NEURAL NETWORKS:
FEEDFORWARD NEURAL NETWORK
n Input Nodes: Input Hidden Output

– Provide information from the outside world to the
network and are together referred to as the “Input Layer Layer Layer
Layer”.
n Hidden Nodes:
– No direct connection with the outside world (hence
the name “hidden”).
– They perform computations and transfer information
from the input nodes to the output nodes.
– A collection of hidden nodes forms a “Hidden Layer”.
– While a feedforward network will only have a single
Output 1
input layer and a single output layer, it can have zero
or multiple Hidden Layers.
n Output Nodes: Output 2

– Are collectively referred to as the “Output Layer” and
are responsible for computations and transferring
information from the network to the outside world.
Information
IFA’2021 24
NEURAL NETWORKS:
FEEDFORWARD NEURAL NETWORK
n Information flow is unidirectional Input Hidden Output

– Data is presented to Input layer Layer Layer Layer
– Passed on to Hidden Layer
– Passed on to Output layer
n Information is distributed
Output 1
n Information processing is parallel
Output 2
Internal representation
(interpretation) of data Information
IFA’2021 25
PERCEPTRONS: FORWARD PROPAGATION
Linear (weighted)
BIAS combination of inputs
b=1 Output
𝜽𝟎
𝐦
𝜽𝟏 𝐲! = 𝐠 𝛉𝟎 + ' 𝐱𝐢 𝛉𝐢
x1 𝐢&𝟏
𝜽𝟐 ∑ ŷ Non-linear Bias
x2 acivaion funcion
𝜽𝒎
xm
Inputs Weights Sum Activation Output
Function
IFA’2021 26
ACTIVATION (TRANSFER) FUNCTIONS
+1
Output +1 Output Output
t
Input
Input 0
Input
𝑆𝑇𝐸𝑃: 𝑓(𝑛) = 𝑢( (𝑛) SIGMOID : f (n) = 1 LINEAR : f (n) = n

-n
1 e
+
n Other common activation Step – At a threshold output becomes 1

functions: Sigmoid – Compress a real value to a number between 0 and 1 (0.9 is usually
– Hyperbolic Tangent considered as 1 since it never reaches 1, and 0.1 for zero)
– Rectified Linear Unit Linear – Weighted sum of inputs as activation level
IFA’2021 27
NEURAL NETWORKS:
ACTIVATION (TRANSFER) FUNCTIONS
n Ac;va;on func;ons introduce non-lineari;es into the network (the world is non-linear)
n For example, if we want to build a neural network to disOnguish between red and green points:
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Linear activation functions will produce linear NonlineariOes allow us to approximate

decisions no matter the network size arbitrarily complex funcOons
IFA’2021 28
BUILDING NEURAL NETWORKS WITH PERCEPTRONS:
SINGLE LAYER NEURAL NETWORK
n Examining a single hidden unit z2:

z1
– Take weighted combination of inputs
x1 (𝟏)
𝜽𝟏,𝟐
and apply bias
(𝟏) z2 ŷ1
𝜽𝟐,𝟐
x2
n The procedure stays the same for each
hidden unit, but the values may be (𝟏) z3 ŷ2
xm 𝜽𝒎,𝟐
different due to different weights from
input to hidden layer zd1
𝒎
(𝟏) (𝟏)
𝒛𝟐 = 𝜽𝟎,𝟐 +% 𝒙𝒋𝜽𝒋,𝟐
𝒋(𝟏
(𝟏) (𝟏) (𝟏) (𝟏)
= 𝜽𝟎,𝟐 + 𝒙𝟏𝜽𝟏,𝟐 +𝒙𝟐𝜽𝟐,𝟐 +𝒙𝒎𝜽𝒎,𝟐
IFA’2021 29
n Add a single hidden layer 𝜽(𝟏) z1 𝒈(𝒛𝟏 ) 𝜽(𝟐)

between input and output x1
of previous network 𝒈(𝒛𝟐 )
z2 ŷ1
n Hidden layer cannot be x2
observed from the z3 ŷ2
outside, it is learned xm 𝒈(𝒛𝟑 )
zd1
n Two weight matrices: 𝒈(𝒛𝒅𝟏 )
– 𝜽(𝟏) : input to hidden Inputs Hidden
𝒎
(𝟐) 𝒛𝒊 =
(𝟏)
𝜽𝟎,𝒊 +%
(𝟏)
𝒙𝒋 𝜽𝒋,𝒊 Final Output
– 𝜽 : hidden to output 𝒋(𝟏 𝒅𝟏
(𝟐) (𝟐)
'𝒊 =
𝒚 𝜽𝟎,𝒊 +% 𝒈(𝒛𝒋) 𝜽𝒋,𝒊
𝒋(𝟏
IFA’2021 30
n Connections between the z1

layers are replaced by ☒
symbol
x1
n The ☒ symbol denotes fully z2 ŷ1
connected or dense layers x2
n Fully connected layers: z3 ŷ2
– each input to the layer is xm
connected to each output zd1
of the layer
Inputs Hidden Final Output
IFA’2021 31
MULTI LAYER
Stack muliple hidden layers back to

Zk,1 back, to create muli-layer (deep)
neural networks
x1
Zk,2 ŷ1
x2 ... ...
Zk,3
ŷ2
xm
Zk,dk
Inputs Hidden Final Output

𝒅𝒌#𝟏
(𝒌) (𝒌)
𝒛𝒌,𝒊 = 𝜽𝟎,𝒊 +% 𝒈(𝒛𝒌,𝟏,𝒋 )𝜽𝒋,𝒊
𝒋*𝟏
IFA’2021 32
FEEDFORWARD NEURAL NETWORKS:
TWO TYPES
Hidden
Layer Output
Layer
Single Layer Perceptron with Multi Layer Perceptron with

Multiple Inputs ( Direct Multiple Inputs
outputs) (one or more hidden layers)
(no hidden layers)
IFA’2021 33
FEEDFORWARD NEURAL NETWORKS:
MULTI LAYER PERCEPTRON
Inputs First Hidden Second Hidden Output

Layer Layer Layer
IFA’2021 34
NEURAL NETWORKS: MODEL TRAINING MECHANISM
Training Dataset
x1 x2 x3 x4 y1 y2 y3
. . . . . . .
. . . . . . .
. . . . . . .
Hidden Layers Prediction

x1
!1
𝒚 Loss y1
x2 Function
!2
𝒚 y2
x3 min
!3 ,𝒊||
||yi – 𝒚 y3
𝒚
x4 y
Backpropagation
Input
IFA’2021 35
PERCEPTRONS: TRAINING PERCEPTRONS
n Learning involves choosing values for the weights

n Our goal in training is to find the best set of weights and biases that minimizes
the loss funcJon.
n Loss funcJon: measures the difference between the predicted value and the
actual value.
Step 4: Adjust Parameters
Step 1 Weight Qi Step 2 Estimated Step 3

Step 5 Parameter ))
output (𝒚 Error
xi Bias (b) Compute
Labeled
Values Error
Input Perceptron Target
Examples output (y)
y
IFA’2021 36
STEP 1:
– Inputs are given random weights (usually between –0.5 and 0.5)
STEP2:
– An item of training data (x,y) pair is presented, and the perceptron provides es;mated output (!
𝒚)
STEP3:
– Loss func;on ||yi – 𝒚#𝒊|| computes the error
STEP4:
– Based on the error, the weights are modified according to (also known as back propaga;on):
)𝒊 ))
𝜽𝒊 ← 𝜽𝒊 +(𝒂 ∗ 𝒙𝒊 ∗ ( 𝒚𝒊 − 𝒚
)𝒊 is the output generated by the

where yi is the target output for the training example, 𝒚
perceptron and a is the learning rate, between 0 and 1 (usually small such as 0.1)
STEP 5:
– Cycle through training data (x,y) elements unOl successfully classify all examples
l Each cycle known as an epoch
IFA’2021 37
n Naturally, the right values for the weights and biases determine the accuracy of the
predicions.
n The process of fine-tuning the weights and biases from the input data is known as training
the Neural Network.
n Determine a weight vector that causes the perceptron to produce correct +-1 output for
each of the given training examples
n Learning rate a can decay over ime

n Converges to successful weight:
– When classifies correctly, error is 0 and weight does not change
– If xi>0, then increasing Qi brings it closer to the correct classificaion
IFA’2021 38
PERCEPTRONS: DELTA RULE AND GRADIENT DESCENT
n Perceptron training rule only converges within a finite number of steps to a

weight vector that correctly classifies all training data when training data are
linearly separable and has a small learning rate a
n If the training data are not linearly separable another approach called the d rule
using gradient descent
– Same basic rule for finding the update values for weights
– Changes/Differences
l Do not incorporate the threshold in the output value (un-thresholded
perceptrons)
l Wait to update the weights unMl the cycle is complete
IFA’2021 39
CONVERGENCE OF DELTA RULE
Converges asymptotically toward the minimum error hypothesis, possibly

requiring unbounded time, but converges regardless of whether the training
data are linearly separable
n Converges because there is a single global minimum
n If a is too large, gradient descent may overstep the minimum, so a is usually

reduced as the number of steps grow
n Can be slow – required 1000s of steps (cycles)
n If there are multiple local minima, might not find global minima
IFA’2021 40
PERCEPTRONS: BACKPROPAGATION
(𝒍)
n Each hidden node 𝑗 is “responsible” for some fraction of the error 𝛿𝒋 in
each of the output nodes to which it connects
(𝒍)
n 𝛿𝒋 is divided according to the strength of the connection between hidden
node and the output node
n Then, the “blame” is propagated back to provide the error values for the
hidden layer
IFA’2021 41
PERCEPTRONS: BACKPROPAGATION ALGORITHM
n Initialize all network weights with small random numbers
n Until termination condition is met, Do

– For each <x,y> in training data, Do
Propagate the input forward through the network:
l Input the instance x to the network and compute the output 𝒚 0u of every unit u in the network
Propagate the errors backward through the network:
l For each network output unit k, calculate its error term δk: 𝛿- ← 𝒚 )𝑘(1 − 𝒚 )𝑘)(𝑦- − 𝒚 )𝑘 )
l For each hidden unit h, calculate its error term δh:
𝛿. ← 𝑦3. (1 − 𝑦3. ) % 𝜃-. 𝛿-
-∈0123124
l Update each network weight 𝜽ji: 𝜽34 ← 𝜽34 + Δ𝜽34
where: Δ𝜽56 = 𝑏𝛿5 𝑥56
IFA’2021 42
PERCEPTRONS: BACKPROPAGATION ALGORITHM
n Update weights incrementally

n known as a stochastic approximation to gradient descent
n When to stop:
– After fixed number of iterations
– Error falls below some threshold
– Once the error on a separate validation set of examples meets some criterion
n Important question – come back to
– May not find the global minimum because many local minima.
– Can run several times to find global minima – in practice it works well
IFA’2021 43
+1 +1 +1
(𝟑) (𝟑) (𝟒) (𝟒)
𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏
𝒙𝟏
(𝟐)
𝒛𝟏
→ 𝒂𝟏𝟐 = 𝒈(𝒛𝟏 )(𝟐) (𝟑) (𝟒)
(𝟐) 𝜹𝟏 𝜹𝟏
𝜹𝟏
𝒙𝟐 (𝟐)
𝒛𝟐 → 𝒂𝟐
(𝟐) (𝟑)
𝒛𝟐 → 𝒂𝟐
(𝟑)
(𝟐) (𝟑)
𝜹𝟐 𝜹𝟐
(:)
δ9 = “error” of node j in layer l
IFA’2021 44
+1 +1 +1
(𝟐) (𝟐) (𝟑) (𝟑) (𝟒) (𝟒)
𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏
𝒙𝟏 (𝟐) (𝟑) (𝟒)
𝜹𝟏 𝜹𝟏 𝜹𝟏
(𝟒) (𝟒)
𝜹𝟏 = 𝒂𝟏 −𝒚
𝒙𝟐 (𝟐)
𝒛𝟐 →
(𝟐)
𝒂𝟐
(𝟑)
𝒛𝟐 →
(𝟑)
𝒂𝟐
(𝟐) (𝟑)
𝜹𝟐 𝜹𝟐
(𝒍)
𝜹𝒋 = “error” of node 𝒋 in layer 𝒍
IFA’2021 45
+1 +1 +1
(𝟐) (𝟐) (𝟑) (𝟑) (𝟒) (𝟒)
𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏
𝒙𝟏 (𝟐) (𝟑) (𝟒)
𝜹𝟏 𝜹𝟏 𝜹𝟏
(𝟑)
𝒙𝟐 (𝟐)
𝒛𝟐 →
(𝟐)
𝒂𝟐
(𝟑)
𝒛𝟐 →
(𝟑)
𝒂𝟐 𝜣𝟏𝟐
(𝟐) (𝟑)
𝜹𝟐 𝜹𝟐
(𝒍)
𝜹𝒋 = “error” of node 𝒋 in layer 𝒍 (𝟑) (𝟑) (𝟒)
𝜹𝟐 = 𝜣𝟏𝟐 ×𝜹𝟏
IFA’2021 46
+1 +1 +1
(𝟐) (𝟐) (𝟑) (𝟑) (𝟒) (𝟒)
𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏
𝒙𝟏 (𝟐) (𝟑) (𝟒)
𝜹𝟏 𝜹𝟏 (𝟑) 𝜹𝟏
𝜣𝟏𝟏
𝒙𝟐 (𝟐)
𝒛𝟐 → 𝒂𝟐
(𝟐) (𝟑)
𝒛𝟐 → 𝒂𝟐
(𝟑)
(𝟐) (𝟑)
𝜹𝟐 𝜹𝟐
(𝟑) (𝟑) (𝟒)
𝜹𝟏 = 𝜣𝟏𝟏 ×𝜹𝟏
(𝒍)
𝜹𝒋 = “error” of node 𝒋 in layer 𝒍 (𝟑) (𝟑) (𝟒)
𝜹𝟐 = 𝜣𝟏𝟐 ×𝜹𝟏
IFA’2021 47
+1 +1 +1
(𝟐) (𝟐) (𝟑) (𝟑) (𝟒) (𝟒)
𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏 𝒛𝟏 → 𝒂𝟏
𝒙𝟏 (𝟐) (𝟑) (𝟒)
𝜹𝟏 (𝟐)
𝜹𝟏
𝜹𝟏
𝜣𝟏𝟐
𝒙𝟐 (𝟐)
𝒛𝟐 → 𝒂𝟐
(𝟐) (𝟑)
𝒛𝟐 → 𝒂𝟐
(𝟑)
(𝟐) (𝟑)
𝜹𝟐 (𝟐)
𝜣𝟐𝟐 𝜹𝟐
(𝒍) (𝟐) (𝟐) (𝟑) (𝟐) (𝟑)
𝜹𝒋 = “error” of node 𝒋 in layer 𝒍 𝜹𝟐 = 𝜣𝟏𝟐 ×𝜹𝟏 + 𝜣𝟐𝟐 ×𝜹𝟐
IFA’2021 48
EXAMPLE: HOUSING MARKET
n Given the data about the previous house sales, can we predict whether
a current house will be sold or not?
Price Number of Sale

Rooms Result
$500K 5 1 (sold)
$300K 2 0 (not sold)

Training Data
$700K 1 1 (sold)
$600K 3 0 (not sold)
$100K 2 ?
IFA’2021 49
n Step 1: Forward propagation

– All weights are randomly assigned.
n Consider the network takes the first training example
– Input to the network= [500K, 5] ([price, # of rooms])
– Desired output from the network = [1,0] ([prob. of sale, prob. of no sale])
Input Hidden Layer

Layer 𝜽1 Output Layer Incorrect Output!
1 1
Probability of sale = 0.4 (Target= 1)
Price 𝜽2
500K Error= 1-0.4=0.6
𝜽3
# of rooms Probability of no sale = 0.6 (Target= 0)
5 Error= 0-0.6=-0.6
IFA’2021 50
n Step 2: Back propagation and Weight Update

– Calculate the total error at the output nodes.
– Propagate these errors back through the network using Backpropagation to calculate the
gradients
– Use an optimization method to adjust all weights with the aim of reducing the error
Input Layer Hidden Layer Output Layer

1 𝜽1’ 1
Price 𝜽2’
500K
# of rooms 𝜽3’
5
IFA’2021 51
n Step 3: Repeat the process for all data points

– Then, our network has learned from those examples
n Now let us predict if the house will be sold or not.
– Input [100K,2]
The prediction says the
Input Layer Hidden Layer house will be sold!
Output Layer
1 𝜽1’ 1
Price Probability of sale=0.8
100K 𝜽2’
# of rooms 𝜽3’
Probability of no sale=0.2
2
IFA’2021 52
EXAMPLE: STUDENT PASS/FAIL
n Given the data about the previous students, can we predict whether a
current student will pass or fail?
Hours Midterm Grade Final Exam

Studied Result
30 67 1 (pass)
12 75 0 (fail)
16 89 1 (pass) Training Data

45 56 0 (fail)
25 70 ?
IFA’2021 53
n Step 1: Forward propagation

– All weights are randomly assigned.
n Consider the network takes the first training example
– Input to the network= [30, 67] ([hours studied, midterm grades])
– Desired output from the network = [1,0] ([prob. of pass, prob. of fail])

𝜽1 Incorrect Output!
1 1
Probability of pass = 0.4 (Target= 1)
Hours studied 𝜽2
30 Error= 1-0.4=0.6
𝜽3
Midterm grades Probability of fail = 0.6 (Target= 0)
67 Error= 0-0.6=-0.6
IFA’2021 54
n Step 2: Back propagaJon and Weight Update

– Calculate the total error and the output nodes.
– Propagate these errors back through the network using Backpropaga;on to calculate the
gradients
– Use an op;miza;on method to adjust all weights with the aim of reducing the error

1 𝜽1’ 1
Hours studied 𝜽2’
30
𝜽3’
Midterm grades
67
IFA’2021 55
n Step 3: Repeat the process for all data points

– Then, our network has learned from those examples
n Now let us predict if the test student will pass or fail.
– Input [25,70]
Input Layer Hidden Layer Output Layer The prediction says the
student will pass!
1 1
𝜽1’
Hours studied Probability of pass=0.8
25 𝜽2’
Midterm grades 𝜽3’

Probability of fail=0.2
70
IFA’2021 56
ONLINE TOOL: HTTP://PLAYGROUND.TENSORFLOW.ORG
TENSORFLOW PLAYGROUND
n An interactive visualization web application

n Select:
– Learning rate
– Activation function
n Select dataset and features:
– Select ratio of training to test data
n Adjust # of layers, # of neurons in each layer
n Click PLAY to see how the network learns:
– The weights are updated at each EPOCH
– The thickness of lines connected nodes represents the weight
n Observe how the output changes
IFA’2021 57
EXAMPLE: SENSOR LOCALIZATION
n For example, sensor node localization problem (i.e., determining node's geographical position)
n Node localization can be based on propagating angle and distance measurements of the received
signals from anchor nodes.
n Such measurements may include received signal strength indicator (RSSI), time of arrival (TOA),
and time difference of arrival (TDOA) as in Figure.
n After several training, the neurons can compute the location of the node.
Illustration example of node localization in WSNs in 3D space using

IFA’2021
supervised neural networks 58
DEEP LEARNING: INTRODUCTION
n Use several layers of Neural Networks

n Learn data representaions with muliple level of abstracions using a training set
IFA’2021 59
DEEP LEARNING: INTRODUCTION
nIt does not require feature extraction
Car
Not Car
Car
Not Car
IFA’2021 60
DEEP LEARNING: PERSPECTIVE
Artificial Intelligence
Machine Learning
Any technique A subset of AI that
that enables includes complex
computers to staOsOcal
Deep Learning
mimic human techniques that
intelligence. enable machines The subset of machine
to improve tasks learning composed of
with data. algorithms that permit
socware to train itself in a
mulOlayered neural networks
with vast amount of data.
IFA’2021 61
DEEP LEARNING: VITAL FOR IoT
§ DL has been actively utilized in many IoT applications in recent years.
§ DL and IoT are among the top 3 strategic technology trends for 2017 that
were announced at Gartner Symposium/ITxpo 2016
§ Because traditional ML approaches do not address the emerging analytic

needs of IoT systems.
§ IoT systems need different modern data analytic approaches and AI methods
according to the hierarchy of IoT data generation and management.
IFA’2021 62
DEEP LEARNING ALGORITHMS:
• Convolutional Neural Networks

• Recurrent Neural Networks
• Long Short Term Memory
• AutoEncoders
• Variational AutoEncoders
• Generative Adverserial Networks
• Restricted Boltzmann Machine
• Deep Belief Networks
• Ladder Networks
IFA’2021 63
SUMMARY OF DEEP LEARNING MODELS & IoT APPLICATIONS
Learning Typical input Sample IoT

Model Category Characteristics
Model data Applications
• Suitable for feature extraction,

•Machinery fault
dimensionality reduction
diagnosis
AE Generative Unsupervised Various • Same number of input and output units
• Emotion
• The output reconstructs input data
recognition
• Works with unlabeled data
• Identify
• Processes sequences of data through
movement
Serial, time- internal memory
RNN Discriminative Supervised pattern
series • Useful in IoT applications with time-
• Behavior
dependent data
detection
• Indoor
• Suitable for feature extraction,
localization
Unsupervised dimensionality reduction, and
RBM Generative Various • Energy
, Supervised classification
consumption
• Expensive training procedure
prediction
IFA’2021 64
Mod Learning Typical Sample IoT

Category Characteristics
el Model input data Applications
• Fault
• Suitable for hierarchical features detection
Unsupervised,
Supervised discovery classification
DBN Generative Various
• Greedy training of the network • Security
layer by layer threat
identification
Serial, • Human
• Good performance with data of
time-series, activity
long time lag
LSTM Discriminative Supervised long time recognition
• Access to memory cell is protected
dependent • Mobility
by gates
data prediction
• Convolution layers take biggest part
• Plant disease
of computations
2-D (image, detection
CNN Discriminative Supervised • Less connection compared to DNNs.
sound, etc.) • Traffic sign
• Needs a large training dataset for
detection
visual tasks.
IFA’2021 65
Learning Typical input Sample IoT

Model Category Characteristics
Model data Applications
VAE Generative Semi- Various • A class of Auto-encoders • Intrusion
supervised • Suitable for scarcity of labeled data detection
• Failure
detection
GAN Hybrid Semi- Various • Suitable for noisy data • Localization
supervised • Composed of two networks: a and wayfinding
generator and a discriminator • Image to text
Ladder Hybrid Semi- Various • Suitable for noisy data • Face

Net supervised • Composed of three networks: two recognition
encoders and one decoder • Authentication
IFA’2021 66
RESEARCH TRENDS AND OPEN ISSUES
Challenge 1. IoT Data CharacterisMcs
n High quality informaMon is required since the quality directly effects the
accuracy of knowledge extracMon.
n IoT data characterisMcs:
– High volume
– Fast velocity
– Variety of data
– Consists mostly of raw data
– Distributed nature
n SoluMon: SemanMc technologies tend to enhance the abstracMon of IoT data
through annotaMon algorithms, while they require further effort to overcome
its velocity and volume.
IFA’2021 67
Challenge 2. IoT Applications
n Each application has its own unique features.

n IoT Applications require:
– Privacy of collected personal or business data is highly critical
– Network security and data encryption
n If security is ignored in the design and implementation, an infected

network of IoT devices can lead to a crisis.
IFA’2021 68
Challenge 3. IoT Data Analytic Algorithms
n According to the characteristic of smart data, analytic algorithms should be able to handle big data
n Algorithms must be able to analyze
– Data coming from a variety of sources
– In real time
n Solution: Deep learning algorithms can reach high accuracy if they have enough data and time
– Cons:
l They can be easily influenced by noisy smart data
l Neural network based algorithms lack interpretation
(Data scientist cannot understand the reasons for the model results)
l Semi-supervised algorithms, which model a small amount of labeled data with a large
amount of unlabeled data can assist
IFA’2021 69

Machine Learning Algorithms For Iot

Uploaded by

Copyright:

Available Formats

Machine Learning Algorithms For Iot

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Algorithms For Iot

Uploaded by

Copyright:

Available Formats

CHAPTER 9.

MACHINE LEARNING ALGORITHMS FOR IoT

• Device Diversity and Interoperability

• Scale, Data Volume, and Performance

n Many IoT applications: health, transportation, smart home, smart city,

n Main element of most of these applications is an intelligent learning

Testing Data Learned

n If training Data is labeled à Supervised Learning

n Development of computer models for learning processes that:

n Adoption of computational methods to improve machine performance by

n ML was introduced in the late 1950's as a technique for AI

A ML algorithm takes a set of samples as an input named a training set.

1. Supervised (Inductive) Learning:

n Applications where the target labels consist of a finite number of discrete

Apples Model New Response

n Defining the objective of unsupervised learning is difficult.

n One of the major objectives is to identify sensible clusters of similar samples

n Moreover, the objective may be the discovery of a useful internal

n Neural Networks are a biologically-inspired programming paradigm which enable a

n It is modelled after the human brain and the nervous system.

n It is based on very simple principles but shows very complex behaviors.

n We are born with about 100 billion neurons

Basic computational Synapse

n An artificial network consists of a pool of simple processing units, also

n Input Nodes: Input Hidden Output

n Output Nodes: Output 2

n Information flow is unidirectional Input Hidden Output

𝑆𝑇𝐸𝑃: 𝑓(𝑛) = 𝑢( (𝑛) SIGMOID : f (n) = 1 LINEAR : f (n) = n

n Other common activation Step – At a threshold output becomes 1

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Linear activation functions will produce linear NonlineariOes allow us to approximate

n Examining a single hidden unit z2:

n Add a single hidden layer 𝜽(𝟏) z1 𝒈(𝒛𝟏 ) 𝜽(𝟐)

n Connections between the z1

Stack muliple hidden layers back to

Inputs Hidden Final Output

Single Layer Perceptron with Multi Layer Perceptron with

Inputs First Hidden Second Hidden Output

Hidden Layers Prediction

n Learning involves choosing values for the weights

Step 1 Weight Qi Step 2 Estimated Step 3

)𝒊 is the output generated by the

n Learning rate a can decay over ime

– If xi>0, then increasing Qi brings it closer to the correct classiﬁcaion

n Perceptron training rule only converges within a ﬁnite number of steps to a

Converges asymptotically toward the minimum error hypothesis, possibly

n Converges because there is a single global minimum

n If a is too large, gradient descent may overstep the minimum, so a is usually

n Initialize all network weights with small random numbers

n Until termination condition is met, Do

n Update weights incrementally

n Important question – come back to

Price Number of Sale

$300K 2 0 (not sold)

$600K 3 0 (not sold)

n Step 1: Forward propagation

Input Hidden Layer

n Step 2: Back propagation and Weight Update

Input Layer Hidden Layer Output Layer