Scalable Neural Network
Scalable Neural Network
Scalable Neural Network
Artificial Neural Networks (ANN) are multi-layer fully-connected neural nets that look like the figure
below. They consist of an input layer, multiple hidden layers, and an output layer. Every node in one
layer is connected to every other node in the next layer. We make the network deeper by increasing
the number of hidden layers.
BIG DATA
Perceptron Training Rule
The neural network first proceeds in the forward phase where the current weights are
used to calculate the target values. If there are n output nodes, there are n target
values to be estimated and n corresponding estimates.
The overall error is then calculated and is backpropagated through the network to
modify each weight according to its contribution to the overall error.
This entire process is iterated n number of times until either a fixed number of epochs
are reached or the error falls below a predefined threshold value.
Mapper
1. Reads weights from HDFS to initialise the network
2. Reads samples pertaining to its batch
3. Iteratively trains on its samples until error reaches < 0.01
4. Instantiates WeightWritable object with current weights
5. Outputs key-value pairs of <Long, Writable>
Reducer
1. Reads weights from each mapper and cumulate values
2. For each weight wi,j, calculates average of this weight value
across all mapper outputs
3. Updates new weight values
4. Compares weight differences from previous iteration
5. If difference > some predefined threshold, outputs 1 (start a
new iteration)
6. Else, outputs 0 (finish training)
BIG DATA
ANN using Map Reduce - Implementation 1
BIG DATA
ANN using Map Reduce - Implementation 2 (Liu)
Cascading Model
The implementation also uses a cascading model in
order to improve classification accuracy -
● Let there by cn class and a mappers that
grouped into g groups.
● In each iteration, training data of a certain
class is inputted into BPNNs of each group
and trained on that class.
● Then the entire classification data is inputted
to each of the g groups.
● Any instances that belong to classes that the
BPNNs already trained on will be predicted
correctly.
● Any instances from the remaining classes will
be predicted incorrectly and be called the
errorset and be used as input to the next
iteration until all cn classes are trained on.
BIG DATA
ANN using Map Reduce - Implementation 3 (Liu)
In the realm of artificial neural networks (ANNs), backpropagation neural networks (BPNNs) are the
most popular and are known to be capable of approximating complex nonlinear functions with
arbitrary precision with an enough number of neurons.
A commonly discussed problem however, is the complexity concomitant with the backpropagation
algorithm which may be assuaged by the use of parallel algorithms.
This paper presents three different Map Reduce based parallel implementations of ANN to deal with
different data intensive scenarios -
● MRBPNN 1 - Scenario where test data to be classified is very large
● MRBPNN 2 - Scenario where training data is very large
● MRBPNN 3 - Scenario where number of neurons in BPNN is very large
In all the three scenarios, data is inputted to the BPNNs in the form -
<⟨instancek, targetk, type⟩> where,
● instancek represents the current instance
● targetk, represents the desired target class for current instance
● type represents train or test instance (when type=test, target field is empty)
BIG DATA
ANN using Map Reduce - Implementation 3
MRBPNN - 1
This Map Reduce based model is applicable in scenarios where
test data is very large.
● Let there be n mappers, each mapper initialises a BPNN
● Each mapper receives entire training data as training input
but only a subset of test data as testing input
● Each mapper stochastically trains on the training instances
one by one until all training instances are processed
● Each mapper than processes the test instances available to
it, and output a key-value pair of the form < instancek,ojm>
where ojm is the output of the mth mapper
● Reducer collects all the output key-value pairs from all the
mappers and performs majority voting for each key
(instance) and outputs final classification for each test
instance.
BIG DATA
ANN using Map Reduce - Implementation 3
MRBPNN - 2
This Map Reduce based model is applicable in scenarios where training
data is very large.
● This model uses balanced bootstrapping to create n bootstrapped
sets for n mappers - one for each mapper
● This is done because splitting the entire samples T into n equal
parts - one for each mapper - will leave too few training instances
per mapper (classification accuracy 📉)
● Each mapper reads its corresponding bootstrapped set from its
HDFS file and performs stochastic training on the instances marked
as type=train.
● Each mapper run the feedforward phase on instances marked as
type=test and produced output in the form of a key-value pair
<instancek, ojm>
● Reducer collects outputs from all mappers and performs majority
voting to finally classify each test instance.
BIG DATA
ANN using Map Reduce - Implementation 3
MRBPNN - 3
This Map Reduce based model is applicable in scenarios where large
number of neurons are present in the BPNN.
● In this implementation, there are a number of iterations to the
Map Reduce Jobs - for a network of l layers, there are l - 1 MR
jobs that run
● Feedforward phase runs in all l - 1 iterations with
backpropagation running only in the final iteration
● In each iteration, mappers input one record from file and
generate their outputs directed to some reducer k
● These k reducers in turn specify mapper k’ to read output for
next iteration
● The above steps keep looping until last round in which only one
reducer computes new weights and biases for each layer based
on current instance
● This entire process repeats for each instance in the dataset
BIG DATA
ANN using Map Reduce - Implementation 4 (Chen)
ANNs working on large datasets are computationally inefficient and parallelisation of these
algorithms may help in improving efficiency and accuracy.
The current solutions implementing BPNN in a parallelised manner have unsolved challenges such as
difficulty in generating convergent global BPNN and training process getting trapped in the local
minimum.
This paper presents a novel approach that introduces a genetic algorithm based Evolution Algorithm
that views local BPNNs as candidates in a population and efficiently generates the ideal global BPNN
candidate.
Gradient Descent, an algorithm that is known to fall into local optimums, is combined with Evolution
Algorithm, an algorithm that can more efficiently land at the global optimum and is much less
efficient to initial conditions.
Finally, Random Project is introduced to improve the training efficiency further. Experiments show
that the algorithm can improve the training efficiency and accuracy remarkably on a high-
dimensional big dataset.
BIG DATA
ANN using Map Reduce - Implementation 4
Throughout the entire process a BPNN candidate is also expressed in terms of its weight matrix LM as a
trained ANN can be defined as the collection of all its weights
The entire algorithm is split into three main stages -
Local Training Stage
● Each map task reads a split of the entire dataset and an initial global BPNN candidate (randomly
weighted initially)
● There will then be m splits of the data - S1, S2, …, Sm and m local BPNNs - LM1, LM2, …, LMm
● Before a reduce task pulls local BPNNs to form the global BPNN, all the local BPNNs on a certain
node are merged with the average of their connection weights in order to reduce I/O.
● Eventually n pairs of { <Ki, LMi >|1 <= i <= n} will be written into files, where Ki corresponding to the
ID of the current node is the key of the local BPNN LMi, and n is the number of nodes of the current
cluster.
Test Stage
● The fitness level of the candidates of the population
is actually evaluated based on their performance
against the testBPNN function.
● For each LM, the error ei between the supposed
output and the actual output on the testBPNN
function is calculated.
● The LM that has the smallest ei and ei <= 𝛿 (𝛿 being a
predefined threshold) will be chosen as the GM or
the global BPNN.
● If no LM has an error ei <= 𝛿, then the LM with the
smallest error is chosen as the GM.
● In this way, in the next iteration, the mappers will
read a global BPNN of higher quality than the
previous iteration.
BIG DATA
ANN using Map Reduce - Implementation 4
BIG DATA
Cascade SVM
● SVMs are really powerful classification algorithms, however their storage and
compute requirements increase rapidly with the number of training vectors.
● The crux of an SVM is the Quadratic Programming Problem which scales with
the cube of the number of training vectors O(k3).
● Support vectors from two SVMs are combined and they are adjusted to
optimize the combined subset. This step goes on iteratively until satisfactory
accuracy is achieved.
● For a new iteration, the SVMs in the first layer receive all the support vectors
of the last layer as input.
BIG DATA
Cascade SVM - Filtering
● Another way of saying this is, the interior points in the subset is
likely to be the interior points of the whole set. Therefore, non- Subset 1 Subset 2
support vectors of a subset are likely to be non-support vectors
of the whole set and we can eliminate them from further
analysis.
Subset 1 + Subset 2
BIG DATA
SVMs using Spark - Cascade SVM
● The data is randomly divided but ensures that the ratio of positive and
negative classes in each subset is equal. This is done so that there isn’t any
extreme update to the global support vector.
● The support vectors and nonsupport vectors of each layer may be stored
back to HDFS to be used as input for the next layers.
● Nonsupport vectors (NoSV) are those that violate the training rule/results
of the other subset.
BIG DATA
Decision Trees (C4.5) using Map reduce Intro
● Decision Trees are classifiers that work on recursive partitioning over an instance space.
● C4.5 is an extension of the ID3 algorithm, to take care of continuous values and handle
incomplete data with missing values.
● Each internal node is a decision node which represents an attribute/subset of attributes.
● Each Edge Represents a specific value or range
● Leaf Nodes represent the class label.
Attribute Table:
● Consider an attribute ‘a’
● Most Basic Data Structure, stores data pertaining to the attribute, in the form
of <row_id, attribute_value[a], class_value>
Count Table:
● This computes the count of instances with specific class labels, if split by
attribute a.
● Stores data for each attribute like: <class_label, count>
Hash Table:
● This is the important one, it stores the link information between tree nodes
and row_id, as well as the node and its branches.
BIG DATA
Decision Trees - Data Preparation in Map reduce
Based of the split made in the earlier step, we now have to update the
count table and the hash table
Map only Jobs are now started as follows:
● The map_update_count function takes in the record from the
attribute table pertaining to the splitting attribute a_best, and emits
the count of the class labels.
● The map_hash function assigns a node_id to best attribute found in
the previous step to make sure that records with the same values
are split into the same partition
BIG DATA
Decision Trees - Growing the Tree🌲