0% found this document useful (0 votes)
16 views

Chapter 11 Neural Nets (Python)

Chapter 11 discusses neural networks and their application in data mining for business analytics using Python. It explains the structure of neural networks, including input, hidden, and output layers, and details the training process involving weight adjustments and backpropagation. The chapter also highlights the advantages and disadvantages of neural networks, particularly in relation to deep learning and its effectiveness in image and voice recognition.

Uploaded by

orselmerve2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Chapter 11 Neural Nets (Python)

Chapter 11 discusses neural networks and their application in data mining for business analytics using Python. It explains the structure of neural networks, including input, hidden, and output layers, and details the training process involving weight adjustments and backpropagation. The chapter also highlights the advantages and disadvantages of neural networks, particularly in relation to deep learning and its effectiveness in image and voice recognition.

Uploaded by

orselmerve2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Chapter 11 – Neural Nets

Data Mining for Business


Analytics in Python
Import Functionality Needed

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from dmba import classificationSummary
Basic Idea
⚫Combine input information in a complex &
flexible neural net “model”

⚫Model “coefficients” are continually


tweaked in an iterative process

⚫The network’s interim performance in


classification and prediction informs
successive tweaks
Network Structure
⚫Multiple layers
⚫Input layer (raw observations)
⚫Hidden layers
⚫Output layer
⚫Nodes
⚫Weights (like coefficients, subject to
iterative adjustment)
⚫Bias values (also subject to iterative
adjustment)
Schematic Diagram
Tiny Example
Predict consumer opinion of cheese product
based on fat and salt content

Obs. Fat Score Salt Score Opinion


1 0.2 0.9 like
2 0.1 0.1 dislike
3 0.2 0.4 dislike
1
4 0.2 0.5 dislike
5 0.4 0.5 like
6 0.3 0.8 like
Example – Using fat & salt content to
predict consumer acceptance of
cheese

Rectangles are nodes, wij on arrows are weights, and ϴj are node bias values
Moving Through the Network
The Input Layer

⚫For input layer, input = output


⚫E.g., for record #1:
Fat input = output = 0.2
Salt input = output = 0.9

⚫Output of input layer = input into hidden


layer
The Hidden Layer
⚫In this example, it has 3 nodes
⚫Each node receives as input the
output of all input nodes
⚫Output of each hidden node is some
function of the weighted sum of
inputs
The Weights
⚫The weights θ (theta) and w are typically
initialized to random values in the range -
0.05 to +0.05

⚫Equivalent to a model with random


prediction (in other words, no predictive
value)

⚫These initial weights are used in the first


round of training
Output of Node 3 if g is a Logistic
Function
Initial Pass of the Network
Node outputs (on right within node) using first record in tiny example, and
logistic function

Calculations at hidden node


3:
Output Layer
The output of the last hidden layer becomes input for the
output layer
Mapping the output to a
classification

Output = 0.506, just slightly in excess of 0.5,


so classification, at this early stage, is “like”
Relation to Linear Regression
A net with a single output node and no
hidden layers, where g is the identity
function, takes the same form as a linear
regression model
Training the Model
Preprocessing Steps

⚫Scale variables to 0-1


⚫Categorical variables
⚫Create dummy variables
⚫Transform (e.g., log) skewed variables
Initial Pass Through Network
⚫Goal: Find weights that yield best
predictions
⚫The process described above is repeated
for all records
⚫At each record compare prediction to actual
⚫Difference is the error for the output node
⚫Error is propagated back and distributed to
all the hidden nodes and used to update
their weights
Back Propagation (“back-
prop”)

⚫ Output from output node k:


⚫ Error associated with that node:
Error is Used to Update
Weights

l = constant between 0 and 1, reflects the


“learning rate” or “weight decay parameter”
Why It Works
⚫Big errors lead to big changes in weights
⚫Small errors leave weights relatively
unchanged
⚫Over thousands of updates, a given weight
keeps changing until the error associated
with that weight is negligible, at which
point weights change little
Python Packages for Neural
Nets
Most common for basic neural nets:
● scikit-learn

For deep learning:


● tensorflow
● keras
● pytorch
Prep for Tiny Example

example_df = pd.read_csv('TinyData.csv')
predictors = ['Fat', 'Salt']
outcome = 'Acceptance'
X = example_df[predictors]
y = example_df[outcome]
classes = sorted(y.unique())
Code for Tiny Example
Using MPClassifier in scikit-learn

clf = MLPClassifier(hidden_layer_sizes=(3), activation='logistic',


solver='lbfgs', random_state=1)
clf.fit(X, y)
clf.predict(X)
# Look at network structure
print('Intercepts')
print(clf.intercepts_)
print('Weights')
print(clf.coefs_)

Intercepts
[array([0.13368045, 4.07247552, 7.00768104]),
array([14.30748676])]
Weights
[array([
[ -1.30656481, -4.20427792, -13.29587332],
[ -0.04399727, -4.91606924, -6.03356987]
]),
array([
[ -0.27348313],
[ -9.01211573],
[-17.63504694]
])]
Predictions
# Prediction
print(pd.concat([
example_df,
pd.DataFrame(clf.predict_proba(X), columns=classes)
], axis=1))

Fat Salt Acceptance dislike like


0 0.2 0.9 like 0.000490 0.999510
1 0.1 0.1 dislike 0.999994 0.000006
2 0.2 0.4 dislike 0.999741 0.000259
3 0.2 0.5 dislike 0.997368 0.002632
4 0.4 0.5 like 0.002133 0.997867
5 0.3 0.8 like 0.000075 0.999925
Tiny Example - Final
Weights
Common Criteria to Stop the
Updating
⚫When weights change very little from one
iteration to the next

⚫When the misclassification rate reaches a


required threshold

⚫When a limit on runs is reached


Avoiding Overfitting
With sufficient iterations, neural net can
easily overfit the data

To avoid overfitting:
⚫ Track error in validation data or via cross-
validation
⚫ Limit iterations
⚫ Limit complexity of network
User Inputs
Specify Network Architecture
Number of hidden layers
⚫Most popular – one hidden layer (use
argument hidden_layer_sizes)

Number of nodes in hidden layer(s)


⚫More nodes capture complexity, but increase
chances of overfit (use argument
hidden_layer_sizes)

Number of output nodes


⚫For classification with m classes, use m or m-
1 nodes
⚫For numerical prediction use one
Network Architecture, cont.

“Learning Rate” (argument


learning_rate)
⚫Low values “downweight” the new
information from errors at each iteration
⚫This slows learning, but reduces tendency to
overfit to local structure
Advantages

⚫Good predictive ability


⚫Can capture complex relationships
⚫No need to specify a model
Disadvantages
⚫Considered a “black box” prediction
machine, with no insight into relationships
between predictors and outcome
⚫No variable-selection mechanism, so you
have to exercise care in selecting variables
⚫Heavy computational requirements if there
are many variables (additional variables
dramatically increase the number of
weights to calculate)
Deep Learning
⚫ The statistical and machine learning models
in this book - including standard neural nets
- work where you have informative
predictors (purchase information, bank
account information, # of rooms in a house,
etc.)
⚫ In rapidly-growing applications of voice and
image recognition, you have high numbers
of “low-level” granular predictors - pixel
values, wave amplitudes, uninformative at
this low level
Deep Learning
The most active application area for neural nets

• In image recognition, pixel values are predictors, and there might be


100,000+ predictors – big data! (voice recognition similar)
• Deep neural nets with many layers (“neural nets on steroids”) have
facilitated revolutionary breakthroughs in image/voice recognition, and in
artificial intelligence (AI)
• Key is the ability to self-learn features (“unsupervised”)
• For example, clustering could separate the pixels in this 1” by 1” football
field image into the “green field” and “yard marker” areas without
knowing that those concepts exist
• From there, the concept of a boundary, or “edge” emerges
• Successive stages move from identification of local, simple features to
more global & complex features
Convolutional Neural Net
example in image recognition
● A popular deep learning implementation is a convolutional
neural net (CNN)
● Need to aggregate predictors (pixels)
● Rather than have weights for each pixel, group pixels together
and apply the same operation: “convolution”
● Common aggregation is a 3 x 3 pixel area, for example the
small area around this man’s lower chin

Enlargement Pixel values


of area (higher number =
darker)
Apply the convolution

Convolution operation is “multiply the pixel matrix by


the filter matrix” then sum

0*25 + 1*200 + 0*25 +


x 0*25 + 1*225 + 0*25 +
0*25 + 1*225 + 0*25
= 650

Filter matrix that is Sum = 650; this is higher


good at identifying Pixel values than for any other
center vertical arrangement of the filter
lines (we will see matrix, because pixel values
why shortly) are highest in central column
Continue the
Convolution
⚫ The filter matrix moves across the image,
storing its result, yielding a smaller matrix
whose values indicate the presence or
absence of a vertical line.
⚫ Similar filters can detect horizontal lines,
curves, borders - hyper-local features
⚫ Further convolutions can be applied to
these local features
⚫ Result: multi-dimensional matrix, or tensor,
of higher-level features
The Learning Process
How does the net learn which convolutions to do?
⚫ In supervised learning, the net retains those
convolutions and features which are successful
in labeling (tagging) images
⚫ Note that the feature-learning process yields a
reduced (simpler) set of features than the
original set of pixel values

training data has


known labels
Unsupervised Learning
Autoencoding
⚫ Deep learning nets can learn higher level
features even when there are no labels to guide
the process
⚫ The net adds a process to take the high level
features and generate an image
⚫ The generated image is compared to the
original image and the net retains the
architecture that produces the best matches
Summary
⚫ Neural nets can capture flexible/complicated
relationships between outcome and predictors
⚫ The network “learns” and updates its model
iteratively as more data are fed into it
⚫ Major danger: overfitting
⚫ Requires large amounts of data
⚫ Good predictive performance, yet it’s a “black
box”
⚫ Deep learning, very complex neural nets, is
effective in learning higher level features from a
multitude of lower level ones
⚫ Deep learning is the key to image recognition
and many AI applications

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy