Notebook - Tensorflow Keras
Notebook - Tensorflow Keras
In [ ]: import tensorflow as tf
print(tf.__version__)
2.3.0
Tensors
This is a constant tensor:
tf.Tensor(
[[5 2]
[1 3]], shape=(2, 2), dtype=int32)
In [ ]: x.numpy()
Much like a Numpy array, it features the attributes dtype and shape :
In [ ]: print('dtype:', x.dtype)
print('shape:', x.shape)
A common way to create constant tensors is via tf.ones and tf.zeros (just
like np.ones and np.zeros ):
In [ ]: print(tf.ones(shape=(2, 1)))
print(tf.zeros(shape=(2, 1)))
tf.Tensor(
[[1.]
[1.]], shape=(2, 1), dtype=float32)
tf.Tensor(
[[0.]
[0.]], shape=(2, 1), dtype=float32)
And here's an integer tensor with values drawn from a random uniform
distribution:
Variables
Variables are special tensors used to store mutable state (like the weights of a
neural network). You create a Variable using some initial value.
c = a + b
d = tf.square(c)
e = tf.exp(d)
In [ ]: a = tf.random.normal(shape=(2, 2))
b = tf.random.normal(shape=(2, 2))
tf.Tensor(
[[ 0.80471563 -0.98697984]
[-0.7596037 0.15738489]], shape=(2, 2), dtype=float32)
In [ ]: a = tf.Variable(a)
tf.Tensor(
[[ 0.80471563 -0.98697984]
[-0.7596037 0.15738489]], shape=(2, 2), dtype=float32)
For the sake of demonstration, we won't use any of the higher-level Keras
components like Layer or MeanSquaredError . Just basic ops.
In [ ]: input_dim = 2
output_dim = 1
learning_rate = 0.01
def compute_predictions(features):
return tf.matmul(features, w) + b
In [ ]: import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
# Prepare a dataset.
num_samples = 10000
negative_samples = np.random.multivariate_normal(
mean=[0, 3], cov=[[1, 0.5],[0.5, 1]], size=num_samples)
positive_samples = np.random.multivariate_normal(
mean=[3, 0], cov=[[1, 0.5],[0.5, 1]], size=num_samples)
features = np.vstack((negative_samples, positive_samples)).astype(np.float32
labels = np.vstack((np.zeros((num_samples, 1), dtype='float32'),
np.ones((num_samples, 1), dtype='float32')))
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
plt.scatter(features[:, 0], features[:, 1], c=labels[:, 0])
Now let's train our linear regression by iterating over batch-by-batch over the
data and repeatedly calling train_on_batch :
In [ ]: predictions = compute_predictions(features)
plt.scatter(features[:, 0], features[:, 1], c=predictions[:, 0] > 0.5)
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Making it fast with tf.function
But how fast is our current code running?
In [ ]: import time
t0 = time.time()
for epoch in range(20):
for step, (x, y) in enumerate(dataset):
loss = train_on_batch(x, y)
t_end = time.time() - t0
print('Time per epoch: %.3f s' % (t_end / 20,))
Let's compile the training function into a static graph. Literally all we need to do
is add the tf.function decorator on it:
In [ ]: @tf.function
def train_on_batch(x, y):
with tf.GradientTape() as tape:
predictions = compute_predictions(x)
loss = compute_loss(y, predictions)
dloss_dw, dloss_db = tape.gradient(loss, [w, b])
w.assign_sub(learning_rate * dloss_dw)
b.assign_sub(learning_rate * dloss_db)
return loss
In [ ]: t0 = time.time()
for epoch in range(20):
for step, (x, y) in enumerate(dataset):
loss = train_on_batch(x, y)
t_end = time.time() - t0
print('Time per epoch: %.3f s' % (t_end / 20,))
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Time per epoch: 0.084 s
40% reduction, neat. In this case we used a trivially simple model; in general the
bigger the model the greater the speedup you can get by leveraging static
graphs.
Remember: eager execution is great for debugging and printing results line-by-
line, but when it's time to scale, static graphs are a researcher's best friends.
If you're an engineer, Keras provides you with reusable blocks such as layers,
metrics, training loops, to support common use cases. It provides a high-
level user experience that's accessible and productive.
If you're a researcher, you may prefer not to use these built-in blocks such as
layers and training loops, and instead create your own. Of course, Keras
allows you to do this. In this case, Keras provides you with templates for the
blocks you write, it provides you with structure, with an API standard for
things like Layers and Metrics. This structure makes your code easy to share
with others and easy to integrate in production workflows.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
The base Layer class
The first class you need to know is Layer . Pretty much everything in Keras
derives from it.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
In [ ]: from tensorflow.keras.layers import Layer
class Linear(Layer):
"""y = w.x + b"""
In [ ]: y = linear_layer(tf.ones((2, 2)))
assert y.shape == (2, 4)
The Layer class takes care of tracking the weights assigned to it as attributes:
Note that's also a shortcut method for creating weights: add_weight . Instead of
doing
w_init = tf.random_normal_initializer()
self.w = tf.Variable(initial_value=w_init(shape=shape,
dtype='float32'))
You would typically do:
It’s good practice to create weights in a separate build method, called lazily
with the shape of the first inputs seen by your layer. Here, this pattern prevents
us from having to specify input_dim in the constructor:
In [ ]: class Linear(Layer):
"""y = w.x + b"""
In [ ]: class ComputeSum(Layer):
"""Returns the sum of the inputs."""
my_sum = ComputeSum(2)
x = tf.ones((2, 2))
y = my_sum(x)
print(y.numpy()) # [2. 2.]
y = my_sum(x)
print(y.numpy()) # [4. 4.]
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
[2. 2.]
[4. 4.]
class MLP(Layer):
"""Simple stack of Linear layers."""
def __init__(self):
super(MLP, self).__init__()
self.linear_1 = Linear(32)
self.linear_2 = Linear(32)
self.linear_3 = Linear(10)
mlp = MLP()
# The first call to the `mlp` object will create the weights.
y = mlp(tf.ones(shape=(3, 64)))
Built-in layers
Keras provides you with a wide range of built-in layers, so that you don't have to
implement your own layers all the time.
Convolution layers
Transposed convolutions
Separateable convolutions
Average and max pooling
Global average and max pooling
LSTM, GRU (with built-in cuDNN acceleration)
BatchNormalization
Dropout
Attention
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
ConvLSTM2D
etc.
By exposing this argument in call , you enable the built-in training and
evaluation loops (e.g. fit ) to correctly use the layer in training and inference.
In [ ]: class Dropout(Layer):
class MLPWithDropout(Layer):
def __init__(self):
super(MLPWithDropout, self).__init__()
self.linear_1 = Linear(32)
self.dropout = Dropout(0.5)
self.linear_3 = Linear(10)
mlp = MLPWithDropout()
y_train = mlp(tf.ones((2, 2)), training=True)
y_test = mlp(tf.ones((2, 2)), training=False)
In [ ]: # We use an `Input` object to describe the shape and dtype of the inputs.
# This is the deep learning equivalent of *declaring a type*.
# The shape argument is per-sample; it does not include the batch size.
# The functional API focused on defining per-sample transformations.
# The model we create will automatically batch the per-sample transformation
# so that it can be called on batches of data.
inputs = tf.keras.Input(shape=(16,))
# A functional model already has weights, before being called on any data.
# That's because we defined its input shape in advance (in `Input`).
assert len(model.weights) == 4
The Functional API tends to be more concise than subclassing, and provides a
few other advantages (generally the same advantages that functional, typed
languages provide over untyped OO development). However, it can only be used
to define DAGs of layers -- recursive networks should be defined as Layer
subclasses instead.
Key differences between models defined via subclassing and Functional models
are explained in this blog post.
For models that are simple stacks of layers with a single input and a single
output, you can also use the Sequential class which turns a list of layers into a
Model :
y = model(tf.ones((2, 16)))
assert y.shape == (2, 10)
Loss classes
Keras features a wide range of built-in loss classes, like BinaryCrossentropy ,
CategoricalCrossentropy , KLDivergence , etc. They work like this:
In [ ]: bce = tf.keras.losses.BinaryCrossentropy()
y_true = [0., 0., 1., 1.] # Targets
y_pred = [1., 1., 1., 0.] # Predictions
loss = bce(y_true, y_pred)
print('Loss:', loss.numpy())
Loss: 11.522857
Note that loss classes are stateless: the output of __call__ is only a function of
the input.
Metric classes
Keras also features a wide range of built-in metric classes, such as
BinaryAccuracy , AUC , FalsePositives , etc.
Unlike losses, metrics are stateful. You update their state using the
update_state method, and you query the scalar metric result using result :
In [ ]: m = tf.keras.metrics.AUC()
m.update_state([0, 1, 1, 1], [0, 1, 0, 0])
print('Intermediate result: ', m.result().numpy())
You can easily roll out your own metrics by subclassing the Metric class:
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Here's a quick implementation of a BinaryTruePositives metric as a
demonstration:
In [ ]: class BinaryTruePositives(tf.keras.metrics.Metric):
def result(self):
return self.true_positives
def reset_states(self):
self.true_positive.assign(0)
Here's a simple MNSIT example that brings together loss classes, metric classes,
and optimizers.
# Prepare a dataset.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train[:].reshape(60000, 784).astype('float32') / 255
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam()
# Open a GradientTape.
with tf.GradientTape() as tape:
# Forward pass.
logits = model(x)
# Logging.
if step % 100 == 0:
print('Step:', step)
print('Loss from last step:', float(loss_value))
print('Total running accuracy so far:', float(accuracy.result()))
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-dat
asets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
Step: 0
Loss from last step: 2.358793258666992
Total running accuracy so far: 0.09375
Step: 100
Loss from last step: 0.21707287430763245
Total running accuracy so far: 0.8310643434524536
Step: 200
Loss from last step: 0.2818300127983093
Total running accuracy so far: 0.8765547275543213
Step: 300
Loss from last step: 0.23447920382022858
Total running accuracy so far: 0.8955564498901367
Step: 400
Loss from last step: 0.11367885768413544
Total running accuracy so far: 0.9080813527107239
Step: 500
Loss from last step: 0.11368697881698608
Total running accuracy so far: 0.9158245921134949
Step: 600
Loss from last step: 0.0994415432214737
Total running accuracy so far: 0.9220309853553772
Step: 700
Loss from last step: 0.047019436955451965
Total running accuracy so far: 0.9272022247314453
Step: 800
Loss from last step: 0.07821480929851532
Total running accuracy so far: 0.9307116270065308
Step: 900
Loss from last step: 0.09753896296024323
Total running accuracy so far: 0.9345518946647644
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Sometimes you need to compute loss values on the fly during a foward pass
(especially regularization losses). Keras allows you to compute loss values at any
time, and to recursively keep track of them via the add_loss method.
In [ ]: class ActivityRegularization(Layer):
"""Layer that creates an activity sparsity regularization loss."""
Loss values added via add_loss can be retrieved in the .losses list property
of any Layer or Model :
class SparseMLP(Layer):
"""Stack of Linear layers with a sparsity regularization loss."""
mlp = SparseMLP(1)
y = mlp(tf.ones((10, 10)))
These losses are cleared by the top-level layer at the start of each forward pass -
- they don't accumulate. So layer.losses always contain only the losses
created during the last forward pass. You would typically use these losses by
summing them before computing your gradients when writing a training loop.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
In [ ]: # Losses correspond to the *last* forward pass.
mlp = SparseMLP(1)
mlp(tf.ones((10, 10)))
assert len(mlp.losses) == 1
mlp(tf.ones((10, 10)))
assert len(mlp.losses) == 1 # No accumulation.
# Prepare a dataset.
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(
(x_train.reshape(60000, 784).astype('float32') / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)
# A new MLP.
mlp = SparseMLP(10)
# Forward pass.
logits = mlp(x)
# Logging.
if step % 100 == 0:
print(step, float(loss))
0 4.304479598999023
100 2.2919511795043945
200 2.2856969833374023
300 2.249835968017578
400 2.154803514480591
500 2.179860830307007
600 2.0276057720184326
700 2.064443349838257
800 2.0333380699157715
900 1.828589916229248
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
A detailed end-to-end example: a Variational
AutoEncoder (VAE)
If you want to take a break from the basics and look at a slightly more advanced
example, check out this Variational AutoEncoder implementation that
demonstrates everything you've learned so far:
Subclassing Layer
Recursive layer composition
Loss classes and metric classes
add_loss
GradientTape
In [ ]: # Prepare a dataset.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam()
First, call compile to configure the optimizer, loss, and metrics to monitor.
In [ ]: model.fit(dataset, epochs=3)
Epoch 1/3
938/938 [==============================] - 2s 2ms/step - loss: 0.2215 - spar
se_categorical_accuracy: 0.9352
Epoch 2/3
938/938 [==============================] - 2s 2ms/step - loss: 0.0874 - spar
se_categorical_accuracy: 0.9737
Epoch 3/3
938/938 [==============================] - 2s 2ms/step - loss: 0.0582 - spar
se_categorical_accuracy: 0.9817
Out[ ]: <tensorflow.python.keras.callbacks.History at 0x7f29713263c8>
Note that you can also monitor your loss and metrics on some validation data
during fit .
Also, you can call fit directly on Numpy arrays, so no need for the dataset
conversion:
num_val_samples = 10000
x_val = x_train[-num_val_samples:]
y_val = y_train[-num_val_samples:]
x_train = x_train[:-num_val_samples]
y_train = y_train[:-num_val_samples]
# Instantiate an accuracy
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js metric.
accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam()
model.compile(optimizer=optimizer,
loss=loss,
metrics=[accuracy])
model.fit(x_train, y_train,
validation_data=(x_val, y_val),
epochs=3,
batch_size=64)
Epoch 1/3
782/782 [==============================] - 2s 2ms/step - loss: 0.2464 - spar
se_categorical_accuracy: 0.9276 - val_loss: 0.1197 - val_sparse_categorical_
accuracy: 0.9645
Epoch 2/3
782/782 [==============================] - 2s 2ms/step - loss: 0.0940 - spar
se_categorical_accuracy: 0.9714 - val_loss: 0.0982 - val_sparse_categorical_
accuracy: 0.9723
Epoch 3/3
782/782 [==============================] - 2s 2ms/step - loss: 0.0605 - spar
se_categorical_accuracy: 0.9810 - val_loss: 0.0774 - val_sparse_categorical_
accuracy: 0.9763
Out[ ]: <tensorflow.python.keras.callbacks.History at 0x7f2970435860>
Callbacks
One of the neat features of fit (besides built-in support for sample weighting
and class weighting) is that you can easily customize what happens during
training and evaluation by using callbacks.
A callback is an object that is called at different points during training (e.g. at the
end of every batch or at the end of every epoch) and does stuff.
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam()
model.compile(optimizer=optimizer,
loss=loss,
metrics=[accuracy])
model.fit(x_train, y_train,
validation_data=(x_val, y_val),
epochs=30,
batch_size=64,
callbacks=callbacks)
Epoch 1/30
782/782 [==============================] - 2s 2ms/step - loss: 0.2457 - spar
se_categorical_accuracy: 0.9287 - val_loss: 0.1373 - val_sparse_categorical_
accuracy: 0.9582
Epoch 2/30
782/782 [==============================] - 2s 2ms/step - loss: 0.0949 - spar
se_categorical_accuracy: 0.9706 - val_loss: 0.0981 - val_sparse_categorical_
accuracy: 0.9711
Epoch 3/30
782/782 [==============================] - 2s 2ms/step - loss: 0.0610 - spar
se_categorical_accuracy: 0.9811 - val_loss: 0.0944 - val_sparse_categorical_
accuracy: 0.9717
Epoch 4/30
782/782 [==============================] - 2s 2ms/step - loss: 0.0455 - spar
se_categorical_accuracy: 0.9851 - val_loss: 0.0802 - val_sparse_categorical_
accuracy: 0.9768
Epoch 5/30
782/782 [==============================] - 2s 2ms/step - loss: 0.0334 - spar
se_categorical_accuracy: 0.9893 - val_loss: 0.0800 - val_sparse_categorical_
accuracy: 0.9780
Epoch 6/30
782/782 [==============================] - 2s 2ms/step - loss: 0.0266 - spar
se_categorical_accuracy: 0.9912 - val_loss: 0.0993 - val_sparse_categorical_
accuracy: 0.9755
Out[ ]: <tensorflow.python.keras.callbacks.History at 0x7f296e2fee10>
Parting words
I hope this guide has given you a good overview of what's possible with
TensorFlow 2.0 and Keras!
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Remember that TensorFlow and Keras don't represent a single workflow. It's a
spectrum of workflows, each with its own trade-off between usability and
flexibility. For instance, you've noticed that it's much easier to use fit than to
write a custom training loop, but fit doesn't give you the same level of
granular control for research use cases.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js