EarlyStopping and LiveLossPlot Callbacks in TensorFlow, Keras, and Python

EarlyStopping and LiveLossPlot Callbacks in TensorFlow, Keras, and Python

Keras library has several callback functions that make your model training very efficient. One of them is EarlyStopping which I love to use. It saves time and computation costs in a great way. As the name suggests, it stops the model training early if you set the model training for more epochs than necessary.

Starting your model training with the right number of epochs in the beginning can be tricky. It may take some trial and error to know actually how many epochs it may take to converge and not cause overfitting.

EarlyStopping can be very useful in this case. You can set as many epochs as you want, but once the model training is done, it will stop training. We will work on a complete example to learn about it.

In this tutorial, I will also touch on another callback function called ‘livelossplot’. This is another cool callback function in the Keras library. It keeps plotting the loss and evaluation metric as the model trains. It is very cool.

I used a Google Colab notebook for this exercise. You can use any other platform of your choice. First, I needed to install livelossplot by running the following line:

!pip install livelossplot

I am assuming you know the basics of TensorFlow and data preparation already. So I will move faster in the beginning.

Here are the necessary imports. If necessary, we will do more imports later.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import keras
from keras.callbacks import EarlyStopping
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from livelossplot import PlotLossesKeras
import tensorflow as tf

I will use the fashion_mnist dataset for this example.

Please feel free to download the dataset from this link:

Fashion MNIST (kaggle.com)

Let’s start the coding part.

Reading the dataset into a DataFrame:

df = pd.read_csv('/content/fashion_mnist_train.csv')

Defining X and y:

X = df.drop(columns=['label'])
y = df['label']

Fill up the null values with zeros and normalize the data just by dividing the features by 255.0:

X = X.fillna(0)
X = X/255.0

Splitting the data for training and testing:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=24)

Binarize the labels:

lb = LabelBinarizer()
y_train = lb.fit_transform(y_train)
y_test = lb.transform(y_test)

Here is what y_train looks like now:

array([[0, 1, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 1, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[1, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[1, 0, 0, ..., 0, 0, 0]])

We will use the Categorical CrossEntropy loss function for this example.

loss_function= tf.keras.losses.CategoricalCrossentropy()

Model Definition

We will use a Sequential model here with two dense layers. The first layer with 128 neurons and the second layer with 64 neurons. The activation function will be ‘elu’, which is another activation function from the ‘real’ family. If you are interested in knowing a little more about that, please watch this video.

Here is the complete model:

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(128, activation = 'elu'))
model.add(tf.keras.layers.Dense(64, activation = 'elu'))
model.add(tf.keras.layers.Dense(10, activation = 'softmax'))

We will use the ‘Adam’ optimizer and ‘accuracy’ as evaluation metrics. Compiling the model:

model.compile(optimizer='adam', loss = loss_function,
metrics=['accuracy'])

This is the time we should define the callback functions for the model. First, I will define EarlyStopping. These are the parameters to be used here:

monitor as ‘val_loss’, which means it will monitor the validation loss,

min_delta as 0.02, which means the minimum improvement expected for validation loss needs is 0.02,

patience as 5, which means it will wait for 5 epochs for validation loss to change the minimum amount before it stops training, and

restore_best_weights as True, which means the model will get the weight of the epoch that has the best value for the monitored metrics (validation loss here).

monitor_loss = EarlyStopping(monitor='val_loss',
min_delta= 0.02,
patience= 5,
restore_best_weights=True
)

Please feel free to check the documentation for more details about the other parameters:

tf.keras.callbacks.EarlyStopping | TensorFlow v2.15.0.post1

The second callback I will use today is LiveLossPlot.

Calling their ‘livelossplot’ here which is simply calling PlotLossKeras():

cb = [PlotLossesKeras()]

Model training time! Alongside passing the training data, validation data, and epochs, callback functions need to be passed here as follows:

model.fit(X_train, y_train, epochs = 1000, validation_data = (X_test, y_test),
callbacks = [cb, monitor_loss])

It plotted these graphs as the training was going on:

You can check the video version of this tutorial to see how the graphs keep plotting live with the model training:

Notice in the graph that we set the epochs to be 1000 but it stopped the training after 8 epochs when the model was not learning anymore. That way it saves a lot of time and computational power. It also helps prevent overfitting.

Conclusion

In this tutorial, we explained in detail how you can use EarlyStopping callbacks to save training time and computation power and prevent overfitting. Also, use LiveLossPlot to plot the losses and metrics while training live. Both of them should improve your model training experience greatly. I will share more of these tools in the future.

Leave a Reply

Close Menu