Implementation of a Siamese Network in Keras and TensorFlow
Siamese Network in Tensorflow

Implementation of a Siamese Network in Keras and TensorFlow

Neural Networks are great and very popular in AI/ML spaces, but they require too much data to train. For tasks like object detection, signature verification, voice verification, and prescription pills recognition regular neural network techniques would be much more time-consuming and expensive because of this excessive data requirement. In these types of work, a Siamese network can be very powerful because it requires a lot less data than a regular neural network. In addition, an imbalanced dataset can also perform well.

This tutorial will give you a high-level overview of a Siamese Network and a complete example of working with it. I worked with the fashion-mnist dataset here but this similar structure is good for a lot of other use cases.

What is a Siamese Network?

Siamese networks contain one or more identical networks, and those identical networks have the same parameters and weights. If the weights of one network update, the weights of the other network also update. They have to be identical. The final layer is usually an embedding layer that calculates the distance between the outputs.

You feed them a pair of inputs. Each network will compute the features of inputs and find the similarity between two inputs using the distance between the two images. So, there are only two classes. Either the images are similar or dissimilar.

The concept will be much clearer when you will work on an example. Learning by doing is always the best idea.

Necessary Imports and Functions Definition

Let’s start with the necessary imports. We will import more if necessary.

import os
import tensorflow.keras.backend as K
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.layers import MaxPooling2D

As we discussed in the previous section, the Siamese network takes a pair of inputs at a time and the output is ‘yes’ or ‘no’. If the images are similar it’s ‘yes’ otherwise it is ‘no’. Or, the Siamese network can also output the distance between the two images that we will do in this tutorial. So, we need to prepare our dataset that way. Our dataset needs to be pairs of images, not single images. For the positive class, there will be two images of the same type and for the negative class, there will be two images of different types.

This next code block defines a function ‘create_pairs’ that will make pairs of images which means putting two images stacking together where sometimes two images will be the same type and sometimes, they will be of different types. When two images match or are the same type, the label will be 1 and when the images will not match, the label will be 0.

def create_pairs(images, labels):
imagePairs = []
labelPairs = []

#Getting the indices of each class
numclasses = len(np.unique(labels))
idx = [np.where(labels ==i)[0] for i in range(numclasses)]

for ind in range(len(images)):
#Getting current image with index
currImage = images[ind]
#getting the label of the image from labels.
label = labels[ind]

#Randomly choosing another labels from the same class
indB = np.random.choice(idx[label])
#corresponding image for this randomly selected label
indImage = images[indB]

imagePairs.append([currImage, indImage])


#Getting a label where label is different than the current image
diss_idx = np.where(labels != label)[0]

#finding an image for this label
diss_image = images[np.random.choice(diss_idx)]

imagePairs.append([currImage, diss_image])

return (np.array(imagePairs), np.array(labelPairs))

The next function calculates the Euclidean distance between two images and follows the traditional formula for Euclidean distance:

def euclidean_distance(vecs):
(imgA, imgB) = vecs
ss = K.sum(K.square(imgA - imgB), axis = 1, keepdims=True)
return K.sqrt(K.maximum(ss, K.epsilon()))

Model and Cost

The Siamese model is pretty similar to other TensorFlow models. We will use two sets of Conv-MaxPooling-Dropout layers, GlobalAveragePooling, and a Dense layer at the end. Finally, it will return the model.

def siamese_model(input_shape, embeddingDim = 48):
inputs = Input(input_shape)
x = Conv2D(128, (2, 2), padding = "same", activation = "relu")(inputs)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.4)(x)

x = Conv2D(128, (2, 2), padding = "same", activation = "relu")(inputs)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.4)(x)

pooling = GlobalAveragePooling2D()(x)
outputs = Dense(embeddingDim)(pooling)
model = Model(inputs, outputs)

return model

A regular binary cross-entropy loss function is good enough as we are doing a binary classification here. But for the Siamese network, a Contrastive loss is more appropriate. If you think about it, actually the goal of a Siamese network is not only just classifying between similar or dissimilar images but also to differentiate between them. We want to know how good a job the Siamese network is doing on distinguishing between similar or dissimilar images.

Here is the formula for contrastive loss:

Image by Author

Here, Y is the true label (either 0 or 1)

D is the Euclidean distance

The margin is usually 1

Let’s make a function contrastiveLoss:

def contrastiveLoss(y, y_preds, margin=1):
y = tf.cast(y, y_preds.dtype)
y_preds_squared = K.square(y_preds)
margin_squared = K.square(K.maximum(margin - y_preds, 0))
loss = K.mean(y * y_preds_squared + (1 - y) * margin_squared)
return loss

These are some common functions that can be used for any other Siamese network of the same type as well.

Model Training


We need the dataset to train the model for sure. For this tutorial, I will use the public dataset (MIT license), fashion_mnist dataset available to load using the TensorFlow library itself.

from tensorflow.keras.layers import Lambda
from tensorflow.keras.datasets import fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

Some simple data processing tasks are necessary, to begin with. To scale the image data, we will divide the image data by 255. Also, another dimension needs to be added to both training and testing images to make them three-dimensional.

x_train = x_train/255.0
x_test = x_test/255.0

x_train = np.expand_dims(x_train, axis = -1)
x_test = np.expand_dims(x_test, axis=-1)

(training_pairs, training_labels) = create_pairs(x_train, y_train)
(test_pairs, test_labels) = create_pairs(x_test, y_test)

Next, we will create two inputs for two images in an image pair and pass them both to the Siamese model we built earlier to extract the features from both of the images.

The Euclidean distance function will be useful here to find the distance between the two extracted features. The smaller the distance between the two feature images, the more similar they are.

Finally, the model takes the two images as inputs and outputs the distance.

image_shape = (28, 28, 1)
# specify the batch size and number of epochs
batch_size = 64
epochs = 70

imageA = Input(shape = image_shape)
imageB = Input(shape = image_shape)

model_build = siamese_model(image_shape)
modelA = model_build(imageA)
modelB = model_build(imageB)

distance = Lambda(euclidean_distance)([modelA, modelB])
model = Model(inputs=[imageA, imageB], outputs=distance)

Now compile the model to train the Siamese model using the contrastive loss we defined earlier. The necessary parameters are the pairs of images, the pairs of labels, batch size, and the number of epochs.

model.compile(loss = contrastiveLoss, optimizer="adam")
history =
[training_pairs[:, 0], training_pairs[:, 1]], training_labels[:],
validation_data=([test_pairs[:, 0], test_pairs[:, 1]], test_labels[:]),
batch_size = batch_size,
epochs = epochs)


Epoch 1/70
1875/1875 [==============================] - 24s 7ms/step - loss: 0.1808 - val_loss: 0.1618
Epoch 2/70
1875/1875 [==============================] - 15s 8ms/step - loss: 0.1615 - val_loss: 0.1572
Epoch 3/70
1875/1875 [==============================] - 14s 7ms/step - loss: 0.1588 - val_loss: 0.1551
Epoch 4/70
1875/1875 [==============================] - 14s 8ms/step - loss: 0.1566 - val_loss: 0.1529
Epoch 5/70
1875/1875 [==============================] - 14s 7ms/step - loss: 0.1552 - val_loss: 0.1520
Epoch 67/70
1875/1875 [==============================] - 14s 7ms/step - loss: 0.1486 - val_loss: 0.1447
Epoch 68/70
1875/1875 [==============================] - 14s 7ms/step - loss: 0.1487 - val_loss: 0.1443
Epoch 69/70
1875/1875 [==============================] - 14s 7ms/step - loss: 0.1484 - val_loss: 0.1447
Epoch 70/70
1875/1875 [==============================] - 14s 7ms/step - loss: 0.1490 - val_loss: 0.1446

Let’s plot some of the pairs with their distances. We will take 4 pairs of images randomly. OpenCV library can be used for this. First, it requires some basic image processing like scaling and then adding one extra dimension to both the dimensions of the images. Then we will use the model to predict the distance between the images in each pair. Finally, you can plot them to see the distance and the pairs.

import cv2
pairs = np.random.choice(len(test_pairs), size=4)

for i in pairs:
imageA = test_pairs[i][0]
imageB = test_pairs[i][1]

baseA = imageA.copy()
baseB = imageB.copy()

imageA = np.expand_dims(imageA, axis=-1)
imageB = np.expand_dims(imageB, axis=-1)

imageA = np.expand_dims(imageA, axis=0)
imageB = np.expand_dims(imageB, axis =0)

imageA = imageA/255.0
imageB = imageB / 255.0

predicts = model.predict([imageA, imageB])

proba = predicts[0][0]

fig = plt.figure("Pair #{}".format(i+1), figsize=(4,2))
plt.suptitle("Distance: {}:.2f".format(proba))

ax = fig.add_subplot(1, 2, 1)

ax = fig.add_subplot(1, 2, 2)

Look at these pictures and the corresponding distances. As you can see, the predict function does not give you the label 0 or 1 in this case. It gives you the distances between the two images in the image pairs. When the images in the pairs are more similar the distance is much smaller.

If you want you can set a threshold distance based on your use cases to distinguish between similar and dissimilar images to get the label as well.


This same kind of model and technique can be used for many different types of tasks as I mentioned in the introduction. Because it can work with a smaller number of data, the data collection part becomes easier. Hopefully, it will be useful for you.

Leave a Reply

Close Menu