Neural Networks are great and very popular in AI/ML spaces, but they require too much data to train. For tasks like object detection, signature verification, voice verification, and prescription pills recognition regular neural network techniques would be much more time-consuming and expensive because of this excessive data requirement. In these types of work, a **S****iamese network** can be very powerful because it requires a lot less data than a regular neural network. In addition, an imbalanced dataset can also perform well.

This tutorial will give you a high-level overview of a Siamese Network and a complete example of working with it. I worked with the fashion-mnist dataset here but this similar structure is good for a lot of other use cases.

## What is a Siamese Network?

Siamese networks contain one or more identical networks, and those identical networks have the same parameters and weights. If the weights of one network update, the weights of the other network also update. They have to be identical. The final layer is usually an embedding layer that calculates the distance between the outputs.

You feed them a pair of inputs. Each network will compute the features of inputs and find the similarity between two inputs using the distance between the two images. So, there are only two classes. Either the images are similar or dissimilar.

The concept will be much clearer when you will work on an example. Learning by doing is always the best idea.

## Necessary Imports and Functions Definition

Let’s start with the necessary imports. We will import more if necessary.

`import os`

import tensorflow.keras.backend as K

import matplotlib.pyplot as plt

import numpy as np

import tensorflow as tf

from tensorflow.keras.models import Model

from tensorflow.keras.layers import Input

from tensorflow.keras.layers import Conv2D

from tensorflow.keras.layers import Dense

from tensorflow.keras.layers import Dropout

from tensorflow.keras.layers import GlobalAveragePooling2D

from tensorflow.keras.layers import MaxPooling2D

As we discussed in the previous section, the Siamese network takes a **pair of inputs** at a time and the output is ‘yes’ or ‘no’. If the images are similar it’s ‘yes’ otherwise it is ‘no’. Or, the Siamese network can also output the distance between the two images that we will do in this tutorial. So, we need to prepare our dataset that way. Our dataset needs to be pairs of images, not single images. For the positive class, there will be two images of the same type and for the negative class, there will be two images of different types.

This next code block defines a function ‘create_pairs’ that will make pairs of images which means putting two images stacking together where sometimes two images will be the same type and sometimes, they will be of different types. When two images match or are the same type, the label will be 1 and when the images will not match, the label will be 0.

def create_pairs(images, labels):

imagePairs = []

labelPairs = []
#Getting the indices of each class

numclasses = len(np.unique(labels))

idx = [np.where(labels ==i)[0] for i in range(numclasses)]

for ind in range(len(images)):

#Getting current image with index

currImage = images[ind]

#getting the label of the image from labels.

label = labels[ind]

#Randomly choosing another labels from the same class

indB = np.random.choice(idx[label])

#corresponding image for this randomly selected label

indImage = images[indB]

imagePairs.append([currImage, indImage])

labelPairs.append([1])

#Getting a label where label is different than the current image

diss_idx = np.where(labels != label)[0]

#finding an image for this label

diss_image = images[np.random.choice(diss_idx)]

imagePairs.append([currImage, diss_image])

labelPairs.append([0])

return (np.array(imagePairs), np.array(labelPairs))

The next function calculates the Euclidean distance between two images and follows the traditional formula for Euclidean distance:

`def euclidean_distance(vecs):`

(imgA, imgB) = vecs

ss = K.sum(K.square(imgA - imgB), axis = 1, keepdims=True)

return K.sqrt(K.maximum(ss, K.epsilon()))

Model and Cost

The Siamese model is pretty similar to other TensorFlow models. We will use two sets of **Conv-MaxPooling-Dropout layers, GlobalAveragePooling**, and a Dense layer at the end. Finally, it will return the model.

def siamese_model(input_shape, embeddingDim = 48):

inputs = Input(input_shape)

x = Conv2D(128, (2, 2), padding = "same", activation = "relu")(inputs)

x = MaxPooling2D(pool_size=(2, 2))(x)

x = Dropout(0.4)(x)
x = Conv2D(128, (2, 2), padding = "same", activation = "relu")(inputs)

x = MaxPooling2D(pool_size=(2, 2))(x)

x = Dropout(0.4)(x)

pooling = GlobalAveragePooling2D()(x)

outputs = Dense(embeddingDim)(pooling)

model = Model(inputs, outputs)

return model

A regular binary cross-entropy loss function is good enough as we are doing a binary classification here. But for the Siamese network, a **Contrastive loss** is more appropriate. If you think about it, actually the goal of a Siamese network is not only just classifying between similar or dissimilar images but also to differentiate between them. We want to know how good a job the Siamese network is doing on distinguishing between similar or dissimilar images.

Here is the formula for contrastive loss:

Here, Y is the true label (either 0 or 1)

D is the Euclidean distance

The margin is usually 1

Let’s make a function contrastiveLoss:

`def contrastiveLoss(y, y_preds, margin=1):`

y = tf.cast(y, y_preds.dtype)

y_preds_squared = K.square(y_preds)

margin_squared = K.square(K.maximum(margin - y_preds, 0))

loss = K.mean(y * y_preds_squared + (1 - y) * margin_squared)

return loss

These are some common functions that can be used for any other Siamese network of the same type as well.

## Model Training

We need the dataset to train the model for sure. For this tutorial, I will use the public dataset (MIT license), fashion_mnist dataset available to load using the TensorFlow library itself.

`from tensorflow.keras.layers import Lambda`

from tensorflow.keras.datasets import fashion_mnist

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

Some simple data processing tasks are necessary, to begin with. To scale the image data, we will divide the image data by 255. Also, another dimension needs to be added to both training and testing images to make them three-dimensional.

`x_train = x_train/255.0`

x_test = x_test/255.0
x_train = np.expand_dims(x_train, axis = -1)

x_test = np.expand_dims(x_test, axis=-1)

(training_pairs, training_labels) = create_pairs(x_train, y_train)

(test_pairs, test_labels) = create_pairs(x_test, y_test)

Next, we will create two inputs for two images in an image pair and pass them both to the Siamese model we built earlier to extract the features from both of the images.

The **Euclidean distance** function will be useful here to find the distance between the two extracted features. The smaller the distance between the two feature images, the more similar they are.

Finally, the model takes the two images as inputs and outputs the distance.

`image_shape = (28, 28, 1)`

# specify the batch size and number of epochs

batch_size = 64

epochs = 70
imageA = Input(shape = image_shape)

imageB = Input(shape = image_shape)

model_build = siamese_model(image_shape)

modelA = model_build(imageA)

modelB = model_build(imageB)

distance = Lambda(euclidean_distance)([modelA, modelB])

model = Model(inputs=[imageA, imageB], outputs=distance)

Now compile the model to train the Siamese model using the contrastive loss we defined earlier. The necessary parameters are the pairs of images, the pairs of labels, batch size, and the number of epochs.

`model.compile(loss = contrastiveLoss, optimizer="adam")`

history = model.fit(

[training_pairs[:, 0], training_pairs[:, 1]], training_labels[:],

validation_data=([test_pairs[:, 0], test_pairs[:, 1]], test_labels[:]),

batch_size = batch_size,

epochs = epochs)

Output:

`Epoch 1/70`

1875/1875 [==============================] - 24s 7ms/step - loss: 0.1808 - val_loss: 0.1618

Epoch 2/70

1875/1875 [==============================] - 15s 8ms/step - loss: 0.1615 - val_loss: 0.1572

Epoch 3/70

1875/1875 [==============================] - 14s 7ms/step - loss: 0.1588 - val_loss: 0.1551

Epoch 4/70

1875/1875 [==============================] - 14s 8ms/step - loss: 0.1566 - val_loss: 0.1529

Epoch 5/70

1875/1875 [==============================] - 14s 7ms/step - loss: 0.1552 - val_loss: 0.1520

.

.

.

.

Epoch 67/70

1875/1875 [==============================] - 14s 7ms/step - loss: 0.1486 - val_loss: 0.1447

Epoch 68/70

1875/1875 [==============================] - 14s 7ms/step - loss: 0.1487 - val_loss: 0.1443

Epoch 69/70

1875/1875 [==============================] - 14s 7ms/step - loss: 0.1484 - val_loss: 0.1447

Epoch 70/70

1875/1875 [==============================] - 14s 7ms/step - loss: 0.1490 - val_loss: 0.1446

Let’s plot some of the pairs with their distances. We will take 4 pairs of images randomly. **OpenCV library** can be used for this. First, it requires some basic image processing like scaling and then adding one extra dimension to both the dimensions of the images. Then we will use the model to predict the distance between the images in each pair. Finally, you can plot them to see the distance and the pairs.

`import cv2`

pairs = np.random.choice(len(test_pairs), size=4)
for i in pairs:

imageA = test_pairs[i][0]

imageB = test_pairs[i][1]

baseA = imageA.copy()

baseB = imageB.copy()

imageA = np.expand_dims(imageA, axis=-1)

imageB = np.expand_dims(imageB, axis=-1)

imageA = np.expand_dims(imageA, axis=0)

imageB = np.expand_dims(imageB, axis =0)

imageA = imageA/255.0

imageB = imageB / 255.0

predicts = model.predict([imageA, imageB])

proba = predicts[0][0]

fig = plt.figure("Pair #{}".format(i+1), figsize=(4,2))

plt.suptitle("Distance: {}:.2f".format(proba))

ax = fig.add_subplot(1, 2, 1)

plt.imshow(baseA, cmap=plt.cm.gray)

plt.axis("off")

ax = fig.add_subplot(1, 2, 2)

plt.imshow(baseB, cmap=plt.cm.gray)

plt.axis("off")

plt.show()

Look at these pictures and the corresponding distances. As you can see, the predict function does not give you the label 0 or 1 in this case. It gives you the distances between the two images in the image pairs. When the images in the pairs are more similar the distance is much smaller.

If you want you can set a threshold distance based on your use cases to distinguish between similar and dissimilar images to get the label as well.

## Conclusion

This same kind of model and technique can be used for many different types of tasks as I mentioned in the introduction. Because it can work with a smaller number of data, the data collection part becomes easier. Hopefully, it will be useful for you.