Image Classification With Convolutional Neural Network: Step By Step

Convolutional neural network (CNN) is an advanced version of neural network. It condense down a picture to some important features. If you worked with the FashionMNIST dataset that contains shirts, shoes handbags etc., CNN will figure out important portions of the images to determine what makes a shirt, a shirt or a handbag, a handbag. For example if you see a shoelace, it might be a shoe, if there is a collar and buttons, that might be a shirt or if there is a handle, that might be handbag.

The simple CNN we will build today to classify a set of image will consists of convolutions and pooling. Inputs get to modify in convolution layers. You can put one or more convolutions depending on your requirement. Inputs go through several filters and those filters slice through the inputs to learn portions of an input such as the buttons of shirts, handle of a handbag or a lace of a shoe. I am not going too deeper on it today. In a future article I will write about it in details.

Pooling is another very important part of a CNN. Pooling works on each local region like convolutions but they do not have filters and it is a vector to scalar transformation. The simply compute the average of the region and recognize the pixels with highest intensity and eliminate the rest. A 2 x 2 pooling will reduce the size of feature maps by a factor of 2.

We are not going into details of the mathematical part today. I decided to keep it for a problem solving session. Even if you don’t know the mathematical part of it, you can still solve a deep learning problem. I will explain each and every line of code for that. Nowadays we have such rich libraries to perform all this amazing work without even knowing much math or coding. Let’s dive in.

I used a google Colab notebook. If you don’t have anaconda and jupyter notebook installed you can still work on it. Google colab notebook is available to everyone. There are lots of youtube videos are there to learn how to use Google Colab. Please feel free to check those out if Google Colab if not known to you. We will use a dataset that contains the images of cats and dogs. Our goal is to develop a convolutional neural network that will successfully classify cats and dogs from a picture. We are using the dataset from Kaggle.

First import all the required packages and libraries.

import os

import zipfile

import random

import tensorflow as tf

from tensorflow.keras.optimizers import RMSprop

from tensorflow.keras.preprocessing.image import ImageDataGenerator

import shutil

It’s time to get our dataset. We will use wget to get the dataset. Let’s download the full Cats-v-Dogs dataset and store it as and save it in a directory name tmp.

!wget –no-check-certificate \    “” \    -O “/tmp/”

Now extract the data from the zip folder which will generate a tmp/PetImages directory with two subdirectories called Cat and Dog. That’s how the data is originally structured.

local_zip = ‘/tmp/’

zip_ref = zipfile.ZipFile(local_zip, ‘r’)



Lets’s  check the Cat and Dog folder.



As the data is available to use, now we need to create a directory named cats-v-dogs and subdirectories training and testing.





except OSError:   


Now split the data for training and testing, put the data in the correct directory with a function split_data. Split_data takes a SOURCE directory containing the files and TRAINING directory where a slice of the data will be copied to, a testing directory where the remaining data will be copied to and a split_size to slice the data.


    cont = os.listdir(SOURCE) #gives the listing of the contents in Source Directory 

    lenList = len(cont) #getting the length of the list 

    shuffleList = random.sample(cont, lenList)  #shuffles the list so that in the first half of the list training model doesn’t get only cats or only dogs.   

    slicePoint = round(len(shuffleList)*SPLIT_SIZE)  #Find the slice point where training data should stop.  

for i in range(0, len(shuffleList[:slicePoint])):   

    if os.path.getsize(SOURCE+cont[i]) !=0: #os.path.getsize gives the size of the file. So this line of code checks if the the file is null or having a file length of zero.     

    shutil.copy(os.path.join(SOURCE,cont[i]), training) #Then copy that file to the TRAINING directory.

The code block below checks the remaining files for length and put them in the TESTING directory. 

for j in range(len(shuffleList[slicePoint:])):   

    if os.path.getsize(SOURCE+cont[j]) !=0:     

    shutil.copy(os.path.join(SOURCE,cont[j]), testing)

Function is ready. Use split_data function to split the data of the source directory to and copied then over to the training and testing directory.

CAT_SOURCE_DIR = “/tmp/PetImages/Cat/”

TRAINING_CATS_DIR = “/tmp/cats-v-dogs/training/cats/”

TESTING_CATS_DIR = “/tmp/cats-v-dogs/testing/cats/”

DOG_SOURCE_DIR = “/tmp/PetImages/Dog/”

TRAINING_DOGS_DIR = “/tmp/cats-v-dogs/training/dogs/”

TESTING_DOGS_DIR = “/tmp/cats-v-dogs/testing/dogs/”

 split_size = .9



check the length of the training and testing directory.





Data preprocessing is done. Here comes the fun part. We will develop a keras model to classify the cats and dogs. In this model we will use three convolutional layers and a pooling layer. You can try it with less or more convolution layers. We will use relu activation and input_shape 150 x 150. This input_shape will reshape all the images into this same square shape. Otherwise images in real world will come in different sizes and shapes. In the first layer we have filter size is 3 x 3 and number of filter is 16. Maxpooling 2 x 2 will condense the pixels by the factor of 2. We have two more layers with different number of filters.

model = tf.keras.models.Sequential([    tf.keras.layers.Conv2D(16, (3,3), activation=’relu’, input_shape=(150, 150, 3)),   


tf.keras.layers.Conv2D(32, (3,3), activation=’relu’),   


tf.keras.layers.Conv2D(64, (3,3), activation=’relu’),     

tf.keras.layers.MaxPooling2D(2,2),    # Flatten the results to feed into a DNN   

tf.keras.layers.Flatten(),     # 512 neuron hidden layer   

tf.keras.layers.Dense(512, activation=’relu’),     # Only 1 output neuron. It will contain a value from 0-1 where 0 for ‘cats’ and 1 for ‘dogs’    

tf.keras.layers.Dense(1, activation=’sigmoid’)


model.compile(optimizer=RMSprop(lr=0.001), loss=’binary_crossentropy’, metrics=[‘acc’])

In the model.compile we should pass at least optimizer and loss parameters. Here lr is learning rate. It is important to choose a reasonable learning rate. I am not describing much about learning rate here. Next step is to normalize the data.

from tensorflow.keras.preprocessing.image import ImageDataGenerator

base_dir = ‘/tmp/cats-v-dogs’ 

TRAINING_DIR = os.path.join(base_dir, ‘training’)

train_datagen = ImageDataGenerator(rescale = 1.0/255)

train_generator = train_datagen.flow_from_directory(TRAINING_DIR,                                        batch_size=20,   class_mode=’binary’,  target_size=(150, 150))

ImageDataGenerator helps normalizing the pixels values and make them in between 0 and 1. Originally the values can be 0 to 255 as you may already know. Then we pass our data in batches for training. Here we are providing batch_size 20. We should normalize the testing data in the same way:

VALIDATION_DIR =os.path.join(base_dir, ‘testing’)

validation_datagen = ImageDataGenerator(rescale = 1.0/255)

validation_generator =  validation_datagen.flow_from_directory(VALIDATION_DIR,                                          batch_size=20,  class_mode=’binary’,  target_size=(150, 150))

Now train the model. Let’s train it with 15 epochs. You should keep track of 4 parameters. Loss, accuracy, validation loss and validation accuracy. Loss should go down and accuracy should go up with every epoch.

history = model.fit_generator(train_generator,  epochs=15,  verbose=1,                              validation_data=validation_generator)

I got 89.51% accuracy in training set and 91.76% accuracy on validation data. I have to mention one thing here. That is, if accuracy on training set is really high and accuracy in test set or validation set is not that good, that is an overfitting problem. It means model learned training dataset so well that it only know that training data very well it’s not good for other unseen data. But that’s not our goal. Our goal is to develop a model that is good for overall most dataset out there. When you see overfitting, you need to modify the training parameter. Probably less number of epochs, different learning rate. We will talk about how to deal with overfitting in a later article.

Leave a Reply

Close Menu