Complete Implementation of a Mini VGG Network for Image Recognition
Mini VGG Network

Complete Implementation of a Mini VGG Network for Image Recognition

VGG Network is the basis for one of the most popular image recognition techniques. It is worth learning because it opens a lot of avenues. You need to understand how a Convolutional Neural Network (CNN) to understand VGGNet. If you are not familiar with CNN architecture please feel free to go through this tutorial first.

In this article, we will only focus on the implementation part of the VGGNet. So we will move pretty fast here.

About VGG Network

VGGNet is a kind of Convolutional Neural Network (CNN) that can extract features more successfully. In VGGNet, we stack multiple Convolution layers. VGGNets can be shallow or deep. In shallow VGGNet, usually, only two sets of four convolution layers are added as we will see soon. And in deep VGGNet, more than four Convolution layers can be added. Two commonly used deep VGGNet is VGG16 which uses 16 layers a total and VGG19 which uses a total of 19 layers. We can add a batch normalization layer or avoid it. But I will use it in this tutorial.

You can read more about the architecture more in this link.

We are going to work on a mini VGGNet today. So it will be much simpler and easier to run but still powerful for a lot of use cases.

One important characteristic of miniVGGNet is, it uses all 3×3 filters. That’s the reason it can generalize so well. Let’s just get started and build a mini VGGNet in Keras and TensorFlow.

I used Google Colaboratory notebook and enabled GPU for this. Otherwise, the training is very slow.

Mini VGG Network Development, Training, and Evaluation

Time to start working. We will experiment with it a little to demonstrate how we can play with it.

These are the necessary imports:

That’s a lot of imports!

We will use the cifar-10 dataset from TensorFlow which is a public dataset available in the TensorFlow library.

I used two different networks just as an experiment. The first one is the popular one. I am saying popular because I found this architecture in Kaggle and some other tutorials.

  if K.image_data_format() == “channels_first”: inputShape = (depth, height, width) chanDim = 1 # first CONV => Activation => CONV => Activation => POOL layer set model.add(Conv2D(32, (3, 3), padding=“same”, input_shape=inputShape)) model.add(Activation(“relu”)) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(32, (3, 3), padding=“same”)) model.add(Activation(“relu”)) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # second CONV => Activation => CONV => Activation => POOL layer set model.add(Conv2D(64, (3, 3), padding=“same”)) model.add(Activation(“relu”)) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), padding=“same”)) model.add(Activation(“relu”)) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # Dense Layer model.add(Flatten()) model.add(Dense(512)) model.add(Activation(“relu”)) model.add(BatchNormalization()) model.add(Dropout(0.5)) # softmax classifier model.add(Dense(classes)) model.add(Activation(“softmax”)) # return the constructed network architecture return model

Let’s load and prepare our cifar-10 dataset.

The cifar-10 dataset has 10 labels. These are the labels in the cifar-10 dataset:

Using the LabelBinarizer to binarize the labels:

Compiling the model here. The evaluation metric is “accuracy” and we will run for 10 epochs.

Here is the result:

After 10 epochs accuracy becomes 79.46% on training data and 78.98% on validation data.

Keeping this in mind, I wanted to change a few things in this network and see the results. Let’s redefine the network above. I used 64 filters all throughout, 256 neurons in the dense layer, and 40% dropout in the last dropout layer.

Here is the new mini VGG network again:

  def build(width, height, depth, classes): model = Sequential() inputShape = (height, width, depth) chanDim = –1 if K.image_data_format() == “channels_first”: inputShape = (depth, height, width) chanDim = 1 # first Conv => Activation => Conv => Activation => Pool layer set model.add(Conv2D(64, (3, 3), padding=“same”, input_shape=inputShape)) model.add(Activation(“relu”)) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), padding=“same”)) model.add(Activation(“relu”)) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # second Conv => Activation => Conv => Activation => Pool layer set model.add(Conv2D(64, (3, 3), padding=“same”)) model.add(Activation(“relu”)) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), padding=“same”)) model.add(Activation(“relu”)) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # Dense Layer model.add(Flatten()) model.add(Dense(300)) model.add(Activation(“relu”)) model.add(BatchNormalization()) model.add(Dropout(0.4)) model.add(Dense(classes)) model.add(Activation(“softmax”)) return model

We will use the same parameters for optimization and running the model. But I used 20 epochs here.

Here are the results:

If you notice, after 10 epochs, the accuracy was slightly higher than the previous network and after 20 epochs the accuracy is really good. 88.45% on training data and 81.99% on validation data.

Presenting the training and validation accuracies and training and validation losses in the same plot:

Training loss went down very smoothly, and validation loss went down as well with some bumps.

Conclusion

Please feel free to experiment with it. Try different parameters as per the project and see how it works for you. We will work on a deep network later.

Leave a Reply

Close Menu