Developing Your First Neural Network in PyTorch

Developing Your First Neural Network in PyTorch

I have been working and writing tutorials on deep learning space for a while now, and I focused mostly on TensorFlow. But Py Torch is also another very widely used deep learning package out there. I think it is a good idea to be comfortable with both of the packages. So, I decided to make tutorials on Py Torch as well.

In that context, this tutorial will be on a Neural Network in Py Torch for beginners. We will work on a project and go through step by step.

The Heart.csv dataset from Kaggle will be used for this tutorial. Please feel free to download the dataset and follow along:

Heart Attack Analysis & Prediction Dataset (kaggle.com)

This is a public dataset with CC0: Public Domain License.

Let’s dive in!

I would like to start by importing the necessary packages:

import pandas as pd 
from collections import OrderedDict 
from torch.optim import SGD 
from sklearn.model_selection import train_test_split 
from sklearn.datasets import make_blobs 
import torch.nn as nn 
import torch 

There are a few columns that have the data type of ‘object’. Before going to any modeling, the data types of those columns should be converted to numeric.

for i in df.columns: 
  if df[i].dtype == 'object':
    df[i] = df[i].astype('category').cat.codes 
df

Output:

As you can see, all the data are in numeric form now.

The last column is ‘HeartDisease’, which has two unique values: 0 and 1. Let’s assume that is the target variable which means the goal of this exercise is to determine the HeartDisease based on the other parameters available in the table.

Defining the training and target variables for the model:

X = df.drop(columns=['HeartDisease'])
y = df['HeartDisease']
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21)

To use the data Py Torch models, data needs to be in ‘torch’ format. But x_train and x_test data are in DataFrame form. DataFrames cannot be converted to torch directly. So, converting the data to numpy arrays first and then to torch:

x_train, x_test, y_train, y_test = np.array(x_train), np.array(x_test), np.array(y_train), np.array(y_test)
trainX = torch.from_numpy(x_train).float()
testX = torch.from_numpy(x_test).float()
trainY = torch.from_numpy(y_train).float() 
testY = torch.from_numpy(y_test).float()

The data is ready.

Model Development

As this is for beginners, we will go for a simple neural network.

The neural network is a Sequence of layers. We will work on a simple Sequential model with 2 hidden layers. Let’s see what the model looks like and then I will explain it.

class HeartDisease(nn.Module):
    def __init__(self):
        
        super().__init__()
        self.hidden1 = nn.Linear(11, 128)
        self.act1 = nn.ReLU()
        self.hidden2 = nn.Linear(128, 64)
        self.act2 = nn.ReLU()
        self.output = nn.Linear(64, 1)
        self.act_output = nn.Sigmoid()
    def forward(self, x):
        x = self.act1(self.hidden1(x))
        x = self.act2(self.hidden2(x))
        x = self.act_output(self.output(x))
        return x

This is a very simple neural network with two hidden layers. The first hidden layer has an input of 11 and an output of 128. Here, 11 is the number of features or the number of columns that is taken as training features and 128 is the number of neurons in the first hidden layer. The number 128 is chosen by me, you can try with other numbers as well. The number of neurons can be considered as hyperparameters that need to be figured out. Mostly, it is decided by a lot of trial and error.

The output of the hidden1 should be the input of the hidden2. So, the input of the hidden2 is 128 and I chose the output to be 64. Finally, in the output layer, the input is 64 and the output has 1, as this is a binary classification. If you have 10 classes, the output will be 10.

In classification problems, the output needs to be passed through an activation function that gives a probability that ranges from 0 to 1. So that can be rounded up to 1 or rounded down to 0.

The number of hidden layers is also decided by trial and error.

Then, the forward function after the HeartDisease() method calls the layers and finally returns the output, which is our predicted value.

Printing the model,

model = HeartDisease()
print(model)

Output:

HeartDisease(
  (hidden1): Linear(in_features=11, out_features=128, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=128, out_features=64, bias=True)
  (act2): ReLU()
  (output): Linear(in_features=64, out_features=1, bias=True)
  (act_output): Sigmoid()
)

Here is the loss function and the optimizer:

import torch.optim as optim
loss_fn = nn.BCELoss() #binary cross entropy 
optimizer = optim.Adam(model.parameters(), lr=0.001)

Here comes the model training part. I trained the model for 150 epochs and used a batch size of 64. For each epoch predicted label is calculated using the model we defined, and we calculated the loss using the predicted label and true label.

The tricky part is that we should fix the gradients to zero before moving to the next epoch. Otherwise, the gradients from the previous epoch will add up to the current epoch, and the model training will not be correct.

epochs = 150 
batch_size = 64 

for epoch in range(epochs):
    for i in range(0, len(trainX), batch_size):
        Xbatch = trainX[i:i+batch_size]
        y_pred = model(Xbatch)
        ybatch = trainY[i:i+batch_size]
        #print()
        loss = loss_fn(torch.flatten(y_pred), ybatch)
        optimizer.zero_grad() 
        loss.backward() 
        optimizer.step() 
    print(f'Finished epoch {epoch}, latest loss {loss}')

Output:

Finished epoch 0, latest loss 0.46334463357925415
Finished epoch 1, latest loss 0.5276321172714233
Finished epoch 2, latest loss 0.5331380367279053
Finished epoch 3, latest loss 0.5323242545127869
...
...
...
Finished epoch 147, latest loss 0.16034317016601562
Finished epoch 148, latest loss 0.14931809902191162
Finished epoch 149, latest loss 0.15581083297729492

I just showed a few printouts of the losses from the model training to show you how gradually losses went down. Now, it’s time to check the performance of the model.

Model’s prediction accuracy on the test data:

with torch.no_grad():
  y_pred = model(testX)
accuracy = len((y_pred.round() == testY).float())/len(testY)
accuracy 

Output:

1.0

The prediction accuracy on the training data:

with torch.no_grad():
  y_pred = model(trainX)
accuracy = len((y_pred.round() == trainY))/len(trainY)
accuracy 

Output:

1.0

Wow! The accuracy is 100% for both training and test data.

Conclusion

If you are a TensorFlow user, model training may feel like too manual process for you. But in industry and in research, many people like this manual training process because it gives a lot of control. I feel like we should at least learn the process so that we can use it if necessary.

Leave a Reply

Close Menu