Tensorflow is arguably the most popular package in deep learning and the neural network domain. I wrote a few different tutorials before on Regular Dense Neural Networks, CNN structure, and RNNs. But all my tutorials on Tensorflow were on classification problems. Sometimes image classification and sometimes natural language classification.
In this article, I would like to work on a regression problem and demonstrate some models of both Sequential and Function APIs.
I already did all the data cleaning. Because this article will be totally focused on TensorFlow in regression problems. So, I did not want to show the data cleaning operations here.
Clean and ready to use in the models’ dataset can be downloaded from this link.
The original dataset was taken from the UCI Machine Learning repository.
So, here I am importing the dataset:
import pandas as pd
df = pd.read_csv("auto_price.csv")
I already took care of some of the null values in my cleaning. But still, if there are more null values, I will simply drop them:
df = df.dropna()
It is important to get rid of all the null values before diving into the TensorFlow models because null values in the dataset will give you errors.
In this dataset, the ‘price’ column will be used as the output variable or label. The rest of the features will be used as the training features.
X = df.drop(columns ="price")
y = df['price']
Next, I will do another common task. That is splitting the dataset for training, validation, and testing.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full)
Unlike classification, regression problems cannot be evaluated by accuracy. Here I will use Root Mean Squared Error as the evaluation metric.
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Densedef rmse(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true)))
Model Development
In TensorFlow, model development is the fun part. For most real-world problems, you will try several TensorFlow models for one project. I tried eight models with this project before I started writing. I will share some of them here.
I will start with a regular DNN model using Sequential API and then show some models using functional APIs. So, you will see the difference.
I tried four DNN models and this one gave me the best results:
model4= Sequential([
tf.keras.layers.Input(shape = X_train.shape[1:]),
Dense(300, activation='tanh'),
Dense(300, activation='tanh'),
Dense(300, activation='tanh'),
Dense(300, activation='tanh'),
Dense(300, activation='tanh'),
Dense(300, activation='tanh'),
Dense(1)
])
As you can see, we have six hidden Dense layers, with 300 neurons and ‘tanh’ activation function. Please feel free to try with some other activation functions, a different number of layers, and a different number of neurons. I will keep the activation functions throughout this article to be consistent and compare the models.
Because this is a regression problem and for a given input, there will be only one output here. So, the final output layer has one neuron. And the input shape is the number of features.
Let’s compile this model using ‘rmse’ as loss, adam optimizer, and evaluation metric as ‘rmse’ again.
model4.compile(
loss=rmse,
optimizer=Adam(),
metrics=[rmse]
)
Finally running the model for 20000 epochs:
history4 = model4.fit(X_train, y_train, epochs=20000, validation_data=(X_valid, y_valid))
Output:
Epoch 1/20000
4/4 [==============================] - 0s 40ms/step - loss: 14714.9736 - rmse: 15355.5986 - val_loss: 17081.0312 - val_rmse: 15776.2559
Epoch 2/20000
4/4 [==============================] - 0s 9ms/step - loss: 14792.2432 - rmse: 15059.4141 - val_loss: 17076.2695 - val_rmse: 15771.4033
Epoch 3/20000
4/4 [==============================] - 0s 9ms/step - loss: 14842.5195 - rmse: 15015.9941 - val_loss: 17074.5098 - val_rmse: 15769.6104
...
...
Epoch 19997/20000
4/4 [==============================] - 0s 8ms/step - loss: 7850.9614 - rmse: 8218.5664 - val_loss: 9565.8027 - val_rmse: 8386.0020
Epoch 19998/20000
4/4 [==============================] - 0s 7ms/step - loss: 7826.3198 - rmse: 7867.3975 - val_loss: 9565.8008 - val_rmse: 8386.0020
Epoch 19999/20000
4/4 [==============================] - 0s 7ms/step - loss: 7857.6772 - rmse: 7917.7451 - val_loss: 9565.8135 - val_rmse: 8386.0088
Epoch 20000/20000
4/4 [==============================] - 0s 8ms/step - loss: 7846.6616 - rmse: 8078.2676 - val_loss: 9565.8145 - val_rmse: 8386.0088
As the output shows above, the training loss and validation loss started with 14714 and 17081 respectively. After 20000 epochs the training and validation losses were 7846 and 9565.
Training and validation data both were used in the training. It will be good to see how it performs on totally unseen data. I will test data to evaluate the TensorFlow model performance:
model4.evaluate(X_test, y_test)
Output:
2/2 [==============================] - 0s 2ms/step - loss: 7129.9590 - rmse: 7222.2402
Out[209]:
[7129.958984375, 7222.240234375]
On test_data loss is 7129 and ‘rmse’ is 7222. Looks like loss and ‘rmse’ are a lot less than the validation loss and ‘rmse’. Not bad!
Let’s try to improve these results TensorFlow offers functional APIs to develop more complex models.
Tensorflow Model Development with Functional APIs
One way to use Functional API to build wide and deep neural networks. In this method, part or full input layer goes through all the transformations in the hidden layers to reach the output layer (deep path). In addition, the input layer can also directly go to the output layer (wide path).
Here is a model using the functional API. I will explain to the model after you see the model:
input = tf.keras.layers.Input(shape = X_train.shape[1:])
hidden1 = tf.keras.layers.Dense(300, activation='relu')(input)
hidden2 = tf.keras.layers.Dense(300, activation='relu')(hidden1)
concat = keras.layers.Concatenate()([input, hidden2])
output = keras.layers.Dense(1)(concat)
model51 = keras.models.Model(inputs=[input], outputs=[output])
The difference is very obvious. Each layer is given a name. After the input layer, a hidden layer is created as usual. It is named hidden1. As soon as hidden1 is created, it is acting as a function where the input layer is passed on. In the same way, hidden2 is also created which will also act as a function. The output of the hidden1 is passed on to hidden2.
This model has only two hidden layers. The next layer is a concatenate() layer that also will act as a function where the input layer and the output of the hidden2 layer were added. As the name suggests, this layer concatenates input layer and the output of the hidden2 layer. Finally, the output layer is created and as the other layers, it is also be used as a function. The output of the ‘concat’ is passed to the output layer. So, in the output layer, we have the raw input and transformed input by hidden1 and hidden2 layers concatenated together.
In the last line, a Keras model was created where input and output were specified explicitly.
Compiling and training processes are the same as before. This time, the training was run for only 8000 epochs as opposed to 20000. Because the TensorFlow models with Functional API are supposed to be converged faster.
model51.compile(
loss=rmse,
optimizer=Adam(),
metrics=[rmse]
)
history5 = model51.fit(X_train, y_train, epochs=8000, validation_data=(X_valid, y_valid))
Output:
Epoch 1/8000
4/4 [==============================] - 1s 64ms/step - loss: 13258.9883 - rmse: 13171.9844 - val_loss: 14222.4844 - val_rmse: 12949.8066
Epoch 2/8000
4/4 [==============================] - 0s 8ms/step - loss: 11380.7041 - rmse: 11492.8750 - val_loss: 12479.9932 - val_rmse: 11245.1924
Epoch 3/8000
4/4 [==============================] - 0s 10ms/step - loss: 9632.6230 - rmse: 10223.1465 - val_loss: 10826.7109 - val_rmse: 9650.0918...
...
Epoch 7998/8000
4/4 [==============================] - 0s 7ms/step - loss: 619.7231 - rmse: 626.7860 - val_loss: 3652.3462 - val_rmse: 2738.8286
Epoch 7999/8000
4/4 [==============================] - 0s 7ms/step - loss: 661.9867 - rmse: 680.8888 - val_loss: 3569.5647 - val_rmse: 2896.3765
Epoch 8000/8000
4/4 [==============================] - 0s 6ms/step - loss: 607.5422 - rmse: 579.1271 - val_loss: 3477.7332 - val_rmse: 2928.8345
After 8000 epochs, ‘rmse’ came down to 579 for training data and 2928 for validation data. Compared to model4, it is a lot better which were 8078 and 8386.
The training and validation ‘rmse’ values are very different. Clear overfitting there. What will be the ‘rmse’ for the totally unseen data? Here is a check with the test data:
model51.evaluate(X_test, y_test)
Output:
[2501.0322265625, 2517.703125]
It gives a ‘rmse’ of 2517, closer to the validation ‘rmse’.
Here is the plot that shows how the training loss and validation loss changed with epochs:
plt.plot(history5.history['loss'])
plt.plot(history5.history['val_loss'])
plt.legend()

The structure of model51 was a very simple model on functional API. You can design your network a lot more complex than that. The next model is model5 which is a bit complex model just to demonstrate how you can play with it.
But I am not for complex models if a simple model can do the job. Simpler the model the better. But if necessary we may need to use more complex models for better results in some complicated projects. So, here I have a demonstration:
input = tf.keras.layers.Input(shape = X_train.shape[1:])
hidden1 = tf.keras.layers.Dense(300, activation='relu')(input)
hidden2 = tf.keras.layers.Dense(300, activation='relu')(hidden1)
hidden3 = tf.keras.layers.Dense(300, activation='relu')(hidden2)
hidden4 = keras.layers.Concatenate()([input, hidden3])
hidden5 = tf.keras.layers.Dense(300, activation='relu')(hidden4)
concat = keras.layers.Concatenate()([input, hidden5])
output = keras.layers.Dense(1)(concat)
model5 = keras.models.Model(inputs=[input], outputs=[output])
As usual, the first layer is the input layer
The second layer is the hidden1 layer where the input layer was passed on.
The output of the hidden1 layer was passed on to the hidden2 layer.
The output of the hidden2 layer is passed on to the hidden3 layer.
Then we concatenate the input layer with the output of the hidden3 layer and name it hidden4.
The output of the hidden4 layer is then passed on to the hidden5.
Then again we concatenate the input with the output of the hidden5.
In the end, there is an output layer and model development specifying the input and output.
The next steps are the same as before:
model5.compile(
loss=rmse,
optimizer=Adam(),
metrics=[rmse]
)
history5 = model5.fit(X_train, y_train, epochs=8000, validation_data=(X_valid, y_valid))
Output:
Epoch 1/20000
4/4 [==============================] - 1s 50ms/step - loss: 13927.1045 - rmse: 13915.9434 - val_loss: 13678.9092 - val_rmse: 12434.3320
Epoch 2/20000
4/4 [==============================] - 0s 8ms/step - loss: 10047.6201 - rmse: 9563.8525 - val_loss: 9203.7637 - val_rmse: 8289.6582
Epoch 3/20000
4/4 [==============================] - 0s 7ms/step - loss: 8499.3525 - rmse: 8310.4893 - val_loss: 9033.5527 - val_rmse: 8297.1348
...
...
Epoch 19998/20000
4/4 [==============================] - 0s 9ms/step - loss: 287.1166 - rmse: 261.8001 - val_loss: 3024.3147 - val_rmse: 2331.6790
Epoch 19999/20000
4/4 [==============================] - 0s 9ms/step - loss: 227.0752 - rmse: 238.3496 - val_loss: 2957.6494 - val_rmse: 2263.0854
Epoch 20000/20000
4/4 [==============================] - 0s 9ms/step - loss: 187.4945 - rmse: 180.5027 - val_loss: 3016.0168 - val_rmse: 2355.3120
As you can see the ‘rmse’ for training data and validation data both further improved with this model.
This time the ‘rmse’ values are 180 and 2355 for the training and validation dataset respectively. Clearly, there is overfitting.
Evaluation with test data:
model5.evaluate(X_test, y_test)
Output:
[2499.998291015625, 2547.47216796875]
The ‘rmse’ is 2547, over the validation ‘rmse’ which looks very normal. But the ‘rmse’ is very close to model51 evaluation. So, we cannot actually say this model improved the result very much.
Complex and larger models do not always mean a better result. We can try just with a different model. It happened to me a lot before. I tried with a simple model first and then worked on so many more and more complex models. But finally went back to the simpler model.
Instead of using all the input features through the transformation, it is possible to transform only a portion of the input features, and part of the input features can be added directly to the output layer.
For this model, the first 15 features are saved as input1 and the last 18 features are saved as input2. Some features are overlapping because the total number of features is 26. So, some features will be in both input1 and input2.
Here is the model:
input1 = tf.keras.layers.Input(shape = [15])
input2 = tf.keras.layers.Input(shape = [18])
hidden1 = tf.keras.layers.Dense(300, activation='relu')(input2)
hidden2 = tf.keras.layers.Dense(300, activation='relu')(hidden1)
hidden3 = tf.keras.layers.Dense(300, activation='relu')(hidden2)
hidden4 = keras.layers.Concatenate()([input2, hidden3])
hidden5 = tf.keras.layers.Dense(300, activation='relu')(hidden4)
concat = keras.layers.Concatenate()([input1, hidden5])
output = keras.layers.Dense(1)(concat)
model6 = keras.models.Model(inputs=[input1, input2], outputs=[output])
The model structure is almost like the previous model. The only difference is, input2 is passed through the transformation in the hidden layers. Input1 is concatenated with the last hidden layer(hidden5), right before the output. So, input1 did not go through all the transformations in the hidden layers.
So, we should segregate the features:
X_train1, X_train2 = X_train.iloc[:, :15], X_train.iloc[:, 7:]
X_valid1, X_valid2 = X_valid.iloc[:, :15], X_valid.iloc[:, 7:]
X_test1, X_test2 = X_test.iloc[:, :15], X_test.iloc[:, 7:]
Let’s compile and fit the input and output data to the model now. Remember, we have to pass both input1 and input2 as input this time.
model6.compile(
loss=rmse,
optimizer=Adam(),
metrics=[rmse]
)
history6 = model6.fit((X_train1, X_train2), y_train, epochs=20000, validation_data=((X_valid1, X_valid2), y_valid))
Output:
Epoch 1/20000
4/4 [==============================] - 1s 53ms/step - loss: 14018.6748 - rmse: 14139.8809 - val_loss: 13892.1895 - val_rmse: 12647.2930
Epoch 2/20000
4/4 [==============================] - 0s 6ms/step - loss: 10192.4922 - rmse: 9930.0068 - val_loss: 9372.5049 - val_rmse: 8391.0254
Epoch 3/20000
4/4 [==============================] - 0s 8ms/step - loss: 8807.8857 - rmse: 8779.3643 - val_loss: 9259.1514 - val_rmse: 8631.5068
...
...
Epoch 19998/20000
4/4 [==============================] - 0s 8ms/step - loss: 495.4005 - rmse: 509.9180 - val_loss: 3193.6880 - val_rmse: 3008.2153
Epoch 19999/20000
4/4 [==============================] - 0s 8ms/step - loss: 405.2184 - rmse: 446.1824 - val_loss: 3340.9062 - val_rmse: 3053.0967
Epoch 20000/20000
4/4 [==============================] - 0s 8ms/step - loss: 405.8342 - rmse: 387.4508 - val_loss: 3277.2720 - val_rmse: 3060.6477
Finally, the ‘rmse’ values for training and validation data became 387 and 3080.
Here is the evaluation:
model6.evaluate([X_test1, X_test2], y_test)
Output:
[2547.850830078125, 2735.259033203125]
The ‘rmse’ on evaluation is also 2735. It does not look like this method did any better. In fact, it did a bit worse than the previous model.
Conclusion
I wanted to demonstrate a few different styles of models for regression in Tensorflow using both Sequential and Functional APIs here and compare them for this specific project. But these results cannot be generalized. Different model architectures work better for different projects. We need to try different models for each project to see which model works best. So, there are a lot of different things that can be tried. Please feel free to try with some more models and see if you can improve the results.
Please feel free to follow me on Twitter, the Facebook page, and check out my YouTube channel.
#Tensorflow #MachineLearning #DeepLearning #ArtificialIntelligence #Python #Regression