Bubble plots are an improved version of the scatter plot. In a scatter plot, there are two dimensions x, and y. In a bubble plot, there are three dimensions x, y, and z. Where the third dimension z denotes weight. That way, bubble plots give more information visually than a two dimensional scatter plot.
Data Preparation
For this tutorial, I will use the dataset that contains Canadian immigration information. It has the data from 1980 to 2013 and it includes the number of immigrants from 195 countries. import the necessary packages and the dataset:
import numpy as np
import pandas as pd
df = pd.read_excel('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Canada.xlsx',
sheet_name='Canada by Citizenship',
skiprows=range(20),
skipfooter=2)
The dataset is too big. So, I can not show a screenshot here. Let’s see the name of the columns.
df.columns#Output:
Index([ 'Type', 'Coverage', 'OdName', 'AREA', 'AreaName', 'REG',
'RegName', 'DEV', 'DevName', 1980, 1981, 1982,
1983, 1984, 1985, 1986, 1987, 1988,
1989, 1990, 1991, 1992, 1993, 1994,
1995, 1996, 1997, 1998, 1999, 2000,
2001, 2002, 2003, 2004, 2005, 2006,
2007, 2008, 2009, 2010, 2011, 2012,
2013],
dtype='object')
We are not going to use a lot of the columns. I just dropped those columns and set the name of the countries (‘OdName’) as the index.
df = df.drop(columns = ['Type', 'Coverage', 'AREA', 'AreaName', 'REG', 'RegName', 'DEV', 'DevName',]).set_index('OdName')
df.head()

I chose the data of Ireland and Brazil for this exercise. There is no special reason. I chose them randomly.
Ireland = df.loc['Ireland']
Brazil = df.loc['Brazil']
Normalize the Data
There are a few different ways to normalize the data. We normalize the data to bring the data in a similar range. Ireland and Brazil immigration data have different ranges. I needed to bring them to the range from 0 to 1. I simply divided the Ireland data by the maximum value of the Ireland data series. I did the same to the data Series of Brazil.
i_normal = Ireland / Ireland.max()
b_normal = Brazil / Brazil.max()
We will plot the Ireland and Bazil data against the years. It will be useful to have the years on a list.
years = list(range(1980, 2014))
Make the Bubble Plot
Just to see the difference, let’s plot the scatter plot first.
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 8))
plt.scatter(years, Ireland, color='blue')
plt.scatter(years, Brazil, color='orange')
plt.xlabel("Years", size=14)
plt.ylabel("Number of immigrants", size=14)
plt.show()

Now, plot the bubble plot. We have to input the size that we defined before.
plt.figure(figsize=(12, 8))
plt.scatter(years, Brazil,
color='darkblue',
alpha=0.5,
s = b_normal * 2000)plt.scatter(years, Ireland,
color='purple',
alpha=0.5,
s = i_normal * 2000,
)
plt.xlabel("Years", size=14)
plt.ylabel("Number of immigrants", size=14)

We can get an idea about the number of immigrants by the size of the bubbles. The smaller the bubbles, the smaller the number of immigrants.
We can make this plot multicolored as well. To make it a bit meaningful, we need the data series’ sorted. You will see the reason very soon.
c_br = sorted(Brazil)
c_fr = sorted(France)
Now we will pass these values to change the colors.
plt.figure(figsize=(12, 8))
plt.scatter(years, Brazil,
c=c_br,
alpha=0.5,
s = b_normal * 2000)plt.scatter(years, Ireland,
c=c_fr,
alpha=0.5,
s = i_normal * 2000,
)
plt.xlabel("Years", size=14)
plt.ylabel("Number of immigrants", size=14)

Now we added another dimension, color. Color changes by the number of immigrants. But it is not doing that good when we are plotting two variables. Because in this process we did not explicitly define the color for the individual variables. But it does a good job when we plot one variable in the y-axis. Let’s plot the number of immigrants from Brazil per year to see the trend over the years.
plt.figure(figsize=(12, 8))
plt.scatter(years, Brazil,
c=c_br,
alpha=0.5,
s = b_normal * 2000)
plt.xlabel("Years", size=14)
plt.ylabel("Number of immigrants of Brazil", size=14)

I am sure, you can see the change in colors with the number of immigrants very clearly here.
That was all for the bubble plot in Matplotlib. I hope it was helpful.
#matplolib #dataVizualization #bubbleplot #dataScience #DataAnalysis #python #plots