Python’s Seaborn library is a very popular visualization library. It is built on Matplotlib and includes a lot of advanced plots with built-in styles. I have an article on Seaborn that covers the most popular plots (a link is provided at the end of this article). I decided to make detailed tutorials on some of the plots.
In this article, I will focus on two very common plots from the Seaborn library:
- Violin plot
Let’s start with the violin plot.
The reason the violin plot is so useful is that it gives you a kernel density and a boxplot together. So, in one plot you get a lot of information about a variable. I will use the penguin dataset for violin plots:
import seaborn as sns import matplotlib.pyplot as plt pen = sns.load_dataset("penguins")
I will start with the most basic violin plot and slowly move towards some more complex ones.
Here is the violin plot of the body_mass:
sns.violinplot(data = pen, x = "body_mass_g")
Here you can see the density plot first. The distribution is right-skewed. The boxplot in the middle shows you the median (the small white dot in the middle), first quartile, third quartile, minimum and maximum.
For the next plot, I will use a categorical variable in the x-direction and the body_mass in the y-direction. I chose ‘island’ for the x-direction. There are three islands in the ‘island’ column.
We will get three violin plots for the body_mass of the penguins of the three islands:
plt.figure(figsize=(8, 6)) sns.violinplot(data = pen, x = 'island', y = "body_mass_g") plt.show()
So, we have three violin plots for three islands. And you can see the kernel density and boxplots for individual island’s penguins body_mass.
One more step ahead, we can even get into a more granular plot.
Using a ‘hue’ parameter, we will separate the violin plots based on gender now:
plt.figure(figsize=(8, 6)) sns.violinplot(data = pen, x = 'island', y = "body_mass_g", hue ="sex") plt.show()
For each island, we have two violin plots now. One for the male population and another for the female population. But we have the same kernel density for both sides of the violin plot. So, the Seaborn library has the option to use two sides to plot the kernel density of two categories using the ‘split’ parameter.
Here I am using one side for the male population and one side of the violin plot for the female population:
plt.figure(figsize=(8, 6)) sns.violinplot(data = pen, x = 'island', y = "body_mass_g", hue ="sex", split=True) plt.show()
One shortcoming of this plot is you get only one boxplot for the overall population of an island. When we had separate violin plots for the male and female populations we also has separate boxplots.
Instead of boxplots, we can get dotted lines representing the quartiles:
plt.figure(figsize=(8, 6)) sns.violinplot(data = pen, x = 'island', y = "body_mass_g", hue ="sex", split=True, inner = "quartile") plt.show()
Instead of boxplots, we got the quartiles that show the first, second, and third quartiles. As a reminder, the second quartile is the median. So, this time we got the separate quartile lines for both male and female populations of each island.
Instead of quartiles, we can also get the lines representing the data points by using “stick” as an inner parameter. Also, we always had the violin plots for the islands in the default order: “Torgersen”, “Biscoe”, and “dream”. The order can be changed as well:
plt.figure(figsize=(8, 6)) sns.violinplot(data = pen, x = 'island', y = "body_mass_g", hue ="sex", split=True, inner = "stick", order=['Dream', 'Torgersen', 'Biscoe']) plt.show()
The order of the islands changed!
Suppose, two of the island’s penguins are selected as special for any unique experiment and we want to show them in different colors. Suppose the special islands are Biscoe and Dream:
pen['special'] = pen['island'].isin(['Dream', 'Biscoe']) plt.figure(figsize=(8, 6)) sns.violinplot(data = pen, x = 'island', y = "body_mass_g", hue = "special") plt.show()
Look, the violin plots for Biscoe and Dream are in a different color!
The last plot on violin plot will show the use of Seaborn’s Facet Grid option to plot the violin plots. The Violin plot itself does not have that privilege.
We can use the ‘catplot’ and use the ‘kind’ as violin:
sns.catplot(data = pen, x = 'island', y = "body_mass_g", hue ="sex", split=True, inner = 'quartile', kind = 'violin', col = 'species', height = 5, aspect = 0.6)
We have different plots for different species. There is only ‘Chinstrap’ in Dream and only ‘Gentoo’ in Biscoe.
That’s all for the violin plot!
I also have a video tutorial that shows all these plots step by step:
The ‘relplot’ in Seaborn is also very useful because it shows the statistical relationship between two variables. It uses scatterplot and line plot. This part will be focused on the relplot in detail.
I will use the famous ‘titanic’ dataset for this one:
ti = sns.load_dataset('titanic')
These are the columns of this dataset:
Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town', 'alive', 'alone'], dtype='object')
As usual, I will start with the most basic relplot.
Starting with the most basic plot that uses two continuous variables ‘age’ and ‘fare’:
sns.relplot(data = ti, x = 'age', y = "fare")
By default, it uses the scatterplot. We will see how to use the line plot later.
Adding a ‘hue’ parameter to get the different colors for different categories:
sns.relplot(data = ti, x = 'age', y = "fare", hue = "alive")
Let’s add one more variable to it.
Adding ‘pclass’ variable as ‘col’ to get three individual plots for three ‘pclass’:
sns.relplot(data = ti, x = 'age', y = "fare", hue = "alive", col = "pclass", height = 4, aspect = 0.8)
We have three individual plots for three ‘pclass’. Now let’s segregate the data even more.
This next plot will add rows for individual ‘embark_town’:
sns.relplot(data = ti, x = 'age', y = "fare", hue = "alive", col = "pclass", row = "embark_town", height = 4, aspect = 0.8)
I only used the default scatterplot all this time. Let’s see how to use a line plot.
I am going back to the most basic plot again to see the relationship between ‘age’ and ‘fare’ only. But this time in line plot:
sns.relplot(data = ti, x = 'age', y = "fare", kind = 'line')
By default, the line plot comes with the line and the confidence band along the line. If you do not want the confidence band, you can avoid it by using ci = None.
In the next plot, we will avoid the confidence band, use the hue parameter to segregate the data with different colors based on gender, and also will use style and markers based on gender:
sns.relplot(data = ti, x = 'age', y = "fare", kind = 'line', ci = None, hue = "sex", dashes = True, style = "sex", markers= True)
Here, I used the same variable as the ‘hue’ parameter and the ‘style’ parameter. But if you want you can use different variables as well. Please try that and see if you like it.
For the next plot, let’s have three individual plots for three ‘pclass’, with the confidence band and have the color and style based on the ‘sex’:
sns.relplot(data = ti, x = 'age', y = "fare", hue = "alive", col = "pclass",
height = 4, aspect = 0.8, style = "sex", kind = "line")
If this plot looks too busy to you, take off the confidence band. That may help. Also, please try using another variable in the ‘row’ as I did in the scatterplot before.
Here is a video tutorial for relplot:
I wanted to show you two important plot of the Seaborn library that helps plot your continuous variables and provide so many useful insights. I hope this was helpful.
#DataScience #DataVisualization #DataAnaltics #Seaborn #Python #relplot #violinplot