Pivot tables are very popular for data table manipulation in Excel. It is a powerful tool for data analysis and presentation of tabular data. Excellent in combining and summarising a useful portion of the data as well. In this article, I will solve some analytic questions using a pivot table.
I am going to use the famous titanic dataset from Kaggle to demonstrate the use of the pivot table. You can download the dataset here as well. First, import the dataset:
import pandas as pd
import numpy as np
df = pd.read_csv('titanic_data.csv')

I am simply deleting the ‘PassengerId’ column because it is unnecessary for data analysis.
df = df.drop(columns='PassengerId')
Let’s make ourselves some assignment to solve.
- Construct a table that shows the proportion of people who survived in each passenger class.
There are three classes. They are 1, 2, and 3. Replacing them with strings will look better as an index.
df['Pclass'] = df.Pclass.replace({1: "First", 2: "Second", 3: "Third"})
df.pivot_table("Survived", index=['Pclass'])

First-class passengers survived the most and the passengers in third class survived the least. There is a clear class difference here.
2. Generate a tabular sheet to show the proportion of people who survived in each class segregated by gender.
Here we will pass two layers of the index: Passenger class and Sex.
df.pivot_table("Survived", index=['Pclass', "Sex"])

Wow! The survival rate of female was much higher
3. Calculate the average fare in each class for both males and females.
df.pivot_table("Fare", index=['Pclass', "Sex"], aggfunc=np.mean)

The average fare of females was also much higher than males for the same passenger class.
4. Build a table to present the average age of survived passengers in each class through the columns segregated by ‘Sex’ and ‘Embarked’.
df.pivot_table("Age", index=["Sex", "Survived", "Embarked"], columns = ["Pclass"], aggfunc=np.mean, fill_values = 0)

It looks like the passengers who survived were younger than the passengers who did not survive. Though, more analysis is required to confirm that.
5. Construct a table to show the proportion of people who survived for different age groups, passenger class, and gender.
age = pd.cut(df["Age"], [0, 18, 38, 58, 80])
df.pivot_table("Survived", ["Sex", age], "Pclass")

It is hard to say for females that age has any effect on survival rate. But for males, the effect of age is more obvious. Younger males seem to have a higher survival rate.
6. Construct a table to present the number of people who survived and the mean fare for each class and gender.
df.pivot_table(index="Pclass", columns="Sex", aggfunc={"Survived": sum, "Fare": "mean"})

The survival rate is much higher for females and the upper passenger class. At the same time, the fare was also much higher for females and the upper passenger class for obvious reasons.
#pandas #python #pivottable #datascience #dataanalysis