 ## Animated and Racing Bar Plots Tutorial

Bar plot is pretty basic and very common. All the plotting libraries have bar plot options for sure. This article will focus on the animated bar plot. I will share the code for some animated bar plots. If you haven’t seen them before, you may enjoy them.

I tried to present them simply so that it is understandable. I always like to start with the basic plots. Here I am starting with two plots that are inspired by this page.

I had to install ‘ImageMagick’ in my anaconda environment using this command below because I wanted to save those plots to upload here. You may want to save your animated plots to upload or present somewhere as well.

`conda install -c conda-forge imagemagick`

These are some necessary imports:

`import pandas as pdimport numpy as npfrom matplotlib import pyplot as pltimport seaborn as snsfrom matplotlib.animation import FuncAnimation`

Here is the complete code for a basic animated bar plot. I will explain the code after the code and the visualization part:

`%matplotlib qtfig = plt.figure(figsize=(8,6))axes = fig.add_subplot(1,1,1)axes.set_ylim(0, 120)plt.style.use("seaborn")lst1=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ]lst2=[0, 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100]def animate(i):    y1=lst1[i]    y2=lst2[i]plt.title("Intro to Animated Bar Plot", fontsize = 14)anim = FuncAnimation(fig, animate, frames = len(lst1)-1)anim.save("bar1.gif", writer="imagemagick")`

Now let’s see how the code works. First I created two lists for two bars.

The animate function here takes the elements of two lists and creates two bars named ‘one’ and ‘two’ using two lists y1 and y2.

The FuncAnimation function actually creates the animation. It takes the animate function and the length of the data.

To save the plot, we need to provide a name for the plot. Here I used a very ordinary name, ‘bar1.gif’. It uses ‘imagemagick’ we installed in the beginning to write the plot.

This one is almost the same as the first one with more bars here.

`%matplotlib qtfig = plt.figure(figsize=(8,6))plt.style.use("seaborn")axes = fig.add_subplot(1,1,1)axes.set_ylim(0, 100)l1=[i if i<20 else 20 for i in range(100)]l2=[i if i<85 else 85 for i in range(100)]l3=[i if i<30 else 30 for i in range(100)]l4=[i if i<65 else 65 for i in range(100)]palette = list(reversed(sns.color_palette("seismic", 4).as_hex()))y1, y2, y3, y4 = [], [], [], []def animate(i):    y1=l1[i]    y2=l2[i]    y3=l3[i]    y4=l4[i]plt.bar(["one", "two", "three", "four"], sorted([y1, y2, y3, y4]), color=palette)plt.title("Animated Bars", color=("blue"))anim = FuncAnimation(fig, animate, frames = len(l1)-1, interval = 1)anim.save("bar2.gif", writer="imagemagick")`

Notice here I included axes.set_ylim in the beginning. If you do not set the y-limit, the plot will animate like this:

I saw an animated bar plot that went viral on social media on Covid’s death a few months ago and decided to check how to do it. Here I am sharing that plot.

For the next plots, I will use this superstore dataset.

The author mentioned in the description of the page that it is allowed to use this dataset for educational purposes only.

Let’s import the dataset and look at the columns:

`df = pd.read_csv("Superstore.csv", encoding='cp1252')df.columns`

Output:

`Index(['Row ID', 'Order ID', 'Order Date', 'Ship Date', 'Ship Mode',       'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category', 'Product Name', 'Sales', 'Quantity', 'Discount', 'Profit'], dtype='object')`

As you can see the dataset is pretty big. But we will only use the ‘Order Date’, ‘Profit’, and ‘State’ columns for the next plots.

The plots will be on monthly total sales per state.

We need to perform some data preparation for this. Because we will use bar_chart_race function for this and this function needs a certain format of data for the plot.

First thing is, that the ‘Order Date’ column needs to be in ‘datetime’ format.

`df['Order Date'] = pd.to_datetime(df['Order Date'])`

I will use pivot_table function to have each state’s Sales data as the column:

`pv = df.pivot_table("Sales", index = "Order Date", columns = ["State"], aggfunc = np.sum)`

Here is the part of the output:

It gives us so many null values because Sales data for each day is not available. Notice that the ‘Order Date’ column is the index now.

I will fill those null values with zeros. Also, I am sorting the dataset by ‘Order Date’

`pv.sort_index(inplace=True, ascending=True)pv = pv.fillna(0)`

Now, we have all the null values replaced by zeroes.

As I mentioned before, I want to see the Sales per state by month. So, I will omit the date part and keep the month and year part from the ‘Order Date’ only.

`pv['month_year'] = pv.index.strftime('%Y-%m')pv_month = pv.set_index("month_year")`

You will see the dataset a bit later.

If we want to get the monthly data, we have to use group by on the month_year data.

`pv_month.reset_index()pv_monthgr = pv_month.groupby('month_year').sum()`

This is what the dataset looks like now. We have the monthly Sales data per state. It is required to have the dataset in this format to use the bar_chart_race function.

I should warn you that it takes a lot longer to render these plots than regular bar plots. Sometimes it got stuck when I was doing them for this tutorial. I had given a little break to my computer and come back later to restart my kernel and then it worked faster. But still, it takes longer overall.

Here is the basic bar_chart_race plot. I am using filter column colors parameters as True so that it does not repeat the colors for bars.

`import bar_chart_race as bcrbcr.bar_chart_race(df = pv_monthgr, filename = "by_month.gif", filter_column_colors = True, cmap = "prism", title = "Sales By Months")`

You can see that it is sorted in descending order by default. Every month the order of the states changes. But it is moving so fast that it is hard to see the name of the states even. In the bottom right corner, it shows the month.

In the next plot, we will see some improvements. period_length parameter controls the speed. I used 2500, which is pretty high. So, it will show one month a lot longer than the default.

There is n_bars parameter to specify how many bars we want to see in one frame. I am fixing it at 10. So we will see the top 10 states based on sales amount for every month. The ‘fixed_max’ parameter will fix the x-axis to the overall maximum Sales value. So it will not change in every frame.

`bcr.bar_chart_race(df=pv_monthgr,filename="bar_race3.gif", filter_column_colors=True, cmap='prism', sort = 'desc', n_bars = 10, fixed_max = True, period_length = 2500, title='Sales By Months', figsize=(10, 6)))`

In the next plot, We will add one more piece of information in the plot. Each frame is showing the plot for a month. In the bottom right corner, it will show the total sales amount of that month.

For that, I added a function called summary. I also included some more style parameters. Bar label size, bar-style that makes it a bit transparent and gives a border. We are also adding a perpendicular bar that will show the mean sales for each month. Each month the mean sales vary. So, the perpendicular bar will keep moving.

`def summary(values, ranks):    total_sales = int(round(values.sum(), -2))    s = f'Total Sales - {total_sales:,.0f}'    return {'x': .99, 'y': .1, 's': s, 'ha': 'right', 'size': 8}bcr.bar_chart_race(df=pv_monthgr,filename="bar_race4.gif", filter_column_colors=True,                   cmap='prism', sort = 'desc', n_bars = 15,                   fixed_max = True, steps_per_period=3, period_length = 1500,                   bar_size = 0.8,                   period_label={'x': .99, 'y':.15, 'ha': 'right', 'color': 'coral'},                   bar_label_size = 6, tick_label_size = 10,                   bar_kwargs={'alpha':0.4, 'ec': 'black', 'lw': 2.5},                   title='Sales By Months',                   period_summary_func=summary,                   perpendicular_bar_func='mean',                   figsize=(7, 5)                  )`

This will be the last plot where I will add one more function that will add a custom function to add the 90th quantile in the bar. The steps_per_period equal to 3 means, it takes three steps to go to the next frame. If you notice by the time a month changes in the bottom right corner label, the bars moved three times and you can see how they overlap and slowly change their position.

`def func(values, ranks):    return values.quantile(0.9)bcr.bar_chart_race(df=pv_monthgr,filename="bar_race5.gif", filter_column_colors=True,                   cmap='prism', sort = 'desc', n_bars = 15,                   steps_per_period=3, period_length = 2000,                   bar_size = 0.8,                   period_label={'x': .99, 'y':.15, 'ha': 'right', 'color': 'coral'},                   bar_label_size = 6, tick_label_size = 10,                   bar_kwargs={'alpha':0.4, 'ec': 'black', 'lw': 2.5},                   title='Sales By Months',                   period_summary_func=summary,                   perpendicular_bar_func=func,                   figsize=(7, 5)                  )`

If you like vertical bars instead of horizontal ones, please use orientation=’v’. I prefer horizontal bars.

Please feel free to try out some more style options or different functions for the summary or the perpendicular bar.

You may also want to try different colors, fonts, font sizes for labels and ticks

## Conclusion

Though some people may argue that a simple bar plot shows the information more clearly. I agree with that too. But still, it is good to have some interesting plots like this in the collection. As you can see how each month the ranking of the states is changing as per the sales data, and how the mean sales or 90th quantile is moving. You cannot get a feel of this type of change using a still plot. Also, if you have some interesting-looking plots, it grabs attention.