## Groupby function in Pandas Library For Efficient Data Analysis

Pandas library in Python is an excellent tool for data analysis. It has numerous useful functions. One such useful function is ‘groupby’. It can be used in many different ways. In this article, I am going to show some use of this function with a simple example.

First, import the libraries’ pandas and NumPy. Then import the dataset. This dataset is not a real-world dataset. I made it up. This dataset shows weather data. But these codes and methods will work on any kind of data.

`import pandas as pdimport numpy as npimport matplotlib.pyplot as plt`
`df = pd.read_excel("weather_cities.xlsx")df` 1. Group this dataset by cities and return the keys that are the list of the cities.
`df.groupby('city').groups.keys()'''OUtputdict_keys(['Boston', 'Houston', 'Miami', 'Paris'])`

2. Return the data of Miami only.

`df.groupby('city').get_group('Miami')` 3. Return the maximum temperature of each city.

`df.groupby('city')['temperature'].max()'''output:city Boston      38 Houston    101 Miami      100 Paris       54 Name: temperature, dtype: int64'''`

With this same syntax, 13 other aggregation is possible. Here is the list:

sum()

mean()

size()

count()

std()

var()

sem()

first()

last()

nth()

min()

skew()

median()

4. Calculate the mean temperature for each day in each city.

`s = df.groupby(['city', 'day']).temperature.mean()` 5. Return the mean temperature for each day in Miami.

`s.loc['Miami]'''Output:day 2017-01-01    90 2017-01-02    85 2017-01-03    87 2017-01-04    92 Name: temperature, dtype: int64'''`

6. Using the DataFrame s above, get the temperature of Miami in 2017–01–01.

`s.loc['Miami', '2017-01-01']'''Output is 90.'''`

7. Make a table of average temperature and a median temperature of each city.

`df.groupby('city')['temperature'].agg({'avg_temp': np.average, 'median_temp': np.median}` 8. Present the average and median for both temperature and median for each city.

`df.groupby('city')['temperature', 'windspeed'].agg({'average': np.average, 'median': np.median})` 9. Build a table to present the maximum temperature and the size of the data for each city.

`a = df.groupby("city")['temperature'].agg([lambda z: np.mean(z), 'size'])a.columns = ["Maximum", 'Total']` Miami and Houston were the hottest cities. At the same time, Boston had a freezing temperature.

10. It is important to know that if we have enough data to make a decision. Make a plot to show how much data do we have for each city.

`plt.clf()df.groupby('city').size().plot(kind='bar')` 11. Make a bar graph of minimum temperature in each city.

`plt.clf()df.groupby('day')['temperature'].min().plot(kind='bar')` 