In time-series data analysis, generating dates could be necessary on many occasions in real life. Sometimes we have data but time is not recorded, sometimes we may have to use one countries data for another country’s research or last year’s data this year. The holidays will be different this year than last or this country than another country. This article shows:
a. how to use the in-built holiday calendar.
b. generate a custom holiday calendar.
c. incorporate a series of dates in a dataset.
Time Series Considering Holidays
- Generate a time series that considers all the holidays.
Pandas already have a US holiday calendar built in it. Use the ‘CustomBusinessDay’ function to generate a custom frequency, pass the built-in US holiday calendar. Use this custom business day as the frequency.
from pandas.tseries.holiday import USFederalHolidayCalendarfrom pandas.tseries.offsets import CustomBusinessDayusb = CustomBusinessDay(calendar = USFederalHolidayCalendar())
pd.date_range('7/1/2018', '7/10/2018', freq=usb)#Output:
DatetimeIndex(['2018-07-02', '2018-07-03', '2018-07-05', '2018-07-06', '2018-07-09', '2018-07-10'], dtype='datetime64[ns]', freq='C')
I used the range from July 1 to July 10. Look at the output. After July 3, it has July 5. Because July 4th is a Holiday.
2. Now, I am going to show you, how to make your custom holiday calendar. If your client is not from the US, you have to make a custom holiday calendar. Because other country’s holidays are not the same as the US.
For the simplicity of this tutorial, assume that I am the owner of a big corporation. I declare my anniversary as a company holiday. Though it’s very unlikely. I am doing it only to demonstrate how to make a custom holiday calendar.
Pandas library lets you use their classes to modify and make your own. At the end of this page, you will find the class ‘ USFederalHolidayCalendar’ which looks like this:
US Federal Government Holiday Calendar based on rules specified by:
"""rules = [
Holiday("New Years Day", month=1, day=1, observance=nearest_workday),
Holiday("July 4th", month=7, day=4, observance=nearest_workday),
Holiday("Veterans Day", month=11, day=11, observance=nearest_workday),
Holiday("Christmas", month=12, day=25, observance=nearest_workday),
Now, modify the name and rules to make a custom holiday calendar. I am assuming my anniversary on March 20. Use ‘observance’ as ‘nearest_workday’. That means if the specified date is a weekend, the holiday calendar will automatically take the nearest workday as the holiday. There are few other options for ‘observance’. They are: ‘sunday_to_monday’, ‘next_monday_or_tuesday’, ‘previous_friday’, ‘next_monday’. Please feel free to try them yourself.
from pandas.tseries.holiday import AbstractHolidayCalendar, nearest_workday, Holidayclass MyAnniversaryCalendar(AbstractHolidayCalendar):
rules = [
Holiday("MyAnniversary", month=3, day=20, observance = nearest_workday)
Here I made my anniversary calendar. Make a custom frequency using this class ‘MyAnniversaryCalendar’ and use it as the value of the ‘freq’ parameter.
myday = CustomBusinessDay(calendar=MyAnniversaryCalendar())
pd.date_range('3/15/2020', periods=12, freq=myday)#Output:
DatetimeIndex(['2020-03-16', '2020-03-17', '2020-03-18', '2020-03-19', '2020-03-23', '2020-03-24', '2020-03-25', '2020-03-26', '2020-03-27', '2020-03-30', '2020-03-31', '2020-04-01'], dtype='datetime64[ns]', freq='C')
Take a look at the output, please. After 19th March, it brings 23rd March. There is a three day weekend!
3. Some countries have different weekdays. For example, counties like Egypt, Qatar have Friday and Saturday as their weekend. So, their holiday calendar should be different. Here is how to define weekly business days and use it:
b = CustomBusinessDay(weekmask = 'Sun Mon Tue Wed Thu')
pd.date_range('6/20/2020', '6/30/2020', freq=b)#Output:
DatetimeIndex(['2020-06-21', '2020-06-22', '2020-06-23', '2020-06-24', '2020-06-25', '2020-06-28', '2020-06-29', '2020-06-30'], dtype='datetime64[ns]', freq='C')
Check the missing dates from the output. You will see they are Friday and Saturday.
Use the Date Series in a Dataset
In this section, we will use a series of dates generated by date_range function and use it in a dataset. I am using the Facebook stock dataset for this exercise. First import the dataset:
df = pd.read_csv('FB_data_with_no_date.csv')
Now, generate the time series, where the start day is January 1st, 2020, ‘periods’ is the length of the dataset and the frequency
rng = pd.date_range('1/1/2020', periods = len(df), freq='B')
Set this time series as the index of the Facebook stock dataset.
This is the time series dataset, ready to use for time series data analysis or forecasting.
Here is the tutorial video for this:
I hope, it was helpful.
#python #TimeSeries #Pandas #DataScience #DataAnalysis