Method chaining in python
Example of how using method chaining in python can make code more readable and less error-prone
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
# conventional way to import pandas
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
The data
The example data comes from Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE available at their github page.
This dataset is extensively used during the Corona outbreak to e.g. visualize the latest numbers of infected people as plots.
corona_data_url='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
## The classical notebook way
df = pd.read_csv(corona_data_url,index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
df.head(2)
# columns to lower case and renaming
df.columns.name = 'date'
df.head(2)
df['type'] = 'confirmed'
df.columns.name = 'date'
df = (df.set_index('type', append=True)
.reset_index(['Lat', 'Long'], drop=True)
.stack()
.reset_index()
.set_index('date')
)
base_url='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_{name}_global.csv'
df = pd.read_csv(url,
index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
df['type'] = name.lower()
df.columns.name = 'date'
df = (df.set_index('type', append=True)
.reset_index(['Lat', 'Long'], drop=True)
.stack()
.reset_index()
.set_index('date')
)
df.index = pd.to_datetime(df.index)
df.columns = ['country', 'state', 'type', 'cases']
# Move HK to country level
df.loc[df.state =='Hong Kong', 'country'] = 'Hong Kong'
df.loc[df.state =='Hong Kong', 'state'] = np.nan
# Aggregate large countries split by states
df = pd.concat([df,
(df.loc[~df.state.isna()]
.groupby(['country', 'date', 'type'])
.sum()
.rename(index=lambda x: x+' (total)', level=0)
.reset_index(level=['country', 'type']))
])