Use case

This notebook is to demonstrate how method chaining can be used in python to make code more readable

Links to other resources:


# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import warnings
# conventional way to import pandas
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

The data

The example data comes from Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE available at their github page.

This dataset is extensively used during the Corona outbreak to e.g. visualize the latest numbers of infected people as plots.

## The classical notebook way
df = pd.read_csv(corona_data_url,index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 ... 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20
Country/Region Province/State Lat Long
Afghanistan NaN 33.0000 65.0000 0 0 0 0 0 0 0 0 0 0 ... 24 24 40 40 74 84 94 110 110 120
Albania NaN 41.1533 20.1683 0 0 0 0 0 0 0 0 0 0 ... 70 76 89 104 123 146 174 186 197 212

2 rows × 68 columns

# columns to lower case and renaming = 'date'
date 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 ... 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20
Country/Region Province/State Lat Long
Afghanistan NaN 33.0000 65.0000 0 0 0 0 0 0 0 0 0 0 ... 24 24 40 40 74 84 94 110 110 120
Albania NaN 41.1533 20.1683 0 0 0 0 0 0 0 0 0 0 ... 70 76 89 104 123 146 174 186 197 212

2 rows × 68 columns

df['type'] = 'confirmed' = 'date'
df = (df.set_index('type', append=True)
            .reset_index(['Lat', 'Long'], drop=True)
df = pd.read_csv(url, 
                     index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
df['type'] = name.lower() = 'date'
df = (df.set_index('type', append=True)
            .reset_index(['Lat', 'Long'], drop=True)
df.index = pd.to_datetime(df.index)
df.columns = ['country', 'state', 'type', 'cases']
# Move HK to country level
df.loc[df.state =='Hong Kong', 'country'] = 'Hong Kong'
df.loc[df.state =='Hong Kong', 'state'] = np.nan
# Aggregate large countries split by states
df = pd.concat([df, 
                     .groupby(['country', 'date', 'type'])
                     .rename(index=lambda x: x+' (total)', level=0)
                     .reset_index(level=['country', 'type']))
