Use case

This notebook is to demonstrate how method chaining can be used in python to make code more readable

Links to other resources:

Imports

# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
# conventional way to import pandas
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

The data

The example data comes from Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE available at their github page.

This dataset is extensively used during the Corona outbreak to e.g. visualize the latest numbers of infected people as plots.

corona_data_url='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
## The classical notebook way
df = pd.read_csv(corona_data_url,index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
df.head(2)
1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 ... 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20
Country/Region Province/State Lat Long
Afghanistan NaN 33.0000 65.0000 0 0 0 0 0 0 0 0 0 0 ... 24 24 40 40 74 84 94 110 110 120
Albania NaN 41.1533 20.1683 0 0 0 0 0 0 0 0 0 0 ... 70 76 89 104 123 146 174 186 197 212

2 rows × 68 columns

# columns to lower case and renaming
df.columns.name = 'date'
df.head(2)
date 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 ... 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20
Country/Region Province/State Lat Long
Afghanistan NaN 33.0000 65.0000 0 0 0 0 0 0 0 0 0 0 ... 24 24 40 40 74 84 94 110 110 120
Albania NaN 41.1533 20.1683 0 0 0 0 0 0 0 0 0 0 ... 70 76 89 104 123 146 174 186 197 212

2 rows × 68 columns

df['type'] = 'confirmed'
df.columns.name = 'date'
df = (df.set_index('type', append=True)
            .reset_index(['Lat', 'Long'], drop=True)
            .stack()
            .reset_index()
            .set_index('date')
         )
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/miniconda3/envs/github_page/lib/python3.7/site-packages/pandas/core/indexes/multi.py in _get_level_number(self, level)
   1294         try:
-> 1295             level = self.names.index(level)
   1296         except ValueError:

ValueError: 'Lat' is not in list

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-17-248097334945> in <module>
      1 df = (df.set_index('type', append=True)
----> 2             .reset_index(['Lat', 'Long'], drop=True)
      3             .stack()
      4             .reset_index()
      5             .set_index('date')

~/miniconda3/envs/github_page/lib/python3.7/site-packages/pandas/core/frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
   4563             if not isinstance(level, (tuple, list)):
   4564                 level = [level]
-> 4565             level = [self.index._get_level_number(lev) for lev in level]
   4566             if len(level) < self.index.nlevels:
   4567                 new_index = self.index.droplevel(level)

~/miniconda3/envs/github_page/lib/python3.7/site-packages/pandas/core/frame.py in <listcomp>(.0)
   4563             if not isinstance(level, (tuple, list)):
   4564                 level = [level]
-> 4565             level = [self.index._get_level_number(lev) for lev in level]
   4566             if len(level) < self.index.nlevels:
   4567                 new_index = self.index.droplevel(level)

~/miniconda3/envs/github_page/lib/python3.7/site-packages/pandas/core/indexes/multi.py in _get_level_number(self, level)
   1296         except ValueError:
   1297             if not is_integer(level):
-> 1298                 raise KeyError(f"Level {level} not found")
   1299             elif level < 0:
   1300                 level += self.nlevels

KeyError: 'Level Lat not found'
base_url='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_{name}_global.csv'
df = pd.read_csv(url, 
                     index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
df['type'] = name.lower()
df.columns.name = 'date'
    
df = (df.set_index('type', append=True)
            .reset_index(['Lat', 'Long'], drop=True)
            .stack()
            .reset_index()
            .set_index('date')
         )
df.index = pd.to_datetime(df.index)
df.columns = ['country', 'state', 'type', 'cases']
    
# Move HK to country level
df.loc[df.state =='Hong Kong', 'country'] = 'Hong Kong'
df.loc[df.state =='Hong Kong', 'state'] = np.nan
    
# Aggregate large countries split by states
df = pd.concat([df, 
                    (df.loc[~df.state.isna()]
                     .groupby(['country', 'date', 'type'])
                     .sum()
                     .rename(index=lambda x: x+' (total)', level=0)
                     .reset_index(level=['country', 'type']))
    ])

Chaining