Use case

This notebook is to demonstrate how method chaining can be used in python to make code more readable

Links to other resources:

Imports

# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

# conventional way to import pandas
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

The data

The example data comes from Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE available at their github page.

This dataset is extensively used during the Corona outbreak to e.g. visualize the latest numbers of infected people as plots.

corona_data_url='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'

## The classical notebook way

df = pd.read_csv(corona_data_url,index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
df.head(2)

# columns to lower case and renaming
df.columns.name = 'date'
df.head(2)

df['type'] = 'confirmed'
df.columns.name = 'date'

df = (df.set_index('type', append=True)
            .reset_index(['Lat', 'Long'], drop=True)
            .stack()
            .reset_index()
            .set_index('date')
         )

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/miniconda3/envs/github_page/lib/python3.7/site-packages/pandas/core/indexes/multi.py in _get_level_number(self, level)
   1294         try:
-> 1295             level = self.names.index(level)
   1296         except ValueError:

ValueError: 'Lat' is not in list

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-17-248097334945> in <module>
      1 df = (df.set_index('type', append=True)
----> 2             .reset_index(['Lat', 'Long'], drop=True)
      3             .stack()
      4             .reset_index()
      5             .set_index('date')

~/miniconda3/envs/github_page/lib/python3.7/site-packages/pandas/core/frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
   4563             if not isinstance(level, (tuple, list)):
   4564                 level = [level]
-> 4565             level = [self.index._get_level_number(lev) for lev in level]
   4566             if len(level) < self.index.nlevels:
   4567                 new_index = self.index.droplevel(level)

~/miniconda3/envs/github_page/lib/python3.7/site-packages/pandas/core/frame.py in <listcomp>(.0)
   4563             if not isinstance(level, (tuple, list)):
   4564                 level = [level]
-> 4565             level = [self.index._get_level_number(lev) for lev in level]
   4566             if len(level) < self.index.nlevels:
   4567                 new_index = self.index.droplevel(level)

~/miniconda3/envs/github_page/lib/python3.7/site-packages/pandas/core/indexes/multi.py in _get_level_number(self, level)
   1296         except ValueError:
   1297             if not is_integer(level):
-> 1298                 raise KeyError(f"Level {level} not found")
   1299             elif level < 0:
   1300                 level += self.nlevels

KeyError: 'Level Lat not found'

base_url='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_{name}_global.csv'
df = pd.read_csv(url, 
                     index_col=['Country/Region', 'Province/State', 'Lat', 'Long'])
df['type'] = name.lower()
df.columns.name = 'date'
    
df = (df.set_index('type', append=True)
            .reset_index(['Lat', 'Long'], drop=True)
            .stack()
            .reset_index()
            .set_index('date')
         )
df.index = pd.to_datetime(df.index)
df.columns = ['country', 'state', 'type', 'cases']
    
# Move HK to country level
df.loc[df.state =='Hong Kong', 'country'] = 'Hong Kong'
df.loc[df.state =='Hong Kong', 'state'] = np.nan
    
# Aggregate large countries split by states
df = pd.concat([df, 
                    (df.loc[~df.state.isna()]
                     .groupby(['country', 'date', 'type'])
                     .sum()
                     .rename(index=lambda x: x+' (total)', level=0)
                     .reset_index(level=['country', 'type']))
    ])

				1/22/20	1/23/20	1/24/20	1/25/20	1/26/20	1/27/20	1/28/20	1/29/20	1/30/20	1/31/20	...	3/20/20	3/21/20	3/22/20	3/23/20	3/24/20	3/25/20	3/26/20	3/27/20	3/28/20	3/29/20
Country/Region	Province/State	Lat	Long
Afghanistan	NaN	33.0000	65.0000	0	0	0	0	0	0	0	0	0	0	...	24	24	40	40	74	84	94	110	110	120
Albania	NaN	41.1533	20.1683	0	0	0	0	0	0	0	0	0	0	...	70	76	89	104	123	146	174	186	197	212

			date	1/22/20	1/23/20	1/24/20	1/25/20	1/26/20	1/27/20	1/28/20	1/29/20	1/30/20	1/31/20	...	3/20/20	3/21/20	3/22/20	3/23/20	3/24/20	3/25/20	3/26/20	3/27/20	3/28/20	3/29/20
Country/Region	Province/State	Lat	Long
Afghanistan	NaN	33.0000	65.0000	0	0	0	0	0	0	0	0	0	0	...	24	24	40	40	74	84	94	110	110	120
Albania	NaN	41.1533	20.1683	0	0	0	0	0	0	0	0	0	0	...	70	76	89	104	123	146	174	186	197	212

Method chaining in python

Use case

Imports

The data

Chaining