I\'m working through the \"Python For Data Analysis\" and I don\'t understand a particular functionality. Adding two pandas series objects will automatically align the index
It makes more sense to use pd.concat()
as it can accept more columns.
import pandas as pd
import numpy as np
a = pd.Series([35000,71000,16000,5000],index=['Ohio','Texas','Oregon','Utah'])
b = pd.Series([np.nan,71000,16000,35000],index=['California', 'Texas', 'Oregon', 'Ohio'])
pd.concat((a,b), axis=1).sum(1, min_count=1)
Output:
California NaN
Ohio 70000.0
Oregon 32000.0
Texas 142000.0
Utah 5000.0
dtype: float64
Or with 3 series:
import pandas as pd
import numpy as np
a = pd.Series([1, np.NaN, 4, 5])
b = pd.Series([3, np.NaN, 5, np.NaN])
c = pd.Series([np.NaN,np.NaN,np.NaN,np.NaN])
print(pd.concat((a,b,c), axis=1).sum(1, min_count=1))
#0 4.0
#1 NaN
#2 9.0
#3 5.0
#dtype: float64
Pandas does not assume that 500+NaN=500, but it is easy to ask it to do that: a.add(b, fill_value=0)
The default approach is to assume that any computation involving NaN gives NaN as the result. Anything plus NaN is NaN, anything divided by NaN is NaN, etc. If you want to fill the NaN with some value, you have to do that explicitly (as Dan Allan showed in his answer).