I have two DataFrames
(with DatetimeIndex
) and want to update the first frame (the older one) with data from the second frame (the newer one).
In addition to previous answer, after reindexing you can use
result.fillna(df1, inplace=True)
so based on Jianxun Li's code (extended with one more column) you can try this
# your data
# ===========================================================
df1 = pd.DataFrame(np.ones(12).reshape(4,3), columns='A B C'.split(), index=pd.date_range('2015-07-09 12:00:00', periods=4, freq='H'))
df2 = pd.DataFrame(np.ones(20).reshape(4,5)*2, columns='A B C D E'.split(), index=pd.date_range('2015-07-09 14:00:00', periods=4, freq='H'))
# processing
# =====================================================
# reindex to populate NaN
result = df2.reindex(np.union1d(df1.index, df2.index))
# fill NaN from df1
result.fillna(df1, inplace=True)
Out[3]:
A B C D E
2015-07-09 12:00:00 1 1 1 NaN NaN
2015-07-09 13:00:00 1 1 1 NaN NaN
2015-07-09 14:00:00 2 2 2 2 2
2015-07-09 15:00:00 2 2 2 2 2
2015-07-09 16:00:00 2 2 2 2 2
2015-07-09 17:00:00 2 2 2 2 2
You can use the combine
function.
import pandas as pd
# your data
# ===========================================================
df1 = pd.DataFrame(np.ones(12).reshape(4,3), columns='A B C'.split(), index=pd.date_range('2015-07-09 12:00:00', periods=4, freq='H'))
df2 = pd.DataFrame(np.ones(16).reshape(4,4)*2, columns='A B C D'.split(), index=pd.date_range('2015-07-09 14:00:00', periods=4, freq='H'))
# processing
# =====================================================
# reindex to populate NaN
result = df2.reindex(np.union1d(df1.index, df2.index))
Out[248]:
A B C D
2015-07-09 12:00:00 NaN NaN NaN NaN
2015-07-09 13:00:00 NaN NaN NaN NaN
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2
combiner = lambda x, y: np.where(x.isnull(), y, x)
# use df1 to update result
result.combine(df1, combiner)
Out[249]:
A B C D
2015-07-09 12:00:00 1 1 1 NaN
2015-07-09 13:00:00 1 1 1 NaN
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2
# maybe fillna(method='ffill') if you like
df2.combine_first(df1)
(documentation)
seems to serve your requirement; PFB code snippet & output
import pandas as pd
print 'pandas-version: ', pd.__version__
df1 = pd.DataFrame.from_records([('2015-07-09 12:00:00',1,1,1),
('2015-07-09 13:00:00',1,1,1),
('2015-07-09 14:00:00',1,1,1),
('2015-07-09 15:00:00',1,1,1)],
columns=['Dt', 'A', 'B', 'C']).set_index('Dt')
# print df1
df2 = pd.DataFrame.from_records([('2015-07-09 14:00:00',2,2,2,2),
('2015-07-09 15:00:00',2,2,2,2),
('2015-07-09 16:00:00',2,2,2,2),
('2015-07-09 17:00:00',2,2,2,2),],
columns=['Dt', 'A', 'B', 'C', 'D']).set_index('Dt')
res_combine1st = df2.combine_first(df1)
print res_combine1st
pandas-version: 0.15.2
A B C D
Dt
2015-07-09 12:00:00 1 1 1 NaN
2015-07-09 13:00:00 1 1 1 NaN
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2