Setting DataFrame values with enlargement

前端未结

关注

 3  1796

I have two DataFrames (with DatetimeIndex) and want to update the first frame (the older one) with data from the second frame (the newer one).

相关标签:

3条回答

再見小時候

2020-12-20 17:29

In addition to previous answer, after reindexing you can use

result.fillna(df1, inplace=True)

so based on Jianxun Li's code (extended with one more column) you can try this

# your data
# ===========================================================
df1 = pd.DataFrame(np.ones(12).reshape(4,3), columns='A B C'.split(), index=pd.date_range('2015-07-09 12:00:00', periods=4, freq='H'))
df2 = pd.DataFrame(np.ones(20).reshape(4,5)*2, columns='A B C D E'.split(), index=pd.date_range('2015-07-09 14:00:00', periods=4, freq='H'))

# processing
# =====================================================
# reindex to populate NaN
result = df2.reindex(np.union1d(df1.index, df2.index))
# fill NaN from df1
result.fillna(df1, inplace=True)

Out[3]:             
                     A  B  C   D   E
2015-07-09 12:00:00  1  1  1 NaN NaN
2015-07-09 13:00:00  1  1  1 NaN NaN
2015-07-09 14:00:00  2  2  2   2   2
2015-07-09 15:00:00  2  2  2   2   2
2015-07-09 16:00:00  2  2  2   2   2
2015-07-09 17:00:00  2  2  2   2   2

0 讨论(0)

夕颜

2020-12-20 17:35

You can use the combine function.

import pandas as pd

# your data
# ===========================================================
df1 = pd.DataFrame(np.ones(12).reshape(4,3), columns='A B C'.split(), index=pd.date_range('2015-07-09 12:00:00', periods=4, freq='H'))

df2 = pd.DataFrame(np.ones(16).reshape(4,4)*2, columns='A B C D'.split(), index=pd.date_range('2015-07-09 14:00:00', periods=4, freq='H'))

# processing
# =====================================================
# reindex to populate NaN
result = df2.reindex(np.union1d(df1.index, df2.index))

Out[248]: 
                      A   B   C   D
2015-07-09 12:00:00 NaN NaN NaN NaN
2015-07-09 13:00:00 NaN NaN NaN NaN
2015-07-09 14:00:00   2   2   2   2
2015-07-09 15:00:00   2   2   2   2
2015-07-09 16:00:00   2   2   2   2
2015-07-09 17:00:00   2   2   2   2

combiner = lambda x, y: np.where(x.isnull(), y, x)

# use df1 to update result
result.combine(df1, combiner)

Out[249]: 
                     A  B  C   D
2015-07-09 12:00:00  1  1  1 NaN
2015-07-09 13:00:00  1  1  1 NaN
2015-07-09 14:00:00  2  2  2   2
2015-07-09 15:00:00  2  2  2   2
2015-07-09 16:00:00  2  2  2   2
2015-07-09 17:00:00  2  2  2   2

# maybe fillna(method='ffill') if you like

0 讨论(0)

借酒劲吻你

2020-12-20 17:47

df2.combine_first(df1) (documentation) seems to serve your requirement; PFB code snippet & output

import pandas as pd

print 'pandas-version: ', pd.__version__

df1 = pd.DataFrame.from_records([('2015-07-09 12:00:00',1,1,1),
                                 ('2015-07-09 13:00:00',1,1,1),
                                 ('2015-07-09 14:00:00',1,1,1),
                                 ('2015-07-09 15:00:00',1,1,1)],
                                columns=['Dt', 'A', 'B', 'C']).set_index('Dt')
# print df1

df2 = pd.DataFrame.from_records([('2015-07-09 14:00:00',2,2,2,2),
                                 ('2015-07-09 15:00:00',2,2,2,2),
                                 ('2015-07-09 16:00:00',2,2,2,2),
                                 ('2015-07-09 17:00:00',2,2,2,2),],
                               columns=['Dt', 'A', 'B', 'C', 'D']).set_index('Dt')
res_combine1st = df2.combine_first(df1)
print res_combine1st

output

pandas-version:  0.15.2
                     A  B  C   D
Dt                              
2015-07-09 12:00:00  1  1  1 NaN
2015-07-09 13:00:00  1  1  1 NaN
2015-07-09 14:00:00  2  2  2   2
2015-07-09 15:00:00  2  2  2   2
2015-07-09 16:00:00  2  2  2   2
2015-07-09 17:00:00  2  2  2   2

0 讨论(0)