Pandas: interpolation where first and last data point in column is NaN

给你一囗甜甜゛ 提交于 2019-12-10 10:27:44

问题


I would like to use the interpolate function, but only between known data values in a pandas DataFrame column. The issue is that the first and last values in the column are often NaN and sometimes it can be many rows before a value is not NaN:

      col 1    col 2
 0    NaN      NaN
 1    NaN      NaN
...
1000   1       NaN
1001  NaN       1   <-----
1002   3       NaN  <----- only want to fill in these 'in between value' rows
1003   4        3
...
3999  NaN      NaN
4000  NaN      NaN

I am tying together a dataset which is updated 'on event' but separately for each column, and is indexed via Timestamp. This means that there are often rows where no data is recorded for some columns, hence a lot of NaNs!


回答1:


I select by min and max value of column by function idxmin and idxmax and use function fillna with method forward filling.

print df
#      col 1  col 2
#0       NaN    NaN
#1       NaN    NaN
#1000      1    NaN
#1001    NaN      1
#1002      3    NaN
#1003      4      3
#3999    NaN    NaN
#4000    NaN    NaN

df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()] = df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()].fillna(method='ffill')
df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()] = df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()].fillna(method='ffill')
print df
#      col 1  col 2
#0       NaN    NaN
#1       NaN    NaN
#1000      1    NaN
#1001      1      1
#1002      3      1
#1003      4      3
#3999    NaN    NaN
#4000    NaN    NaN

Added different solution, thanks HStro.

df['col 1'].loc[df['col 1'].first_valid_index() : df['col 1'].last_valid_index()] = df['col 1'].loc[df['col 1'].first_valid_index(): df['col 1'].last_valid_index()].astype(float).interpolate()


来源:https://stackoverflow.com/questions/33691591/pandas-interpolation-where-first-and-last-data-point-in-column-is-nan

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!