问题
I would like to use the interpolate function, but only between known data values in a pandas DataFrame column. The issue is that the first and last values in the column are often NaN and sometimes it can be many rows before a value is not NaN:
col 1 col 2
0 NaN NaN
1 NaN NaN
...
1000 1 NaN
1001 NaN 1 <-----
1002 3 NaN <----- only want to fill in these 'in between value' rows
1003 4 3
...
3999 NaN NaN
4000 NaN NaN
I am tying together a dataset which is updated 'on event' but separately for each column, and is indexed via Timestamp. This means that there are often rows where no data is recorded for some columns, hence a lot of NaNs!
回答1:
I select by min and max value of column by function idxmin and idxmax and use function fillna with method forward filling.
print df
# col 1 col 2
#0 NaN NaN
#1 NaN NaN
#1000 1 NaN
#1001 NaN 1
#1002 3 NaN
#1003 4 3
#3999 NaN NaN
#4000 NaN NaN
df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()] = df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()].fillna(method='ffill')
df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()] = df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()].fillna(method='ffill')
print df
# col 1 col 2
#0 NaN NaN
#1 NaN NaN
#1000 1 NaN
#1001 1 1
#1002 3 1
#1003 4 3
#3999 NaN NaN
#4000 NaN NaN
Added different solution, thanks HStro.
df['col 1'].loc[df['col 1'].first_valid_index() : df['col 1'].last_valid_index()] = df['col 1'].loc[df['col 1'].first_valid_index(): df['col 1'].last_valid_index()].astype(float).interpolate()
来源:https://stackoverflow.com/questions/33691591/pandas-interpolation-where-first-and-last-data-point-in-column-is-nan