Adding missing time in pandas dataframe

て烟熏妆下的殇ゞ 提交于 2021-01-27 18:31:45

问题


I have a pandas.DataFrame with timestamps in a column. The values are in epoch and 0.1 seconds apart. Values like 1488771900.100000, 1488771900.200000 and so on. However, there are missing values. So I have 1488794389.500000 and then 1488794389.900000 with 3 missing values between. I want to insert rows in the dataframe with missing values between the max and min in this column. So if the min is 1488771900.000000 and max is 1488794660.000000, I want to insert rows with all values separated by 0.1 seconds and NA in all other columns.

I saw an answer in this link, but wasn't able to replicate the same.

How can this operation be performed?


回答1:


You can fill in your missing times using pandas.DataFrame.resample. The caveat is that the dataframe needs to have a pandas.DateTimeIndex. In your case the time is likely stored as a float in seconds since epoch, and this needs to be converted prior to re-sampling. Here is a function which will perform that operation.

Code:

import datetime as dt
import pandas as pd

def resample(dataframe, time_column, sample_period):
    # make a copy of the dataframe
    dataframe = dataframe.copy()

    # convert epoch times to datetime
    dataframe.time = dataframe.time.apply(
        lambda ts: dt.datetime.fromtimestamp(ts))

    # make the datetimes into an index
    dataframe.set_index(time_column, inplace=True)

    # resample to desired period
    dataframe = dataframe.resample(sample_period).asfreq().reset_index()

    # convert datetimes back to epoch
    epoch = dt.datetime.fromtimestamp(0)
    dataframe.time = dataframe.time.apply(
        lambda ts: (ts - epoch).total_seconds())
    return dataframe

Test Code:

values = [
    (1488771900.10, 'a'),
    (1488771900.20, 'b'),
    (1488771900.30, 'c'),
    (1488771900.60, 'f'),
]
columns = ['time', 'value']
df = pd.DataFrame(values, columns=columns)
print(df)

new_df = resample(df, 'time', '100ms')
print(new_df)

Results:

           time value
0  1.488772e+09     a
1  1.488772e+09     b
2  1.488772e+09     c
3  1.488772e+09     f

           time value
0  1.488772e+09     a
1  1.488772e+09     b
2  1.488772e+09     c
3  1.488772e+09   NaN
4  1.488772e+09   NaN
5  1.488772e+09     f


来源:https://stackoverflow.com/questions/42673064/adding-missing-time-in-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!