need work-around for handling timestamps in dataframe and get datetime

跟風遠走 提交于 2020-12-15 04:36:05

问题


I originally posted a question about plotting different datetime-sampling in the same plot, stored in many different dataframes.

I got help understanding I needed to convert my time-column (‘ts’) to datetime. I struggled with this, still getting messed up plots. Turns out my conversion to datetime isn’t working, and this is a known thing, as stated here.

A dataframe can’t store datetime in a column (why??), it converts it back to pandas._libs.tslibs.timestamps.Timestamp.

I need to figure out the best work around this to be able to plot large datasets.

In the post above, it is stated that dataframe index can store datetime format, but when I set my column as index, and try to loop through, I get key error.

 In[]: df.index.name 
 Out[]: ‘ts’

but when I try:

for column in df.columns[1:]:
    df['ts'] = pd.to_datetime(df['ts'])

I get KeyError: 'ts'

Am I doing something wrong here? Does anyone know if datetime is stored correctly in the index?

However, I would still like to ask about the best work-around for this issue.

My bottom line is wanting to plot several dataframes correctly in the same plot. I have a lot of large datasets, and when trying out things, I am using two simplified dataframes, see below:

print(df1)
                        ts  value
0  2019-10-18 08:13:26.702     14
1  2019-10-18 08:13:26.765     10
2  2019-10-18 08:13:26.790      5
3  2019-10-18 08:13:26.889      6
4  2019-10-18 08:13:26.901      8
5  2019-10-18 08:13:27.083     33
6  2019-10-18 08:13:27.098     21
7  2019-10-18 08:13:27.101     11
8  2019-10-18 08:13:27.129     22
9  2019-10-18 08:13:27.159     29
10 2019-10-18 08:13:27.188      7
11 2019-10-18 08:13:27.212     20
12 2019-10-18 08:13:27.228     24
13 2019-10-18 08:13:27.246     30
14 2019-10-18 08:13:27.395     34
15 2019-10-18 08:23:26.375     40
16 2019-10-18 08:23:26.527     49
17 2019-10-18 08:23:26.725     48

print(df2)
                       ts  value
0 2019-10-18 08:23:26.375     27
1 2019-10-18 08:23:26.427     17
2 2019-10-18 08:23:26.437      4
3 2019-10-18 08:23:26.444      2
4 2019-10-18 08:23:26.527     39
5 2019-10-18 08:23:26.575     25
6 2019-10-18 08:23:26.662      6
7 2019-10-18 08:23:26.676     14
8 2019-10-18 08:23:26.718     11
9 2019-10-18 08:23:26.725     13

What is the best way to achieve the result I am looking for?

I have tried converting ‘ts’ column to both array and list, but nothing seem to bring me closer to a final working result for plotting the datasets together. Converting to datetime in array gives me numpy.datetime64, converting to datetime in list gives me pandas._libs.tslibs.timestamps.Timestamp.

Any help is highly appreciated as this is really driving me crazy.

If needed, my original 'ts' values read from avro files are of type:

 '2019-10-18T08:13:27.098000'

Running:

df['ts'] = pd.to_datetime(df['ts'])

returns

'2019-10-18 08:13:27.098'  (pandas._libs.tslibs.timestamps.Timestamp)

EDIT 1

Further information about my steps, this is my df after reading the avro files:

This is my df after first attempt to turn the format into datetime, returns timestamp:

This is what my df looks like after setting 'ts' as index:

I then try to turn the timestamp to datetime when it's in the index, I get keyError:


回答1:


I guess I am having trouble figuring out what you are asking. Given a df of the form:

    ts  value
0   2019-10-18 08:13:26.702 14
1   2019-10-18 08:13:26.765 10
2   2019-10-18 08:13:26.790 5
3   2019-10-18 08:13:26.889 6
4   2019-10-18 08:13:26.901 8
5   2019-10-18 08:13:27.083 33

I can execute the following to convert the ts column to a pd.datetime varaible and make the ts column the index:

df['ts'] = pd.to_datetime(df['ts'])
df = df.set_index(['ts'], drop=True)

which yields the df of form

                       value
       ts   
2019-10-18 08:13:26.702 14
2019-10-18 08:13:26.765 10
2019-10-18 08:13:26.790 5
2019-10-18 08:13:26.889 6
2019-10-18 08:13:26.901 8

I can then print the values of the index, or for that matter use any iteration on the index I want. The following just gives the first 5 values.

for i in range(5):
    print(df.iloc[i].name)

2019-10-18 08:13:26.702000
2019-10-18 08:13:26.765000
2019-10-18 08:13:26.790000
2019-10-18 08:13:26.889000
2019-10-18 08:13:26.901000


来源:https://stackoverflow.com/questions/64388845/need-work-around-for-handling-timestamps-in-dataframe-and-get-datetime

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!