“ValueError: cannot reindex from a duplicate axis”

匿名 (未验证) 提交于 2019-12-03 03:08:02

问题:

I have the following df:

Timestamp                            A      B      C     ...      2014-11-09 00:00:00                     NaN     1      NaN   NaN       2014-11-09 00:00:00                      2     NaN     NaN   NaN              2014-11-09 00:00:00                     NaN    NaN     3     NaN    2014-11-09 08:24:00                     NaN    NaN     1     NaN          2014-11-09 08:24:00                     105    NaN     NaN   NaN            2014-11-09 09:19:00                     NaN    NaN     23    NaN           

And I would like to make the following:

Timestamp                            A      B      C     ...      2014-11-09 00:00:00                  2      1      3     NaN       2014-11-09 00:01:00                  NaN    NaN    NaN   NaN 2014-11-09 00:02:00                  NaN    NaN    NaN   NaN ...                                  NaN    NaN    NaN   NaN 2014-11-09 08:23:00                  NaN    NaN    NaN   NaN 2014-11-09 08:24:00                  105    NaN     1    NaN          2014-11-09 08:25:00                  NaN    NaN     NaN  NaN      2014-11-09 08:26:00                  NaN    NaN     NaN  NaN 2014-11-09 08:27:00                  NaN    NaN     NaN  NaN       ...                                  NaN    NaN     NaN  NaN       2014-11-09 09:18:00                  NaN    NaN     NaN  NaN   2014-11-09 09:19:00                  NaN    NaN     23   NaN       

That is: I would like to merge the columns with the same Timestamp (I have 17 columns), resample at 1 min granularity and for those column with no values I would like to have NaN.

I started in the following ways:

df.groupby('Timestamp').sum() 

and

df = df.resample('1Min', how='max') 

but I obtained the following error:

ValueError: cannot reindex from a duplicate axis 

How can I solve this problem? I'm just learning Python so I don't have experience at all.

Thank you!

回答1:

Assumed that you have your Timestamp as index to begin with, you need to do the resample first, and reset_index before doing a groupby, here's the working sample:

import pandas as pd  df                        A   B   C  ... Timestamp                             2014-11-09 00:00:00  NaN   1 NaN  NaN 2014-11-09 00:00:00    2 NaN NaN  NaN 2014-11-09 00:00:00  NaN NaN   3  NaN 2014-11-09 08:24:00  NaN NaN   1  NaN 2014-11-09 08:24:00  105 NaN NaN  NaN 2014-11-09 09:19:00  NaN NaN  23  NaN  df.resample('1Min', how='max').reset_index().groupby('Timestamp').sum()                        A   B   C  ... Timestamp                            2014-11-09 00:00:00   2   1   3  NaN 2014-11-09 00:01:00 NaN NaN NaN  NaN 2014-11-09 00:02:00 NaN NaN NaN  NaN 2014-11-09 00:03:00 NaN NaN NaN  NaN 2014-11-09 00:04:00 NaN NaN NaN  NaN ... 2014-11-09 09:17:00 NaN NaN NaN  NaN 2014-11-09 09:18:00 NaN NaN NaN  NaN 2014-11-09 09:19:00 NaN NaN  23  NaN 

Hope this helps.

Updated:

As said in comment, your 'Timestamp' isn't datetime and probably as string so you cannot resample by DatetimeIndex, just reset_index and convert it something like this:

df = df.reset_index() df['ts'] = pd.to_datetime(df['Timestamp']) # 'ts' is now datetime of 'Timestamp', you just need to set it to index df = df.set_index('ts') ... 

Now just run the previous code again but replace 'Timestamp' with 'ts' and you should be OK.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!