Weird inconsistency between df.drop() and df.idxmin()

最后都变了- 提交于 2021-01-29 02:03:11

问题


I am encountering a weird issue with pandas. After some careful debugging I have found the problem, but I would like a fix, and an explanation as to why this is happening.

I have a dataframe which consists of a list of cities with some distances. I have to iteratively find a city which is closest to some "Seed" city (details are not too important here).

To locate the "closest" city to my seed city, i use:

id_new_point = df["Time from seed"].idxmin(skipna=True)

Then, I want to remove the city I just found from the dataframe, for which I use:

df.drop(index=emte_df.index[id_new_point], inplace=True)

Now comes the strange part; I've connected the pycharm debugger and closely observed my dataframe as I stepped through my loops.

When you delete a row in pandas with df.drop, it seems deletes the entire row by row number. This means for example that if I delete row #126, the "Index column" of the dataframe doesn't adapt accordingly. (See screenshot below)

idxmin seems to return the actual index of the row, so the index associated with the row in the pandas data-frame.

Now here comes the issue: Say I want to remove row 130 (EMTE WIJCHEN). I would use df.idxmin which would give me the index, namely 130. Then i call df.drop[index], but since df.drop uses row numbers to delete instead of dataframe indices and I just deleted a row (#126) it now (wrongly) removes row #131 which slided up into place 130.

Why is this happening, and how can I improve my code so I can drop the right rows in my dataframe? Any suggestions are greatly appreciated (Also if there's a better practice for approaching these kinds of problems, maybe a solution which doesn't adapt the dataframe in place?)


回答1:


Pandas don't update the index column because it is not always a range of numbers, it can be, for example, names (John, Maria, etc) and thus it wouldn't make sense to "update" the index after dropping a specific row. You can reset_index after each iteration like this:

df.drop(index=emte_df.index[id_new_point], inplace=True)
df = df.reset_index(drop=True)


来源:https://stackoverflow.com/questions/60960083/weird-inconsistency-between-df-drop-and-df-idxmin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!