问题
I am encountering a weird issue with pandas. After some careful debugging I have found the problem, but I would like a fix, and an explanation as to why this is happening.
I have a dataframe which consists of a list of cities with some distances. I have to iteratively find a city which is closest to some "Seed" city (details are not too important here).
To locate the "closest" city to my seed city, i use:
id_new_point = df["Time from seed"].idxmin(skipna=True)
Then, I want to remove the city I just found from the dataframe, for which I use:
df.drop(index=emte_df.index[id_new_point], inplace=True)
Now comes the strange part; I've connected the pycharm debugger and closely observed my dataframe as I stepped through my loops.
When you delete a row in pandas with df.drop, it seems deletes the entire row by row number. This means for example that if I delete row #126, the "Index column" of the dataframe doesn't adapt accordingly. (See screenshot below)
idxmin seems to return the actual index of the row, so the index associated with the row in the pandas data-frame.
Now here comes the issue: Say I want to remove row 130 (EMTE WIJCHEN). I would use df.idxmin which would give me the index, namely 130. Then i call df.drop[index], but since df.drop uses row numbers to delete instead of dataframe indices and I just deleted a row (#126) it now (wrongly) removes row #131 which slided up into place 130.
Why is this happening, and how can I improve my code so I can drop the right rows in my dataframe? Any suggestions are greatly appreciated (Also if there's a better practice for approaching these kinds of problems, maybe a solution which doesn't adapt the dataframe in place?)
回答1:
Pandas don't update the index column because it is not always a range of numbers, it can be, for example, names (John, Maria, etc) and thus it wouldn't make sense to "update" the index after dropping a specific row. You can reset_index after each iteration like this:
df.drop(index=emte_df.index[id_new_point], inplace=True)
df = df.reset_index(drop=True)
来源:https://stackoverflow.com/questions/60960083/weird-inconsistency-between-df-drop-and-df-idxmin