Pandas: SettingWithCopyWarning, trying to understand how to write the code better, not just whether to ignore the warning

喜欢而已 提交于 2019-12-23 23:06:59

问题


I am trying to change all date values in a spreadsheet's Date column where the year is earlier than 1900, to today's date, so I have a slice.

EDIT: previous lines of code:

df=pd.read_excel(filename)#,usecols=['NAME','DATE','EMAIL']
#regex to remove weird characters
df['DATE'] = df['DATE'].str.replace(r'[^a-zA-Z0-9\._/-]', '')
df['DATE'] = pd.to_datetime(df['DATE'])

sample row in dataframe: name, date, email
[u'Public, Jane Q.\xa0' u'01/01/2016\xa0' u'jqpublic@email.com\xa0'] 

This line of code works.

df["DATE"][df["DATE"].dt.year < 1900] = dt.datetime.today()

Then, all date values are formatted:

df["DATE"] = df["DATE"].map(lambda x: x.strftime("%m/%d/%y"))

But I get an error:

SettingWithCopyWarning:  A value is trying to be set on a copy of a
slice from a DataFrame

See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-

versus-copy

I have read the documentation and other posts, where using .loc is suggested

The following is the recommended solution:

df.loc[row_indexer,col_indexer] = value

but df["DATE"].loc[df["DATE"].dt.year < 1900] = dt.datetime.today() gives me the same error, except that the line number is actually the line number after the last line in the script.

I just don't understand what the documentation is trying to tell me as it relates to my example.

I started messing around with pulling out the slice and assigning to a separate dataframe, but then I'm going to have to bring them together again.


回答1:


You are producing a view when you df["DATE"] and subsequently use a selector [df["DATE"].dt.year < 1900] and try to assign to it.

df["DATE"][df["DATE"].dt.year < 1900] is the view that pandas is complaining about.

Fix it with loc like this:

df.loc[df.DATE.dt.year < 1900, "DATE"] = pd.datetime.today()



回答2:


My thought would be that you could do

df.loc[df.DATE.dt.year < 1900, "DATE"] = dt.datetime.today()
df.loc[:, "DATE"] = df.DATE.map(lambda x: x.strftime("%m/%d/%y")

Not at a computer so I can't test but I think that should do it.



来源:https://stackoverflow.com/questions/38412571/pandas-settingwithcopywarning-trying-to-understand-how-to-write-the-code-bette

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!