Why does one use of iloc() give a SettingWithCopyWarning, but the other doesn't?

天大地大妈咪最大 提交于 2020-07-30 02:27:31

问题


Inside a method from a class i use this statement:

self.__datacontainer.iloc[-1]['c'] = value

Doing this i get a "SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame"

Now i tried to reproduce this error and write the following simple code:

import pandas, numpy
df = pandas.DataFrame(numpy.random.randn(5,3),columns=list('ABC'))
df.iloc[-1]['C'] = 3

There i get no error. Why do i get an error in the first statement and not in the second?


回答1:


Chain indexing

As the documentation and a couple of other answers on this site ([1], [2]) suggest, chain indexing is considered bad practice and should be avoided.

Since there doesn't seem to be a graceful way of making assignments using integer position based indexing (i.e. .iloc) without violating the chain indexing rule (as of pandas v0.23.4), it is advised to instead use label based indexing (i.e. .loc) for assignment purposes whenever possible.

However, if you absolutely need to access data by row number you can

df.iloc[-1, df.columns.get_loc('c')] = 42

or

df.iloc[[-1, 1], df.columns.get_indexer(['a', 'c'])] = 42

Pandas behaving oddly

From my understanding you're absolutely right to expect the warning when trying to reproduce the error artificially.

What I've found so far is that it depends on how a dataframe is constructed

df = pd.DataFrame({'a': [4, 5, 6], 'c': [3, 2, 1]})
df.iloc[-1]['c'] = 42 # no warning

df = pd.DataFrame({'a': ['x', 'y', 'z'], 'c': ['t', 'u', 'v']})
df.iloc[-1]['c'] = 'f' # no warning

df = pd.DataFrame({'a': ['x', 'y', 'z'], 'c': [3, 2, 1]})
df.iloc[-1]['c'] = 42 # SettingWithCopyWarning: ...

It seems that pandas (at least v0.23.4) handles mixed-type and single-type dataframes differently when it comes to chain assignments [3]

def _check_is_chained_assignment_possible(self):
    """
    Check if we are a view, have a cacher, and are of mixed type.
    If so, then force a setitem_copy check.
    Should be called just near setting a value
    Will return a boolean if it we are a view and are cached, but a
    single-dtype meaning that the cacher should be updated following
    setting.
    """
    if self._is_view and self._is_cached:
        ref = self._get_cacher()
        if ref is not None and ref._is_mixed_type:
            self._check_setitem_copy(stacklevel=4, t='referant',
                                     force=True)
        return True
    elif self._is_copy:
        self._check_setitem_copy(stacklevel=4, t='referant')
    return False

which appears really odd to me although I'm not sure if it's not expected.

However, there's an old bug with a similar behavour.


UPDATE

According to the developers the above behaviour is expected.




回答2:


So it's pretty hard to answer this without context around your problem operation, but the pandas documentation covers this pretty well.

>>> df[['C']].iloc[0] = 2 # This is a problem
SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

Basically it boils down to - don't chain together indexing operations if you can just use a single operation to do it.

>>> df.loc[0, 'C'] = 2 # This is ok

The warning you're getting is because you've failed to set a value in the original dataframe that you're presumably trying to modify - instead, you've copied it and set something into the copy (usually when this happens to me I don't even have a reference to the copy and it just gets garbage collected, so the warning is pretty helpful)




回答3:


Don't focus on the warning. The warning is just an indication, sometimes it doesn't even come up when you expect it should. Sometimes you will notice it occurs inconsistently. Instead, just avoid chained indexing or generally working with what could be a copy.

You wish to index by row integer location and column label. That's an unnatural mix, given Pandas has functionality to index by integer positions or labels, but not both simultaneously.

In this case, you can use use integer positional indexing for both rows and columns via a single iat call:

df.iat[-1, df.columns.get_loc('C')] = 3

Or, if your index labels are guaranteed to be unique, you can use at:

df.at[df.index[-1], 'C'] = 3


来源:https://stackoverflow.com/questions/53806570/why-does-one-use-of-iloc-give-a-settingwithcopywarning-but-the-other-doesnt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!