Understanding pandas dataframe indexing

前端 未结 2 620
太阳男子
太阳男子 2020-12-05 05:34

Summary: This doesn\'t work:

df[df.key==1][\'D\'] = 1

but this does:

df.D[df.key==1] = 1

Why?

Rep

相关标签:
2条回答
  • 2020-12-05 06:14

    I am pretty sure that your 1st way is returning a copy, instead of a view, and so assigning to it does not change the original data. I am not sure why this is happening though.

    It seems to be related to the order in which you select rows and columns, NOT the syntax for getting columns. These both work:

    df.D[df.key == 1] = 1
    df['D'][df.key == 1] = 1
    

    And neither of these works:

    df[df.key == 1]['D'] = 1
    df[df.key == 1].D = 1
    

    From this evidence, I would assume that the slice df[df.key == 1] is returning a copy. But this is not the case! df[df.key == 1] = 0 will actually change the original data, as if it were a view.

    So, I'm not sure. My sense is that this behavior has changed with the version of pandas. I seem to remember that df.D used to return a copy and df['D'] used to return a view, but this doesn't appear to be true anymore (pandas 0.10.0).

    If you want a more complete answer, you should post in the pystatsmodels forum: https://groups.google.com/forum/?fromgroups#!forum/pystatsmodels

    0 讨论(0)
  • 2020-12-05 06:26

    The pandas documentation says:

    Returning a view versus a copy

    The rules about when a view on the data is returned are entirely dependent on NumPy. Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.

    In df[df.key==1]['D'] you first do boolean slicing (leading to a copy of the Dataframe), then you choose a column ['D'].

    In df.D[df.key==1] = 3.4, you first choose a column, then do boolean slicing on the resulting Series.

    This seems to make the difference, although I must admit that it is a little counterintuitive.

    Edit: The difference was identified by Dougal, see his comment: With version 1, the copy is made as the __getitem__ method is called for the boolean slicing. For version 2, only the __setitem__ method is accessed - thus not returning a copy but just assigning.

    0 讨论(0)
提交回复
热议问题