Changing values in pandas dataframe does not work

后端 未结 1 951
长情又很酷
长情又很酷 2020-12-08 12:03

I’m having a problem changing values in a dataframe. I also want to consult regarding a problem I need to solve and the proper way to use pandas to solve it. I\'ll appreciat

相关标签:
1条回答
  • 2020-12-08 12:44

    Indexing Panda objects can return two fundamentally different objects: a view or a copy.

    If mask is a basic slice, then df.ix[mask] returns a view of df. Views share the same underlying data as the original object (df). So modifying the view, also modifies the original object.

    If mask is something more complicated, such as an arbitrary sequence of indices, then df.ix[mask] returns a copy of some rows in df. Modifying the copy has no affect on the original.

    In your case, since the rows which share the same wave_path occur at arbitrary locations, ind_res.ix[example_file] returns a copy. So

    ind_res.ix[example_file]['isUsed'] = True
    

    has no effect on ind_res.

    Instead, you could use

    ind_res.ix[example_file, 'isUsed'] = True
    

    to modify ind_res. However, see below for a groupby suggestion which I think might be closer to what you really want.

    Jeff has already provided a link to the Pandas docs which state that

    The rules about when a view on the data is returned are entirely dependent on NumPy.

    Here are the (complicated) rules which describe when a view or copy is returned. Basically, however, the rule is if the index is requesting a regularly spaced slice of the underlying array then a view is returned, otherwise a copy (out of necessity) is returned.


    Here is a simple example which uses basic slice. A view is returned by df.ix, so modifying subdf modifies df as well:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.arange(12).reshape(4,3), 
             columns=list('ABC'), index=[0,1,2,3])
    
    subdf = df.ix[0]
    print(subdf.values)
    # [0 1 2]
    subdf.values[0] = 100
    print(subdf)
    # A    100
    # B      1
    # C      2
    # Name: 0, dtype: int32
    
    print(df)           # df is modified
    #      A   B   C
    # 0  100   1   2
    # 1    3   4   5
    # 2    6   7   8
    # 3    9  10  11
    

    Here is a simple example which uses "fancy indexing" (arbitrary rows selected). A copy is returned by df.ix. So modifying subdf does not affect df.

    df = pd.DataFrame(np.arange(12).reshape(4,3), 
             columns=list('ABC'), index=[0,1,0,3])
    
    subdf = df.ix[0]
    print(subdf.values)
    # [[0 1 2]
    #  [6 7 8]]
    
    subdf.values[0] = 100
    print(subdf)
    #      A    B    C
    # 0  100  100  100
    # 0    6    7    8
    
    print(df)          # df is NOT modified
    #    A   B   C
    # 0  0   1   2
    # 1  3   4   5
    # 0  6   7   8
    # 3  9  10  11
    

    Notice the only difference between the two examples is that in the first, where a view is returned, the index was [0,1,2,3], whereas in the second, where a copy is returned, the index was [0,1,0,3].

    Since we are selected rows where the index is 0, in the first example, we can do that with a basic slice. In th second example, the rows where index equals 0 could appear at arbitrary locations, so a copy has to be returned.


    Despite having ranted on about the subtlety of Pandas/NumPy slicing, I really don't think that

    ind_res.ix[example_file, 'isUsed'] = True
    

    is what you are ultimately looking for. You probably want to do something more like

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.arange(12).reshape(4,3), 
                      columns=list('ABC'))
    df['A'] = df['A']%2
    print(df)
    #    A   B   C
    # 0  0   1   2
    # 1  1   4   5
    # 2  0   7   8
    # 3  1  10  11
    
    def calculation(grp):
        grp['C'] = True
        return grp
    
    newdf = df.groupby('A').apply(calculation)
    print(newdf)
    

    which yields

       A   B     C
    0  0   1  True
    1  1   4  True
    2  0   7  True
    3  1  10  True
    
    0 讨论(0)
提交回复
热议问题