Modifying dataFrames inside a list is not working

前端 未结 3 860
别那么骄傲
别那么骄傲 2020-12-17 05:09

I have two DataFrames and I want to perform the same list of cleaning ops. I realized I can merge into one, and to everything in one pass, but I am still curios

相关标签:
3条回答
  • 2020-12-17 05:37

    All these slicing/indexing operations create views/copies of the original dataframe and you then reassign df to these views/copies, meaning the originals are not touched at all.

    Option 1
    dropna(...inplace=True)
    Try an in-place dropna call, this should modify the original object in-place

    df_list = [test_1, test_2]
    for df in df_list:
        df.dropna(subset=['A'], inplace=True)  
    

    Note, this is one of the few times that I will ever recommend an in-place modification, because of this use case in particular.


    Option 2
    enumerate with reassignment
    Alternatively, you may re-assign to the list -

    for i, df in enumerate(df_list):
        df_list[i] = df.dropna(subset=['A'])  # df_list[i] = df[df.A.notnull()]
    
    0 讨论(0)
  • 2020-12-17 05:38

    You are modifying copies of the dataframes rather than the original dataframes.

    One way to deal with this issue is to use a dictionary. As a convenience, you can use pd.DataFrame.pipe together with dictionary comprehensions to modify your dictionaries.

    def remove_nulls(df):
        return df[df['A'].notnull()]
    
    dfs = dict(enumerate([test_1, test_2]))
    dfs = {k: v.pipe(remove_nulls) for k, v in dfs.items()}
    
    print(dfs)
    
    # {0:    A   B
    #     0  1  15
    #     1  8  49
    #     2  5  34
    #     3  6  44
    #     4  0  63,
    #  1:      A    B
    #     1  3.0  100
    #     2  6.0  200
    #     3  4.0  300
    #     4  9.0  400
    #     5  0.0  500}
    

    Note: In your result dfs[1]['A'] remains float: this is because np.nan is considered float and we have not triggered a conversion to int.

    0 讨论(0)
  • 2020-12-17 05:45

    By using pd.concat

    [x.reset_index(level=0,drop=True) for _, x in pd.concat([test_1,test_2],keys=[0,1]).dropna().groupby(level=0)]
    Out[376]: 
    [     A   B
     0  1.0  15
     1  8.0  49
     2  5.0  34
     3  6.0  44
     4  0.0  63,      A    B
     1  3.0  100
     2  6.0  200
     3  4.0  300
     4  9.0  400
     5  0.0  500]
    
    0 讨论(0)
提交回复
热议问题