What is the difference between combine_first and fillna?

前端 未结 1 1403
挽巷
挽巷 2020-12-16 17:39

These two functions seem equivalent to me. You can see that they accomplish the same goal in the code below, as columns c and d are equal. So when should I use one over the

相关标签:
1条回答
  • 2020-12-16 18:05

    combine_first is intended to be used when there is exists non-overlapping indices. It will effectively fill in nulls as well as supply values for indices and columns that didn't exist in the first.

    dfa = pd.DataFrame([[1, 2, 3], [4, np.nan, 5]], ['a', 'b'], ['w', 'x', 'y'])
    
         w    x    y  
    a  1.0  2.0  3.0  
    b  4.0  NaN  5.0  
    
    dfb = pd.DataFrame([[1, 2, 3], [3, 4, 5]], ['b', 'c'], ['x', 'y', 'z'])
    
         x    y    z
    b  1.0  2.0  3.0
    c  3.0  4.0  5.0
    
    dfa.combine_first(dfb)
    
         w    x    y    z
    a  1.0  2.0  3.0  NaN
    b  4.0  1.0  5.0  3.0  # 1.0 filled from `dfb`; 5.0 was in `dfa`; 3.0 new column
    c  NaN  3.0  4.0  5.0  # whole new index
    

    Notice that all indices and columns are included in the results

    Now if we fillna

    dfa.fillna(dfb)
    
       w    x  y
    a  1  2.0  3
    b  4  1.0  5  # 1.0 filled in from `dfb`
    

    Notice no new columns or indices from dfb are included. We only filled in the null value where dfa shared index and column information.


    In your case, you use fillna and combine_first on one column with the same index. These translate to effectively the same thing.

    0 讨论(0)
提交回复
热议问题