Merge pandas DataFrame on column of float values

后端 未结 2 716
梦如初夏
梦如初夏 2020-12-10 13:26

I have two data frames that I am trying to merge.

Dataframe A:

    col1    col2    sub    grade
0   1       34.32   x       a 
1   1       34.32   x          


        
相关标签:
2条回答
  • 2020-12-10 13:43

    You can use a little hack - multiple float columns by some constant like 100, 1000..., convert column to int, merge and last divide by constant:

    N = 100
    #thank you koalo for comment
    A.col2 = np.round(A.col2*N).astype(int) 
    B.col2 = np.round(B.col2*N).astype(int) 
    df = pd.merge(A, B, how = 'outer', on = ['col1', 'col2'])
    df.col2 = df.col2 / N
    print (df)
       col1   col2  sub grade group ID
    0     1  34.32    x     a     t  z
    1     1  34.32    x     b     t  z
    2     1  34.33    y     c     r  z
    3     2  10.14    z     b     q  z
    4     3  33.01    z     a     q  e
    5     1  54.32  NaN   NaN     s  w
    
    0 讨论(0)
  • 2020-12-10 13:49

    I had a similar problem where I needed to identify matching rows with thousands of float columns and no identifier. This case is difficult because values can vary slightly due to rounding.

    In this case, I used scipy.spatial.distance.cosine to get the cosine similarity between rows.

    from scipy import distance
    
    threshold = 0.99999
    similarity = 1 - spatial.distance.cosine(row1, row2)
    
    if similarity >= threshold:
        # it's a match
    else:
        # loop and check another row pair
    

    This won't work if you have duplicate or very similar rows, but when you have a large number of float columns and not too many of rows, it works well.

    0 讨论(0)
提交回复
热议问题