Most efficient way to construct similarity matrix

前端 未结 5 1399
孤城傲影
孤城傲影 2020-12-31 13:53

I\'m using the following links to create a \"Euclidean Similarity Matrix\" (that I convert to a DataFrame). https://stats.stackexchange.com/questions/53068/euclidean-distan

5条回答
  •  南方客
    南方客 (楼主)
    2020-12-31 14:11

    The simplest way I can find to get the same result as the OP is to use distance_matrix, also from scipy.spatial. The whole thing can be done in one sort-of-long line.

    import numpy as np
    import pandas as pd
    from scipy.spatial import distance_matrix
    
    # Original code from OP, slightly reformatted
    DF_var = pd.DataFrame.from_dict({
        "s1":[1.2,3.4,10.2],
        "s2":[1.4,3.1,10.7],
        "s3":[2.1,3.7,11.3],
        "s4":[1.5,3.2,10.9]
    }).T
    DF_var.columns = ["g1","g2","g3"]
    
    # Whole similarity algorithm in one line
    df_euclid = pd.DataFrame(
        1 / (1 + distance_matrix(DF_var.T, DF_var.T)),
        columns=DF_var.columns, index=DF_var.columns
    )
    
    #           g1        g2        g3
    # g1  1.000000  0.215963  0.051408
    # g2  0.215963  1.000000  0.063021
    # g3  0.051408  0.063021  1.000000
    

    The code above should copy-paste and run in any python IDE.

提交回复
热议问题