Most efficient way to construct similarity matrix

前端未结

关注

 5  1399

孤城傲影 2020-12-31 13:53

I\'m using the following links to create a \"Euclidean Similarity Matrix\" (that I convert to a DataFrame). https://stats.stackexchange.com/questions/53068/euclidean-distan

5条回答

南方客 (楼主)

2020-12-31 14:11

The simplest way I can find to get the same result as the OP is to use distance_matrix, also from scipy.spatial. The whole thing can be done in one sort-of-long line.

import numpy as np
import pandas as pd
from scipy.spatial import distance_matrix

# Original code from OP, slightly reformatted
DF_var = pd.DataFrame.from_dict({
    "s1":[1.2,3.4,10.2],
    "s2":[1.4,3.1,10.7],
    "s3":[2.1,3.7,11.3],
    "s4":[1.5,3.2,10.9]
}).T
DF_var.columns = ["g1","g2","g3"]

# Whole similarity algorithm in one line
df_euclid = pd.DataFrame(
    1 / (1 + distance_matrix(DF_var.T, DF_var.T)),
    columns=DF_var.columns, index=DF_var.columns
)

#           g1        g2        g3
# g1  1.000000  0.215963  0.051408
# g2  0.215963  1.000000  0.063021
# g3  0.051408  0.063021  1.000000

The code above should copy-paste and run in any python IDE.

0 讨论(0)

查看其它5个回答