I have the following Pandas DataFrame:
In [31]:
import pandas as pd
sample = pd.DataFrame({\'Sym1\': [\'a\',\'a\',\'a\',\'d\'],\'Sym2\':[\'a\',\'c\',\'b\',\'
For a large data, I found a fast way to do this. Assume your data is already in np.array format, named as a.
from sklearn.metrics.pairwise import euclidean_distances
dist = euclidean_distances(a, a)
Below is an experiment to compare the time needed for two approaches:
a = np.random.rand(1000,1000)
import time
time1 = time.time()
distances = pdist(a, metric='euclidean')
dist_matrix = squareform(distances)
time2 = time.time()
time2 - time1 #0.3639109134674072
time1 = time.time()
dist = euclidean_distances(a, a)
time2 = time.time()
time2-time1 #0.08735871315002441