Customize Distance Formular of K-means in Apache Spark Python
Now I'm using K-means for clustering and following this tutorial and API . But I want to use custom formula for calculate distances. So how can I pass custom distance functions in k-means with PySpark? zero323 In general using a different distance measure doesn't make sense, because k-means (unlike k-medoids ) algorithm is well defined only for Euclidean distances. See Why does k-means clustering algorithm use only Euclidean distance metric? for an explanation. Moreover MLlib algorithms are implemented in Scala, and PySpark provides only the wrappers required to execute Scala code. Therefore