Inconsistent results with KMeans between Apache Spark and scikit_learn
I am performing clustering on a dataset using PySpark. To find the number of clusters I performed clustering over a range of values (2,20) and found the wsse (within-cluster sum of squares) values for each value of k . This where I found something unusual. According to my understanding when you increase the number of clusters, the wsse decreases monotonically. But results I got say otherwise. I 'm displaying wsse for first few clusters only Results from spark For k = 002 WSSE is 255318.793358 For k = 003 WSSE is 209788.479560 For k = 004 WSSE is 208498.351074 For k = 005 WSSE is 142573.272672