k-means

Spark ML Kmeans give : org.apache.spark.SparkException: Failed to execute user defined function($anonfun$2: (vector) => int)

别说谁变了你拦得住时间么 提交于 2019-12-11 05:03:59
问题 I try to load the KmeansModel and then get the label out of it : Here is the code that I have written : val kMeansModel = KMeansModel.load(trainedMlModel.mlModelFilePath) val arrayOfElements = measurePoint.measurements.map(a => a._2).toSeq println(s"ArrayOfELements::::$arrayOfElements") val arrayDF = sparkContext.parallelize(arrayOfElements).toDF() arrayDF.show() val vectorDF = new VectorAssembler().setInputCols(arrayDF.columns).setOutputCol("features").transform(arrayDF) vectorDF.printSchema

How to visualize effect of running kmeans algorithm in SPSS?

别说谁变了你拦得住时间么 提交于 2019-12-11 04:39:10
问题 How to visualize effect of runnng kmeans algoritm in SPSS ? I really don't see there any additional graphical options, but I think I've seen some visualizations of kmeans results made in SPSS, which seemed dedicated to kmeans procedure. I would like to visualize values of centers of clusters. 回答1: You might be interested in the cluster silhouette plots available from the STATS CLUS SIL extension command for any clustering method. Requires the Python Essentials available from the SPSS

MATLAB - Classification output

烈酒焚心 提交于 2019-12-11 03:52:02
问题 My programme uses K-means clustering of a set amount of clusters from the user. For this k=4 but I would like to run the clustered information through matlabs naive bayes classifier afterwards. Is there a way to split the clusters up and feed them into different naive classifiers in matlab? Naive Bayes: class = classify(test,training, target_class, 'diaglinear'); K-means: %% generate sample data K = 4; numObservarations = 5000; dimensions = 42; %% cluster opts = statset('MaxIter', 500,

custom function in mutate/tibble

折月煮酒 提交于 2019-12-11 03:32:56
问题 I am following a tutorial and I am trying to apply this part to my data/problem kclusts <- tibble(k = 1:9) %>% mutate( kclust = map(k, ~kmeans(points, .x)), tidied = map(kclust, tidy), glanced = map(kclust, glance), augmented = map(kclust, augment, points) ) However my data is slightly different to that of the tutorials. I am trying to apply the final line augmented = map(kclust, augment, points) . Code which Works (without the final line): kclust <- results %>% as_tibble() %>% select(-id_row

Plotting clusters using k-means with distance from centroid

[亡魂溺海] 提交于 2019-12-11 03:13:40
问题 I am trying to create a plot similar to this: Here there are three clusters and all the datapoints (circles) are plotted according to their euclidean distance from the centroid. Using this image its easy to see that 5 samples from class 2 ended up in wrong clusters. I'm running k-means using kmeans and can't figure out how to plot this type of graph. For example purposes we can use the iris dataset. > iri <- iris > cl <- kmeans (iri[, 1:4], 3) > cl K-means clustering with 3 clusters of sizes

Set static centers for kmeans in R

霸气de小男生 提交于 2019-12-11 02:31:29
问题 I want to group a list of Long and Lats (my_long_lats) based on pre determined center points (my_center_Points). When I run:- k <- kmeans(as.matrix(my_long_lats), centers = as.matrix(my_center_Points)) k$centers does not equal my_center_Points. I assume k-means has adjusted my center points to the optimal center. But what I need is for my_center_Points to not change and group my_long_lats around them. In this link they talk about setting initial centers but How do I set centers that wont

PySpark 2: KMeans The input data is not directly cached

那年仲夏 提交于 2019-12-10 17:56:05
问题 I don't know why I receive the message WARN KMeans: The input data is not directly cached, which may hurt performance if its parent RDDs are also uncached. When I try to use Spark KMeans df_Part = assembler.transform(df_Part) df_Part.cache() while (k<=max_cluster) and (wssse > seuilStop): kmeans = KMeans().setK(k) model = kmeans.fit(df_Part) wssse = model.computeCost(df_Part) k=k+1 It says that my input (Dataframe) is not cached !! I tried to print df_Part.is_cached and I received True which

Scikit-learn, KMeans: How to use max_iter

三世轮回 提交于 2019-12-10 17:15:03
问题 I'd like to understand the parameter max_iter from the class sklearn.cluster.KMeans. According to the documentation: max_iter : int, default: 300 Maximum number of iterations of the k-means algorithm for a single run. But in my opinion if I have 100 Objects the code must run 100 times, if I have 10.000 Objects the code must run 10.000 times to classify every object. And on the other hand it makes no sense to run several times over all objects. What is my misconception and how do I have to

Join neighbour cluster centroids Matlab

二次信任 提交于 2019-12-10 12:15:51
问题 I have used K-means to cluster data into 8 different clusters using this [X,C] = kmeans(XX, 8] , this means I have 8 centroids where their locations is stored in C "example shown below X Y Z as coloumns". I want to connect the 8 centroids together where only the centroids of the clusters that are close to each other are connected "have borders between each other" while centroids of clusters that are not close to each other are not connected. So if anyone could please advise? C= -0

More questions on “optimizing K-means algorithm”

流过昼夜 提交于 2019-12-10 11:49:27
问题 I want to implement a paper with "An Optimized Version of the K-Means Clustering Algorithm" title. This paper is in this link : https://fedcsis.org/proceedings/2014/pliks/258.pdf. This paper is not obvious. I see in stackoverflow that @Vpp Man was ask some question about that (Optimizing K-means algorithm) but because i have extra question about that, i create new question page. My questions: 1) Is algorithm2 full of algorithm or I must put it in part of algorithm1 (in step2 of algorithm1)? 2