cluster-analysis

How to identify sequences within each cluster?

六眼飞鱼酱① 提交于 2019-12-11 02:27:25
问题 Using the biofam dataset that comes as part of TraMineR : library(TraMineR) data(biofam) lab <- c("P","L","M","LM","C","LC","LMC","D") biofam.seq <- seqdef(biofam[,10:25], states=lab) head(biofam.seq) Sequence 1167 P-P-P-P-P-P-P-P-P-LM-LMC-LMC-LMC-LMC-LMC-LMC 514 P-L-L-L-L-L-L-L-L-L-L-LM-LMC-LMC-LMC-LMC 1013 P-P-P-P-P-P-P-L-L-L-L-L-LM-LMC-LMC-LMC 275 P-P-P-P-P-L-L-L-L-L-L-L-L-L-L-L 2580 P-P-P-P-P-L-L-L-L-L-L-L-L-LMC-LMC-LMC 773 P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P I can perform a cluster analysis:

Clustering algorithm for rays

試著忘記壹切 提交于 2019-12-11 02:26:27
问题 I know that there are clustering algorithms for points obviously, but I have a different scenario. I have many rays, all whose start points are on a sphere in 3D, and whose direction vectors point inwards into the sphere. Some of the rays are pointing towards a point A, others are pointing towards a point B, etc, with some noise(ie the rays don't perfectly intersect each other). Is there a clustering algorithm that will allow me to cluster the rays based on which point they are pointing

How to use ggplot to plot T-SNE clustering

百般思念 提交于 2019-12-11 00:44:04
问题 Here is the t-SNE code using IRIS data: library(Rtsne) iris_unique <- unique(iris) # Remove duplicates iris_matrix <- as.matrix(iris_unique[,1:4]) set.seed(42) # Set a seed if you want reproducible results tsne_out <- Rtsne(iris_matrix) # Run TSNE # Show the objects in the 2D tsne representation plot(tsne_out$Y,col=iris_unique$Species) Which produces this plot: How can I use GGPLOT to make that figure? 回答1: I think the easiest/cleanest ggplot way would be to store all the info you need in a

Predict in Clustering

十年热恋 提交于 2019-12-10 22:51:33
问题 In R language is there a predict function in clustering like the way we have in classification? What can we conclude from the clustering graph result that we get from R, other that comparing two clusters? 回答1: Clustering does not pay attention to prediction capabilities. It just tries to find objects that seem to be related. That is why there is no "predict" function for clustering results. However, in many situations, learning classifiers based on the clusters offers an improved performance.

Determining cluster membership in SOM (Self Organizing Map) for time series data

馋奶兔 提交于 2019-12-10 21:12:11
问题 I am also working on a project that requires clustering of time series data. I am using the SOM toolbox that works in MATLAB for clustering purpose and stuck with the following problem: "How can we determine which data belongs to which cluster?" SOM randomly chooses data sample from dataset and finds BMU for each data sample. As far as I know, data sample identifier is not regarded as dimension of data in SOM algorithm. If it is the case then how can we track the samples? I don't think that

Spectral clustering using scikit learn on graph generated through networkx

喜夏-厌秋 提交于 2019-12-10 19:23:38
问题 I have a 3000x50 feature vector matrix. I obtained a similarity matrix for this using sklearn.metrics.pairwise_distances as 'Similarity_Matrix'. Now I used networkx to create a graph using the similarity matrix generated in the previous step as G=nx.from_numpy_matrix(Similarity_Matrix) . I want to perform spectral clustering on this graph G now but several google searches have failed to provide a decent example of scikit learn spectral clustering on this graph :( The official documentation

Removing cycles in weighted directed graph

拈花ヽ惹草 提交于 2019-12-10 18:32:26
问题 This is a follow-up question on my other posts. Algorithm for clustering with size constraints I'm working on a clustering algorithm, After some reclustering, now I have this set of points that none of them are in their optimal cluster but could not be reassigned individually, since it'll violate the constraint. I'm trying to use a graph structure to solve the problem but came across a few issues in implementing. I'm a beginner, please let me know if I'm wrong. Per @Kittsil's answer build a

Clustering uni-variate Time series using sklearn

不想你离开。 提交于 2019-12-10 18:04:02
问题 I have a panda DataFrame from which, i would like to do clustering for each columns. I am using sklearn and this is what i have: data= pd.read_csv("data.csv") data=pd.DataFrame(data) data=data.set_index("Time") #print(data) cluster_numbers=2 list_of_cluster=[] for k,v in data.iteritems(): temp=KMeans(n_clusters=cluster_numbers) temp.fit(data[k]) print(k) print("predicted",temp.predict(data[k])) list_of_cluster.append(temp.predict(data[k])) when i try to run it, i have this error: ValueError:

Scikit-learn, KMeans: How to use max_iter

三世轮回 提交于 2019-12-10 17:15:03
问题 I'd like to understand the parameter max_iter from the class sklearn.cluster.KMeans. According to the documentation: max_iter : int, default: 300 Maximum number of iterations of the k-means algorithm for a single run. But in my opinion if I have 100 Objects the code must run 100 times, if I have 10.000 Objects the code must run 10.000 times to classify every object. And on the other hand it makes no sense to run several times over all objects. What is my misconception and how do I have to

Data clustering in C++ using openGL

南楼画角 提交于 2019-12-10 14:45:56
问题 I am working on a project for object tracking, where I am getting data (distance in mm and amplitude) from a Lidar sensor(Pepperl-Fuchs R2000). Using OpenGL and C++ I am displaying data in linux machine. Now I want to group the points in clusters based on distance. I don't know how to put all the clusters in separate containers in c++? Is there any possibility that I can use output data from OpenGL as an input data in OpenCV for object tracking? 回答1: You should transform the OpenGL data into