cluster-analysis | 易学教程

How to identify sequences within each cluster?

阅读更多关于 How to identify sequences within each cluster?

问题 Using the biofam dataset that comes as part of TraMineR : library(TraMineR) data(biofam) lab <- c("P","L","M","LM","C","LC","LMC","D") biofam.seq <- seqdef(biofam[,10:25], states=lab) head(biofam.seq) Sequence 1167 P-P-P-P-P-P-P-P-P-LM-LMC-LMC-LMC-LMC-LMC-LMC 514 P-L-L-L-L-L-L-L-L-L-L-LM-LMC-LMC-LMC-LMC 1013 P-P-P-P-P-P-P-L-L-L-L-L-LM-LMC-LMC-LMC 275 P-P-P-P-P-L-L-L-L-L-L-L-L-L-L-L 2580 P-P-P-P-P-L-L-L-L-L-L-L-L-LMC-LMC-LMC 773 P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P I can perform a cluster analysis:

Clustering algorithm for rays

阅读更多关于 Clustering algorithm for rays

问题 I know that there are clustering algorithms for points obviously, but I have a different scenario. I have many rays, all whose start points are on a sphere in 3D, and whose direction vectors point inwards into the sphere. Some of the rays are pointing towards a point A, others are pointing towards a point B, etc, with some noise(ie the rays don't perfectly intersect each other). Is there a clustering algorithm that will allow me to cluster the rays based on which point they are pointing

How to use ggplot to plot T-SNE clustering

阅读更多关于 How to use ggplot to plot T-SNE clustering

问题 Here is the t-SNE code using IRIS data: library(Rtsne) iris_unique <- unique(iris) # Remove duplicates iris_matrix <- as.matrix(iris_unique[,1:4]) set.seed(42) # Set a seed if you want reproducible results tsne_out <- Rtsne(iris_matrix) # Run TSNE # Show the objects in the 2D tsne representation plot(tsne_out$Y,col=iris_unique$Species) Which produces this plot: How can I use GGPLOT to make that figure? 回答1: I think the easiest/cleanest ggplot way would be to store all the info you need in a

Predict in Clustering

阅读更多关于 Predict in Clustering

问题 In R language is there a predict function in clustering like the way we have in classification? What can we conclude from the clustering graph result that we get from R, other that comparing two clusters? 回答1: Clustering does not pay attention to prediction capabilities. It just tries to find objects that seem to be related. That is why there is no "predict" function for clustering results. However, in many situations, learning classifiers based on the clusters offers an improved performance.

Determining cluster membership in SOM (Self Organizing Map) for time series data

阅读更多关于 Determining cluster membership in SOM (Self Organizing Map) for time series data

问题 I am also working on a project that requires clustering of time series data. I am using the SOM toolbox that works in MATLAB for clustering purpose and stuck with the following problem: "How can we determine which data belongs to which cluster?" SOM randomly chooses data sample from dataset and finds BMU for each data sample. As far as I know, data sample identifier is not regarded as dimension of data in SOM algorithm. If it is the case then how can we track the samples? I don't think that

Spectral clustering using scikit learn on graph generated through networkx

阅读更多关于 Spectral clustering using scikit learn on graph generated through networkx

问题 I have a 3000x50 feature vector matrix. I obtained a similarity matrix for this using sklearn.metrics.pairwise_distances as 'Similarity_Matrix'. Now I used networkx to create a graph using the similarity matrix generated in the previous step as G=nx.from_numpy_matrix(Similarity_Matrix) . I want to perform spectral clustering on this graph G now but several google searches have failed to provide a decent example of scikit learn spectral clustering on this graph :( The official documentation

Removing cycles in weighted directed graph

阅读更多关于 Removing cycles in weighted directed graph

问题 This is a follow-up question on my other posts. Algorithm for clustering with size constraints I'm working on a clustering algorithm, After some reclustering, now I have this set of points that none of them are in their optimal cluster but could not be reassigned individually, since it'll violate the constraint. I'm trying to use a graph structure to solve the problem but came across a few issues in implementing. I'm a beginner, please let me know if I'm wrong. Per @Kittsil's answer build a

Clustering uni-variate Time series using sklearn

阅读更多关于 Clustering uni-variate Time series using sklearn

问题 I have a panda DataFrame from which, i would like to do clustering for each columns. I am using sklearn and this is what i have: data= pd.read_csv("data.csv") data=pd.DataFrame(data) data=data.set_index("Time") #print(data) cluster_numbers=2 list_of_cluster=[] for k,v in data.iteritems(): temp=KMeans(n_clusters=cluster_numbers) temp.fit(data[k]) print(k) print("predicted",temp.predict(data[k])) list_of_cluster.append(temp.predict(data[k])) when i try to run it, i have this error: ValueError:

Scikit-learn, KMeans: How to use max_iter

阅读更多关于 Scikit-learn, KMeans: How to use max_iter

问题 I'd like to understand the parameter max_iter from the class sklearn.cluster.KMeans. According to the documentation: max_iter : int, default: 300 Maximum number of iterations of the k-means algorithm for a single run. But in my opinion if I have 100 Objects the code must run 100 times, if I have 10.000 Objects the code must run 10.000 times to classify every object. And on the other hand it makes no sense to run several times over all objects. What is my misconception and how do I have to

Data clustering in C++ using openGL

阅读更多关于 Data clustering in C++ using openGL

问题 I am working on a project for object tracking, where I am getting data (distance in mm and amplitude) from a Lidar sensor(Pepperl-Fuchs R2000). Using OpenGL and C++ I am displaying data in linux machine. Now I want to group the points in clusters based on distance. I don't know how to put all the clusters in separate containers in c++? Is there any possibility that I can use output data from OpenGL as an input data in OpenCV for object tracking? 回答1: You should transform the OpenGL data into