cluster-analysis | 易学教程

Understanding heatmap dendogram clustering in R

阅读更多关于 Understanding heatmap dendogram clustering in R

问题 I would appreciate any info material on the dendograms (Colv, Rowv) of R's heatmap function. Such as how the clustering works (is it euclidean distance?). You don't have to post lengthy explanations, I would already be happy about some keywords that could bring me on the right track so I could do some online research. Here is an excerpt from the help manual, which confuses me a little bit. What does "honored" mean in this context and how is it different from reordering? If either Rowv or Colv

MATLAB kMeans does not always converge to global minima

阅读更多关于 MATLAB kMeans does not always converge to global minima

问题 I wrote a k-Means clustering algorithm in MATLAB, and I thought I'd try it against MATLABs built in kmeans(X,k) . However, for the very easy four cluster setup (see picture), MATLAB kMeans does not always converge to the optimum solution (left) but to (right). The one I wrote does not always do that either, but should not the built-in function be able to solve such an easy problem, always finding the optimal solution? 回答1: As @Alexandre C. explained, the K-means algorithm depends on the

Clustering from the cosine similarity values

阅读更多关于 Clustering from the cosine similarity values

问题 I have extracted words from a set of URLs and calculated cosine similarity between each URL's contents.And also I have normalized the values between 0-1(using Min-Max).Now i need to cluster the URLs based on cosine similarity values to find out similar URLs.which clustering algorithm will be most suitable?.Please suggest me a Dynamic clustering method because it will be useful since i could increase number of URL's on demand and also it will be more natural.Please correct me if you feel i'm

What is an intuitive explanation of the Expectation Maximization technique? [closed]

阅读更多关于 What is an intuitive explanation of the Expectation Maximization technique? [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed last year . Expectation Maximization (EM) is a kind of probabilistic method to classify data. Please correct me if I am wrong if it is not a classifier. What is an intuitive explanation of this EM technique? What is expectation here and what is being maximized ? 回答1: Note: the code behind this

How to determine if two partitions (clusterings) of data points are identical?

阅读更多关于 How to determine if two partitions (clusterings) of data points are identical?

问题 I have n data points in some arbitrary space and I cluster them. The result of my clustering algorithm is a partition represented by an int vector l of length n assigning each point to a cluster. Values of l ranges from 0 to (possibly) n-1 . Example: l_1 = [ 1 1 1 0 0 2 6 ] Is a partition of n=7 points into 4 clusters: first three points are clustered together, the fourth and fifth are together and the last two points forms two distinct singleton clusters. My question: Suppose I have two

clustering and matlab

阅读更多关于 clustering and matlab

问题 I'm trying to cluster some data I have from the KDD 1999 cup dataset the output from the file looks like this: 0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal. with 48 thousand different records in that format. I have cleaned the data up and removed the text keeping only the numbers. The output looks like this now: I created a comma delimited file in excel and saved as a csv file then created a

Grouping points that represent lines

阅读更多关于 Grouping points that represent lines

问题 I am looking for an Algorithm that is able to solve this problem. The problem: I have the following set points: I want to group the points that represents a line (with some epsilon) in one group. So, the optimal output will be something like: Some notes: The point belong to one and only line. If the point can be belong to two lines, it should belong to the strongest. A line is considered stronger that another when it has more belonging points. The algorithm should not cover all points because

Create a summary description of a schedule given a list of shifts

阅读更多关于 Create a summary description of a schedule given a list of shifts

问题 Assuming I have a list of shifts for an event (in the format start date/time, end date/time) - is there some sort of algorithm I could use to create a generalized summary of the schedule? It is quite common for most of the shifts to fall into some sort of common recurrence pattern (ie. Mondays from 9:00 am to 1:00 pm, Tuesdays from 10:00 am to 3:00 pm, etc). However, there can (and will be) exceptions to this rule (eg. one of the shifts fell on a holiday and was rescheduled for the next day).

Create a summary description of a schedule given a list of shifts

阅读更多关于 Create a summary description of a schedule given a list of shifts

How do I manually create a dendrogram (or “hclust”) object ? (in R)

阅读更多关于 How do I manually create a dendrogram (or “hclust”) object ? (in R)

问题 I have a dendrogram given to me as images. Since it is not very large, I can construct it "by hand" into an R object. So my question is how do I manually create a dendrogram (or "hclust") object when all I have is the dendrogram image? I see that there is a function called "as.dendrogram" But I wasn't able to find an example on how to use it. (p.s: This post is following my question from here) Many thanks, Tal 回答1: I think you are better of creating an hclust object, and then converting it to