cluster-analysis

Server-side clustering for google maps api v3

做~自己de王妃 提交于 2019-12-20 12:36:34
问题 I am currently developing a kind of google maps overview widget that displays locations as markers on the map. The amount of markers varies from several hundreds up to thousands of markers (10000 up). Right now I am using MarkerClusterer for google maps v3 1.0 and the google maps javascript api v3 (premier) and it works pretty decent for lets say a hundred markers. Due to the fact that the number of markers will increase I need a new way of clustering the markers. From what I read the only

K means clustering for multidimensional data

有些话、适合烂在心里 提交于 2019-12-20 10:57:33
问题 if the data set has 440 objects and 8 attributes (dataset been taken from UCI machine learning repository). Then how do we calculate centroids for such datasets. (wholesale customers data) https://archive.ics.uci.edu/ml/datasets/Wholesale+customers if i calculate the mean of values of each row, will that be the centroid? and how do I plot resulting clusters in matlab. 回答1: OK, first of all, in the dataset, 1 row corresponds to a single example in the data, you have 440 rows, which means the

Java machine learning library for commercial use? [closed]

房东的猫 提交于 2019-12-20 09:56:47
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . Does anyone know a good Java machine learning library I can use for a commercial product? Weka and Rapidminer unfortunately do not allow this. I already found Apache Mahout and Java Data Mininng Package. Has anyone experience with them and provide some decision support? The task calls for clustering and

News clustering

戏子无情 提交于 2019-12-20 08:38:32
问题 How does Google News and Techmeme cluster news items that are similar? Are there any well know algorithm that is used to achieve this? Appreciate your help. Thanks in advance. 回答1: One fairly common way to cluster text based on content is to use Principle Component Analysis on the word vectors (a vector of n dimensions where each possible word represents one dimension and the magnitude in each direction, for each vector, is the number occurrences of the word in that particular article),

What makes the distance measure in k-medoid “better” than k-means?

落花浮王杯 提交于 2019-12-20 08:10:54
问题 I am reading about the difference between k-means clustering and k-medoid clustering. Supposedly there is an advantage to using the pairwise distance measure in the k-medoid algorithm, instead of the more familiar sum of squared Euclidean distance-type metric to evaluate variance that we find with k-means. And apparently this different distance metric somehow reduces noise and outliers. I have seen this claim but I have yet to see any good reasoning as to the mathematics behind this claim.

Clustering Algorithm for Paper Boys

纵然是瞬间 提交于 2019-12-20 08:03:57
问题 I need help selecting or creating a clustering algorithm according to certain criteria. Imagine you are managing newspaper delivery persons. You have a set of street addresses, each of which is geocoded. You want to cluster the addresses so that each cluster is assigned to a delivery person. The number of delivery persons, or clusters, is not fixed. If needed, I can always hire more delivery persons, or lay them off. Each cluster should have about the same number of addresses. However, a

Clustering Algorithm for Paper Boys

僤鯓⒐⒋嵵緔 提交于 2019-12-20 08:03:33
问题 I need help selecting or creating a clustering algorithm according to certain criteria. Imagine you are managing newspaper delivery persons. You have a set of street addresses, each of which is geocoded. You want to cluster the addresses so that each cluster is assigned to a delivery person. The number of delivery persons, or clusters, is not fixed. If needed, I can always hire more delivery persons, or lay them off. Each cluster should have about the same number of addresses. However, a

How to apply DBSCAN algorithm on grouping of similar url [closed]

时光毁灭记忆、已成空白 提交于 2019-12-20 07:56:13
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . how to group similar url using the DBSCAN algorithm. I have seen many datasets but none were on url , I want to take similar type of urls and group it together. Here i am not able to know distance (eps) and

Display Edge Label only when Hovering Over it with Cursor - VisNetwork Igraph

淺唱寂寞╮ 提交于 2019-12-20 07:26:22
问题 Referring back to one of my previous post which contains the full reproducible code: VisNetwork from IGraph - Can't Implement Cluster Colors to Vertices My goal here is to change some of the visualization options from the visNetwork package graph. There are too many labels currently when I zoom in and it is very tough to distinguish which node belongs to which label. Is it possible to remove the labels from the visNetwork graph, and only display the labels when I hover over a node? I have

How to reduce memory usage within Prado's k-means framework used on big data in R?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-20 06:17:44
问题 I am trying to validate Prado's k-means framework for clustering trading strategies based on returns correlation matrix as found in his paper, using R for a large number of strategies, say 1000. He tries to find optimal k and optimal initialization for k-means using two for loops over all possible k 's and a number of initializations, i.e. k 's go from 2 to N-1 , where N is number of strategies. The issue is that running k-means that many times and especially with that many clusters is memory