cluster-analysis

ELKI - Use List<String> of objects to populate the Database

醉酒当歌 提交于 2020-01-05 18:55:39
问题 Sorry for the naive question, but I got stuck while following all the pieces of tutorials available. So, is there a way to populate a Database db from a simple List rather than loading it reading a file? Basically what I'm looking for is something similar to: List objects = ... Database db = ClassGenericsUtil.parameterizeOrAbort(ArrayDatabase.class, params, objects); db.initialize(); Thanks in advance. 回答1: What are the contents of your String s? Same as understood by the ELKI parsers? This

ELKI - Use List<String> of objects to populate the Database

 ̄綄美尐妖づ 提交于 2020-01-05 18:55:14
问题 Sorry for the naive question, but I got stuck while following all the pieces of tutorials available. So, is there a way to populate a Database db from a simple List rather than loading it reading a file? Basically what I'm looking for is something similar to: List objects = ... Database db = ClassGenericsUtil.parameterizeOrAbort(ArrayDatabase.class, params, objects); db.initialize(); Thanks in advance. 回答1: What are the contents of your String s? Same as understood by the ELKI parsers? This

How do I automate the number of clusters? [duplicate]

佐手、 提交于 2020-01-05 08:36:10
问题 This question already has answers here : Cluster analysis in R: determine the optimal number of clusters (7 answers) Closed 10 months ago . I've been playing with the below script: from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.cluster import KMeans from sklearn.metrics import adjusted_rand_score import textract import os folder_to_scan = '/media/sf_Documents/clustering' dict_of_docs = {} # Gets all the files to scan with textract for root, sub, files in os.walk

Unsupervised high dimension clustering

对着背影说爱祢 提交于 2020-01-05 07:16:34
问题 I have dataset of records where each record is with 5 labels and the importance of each label is different. I know to labels order according to importance but don't know the differences, so the difference between two records is look like: a dist of label1 + b dist of label2 + c*dist of label3 such that a+b+c = 1. The data set contains around 3000 records and I want to cluster it(don't know the number of clusters) in some way. I thought about DBSCAN but it is not really good with high

Louvain community detection in R using igraph - format of edges and vertices

五迷三道 提交于 2020-01-05 06:49:06
问题 I have a correlation matrix of scores that I would like to run community detection on using the Louvain method in igraph, in R. I converted the correlation matrix to a distance matrix using cor2dist , as below: distancematrix <- cor2dist(correlationmatrix) This gives a 400 x 400 matrix of distances from 0-2. I then made the list of edges (the distances) and vertices (each of the 400 individuals) using the below method from http://kateto.net/networks-r-igraph (section 3.1). library(igraph)

Python number line cluster exercise

旧城冷巷雨未停 提交于 2020-01-05 06:27:19
问题 I am working through an exercise in my textbook (Ex 4.7) and am implementing the code in Python to practice dynamic programming. I am having some trouble actually executing Algorithm 4.8. I understand what is going on until I get to 'Otherwise range s from 1 to t-1 and set s to minimize f(s) . Why is the book using s in the for loop as well as setting it to the function f(s) ? How should one go about implementing that line in Python? [current code at bottom] My current code is this so far: x

Python number line cluster exercise

霸气de小男生 提交于 2020-01-05 06:25:13
问题 I am working through an exercise in my textbook (Ex 4.7) and am implementing the code in Python to practice dynamic programming. I am having some trouble actually executing Algorithm 4.8. I understand what is going on until I get to 'Otherwise range s from 1 to t-1 and set s to minimize f(s) . Why is the book using s in the for loop as well as setting it to the function f(s) ? How should one go about implementing that line in Python? [current code at bottom] My current code is this so far: x

Python number line cluster exercise

隐身守侯 提交于 2020-01-05 06:25:08
问题 I am working through an exercise in my textbook (Ex 4.7) and am implementing the code in Python to practice dynamic programming. I am having some trouble actually executing Algorithm 4.8. I understand what is going on until I get to 'Otherwise range s from 1 to t-1 and set s to minimize f(s) . Why is the book using s in the for loop as well as setting it to the function f(s) ? How should one go about implementing that line in Python? [current code at bottom] My current code is this so far: x

Is it possible to use KDTree with cosine similarity?

こ雲淡風輕ζ 提交于 2020-01-05 05:46:27
问题 Looks like I can't use this similarity metric for with sklearn KDTree, for example, but I need because I am using measuring words vectors similarity. What is fast robust customization algorithm for this case? I know about Local Sensitivity Hashing , but it should tunned & tested up a lot to find params. 回答1: The ranking your would get with cosine similarity is equivalent to the rank order of the euclidean distance when you normalize all the data points first. So you can use a KD tree to the

Bag of feature: how to create the query histogram?

ⅰ亾dé卋堺 提交于 2020-01-05 04:28:07
问题 I'm trying to implement the Bag of Features model. Given a descriptors matrix object (representing an image) belonging to the initial dataset, compute its histogram is easy, since we already know to which cluster each descriptor vector belongs to from k-means. But what about if we want to compute the histogram of a query matrix? The only solution that crosses my mind is to compute the distance between each vector descriptor to each of the k cluster centroids. This can be inefficient: