distance

Effective clustering of a similarity matrix

喜欢而已 提交于 2019-12-03 07:53:25
my topic is similarity and clustering of (a bunch of) text(s). In a nutshell: I want to cluster collected texts together and they should appear in meaningful clusters at the end. To do this, my approach up to now is as follows, my problem is in the clustering. The current software is written in php. 1) Similarity: I treat every document as a "bag-of-words" and convert words into vectors. I use filtering (only "real" words) tokenization (split sentences into words) stemming (reduce words to their base form; Porter's stemmer) pruning (cut of words with too high & low frequency) as methods for

ElasticSearch — use distance from point to affect query relevance

僤鯓⒐⒋嵵緔 提交于 2019-12-03 07:19:36
Trying to use ElasticSearch to create a search that uses distance from a centerpoint to influence relevance. I don't want to simply sort on distance from a point, which I know is possible, because I want relevance based on the searched query to also affect results. I'd like to pass in a search string, say "coffee", and a lat/lon, say "38, -77", and get my results ordered by a combination of how related they are to "coffee" and how close they are to "38, -77". Thanks! You can use distance function in the script of the Custom Score Query to modify _score based on the distance from a centerpoint.

MongoDB Bound Queries: How do I convert mile to radian?

守給你的承諾、 提交于 2019-12-03 07:11:55
问题 I have a collection of stores with a geospacial index on the location propery. What I am trying to do is given the user's latitude, latitude and a search radius (mi), I want to return the list of stores that are within those parameters. I saw the following example on the MongoDB documentation (http://www.mongodb.org/display/DOCS/Geospatial+Indexing), but it looks like the distance is in radians. center = [50, 50] radius = 10 db.places.find({"loc" : {"$within" : {"$center" : [center, radius]}}

Computation of Kullback-Leibler (KL) distance between text-documents using numpy

与世无争的帅哥 提交于 2019-12-03 06:55:29
My goal is to compute the KL distance between the following text documents: 1)The boy is having a lad relationship 2)The boy is having a boy relationship 3)It is a lovely day in NY I first of all vectorised the documents in order to easily apply numpy 1)[1,1,1,1,1,1,1] 2)[1,2,1,1,1,2,1] 3)[1,1,1,1,1,1,1] I then applied the following code for computing KL distance between the texts: import numpy as np import math from math import log v=[[1,1,1,1,1,1,1],[1,2,1,1,1,2,1],[1,1,1,1,1,1,1]] c=v[0] def kl(p, q): p = np.asarray(p, dtype=np.float) q = np.asarray(q, dtype=np.float) return np.sum(np.where

PHP/MySQL: Select locations close to a given location from DB

情到浓时终转凉″ 提交于 2019-12-03 06:19:17
问题 In PHP, I have the following code for calculating the distance between two locations: <?php function distance($lat1, $long1, $lat2, $long2) { // DEGREE TO RADIAN $latitude1 = $lat1/180*pi(); $longitude1 = $long1/180*pi(); $latitude2 = $lat2/180*pi(); $longitude2 = $long2/180*pi(); // FORMULA: e = ARCCOS ( SIN(Latitude1) * SIN(Latitude2) + COS(Latitude1) * COS(Latitude2) * COS(Longitude2-Longitude1) ) * EARTH_RADIUS $distance = acos(sin($latitude1)*sin($latitude2)+cos($latitude1)*cos(

Efficiently finding the closest coordinate pair from a set in Python

帅比萌擦擦* 提交于 2019-12-03 06:16:36
The Problem Imagine I am stood in an airport. Given a geographic coordinate pair, how can one efficiently determine which airport I am stood in? Inputs A coordinate pair (x,y) representing the location I am stood at. A set of coordinate pairs [(a1,b1), (a2,b2)...] where each coordinate pair represents one airport. Desired Output A coordinate pair (a,b) from the set of airport coordinate pairs representing the closest airport to the point (x,y) . Inefficient Solution Here is my inefficient attempt at solving this problem. It is clearly linear in the length of the set of airports. shortest

Objective c string formatter for distances

醉酒当歌 提交于 2019-12-03 06:14:02
问题 I have a distance as a float and I'm looking for a way to format it nicely for human readers. Ideally, I'd like it to change from m to km as it gets bigger, and to round the number nicely. Converting to miles would be a bonus. I'm sure many people have had a need for one of these and I'm hoping that there's some code floating around somewhere. Here's how I'd like the formats: 0-100m: 47m (as a whole number) 100-1000m: 325m or 320m (round to the nearest 5 or 10 meters) 1000-10000m: 1.2km

How to compute distances between centroids and data matrix (for kmeans algorithm)

心已入冬 提交于 2019-12-03 06:13:45
问题 I am a student of clustering and R. In order to obtain a better grip of both I would like to compute the distance between centroids and my xy-matrix for each iteration till it "converges". How can I solve for step 2 and 3 using R? library(fields) x <- c(3,6,8,1,2,2,6,6,7,7,8,8) y <- c(5,2,3,5,4,6,1,8,3,6,1,7) df <- data.frame(x,y) initial matrix a <- c(3,6,8) b <- c(5,2,3) df1 <- data.frame(a,b) # initial centroids Here is what I want to do: I0 <- t(rdist(df, df1)) after zero iteration

Find all coordinates within a circle in geographic data in python

隐身守侯 提交于 2019-12-03 05:54:43
问题 I've got millions of geographic points. For each one of these, I want to find all "neighboring points," i.e., all other points within some radius, say a few hundred meters. There is a naive O(N^2) solution to this problem---simply calculate the distance of all pairs of points. However, because I'm dealing with a proper distance metric (geographic distance), there should be a quicker way to do this. I would like to do this within python. One solution that comes to mind is to use some database

How to estimate distance between two android devices? (bluetooth preferred)

僤鯓⒐⒋嵵緔 提交于 2019-12-03 05:52:44
问题 Target is not to have the real distance. It is something simpler. Target is to check whether another device is very very close . True or false. Let's say 10 or 15 cms is close so our check is true and any device further away fails the check and it is false. My first approach was to use api's method fetchUuidsWithSdp() but it failed! Latency seemed the same whether the devices where a couple of cms away or at the other end of a large room! Any solution even without bluetooth is acceptable. For