nearest-neighbor

Why does scikit-learn's Nearest Neighbor doesn't seem to return proper cosine similarity distances?

ε祈祈猫儿з 提交于 2019-12-04 07:00:44
I am trying to use scikit's Nearest Neighbor implementation to find the closest column vectors to a given column vector, out of a matrix of random values. This code is supposed to find the nearest neighbors of column 21 then check the actual cosine similarity of those neighbors against column 21. from sklearn.neighbors import NearestNeighbors import sklearn.metrics.pairwise as smp import numpy as np test=np.random.randint(0,5,(50,50)) nbrs = NearestNeighbors(n_neighbors=5, algorithm='auto', metric=smp.cosine_similarity).fit(test) distances, indices = nbrs.kneighbors(test) x=21 for idx,d in

kNN with big sparse matrices in Python

烈酒焚心 提交于 2019-12-04 06:40:22
I have two large sparse matrices: In [3]: trainX Out[3]: <6034195x755258 sparse matrix of type '<type 'numpy.float64'>' with 286674296 stored elements in Compressed Sparse Row format> In [4]: testX Out[4]: <2013337x755258 sparse matrix of type '<type 'numpy.float64'>' with 95423596 stored elements in Compressed Sparse Row format> About 5 GB RAM in total to load. Note these matrices are HIGHLY sparse (0.0062% occupied). For each row in testX , I want to find the Nearest Neighbor in trainX and return its corresponding label, found in trainY . trainY is a list with the same length as trainX and

r - Finding closest coordinates between two large data sets

本秂侑毒 提交于 2019-12-04 05:31:53
问题 I am aiming to identify the nearest entry in dataset 2 to each entry in dataset 1 based on the coordinates in both datasets. Dataset 1 contains 180,000 rows (only 1,800 unique coordinates) and dataset 2 contains contains 4,500 rows (full 4,500 unique coordinates). I have attempted to replicate the answers from similar questions on stackoverflow. for example: R - Finding closest neighboring point and number of neighbors within a given radius, coordinates lat-long Calculating the distance

Benefits of nearest neighbor search with Morton-order?

孤人 提交于 2019-12-04 05:26:59
While working on the simulation of particle interactions, I stumbled across grid indexing in Morton-order (Z-order)( Wikipedia link ) which is regarded to provide an efficient nearest neighbor cell search. The main reason that I've read is the almost sequential ordering of spatially close cells in memory. Being in the middle of a first implementation, I can not wrap my head around how to efficiently implement the algorithm for the nearest neighbors, especially in comparison to a basic uniform grid. Given a cell (x,y) it is trivial to obtain the 8 neighbor cell indices and compute the

Spatial nearest neighbor assignment in R

﹥>﹥吖頭↗ 提交于 2019-12-04 01:57:36
问题 I am working on a study that is trying to assign particulate matter exposure to specific individuals based on their addresses. I have two data sets with longitude and latitude coordinates. One if for individuals and one if for pm exposure blocks. I want to assign each subject with a pm exposure block based on the block that is closest. library(sp) library(raster) library(tidyverse) #subject level data subjectID<-c("A1","A2","A3","A4") subjects<-data.frame(tribble( ~lon,~lat, -70.9821391, 42

Implementing KNN with different distance metrics using R

谁说我不能喝 提交于 2019-12-03 20:26:42
I am working on a dataset in order to compare the effect of different distance metrics. I am using the KNN algorithm. The KNN algorithm in R uses the Euclidian distance by default. So I wrote my own one. I would like to find the number of correct class label matches between the nearest neighbor and target. I have prepared the data at first. Then I called the data ( wdbc_n ), I chose K=1. I have used Euclidian distance as a test. library(philentropy) knn <- function(xmat, k,method){ n <- nrow(xmat) if (n <= k) stop("k can not be more than n-1") neigh <- matrix(0, nrow = n, ncol = k) for(i in 1

Image interpolation mode in Chrome/Safari?

☆樱花仙子☆ 提交于 2019-12-03 16:27:08
问题 I need to have an image render with nearest-neighbor resizing and not the bicubic way that is currently used. I currently use the following: ms-interpolation-mode: nearest-neighbor; image-rendering: -moz-crisp-edges; This works in IE and Firefox, but not in Chrome and Safari. Are there any webkit alternatives or any other way to achieve this effect? 回答1: Edit: It's now possible with image-rendering: -webkit-optimize-contrast; . https://developer.mozilla.org/en-US/docs/CSS/image-rendering

Efficient implementation of the Nearest Neighbour Search

陌路散爱 提交于 2019-12-03 15:42:29
I am trying to implement an efficient algorithm for nearest-neighbour search problem. I have read tutorials about some data structures, which support operations for this kind of problems (for example, R-tree , cover tree , etc.), but all of them are difficult to implement. Also I cannot find sample source code for these data structures. I know C++ and I am trying to solve this problem in this language. Ideally, I need links that describe how to implement these data structures using source code. You could try a linesweep algorithm to find the closest pair of points: http://community.topcoder

ERROR: subquery in FROM cannot refer to other relations of same query level

江枫思渺然 提交于 2019-12-03 13:50:29
问题 I'm working with PostgreSQL 9 and I want to find the nearest neighbor inside table RP for all tuples in RQ , comparing the dates ( t ), but I get this error: ERROR: subquery in FROM cannot refer to other relations of same query level using this query: SELECT * FROM RQ, (SELECT * FROM RP ORDER BY ABS(RP.t - RQ.t) LIMIT 1) AS RA RQ.t in subquery seems to be the problem. How can I avoid this error? How can I get access from subquery to RQ ? 回答1: Update: LATERAL joins allow that and were

Incremental Nearest Neighbor Algorithm in Python

痞子三分冷 提交于 2019-12-03 12:15:31
问题 Is anyone aware of a nearest neighbor algorithm implemented in Python that can be updated incrementally? All the ones I've found, such as this one, appear to be batch processes. Is it possible to implement an incremental NN algorithm? 回答1: I think the problem with incremental construction of a KD-tree or KNN-tree is, as you've alluded to in a comment, that the tree will eventually become unbalanced and you can't do simple tree rotation to fix balance problems and keep consistency. At the