nearest-neighbor | 易学教程

Why does scikit-learn's Nearest Neighbor doesn't seem to return proper cosine similarity distances?

阅读更多关于 Why does scikit-learn's Nearest Neighbor doesn't seem to return proper cosine similarity distances?

I am trying to use scikit's Nearest Neighbor implementation to find the closest column vectors to a given column vector, out of a matrix of random values. This code is supposed to find the nearest neighbors of column 21 then check the actual cosine similarity of those neighbors against column 21. from sklearn.neighbors import NearestNeighbors import sklearn.metrics.pairwise as smp import numpy as np test=np.random.randint(0,5,(50,50)) nbrs = NearestNeighbors(n_neighbors=5, algorithm='auto', metric=smp.cosine_similarity).fit(test) distances, indices = nbrs.kneighbors(test) x=21 for idx,d in

kNN with big sparse matrices in Python

阅读更多关于 kNN with big sparse matrices in Python

I have two large sparse matrices: In [3]: trainX Out[3]: <6034195x755258 sparse matrix of type '<type 'numpy.float64'>' with 286674296 stored elements in Compressed Sparse Row format> In [4]: testX Out[4]: <2013337x755258 sparse matrix of type '<type 'numpy.float64'>' with 95423596 stored elements in Compressed Sparse Row format> About 5 GB RAM in total to load. Note these matrices are HIGHLY sparse (0.0062% occupied). For each row in testX , I want to find the Nearest Neighbor in trainX and return its corresponding label, found in trainY . trainY is a list with the same length as trainX and

r - Finding closest coordinates between two large data sets

阅读更多关于 r - Finding closest coordinates between two large data sets

问题 I am aiming to identify the nearest entry in dataset 2 to each entry in dataset 1 based on the coordinates in both datasets. Dataset 1 contains 180,000 rows (only 1,800 unique coordinates) and dataset 2 contains contains 4,500 rows (full 4,500 unique coordinates). I have attempted to replicate the answers from similar questions on stackoverflow. for example: R - Finding closest neighboring point and number of neighbors within a given radius, coordinates lat-long Calculating the distance

Benefits of nearest neighbor search with Morton-order?

阅读更多关于 Benefits of nearest neighbor search with Morton-order?

While working on the simulation of particle interactions, I stumbled across grid indexing in Morton-order (Z-order)( Wikipedia link ) which is regarded to provide an efficient nearest neighbor cell search. The main reason that I've read is the almost sequential ordering of spatially close cells in memory. Being in the middle of a first implementation, I can not wrap my head around how to efficiently implement the algorithm for the nearest neighbors, especially in comparison to a basic uniform grid. Given a cell (x,y) it is trivial to obtain the 8 neighbor cell indices and compute the

Spatial nearest neighbor assignment in R

阅读更多关于 Spatial nearest neighbor assignment in R

问题 I am working on a study that is trying to assign particulate matter exposure to specific individuals based on their addresses. I have two data sets with longitude and latitude coordinates. One if for individuals and one if for pm exposure blocks. I want to assign each subject with a pm exposure block based on the block that is closest. library(sp) library(raster) library(tidyverse) #subject level data subjectID<-c("A1","A2","A3","A4") subjects<-data.frame(tribble( ~lon,~lat, -70.9821391, 42

Implementing KNN with different distance metrics using R

阅读更多关于 Implementing KNN with different distance metrics using R

I am working on a dataset in order to compare the effect of different distance metrics. I am using the KNN algorithm. The KNN algorithm in R uses the Euclidian distance by default. So I wrote my own one. I would like to find the number of correct class label matches between the nearest neighbor and target. I have prepared the data at first. Then I called the data ( wdbc_n ), I chose K=1. I have used Euclidian distance as a test. library(philentropy) knn <- function(xmat, k,method){ n <- nrow(xmat) if (n <= k) stop("k can not be more than n-1") neigh <- matrix(0, nrow = n, ncol = k) for(i in 1

Image interpolation mode in Chrome/Safari?

阅读更多关于 Image interpolation mode in Chrome/Safari?

问题 I need to have an image render with nearest-neighbor resizing and not the bicubic way that is currently used. I currently use the following: ms-interpolation-mode: nearest-neighbor; image-rendering: -moz-crisp-edges; This works in IE and Firefox, but not in Chrome and Safari. Are there any webkit alternatives or any other way to achieve this effect? 回答1: Edit: It's now possible with image-rendering: -webkit-optimize-contrast; . https://developer.mozilla.org/en-US/docs/CSS/image-rendering

Efficient implementation of the Nearest Neighbour Search

阅读更多关于 Efficient implementation of the Nearest Neighbour Search

I am trying to implement an efficient algorithm for nearest-neighbour search problem. I have read tutorials about some data structures, which support operations for this kind of problems (for example, R-tree , cover tree , etc.), but all of them are difficult to implement. Also I cannot find sample source code for these data structures. I know C++ and I am trying to solve this problem in this language. Ideally, I need links that describe how to implement these data structures using source code. You could try a linesweep algorithm to find the closest pair of points: http://community.topcoder

ERROR: subquery in FROM cannot refer to other relations of same query level

阅读更多关于 ERROR: subquery in FROM cannot refer to other relations of same query level

问题 I'm working with PostgreSQL 9 and I want to find the nearest neighbor inside table RP for all tuples in RQ , comparing the dates ( t ), but I get this error: ERROR: subquery in FROM cannot refer to other relations of same query level using this query: SELECT * FROM RQ, (SELECT * FROM RP ORDER BY ABS(RP.t - RQ.t) LIMIT 1) AS RA RQ.t in subquery seems to be the problem. How can I avoid this error? How can I get access from subquery to RQ ? 回答1: Update: LATERAL joins allow that and were

Incremental Nearest Neighbor Algorithm in Python

阅读更多关于 Incremental Nearest Neighbor Algorithm in Python

问题 Is anyone aware of a nearest neighbor algorithm implemented in Python that can be updated incrementally? All the ones I've found, such as this one, appear to be batch processes. Is it possible to implement an incremental NN algorithm? 回答1: I think the problem with incremental construction of a KD-tree or KNN-tree is, as you've alluded to in a comment, that the tree will eventually become unbalanced and you can't do simple tree rotation to fix balance problems and keep consistency. At the