distance

Return Similarity Matrix From Two Variable-length Arrays of Strings (scipy option?)

大兔子大兔子 提交于 2021-02-10 12:29:06
问题 Say I have two arrays: import numpy as np arr1 = np.array(['faucet', 'faucets', 'bath', 'parts', 'bathroom']) arr2 = np.array(['faucett', 'faucetd', 'bth', 'kichen']) and I want to compute the similarity of the strings in arr2 to the strings in arr1 . arr1 is an array of correctly spelled words. arr2 is an array of words not recognized in a dictionary of words. I want to return a matrix which will then be turned into a pandas DataFrame. My current solution (credit): from scipy.spatial

Distance from a point to a line segment in 3d (Python)

拈花ヽ惹草 提交于 2021-02-08 03:31:31
问题 I am looking for Python function that would compute distance from a point in 3D (x_0,y_0,z_0) to a line segment defined by its endpoints (x_1,y_1,z_1) and (x_2,y_2,z_2). I have only found solution for 2D for this problem. There are solutions to finding a distance from a point to a line in 3d, but not to a line segment, like here: (picture taken from Calculate distance point to line segment with special cases) 回答1: This answer is adapted from here: Calculate the euclidian distance between an

Find the most similar row using Python

大兔子大兔子 提交于 2021-02-07 10:11:11
问题 I have two data frames (df1 and df2). In the df1 I store one row with a set of values and I want to find the most similar row in the df2. import pandas as pd import numpy as np # Df1 has only one row and four columns. df1 = pd.DataFrame(np.array([[30, 60, 70, 40]]), columns=['A', 'B', 'C','D']) # Df2 has 50 rows and four columns df2 = pd.DataFrame(np.random.randint(0,100,size=(50, 4)), columns=list('ABCD')) Question: Based on the df1 what is the most similar row in df2? 回答1: If you want the

Scikit-learn: How do we define a distance metric's parameter for grid search

╄→гoц情女王★ 提交于 2021-02-07 09:45:48
问题 I have following code snippet that attempts to do a grid search in which one of the grid parameters are the distance metrics to be used for the KNN algorithm. The example below fails if I use "wminkowski", "seuclidean" or "mahalanobis" distances metrics. # Define the parameter values that should be searched k_range = range(1,31) weights = ['uniform' , 'distance'] algos = ['auto', 'ball_tree', 'kd_tree', 'brute'] leaf_sizes = range(10, 60, 10) metrics = ["euclidean", "manhattan", "chebyshev",

Distance between vectors with missing values

拜拜、爱过 提交于 2021-02-07 02:58:57
问题 For vectors A and B , euclidean distance is: sqrt((A1-B1)^2+(A2-B2)^2+...+(An-Bn)^2) A <- c(5, 4, 3, 2, 1, 1, 2, 3, 5) B <- c(1, 0, 6, 4, 3, 2, 3, 1, 3) dist(rbind(A,B), method= "euclidean") 7.681146 How is distance calculated when vectors A and B contain missing values? Here is an example: R output for distance is 8.485281 but how is it calculated? A <- c(5, NA, NA, NA, 1, 1, 2, 3, 5) B <- c(1, 0, 6, NA, NA, NA, NA, 1, 3) dist(rbind(A,B), method= "euclidean") 8.485281 回答1: Entries with NA

Calculate distance between two x/y coordinates?

对着背影说爱祢 提交于 2021-02-06 07:56:53
问题 I would like to calculate the distance between two x/y coordinates on the surface of a torus. So, this is a normal grid that has the property that its corners and sides are 'connected'. For example, on a grid of 500x500, the point at (499, 499) is adjacent to (0, 0) and the distance between e.g. (0,0) and (0,495) should then be 5. Is there any good mathematical way of calculating this? 回答1: So you are looking for the Euclidean distance on the two-dimensional surface of a torus, I gather. sqrt

How can I calculate distance between multiple latitude and longitude data?

强颜欢笑 提交于 2021-02-05 11:28:46
问题 I have 1100 station location (latitude and longitude) data and 10000 house location (latitude and longitude) data. Is it possible to calculate the lowest distance between station and house for each house by using R codes? I also want the station that gives the lowest distance for each house. Is it possible? 回答1: Here's a toy example for finding mass distances between m points and n cities. It should translate directly to your station/house problem. I brought up worldcities, spun the globe (so

How to calculate distance and time between two locations

倖福魔咒の 提交于 2021-01-29 16:27:02
问题 Here's a sample of some data Tag.ID TimeStep.coa Latitude.coa Longitude.coa <chr> <dttm> <dbl> <dbl> 1 1657 2017-08-17 12:00:00 72.4 -81.1 2 1657 2017-08-17 18:00:00 72.3 -81.1 3 1658 2017-08-14 18:00:00 72.3 -81.2 4 1658 2017-08-15 00:00:00 72.3 -81.3 5 1659 2017-08-14 18:00:00 72.3 -81.1 6 1659 2017-08-15 00:00:00 72.3 -81.2 7 1660 2017-08-20 18:00:00 72.3 -81.1 8 1660 2017-08-21 00:00:00 72.3 -81.2 9 1660 2017-08-21 06:00:00 72.3 -81.2 10 1660 2017-08-21 12:00:00 72.3 -81.3 11 1661 2017-08

Grouping people in pandas dataframe with customized function

最后都变了- 提交于 2021-01-29 15:57:15
问题 Introduction: I have a pandas dataframe with people who live in different locations (latitude, longitude, floor number). I want to cluster 3 people each in one group. This means, at the end of this process, every person is assigned to one particular group. My dataframe has the length of multiples of 9 (e.g 18 people). The tricky part is, people in the same group are not allowed to have same location in terms of latitude and longitude. What is going wrong? After I apply my function to the

R distance matrix build

生来就可爱ヽ(ⅴ<●) 提交于 2021-01-29 08:22:55
问题 New to R. I have a matrix of coordinates of several components in R, looks like: x y z C1 0.3 0.2 -1.2 C2 -1.5 0.7 0 C3 0.2 -0.75 0.22 ... My question is how to build a distance matrix of pairs of each components in R like: C1 C2 C3 ... C1 0 0.2 0.7 ... C2 0.2 0 1.2 ... C3 0.7 1.2 0 ... ... 回答1: You would do as.matrix(dist(Matrix)) Then: rownames(DistMatrix) <- colnames(DistMatrix) <- rownames(Matrix) 来源: https://stackoverflow.com/questions/13843048/r-distance-matrix-build