fuzzyjoin

How to join location data (lat,lon)

微笑、不失礼 提交于 2021-02-18 17:12:36
问题 I have to dataset, one with some location (lat,lon), that's test, and one with the lat/lon information of all zip codes in NYC, that's test2. test <- structure(list(trip_count = 1:10, dropoff_longitude = c(-73.959862, -73.882202, -73.934113, -73.992203, -74.00563, -73.975189, -73.97448, -73.974838, -73.981377, -73.955093), dropoff_latitude = c(40.773617, 40.744175, 40.715923, 40.749203, 40.726158, 40.729824, 40.763599, 40.754135, 40.759987, 40.765224)), row.names = c(NA, -10L), class = c("tbl

How to fuzzy join 2 dataframes on 2 variables with differing “fuzzy logic”?

元气小坏坏 提交于 2021-02-11 13:31:44
问题 # example a <- data.frame(name=c("A","B","C"), KW=c(201902,201904,201905),price=c(1.99,3.02,5.00)) b <- data.frame(KW=c(201903,201904,201904),price=c(1.98,3.00,5.00),name=c("a","b","c")) I want to match a and b with fuzzy logic, using the variables KW and price. I want to allow a tolerance of +/- 1 for KW and a tolerance for +/- 0.02 in price. The desired outcome should look like this: name.x KW.x price.x KW.y price.y name.y 1 A 201902 1.99 201903 1.98 a 2 B 201904 3.02 201904 3.00 b 3 C

How to fuzzy join 2 dataframes on 2 variables with differing “fuzzy logic”?

允我心安 提交于 2021-02-11 13:30:49
问题 # example a <- data.frame(name=c("A","B","C"), KW=c(201902,201904,201905),price=c(1.99,3.02,5.00)) b <- data.frame(KW=c(201903,201904,201904),price=c(1.98,3.00,5.00),name=c("a","b","c")) I want to match a and b with fuzzy logic, using the variables KW and price. I want to allow a tolerance of +/- 1 for KW and a tolerance for +/- 0.02 in price. The desired outcome should look like this: name.x KW.x price.x KW.y price.y name.y 1 A 201902 1.99 201903 1.98 a 2 B 201904 3.02 201904 3.00 b 3 C

How do I do one fuzzy and one exact match in a dataframe?

风格不统一 提交于 2021-02-10 14:28:45
问题 I want to be able to fuzzy match one column and exact match another column. Say I df1 looks like this: And df2 looks like this: I want to fuzzy match the "Name" but exact match the "Year." So "Ashley" and "Ashlee" would be a match. This is what I have so far: res <- fuzzy_left_join( df, df2, by=c("Year","Name"), list(`==`, function(x,y) stringdist(tolower(x), tolower(y), method="lv") <= 3) ) res %>% select(Year = Year.x, everything(), - Year.y) It appears to be over-matching, though. Not sure

How do I do one fuzzy and one exact match in a dataframe?

馋奶兔 提交于 2021-02-10 14:27:37
问题 I want to be able to fuzzy match one column and exact match another column. Say I df1 looks like this: And df2 looks like this: I want to fuzzy match the "Name" but exact match the "Year." So "Ashley" and "Ashlee" would be a match. This is what I have so far: res <- fuzzy_left_join( df, df2, by=c("Year","Name"), list(`==`, function(x,y) stringdist(tolower(x), tolower(y), method="lv") <= 3) ) res %>% select(Year = Year.x, everything(), - Year.y) It appears to be over-matching, though. Not sure

R: Regex_Join/Fuzzy_Join - Join Inexact Strings in Different Word Orders

非 Y 不嫁゛ 提交于 2021-02-08 04:03:45
问题 df1 df2 df3 library(dplyr) library(fuzzyjoin) df1 <- tibble(a =c("Apple Pear Orange", "Sock Shoe Hat", "Cat Mouse Dog")) df2 <- tibble(b =c("Kiwi Lemon Apple", "Shirt Sock Glove", "Mouse Dog"), c = c("Fruit", "Clothes", "Animals")) # Appends 'Animals' df3 <- regex_left_join(df1,df2, c("a" = "b")) # Appends Nothing df3 <- stringdist_left_join(df1, df2, by = c("a" = "b"), max_dist = 3, method = "lcs") I want to append column c of df2 to df1 using the strings, 'Apple', 'Sock' and 'Mouse Dog'. I

fuzzy outer join/merge in R

泄露秘密 提交于 2021-01-28 03:12:05
问题 I have 2 datasets and want to do fuzzy join. Here is the two datasets. library(data.table) # data1 dt1 <- fread("NAME State type ABERCOMBIE TOWNSHIP ND TS ABERDEEN TOWNSHIP NJ TS ABERDEEN TOWNSHIP SD TS ABBOTSFORD CITY WI CI ABERDEEN CITY WA CI ADA TOWNSHIP MI TS ADAMS IL TS", header = T) # data2 dt2 <- fread("NAME State type ABERDEEN TWP N J NJ TS ABERDEEN WASH WA CI ABBOTSFORD WIS WI CI ADA TWP MICH MI TS ADA OHIO OH CI ADAMS MASS MA CI ADAMSVILLE ALA AL CI", header = T) Two datasets have

How to fuzzy join based on multiple columns and conditions?

戏子无情 提交于 2021-01-27 05:54:21
问题 I'm trying to left join two data frames (df1, df2). The data frames have two columns in common: zone and slope. Zone is a factor column and slope is numeric. df1 = data.frame(slope = c(1:6), zone = c(rep("Low", 3), rep("High", 3))) df2 = data.frame(slope = c(2.4, 2.4,6.2), zone = c(rep("Low", 1), rep("High", 2)), other = c(rep("a", 1), rep("b", 1), rep("c", 1))) df1 df2 I want to join the data frames such that they are first matched exactly on zone, and then the closest match for slope. If

fuzzy and exact match of two databases

∥☆過路亽.° 提交于 2020-12-13 03:40:13
问题 I have two databases. The first one has about 70k rows with 3 columns. the second one has 790k rows with 2 columns. Both databases have a common variable grantee_name . I want to match each row of the first database to one or more rows of the second database based on this grantee_name . Note that merge will not work because the grantee_name do not match perfectly. There are different spellings etc. So, I am using the fuzzyjoin package and trying the following: library("haven"); library(

fuzzy and exact match of two databases

时间秒杀一切 提交于 2020-12-13 03:38:25
问题 I have two databases. The first one has about 70k rows with 3 columns. the second one has 790k rows with 2 columns. Both databases have a common variable grantee_name . I want to match each row of the first database to one or more rows of the second database based on this grantee_name . Note that merge will not work because the grantee_name do not match perfectly. There are different spellings etc. So, I am using the fuzzyjoin package and trying the following: library("haven"); library(