问题
Need help here. I am trying to create a new column that will list the number of restaurants with in 200 meters of a restaurant using latitude and longitude. I couldn't find anything on stackoverflow, and I am no R ninja. Any help would be appreciated!
head()
business_id restaurantType full_address open city
1 --5jkZ3-nUPZxUvtcbr8Uw Greek 1336 N Scottsdale Rd\nScottsdale, AZ 85257 1 Scottsdale
2 --BlvDO_RG2yElKu9XA1_g Sushi Bars 14870 N Northsight Blvd\nSte 103\nScottsdale, AZ 85260 1 Scottsdale
3 -_Ke8q969OAwEE_-U0qUjw Beer, Wine & Spirits 18555 N 59th Ave\nGlendale, AZ 85308 0 Glendale
4 -_npP9XdyzILAjtFfX8UAQ Vietnamese 6025 N 27th Avenue\nSte 24\nPhoenix, AZ 85073 1 Phoenix
5 -2xCV0XGD9NxfWaVwA1-DQ Pizza 9008 N 99th Ave\nPeoria, AZ 85345 1 Peoria
6 -3WVw1TNQbPBzaKCaQQ1AQ Chinese 302 E Flower St\nPhoenix, AZ 85012 1 Phoenix
review_count name longitude state stars latitude type categories1 categories2
1 11 George's Gyros Greek Grill -111.9269 AZ 4.5 33.46337 business Greek <NA>
2 37 Asian Island -111.8983 AZ 4.0 33.62146 business Sushi Bars Hawaiian
3 6 Jug 'n Barrel Wine Shop -112.1863 AZ 4.5 33.65387 business <NA> Beer, Wine & Spirits
4 15 Thao's Sandwiches -112.0739 AZ 3.0 33.44990 business Vietnamese Sandwiches
5 4 Nino's Pizzeria 2 -112.2766 AZ 4.0 33.56626 business Pizza <NA>
6 145 China Chili -112.0692 AZ 3.5 33.48585 business Chinese <NA>
avgStar duration delta
1 3.694030 381 0
2 3.661017 690 0
3 3.555556 604 1
4 3.577778 1916 0
5 3.482036 226 0
6 3.535928 2190 0
str()
'data.frame': 2833 obs. of 28 variables:
$ business_id : Factor w/ 2833 levels "--5jkZ3-nUPZxUvtcbr8Uw",..: 1 2 3 4 5 6 7 8 9 10 ...
$ restaurantType: Factor w/ 118 levels "Afghan","African",..: 60 106 15 117 89 31 17 7 84 31 ...
$ full_address : Factor w/ 2586 levels "1 E Jackson St\nPhoenix, AZ 85004",..: 274 371 642 1825 2368 1102 1000 1143 2169 1669 ...
$ open : int 1 1 0 1 1 1 1 1 1 1 ...
$ city : Factor w/ 44 levels "Ahwatukee","Anthem",..: 34 34 19 31 30 31 34 4 18 31 ...
$ review_count : int 11 37 6 15 4 145 255 35 7 7 ...
$ name : Factor w/ 2652 levels "#1 Brother's Pizza",..: 885 127 1167 2318 1601 453 591 697 1492 1319 ...
$ longitude : num -112 -112 -112 -112 -112 ...
$ state : Factor w/ 2 levels "AZ","SC": 1 1 1 1 1 1 1 1 1 1 ...
$ stars : num 4.5 4 4.5 3 4 3.5 4.5 4 2.5 4.5 ...
$ latitude : num 33.5 33.6 33.7 33.4 33.6 ...
$ type : Factor w/ 1 level "business": 1 1 1 1 1 1 1 1 1 1 ...
$ categories1 : Factor w/ 103 levels "Afghan","African",..: 50 93 NA 102 78 26 14 7 73 26 ...
$ Freq : int 66 58 8 44 166 166 98 35 45 166 ...
$ avgRev : num 31.3 68.6 34.3 63.2 30.8 ...
$ avgStar : num 3.69 3.66 3.56 3.58 3.48 ...
$ duration : int 381 690 604 1916 226 2190 1968 1338 1606 56 ...
回答1:
Base R and untested code but you should get the idea.
I'm basically testing how many rows fall within the circle equation x2 + y2 <= R
for each restaurant, except for that restaurant itself, and updating that as the value in the column. Note that the radius in my equation is 200 but it will be different because your x,y is in latitude, longitude and you will have to scale the radius of 200 metres to 2pi radians / circumference of earth
or 360 degree / circumference of earth
.
df <- data.frame(
latitude = runif(n=10,min=0,max=1000),
longitude = runif(n=10,min=0,max=1000)
)
for (i in seq(nrow(df)))
{
# circle's centre
xcentre <- df[i,'latitude']
ycentre <- df[i,'longitude']
# checking how many restaurants lie within 200 m of the above centre, noofcloserest column will contain this value
df[i,'noofcloserest'] <- sum(
(df[,'latitude'] - xcentre)^2 +
(df[,'longitude'] - ycentre)^2
<= 200^2
) - 1
# logging part for deeper analysis
cat(i,': ')
# this prints the true/false vector for which row is within the radius, and which row isn't
cat((df[,'latitude'] - xcentre)^2 +
(df[,'longitude'] - ycentre)^2
<= 200^2)
cat('\n')
}
Output -
1 : TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
2 : FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 : FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
4 : TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
5 : FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
6 : TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
7 : FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
8 : FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
9 : FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
10 : FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
> df
latitude longitude noofcloserest
1 189.38878 270.25004 2
2 402.36853 879.26657 0
3 747.46417 581.66627 1
4 291.64303 157.75450 2
5 830.10699 736.19586 2
6 299.06803 157.76147 2
7 725.68360 58.53049 1
8 893.31904 772.46217 1
9 45.47875 701.82201 0
10 645.44772 226.95042 1
What that output means is that for the coordinates at row 1, three rows are within 200 m. Row 1 itself, and rows 4 and 6.
回答2:
One approach would be to compute the distance matrix, and then to figure out the ones that are sufficiently close (here I demonstrate being within 20 kilometers so the numbers aren't all 0):
# Load the fields library
library(fields)
# Create a simple data frame to demonstrate (each row is a restaurant). The rdist.earth function
# we're about to call takes as input something where the first column is longitude and the second
# column is latitude.
df = data.frame(longitude=c(-111.9269, -111.8983, -112.1863, -112.0739, -112.2766, -112.0692),
latitude=c(33.46337, 33.62146, 33.65387, 33.44990, 33.56626, 33.48585))
# Let's compute the distance between each restaurant.
distances = rdist.earth(df, miles=F)
distances
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0.00000 17.79813 32.07533 1.373515e+01 34.41932 1.344867e+01
# [2,] 17.79813 0.00000 26.93558 2.510519e+01 35.61413 2.189270e+01
# [3,] 32.07533 26.93558 0.00000 2.498676e+01 12.85352 2.162964e+01
# [4,] 13.73515 25.10519 24.98676 1.344145e-04 22.84310 4.025824e+00
# [5,] 34.41932 35.61413 12.85352 2.284310e+01 0.00000 2.122719e+01
# [6,] 13.44867 21.89270 21.62964 4.025824e+00 21.22719 9.504539e-05
# Compute the number of restaurants within 20 kilometers of the restaurant in each row.
df$num.close = colSums(distances <= 20) - 1
df$num.close
# [1] 3 1 1 2 1 2
来源:https://stackoverflow.com/questions/20695849/listing-number-of-obervations-by-location