Listing number of obervations by location

问题

Need help here. I am trying to create a new column that will list the number of restaurants with in 200 meters of a restaurant using latitude and longitude. I couldn't find anything on stackoverflow, and I am no R ninja. Any help would be appreciated!

head()

 business_id                        restaurantType                                               full_address open       city
1 --5jkZ3-nUPZxUvtcbr8Uw                Greek             1336 N Scottsdale Rd\nScottsdale, AZ 85257    1 Scottsdale
2 --BlvDO_RG2yElKu9XA1_g           Sushi Bars 14870 N Northsight Blvd\nSte 103\nScottsdale, AZ 85260    1 Scottsdale
3 -_Ke8q969OAwEE_-U0qUjw Beer, Wine & Spirits                   18555 N 59th Ave\nGlendale, AZ 85308    0   Glendale
4 -_npP9XdyzILAjtFfX8UAQ           Vietnamese          6025 N 27th Avenue\nSte 24\nPhoenix, AZ 85073    1    Phoenix
5 -2xCV0XGD9NxfWaVwA1-DQ                Pizza                      9008 N 99th Ave\nPeoria, AZ 85345    1     Peoria
6 -3WVw1TNQbPBzaKCaQQ1AQ              Chinese                     302 E Flower St\nPhoenix, AZ 85012    1    Phoenix
review_count                       name longitude state stars latitude     type categories1          categories2
 1           11 George's Gyros Greek Grill -111.9269    AZ   4.5 33.46337 business       Greek                 <NA>
 2           37               Asian Island -111.8983    AZ   4.0 33.62146 business  Sushi Bars             Hawaiian
 3            6    Jug 'n Barrel Wine Shop -112.1863    AZ   4.5 33.65387 business        <NA> Beer, Wine & Spirits
 4           15          Thao's Sandwiches -112.0739    AZ   3.0 33.44990 business  Vietnamese           Sandwiches
 5            4          Nino's Pizzeria 2 -112.2766    AZ   4.0 33.56626     business       Pizza                 <NA>
 6          145                China Chili -112.0692    AZ   3.5 33.48585 business     Chinese                 <NA>

   avgStar duration delta
1 3.694030      381     0
2 3.661017      690     0   
3 3.555556      604     1
4 3.577778     1916     0
5 3.482036      226     0
6 3.535928     2190     0

str()

'data.frame':   2833 obs. of  28 variables:
 $ business_id   : Factor w/ 2833 levels "--5jkZ3-nUPZxUvtcbr8Uw",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ restaurantType: Factor w/ 118 levels "Afghan","African",..: 60 106 15 117 89 31 17 7 84 31 ...
 $ full_address  : Factor w/ 2586 levels "1 E Jackson St\nPhoenix, AZ 85004",..: 274 371 642 1825 2368 1102 1000 1143 2169 1669 ...
 $ open          : int  1 1 0 1 1 1 1 1 1 1 ...
 $ city          : Factor w/ 44 levels "Ahwatukee","Anthem",..: 34 34 19 31 30 31 34 4 18 31 ...
 $ review_count  : int  11 37 6 15 4 145 255 35 7 7 ...
 $ name          : Factor w/ 2652 levels "#1 Brother's Pizza",..: 885 127 1167 2318 1601 453 591 697 1492 1319 ...
 $ longitude     : num  -112 -112 -112 -112 -112 ...
 $ state         : Factor w/ 2 levels "AZ","SC": 1 1 1 1 1 1 1 1 1 1 ...
 $ stars         : num  4.5 4 4.5 3 4 3.5 4.5 4 2.5 4.5 ...
 $ latitude      : num  33.5 33.6 33.7 33.4 33.6 ...
 $ type          : Factor w/ 1 level "business": 1 1 1 1 1 1 1 1 1 1 ...
 $ categories1   : Factor w/ 103 levels "Afghan","African",..: 50 93 NA 102 78 26 14 7 73 26 ...
 $ Freq          : int  66 58 8 44 166 166 98 35 45 166 ...
 $ avgRev        : num  31.3 68.6 34.3 63.2 30.8 ...
 $ avgStar       : num  3.69 3.66 3.56 3.58 3.48 ...
 $ duration      : int  381 690 604 1916 226 2190 1968 1338 1606 56 ...

回答1:

Base R and untested code but you should get the idea.

I'm basically testing how many rows fall within the circle equation x2 + y2 <= R for each restaurant, except for that restaurant itself, and updating that as the value in the column. Note that the radius in my equation is 200 but it will be different because your x,y is in latitude, longitude and you will have to scale the radius of 200 metres to 2pi radians / circumference of earth or 360 degree / circumference of earth.

df <- data.frame(
  latitude = runif(n=10,min=0,max=1000),
  longitude = runif(n=10,min=0,max=1000)
  )

for (i in seq(nrow(df)))
{
  # circle's centre
  xcentre <- df[i,'latitude']
  ycentre <- df[i,'longitude']

  # checking how many restaurants lie within 200 m of the above centre, noofcloserest column will contain this value
  df[i,'noofcloserest'] <- sum(
    (df[,'latitude'] - xcentre)^2 + 
      (df[,'longitude'] - ycentre)^2 
    <= 200^2
  ) - 1

  # logging part for deeper analysis
  cat(i,': ')
  # this prints the true/false vector for which row is within the radius, and which row isn't
  cat((df[,'latitude'] - xcentre)^2 + 
    (df[,'longitude'] - ycentre)^2 
  <= 200^2)

  cat('\n')

}

Output -

1 : TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
2 : FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 : FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
4 : TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
5 : FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
6 : TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
7 : FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
8 : FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
9 : FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
10 : FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
> df
    latitude longitude noofcloserest
1  189.38878 270.25004             2
2  402.36853 879.26657             0
3  747.46417 581.66627             1
4  291.64303 157.75450             2
5  830.10699 736.19586             2
6  299.06803 157.76147             2
7  725.68360  58.53049             1
8  893.31904 772.46217             1
9   45.47875 701.82201             0
10 645.44772 226.95042             1

What that output means is that for the coordinates at row 1, three rows are within 200 m. Row 1 itself, and rows 4 and 6.

回答2:

One approach would be to compute the distance matrix, and then to figure out the ones that are sufficiently close (here I demonstrate being within 20 kilometers so the numbers aren't all 0):

# Load the fields library
library(fields)

# Create a simple data frame to demonstrate (each row is a restaurant). The rdist.earth function
# we're about to call takes as input something where the first column is longitude and the second
# column is latitude.
df = data.frame(longitude=c(-111.9269, -111.8983, -112.1863, -112.0739, -112.2766, -112.0692),
                latitude=c(33.46337, 33.62146, 33.65387, 33.44990, 33.56626, 33.48585))

# Let's compute the distance between each restaurant.
distances = rdist.earth(df, miles=F)
distances

#          [,1]     [,2]     [,3]         [,4]     [,5]         [,6]
# [1,]  0.00000 17.79813 32.07533 1.373515e+01 34.41932 1.344867e+01
# [2,] 17.79813  0.00000 26.93558 2.510519e+01 35.61413 2.189270e+01
# [3,] 32.07533 26.93558  0.00000 2.498676e+01 12.85352 2.162964e+01
# [4,] 13.73515 25.10519 24.98676 1.344145e-04 22.84310 4.025824e+00
# [5,] 34.41932 35.61413 12.85352 2.284310e+01  0.00000 2.122719e+01
# [6,] 13.44867 21.89270 21.62964 4.025824e+00 21.22719 9.504539e-05

# Compute the number of restaurants within 20 kilometers of the restaurant in each row.
df$num.close = colSums(distances <= 20) - 1
df$num.close
# [1] 3 1 1 2 1 2

来源：https://stackoverflow.com/questions/20695849/listing-number-of-obervations-by-location

标签

latitude-longitude