Reverse geocoding with big array is fastest way? - javascript and performance

I have many points on Google Maps and I want to show for each point the nearest city (so a reverse geocoding). I have a multidimensional array like this:

citta_vicine = [];

var comuni = [
 ["Abano Terme (PD)",45.3594,11.7894],
 ["Abbadia Cerreto (LO)",45.3122,9.5928],
 ["Abbadia Lariana (LC)",45.8992,9.3336],
 ["Abbadia San Salvatore (SI)",42.8800,11.6775],
 ["Abbasanta (OR)",40.1250,8.8200]
]

//city, latitude, longitude

The problem is that my array has all city of Italy (8000 !) and is 300 Kb.

To get nearest city i can use this:

//this line will be inside for loop of points
 var city_near= estrapola_paese(50,lat,lng); //lat and lng are coordinates of these points
//


 function estrapola_paese(distanza,latB,longB){ 
  citta_vicine = [];
  for(var i= 0; i < comuni.length; i++){
    var dist_eqcity= dist_coords(comuni[i][1],comuni[i][2],latB,longB);
    if(dist_eqcity < distanza){
        citta_vicine.push([dist_eqcity, comuni[i][0]]);
    }
  }
  if(citta_vicine.length > 0){
    citta_vicine.sort(function(a, b) {
        return a - b;
    }); 
    return citta_vicine[0][1];
  }
  else{
    distanza = distanza+50;
    estrapola_paese(distanza,latB,longB);   
  }
 }

//calculate distance in Km between city and "point"
function dist_coords(latA,longA,latB,longB) {
 var R = 6372.795477598;
 var laA = latA * Math.PI/180;
 var laB = latB * Math.PI/180;
 var loA = longA * Math.PI/180;
 var loB = longB * Math.PI/180;
 var distanza = R * Math.acos(Math.sin(laA)*Math.sin(laB) + Math.cos(laA)*Math.cos(laB) * Math.cos(loA-loB));
 if(isNaN(distanza) == true){   
  distanza = 0;
 }
 return distanza;
}

In short, for a question of performance, I consider (at first) only cities within a radius of 50 km from the point. If there are cities within 50 km, I add the city (and the distance) in the "citta_vicine" array and order the latter array from the lowest to the highest value. Therefore from the city closest to the most distant.

If instead there are no cities within 50 km then I perform the function "estrapola_paese" again but increase the radius to be considered of another 50 km.

I think CODE WORKS but I have many doubts:

1) The file weighs 459 KB: is it too much?

2) Is there a better way to do all this?

3) sorting of array citta_vicine is correct? If is not empty is like this:

   [
    ["tokyo",34],
    ["rome",24],
    ["paris",54]
   ]

using this:

   citta_vicine.sort(function(a, b) {
     return a - b;
   });

i will have this output:

   [        
    ["rome",24],
    ["tokyo",34],
    ["paris",54]
   ]

I hope you can help me and sorry for my English.

🚀 Rock your calculations

Step #1 Calculate distance of all locations.

Step #2 Sort the result by the distance value

Step #3 Find locations, repeat it at least one record is found.

Check the DEMO, to calculate a list with 7320 records took ~ 17.22119140625ms

const citites = [
  [`Abano Terme (PD)`, 45.3594, 11.7894],
  [`Abbadia Cerreto (LO)`, 45.3122, 9.5928],
  [`Abbadia Lariana (LC)`, 45.8992, 9.3336],
  [`Abbadia San Salvatore (SI)`, 42.8800, 11.6775],
  [`Abbasanta (OR)`, 40.1250, 8.8200]
]

function distance(lat, long) {
  const R = 6372.795477598
  const PI = Math.PI / 180
  return cities
    .map(city => {
      const laA = city[1] * PI
      const laB = lat * PI
      const loA = city[2] * PI
      const loB = long * PI
      const dist = R * Math.acos(
        Math.sin(laA) * Math.sin(laB) +
        Math.cos(laA) * Math.cos(laB) * Math.cos(loA - loB)
      ) || 0
      return { dist, city }
    })
    .sort((a, b) => a.dist - b.dist)
}


function nearest(dist, lat, long) {
  const locations = distance(lat, long)
  function find(delta) {
    const result = []
    for (let location of locations) {
      if (location.dist > delta) break
      result.push(location.city)
    }
    return result.length > 0
      ? result
      : find(delta + 50)
  }
  return find(dist)
}

const result = nearest(50, 41.89595563, 12.48325842)

Since the city data is not dynamically changed and you need to calculate the distance / nearest neighbour frequently, using a geospatial index (KD-Tree, R-Tree etc) would make sense.

Here's an example implementation using geokdbush which is based on a static spatial index implemented using a KD-Tree. It takes Earth curvature and date line wrapping into account.

const kdbush = require('kdbush');
const geokdbush = require('geokdbush');

// I've stored the data points as objects to make the values unambiguous
const cities = [
  { name: "Abano Terme (PD)", latitude: 45.3594, longitude: 11.7894 },
  { name: "Abbadia Cerreto (LO)", latitude: 45.3122, longitude: 9.5928 },
  { name: "Abbadia Lariana (LC)", latitude: 45.8992, longitude: 9.3336 },
  { name: "Abbadia San Salvatore (SI)", latitude: 42.8800, longitude: 11.6775 },
  { name: "Abbasanta (OR)", latitude: 40.1250, longitude: 8.8200 }
];

// Create the index over city data ONCE
const index = kdbush(cities, ({ longitude }) => longitude, ({ latitude }) => latitude);

// Get the nearest neighbour in a radius of 50km for a point with latitude 43.7051 and longitude 11.4363
const nearest = geokdbush.around(index, 11.4363, 43.7051, 1, 50);

Once again, bear in mind that kdbush is a static index and cannot be changed (you cannot add or remove cities from it). If you need to change the city data after initialisation, depending on how often you do it, using an index might prove too costly.

You may want to sort after the second array element:

 citta_vicine.sort(function(a, b) {
   return a[1] - b[1];
 });

To get nearest city...

If you're only interested in the nearest city, you don't need to sort the whole list. That's your first performance gain in one line of code!

// Unneeded sort:
const closest = cityDistancePairs.sort((a, b) => a[1] - b[2])[0];

// Only iterates once instead:
const closestAlt = cityDistancePairs.reduce(
  (closest, current) => current[1] < closest[1] ? current : closest
);

To further optimize you'll need to benchmark which parts of the code take the longest to run. Some ideas:

Do a quick check on the lat/lon difference before calculating the precise value. If coordinates are further than a certain delta apart, you already know they are out of range.
Cache the calculated distances by implementing a memoization pattern to make sure that on a second pass with a different limit (50 -> 100), you don't recalculate the distances

However, I can't imagine that a loop of 8000 distance calculations is the real performance drain... I'm guessing that the parsing of 300kb of javascript is the real bottleneck. How are you initializing the city array?

Make sure you strip down the data set to only the properties and values you actually use. If you know you're only going to use the name and lat/lon, you can preprocess the data to only include those. That'll probably make it much smaller than 300kb and easier to work with.

来源：https://stackoverflow.com/questions/49494643/reverse-geocoding-with-big-array-is-fastest-way-javascript-and-performance

标签

javascript

arrays

performance

optimization

reverse-geocoding