Speeding up a “closest” string match algorithm
问题 I am currently processing a very large database of locations and trying to match them with their real world coordinates. To achieve this, I have downloaded the geoname dataset which contains a lot of entries. It gives possible names and lat/long coordinates. To try and speed up the process, I have managed to reduce the huge csv file (of 1.6 GB) to 0.450 GB by removing entries that do not make sense for my dataset. It still contains however 4 million entries. Now I have many entries such as: