I\'ve written some code that includes a nested loop where the inner loop is executed about 1.5 million times. I have a function in this loop that I\'m trying to optimize. I\
The fastest way to do this is to avoid computing a function for each pair of points, assuming your relatively small collection isn't very tiny.
There are some databases that have geo-indexes you could use (mysql, oracle, mongodb..), or implement something yourself.
You could use python-geohash. For each doc in the smaller collection you need to quickly find the set of documents in the larger collection that share a hash from geohash.neighbors for the longest hash size that has matches. You'll need to use an appropriate datastructure for the lookup or this will be slow.
For finding the distance between points, the error of the simple approach increases as the distance between the points increases and also depends on the latitude. See http://www.movable-type.co.uk/scripts/gis-faq-5.1.html for example.