How to cluster Latitude and longitude data in python (or remove unwanted data)?

独自空忆成欢 提交于 2019-12-24 07:16:08

问题


I have a Latitude and Longitude data of size (34000 * 2) in pandas df

df =

Index       Latitude            Longitude
0           66.36031097267725   23.714807357485936
1           66.36030099322495   23.71479548193769
2
.
.
.
.
34000       66.27918383581169   23.568631229948359

Important Note : The above Lat & Long route has been covered twice which means if I cover the route only once, then my Latitude and Longitude data will be of size (34000/2, 2) for example.

Problem

I just want Lat and Long Data for a particular selected area. So i filtered using the starting and ending Lat and Long points in my df. On doing that, the another part of the area also selected. (See picture below after filtering)

Requirement

How to remove the additional area ? I am sure there will be some easy approach for this problem. Note : The Lat & Long data after filtering also it covered twice.

Filtered

def apply_geofence_on_data(interpolated_data, min_latitude=66.27832887852133, max_latitude=66.37098470528755, min_longitude=23.568626549485927,
                               max_longitude=23.71481685393929):

    interpolated_data = interpolated_data[interpolated_data['Latitude'] > min_latitude]
    interpolated_data = interpolated_data[interpolated_data['Latitude'] < max_latitude]
    interpolated_data = interpolated_data[interpolated_data['Longitude'] < max_longitude]
    interpolated_data = interpolated_data[interpolated_data['Longitude'] > min_longitude]

    return interpolated_data

回答1:


here a solution to test: the idea is to trap all points above the line. you choose the value of P to select the right line.

from random import uniform
import matplotlib.pyplot as plt

def newpoint(lon_min = -180.0, lon_max = 180.0, lat_min = -90.0, lat_max = 90.0 ):#long,lat
    return uniform(lon_min, lon_max), uniform(lat_min, lat_max)

lon_min = 23.568626549485927; lon_max = 23.71481685393929
lat_min = 66.27832887852133; lat_max = 66.37098470528755
p = 0.25 # i have taken this value for sample, for your case i think a value nearer from 0.75

# i generate 10 points for sample
n=10
points = (newpoint(lon_min, lon_max, lat_min, lat_max) for x in range(n))
points = [x for x in points]
Lon = [x for x,y in points]
Lat = [x for y,x in points]
df = pd.DataFrame({'Lat': Lat, 'Lon': Lon})
print(df)

#equation of the line using points A and B -> y=m*x + z 
m = (lat_max - lat_min)/(lon_max - lon_min)
z = lat_min - m * (lon_min + p * (lon_max - lon_min))
xa = lon_min + p * (lon_max - lon_min)
xb = lon_max

#you could uncomment to display result 
#df['calcul'] = df['Lon'] * m + z

#select only points above the line
df = df[df['Lon'] * m + z < df['Lat']]
print(df)

#plot to show result
plt.plot([xa, xb] , [m * xa + z, m * xb + z])
plt.plot(df.Lon, df.Lat, 'ro')
plt.show()

inital ouput:

         Lat        Lon
0  66.343486  23.674008
1  66.281614  23.678554
2  66.359215  23.637975
3  66.303976  23.659128
4  66.302640  23.589577
5  66.313877  23.634785
6  66.309733  23.683281
7  66.365582  23.667262
8  66.344611  23.688108
9  66.352028  23.673376


final result: points index 1, 3 and 6 have been put off (they are below the line)

         Lat        Lon
0  66.343486  23.674008
2  66.359215  23.637975
4  66.302640  23.589577
5  66.313877  23.634785
7  66.365582  23.667262
8  66.344611  23.688108
9  66.352028  23.673376



来源:https://stackoverflow.com/questions/55518674/how-to-cluster-latitude-and-longitude-data-in-python-or-remove-unwanted-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!