Geospatial Analytics in Python

£可爱£侵袭症+ 提交于 2019-12-18 09:38:53

问题


I have been doing some investigation to find a package to install and use for Geospatial Analytics

The closest I got to was https://github.com/harsha2010/magellan - This however has only scala interface and no doco how to use it with Python.

I was hoping if you someone knows of a package I can use?

What I am trying to do is analyse Uber's data and map it to the actual postcodes/suburbs and run it though SGD to predict the number of trips to a particular suburb.

There is already lots of data info here - http://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/#comment-606532 and I am looking for ways to do it in Python.


回答1:


In Python I'd take a look at GeoPandas. It provides a data structure called GeoDataFrame: it's a list of features, each one having a geometry and some optional attributes. You can join two GeoDataFrames together based on geometry intersection, and you can aggregate the numbers of rows (say, trips) within a single geometry (say, postcode).

  1. I'm not familiar with Uber's data, but I'd try to find a way to get it into a GeoPandas GeoDataFrame.
  2. Likewise postcodes can be downloaded from places like the U.S. Census, OpenStreetMap[1], etc, and coerced into a GeoDataFrame.
  3. Join #1 to #2 based on geometry intersection. You want a new GeoDataFrame with one row per Uber trip, but with the postcode attached to each. Another StackOverflow post discusses how do to this, and it's currently harder than it ought to be.
  4. Aggregate this by postcode and count the trips in each. The code will look like joined_dataframe.groupby('postcode').count().

My fear for the above process is if you have hundreds of thousands of very complex trip geometries, it could take forever on one machine. The link you posted uses Spark and you may end up wanting to parallelize this after all. You can write Python against a Spark cluster(!) but I'm not the person to help you with this component.

Finally, for the prediction component (e.g. SGD), check out scikit-learn: it's a pretty fully featured machine learning package, with a dead simple API.

[1]: There is a separate package called geopandas_osm that grabs OSM data and returns a GeoDataFrame: https://michelleful.github.io/code-blog/2015/04/27/osm-data/




回答2:


I realize this is an old questions, but to build on Jeff G's answer.

If you arrive at this page looking for help putting together a suite of geospatial analytics tools in python - I would highly recommend this tutorial.

https://geohackweek.github.io/vector

It really picks up steam in the 3rd section.

It shows how to integrate

  1. GeoPandas
  2. PostGIS
  3. Folium
  4. rasterstats

add in scikit-learn, numpy, and scipy and you can really accomplish a lot. You can grab information from this nDarray tutorial as well



来源:https://stackoverflow.com/questions/33427170/geospatial-analytics-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!