How to Load Spatial Data using the Hadoop GIS framework

一笑奈何 提交于 2020-01-16 10:13:22

问题


I am trying to use the Hadoop GIS Framework, in order to add Spatial support to hive. One of the things I want to do is to create a spatial table from external data (from PostGIS). Unfortunately, the serializer provided by ESRI maps to a ESRI JSON format, rather than standards such as WKT, GeoJSON. What I ended up doing, was a bit of a workaround.

The first thing, was to export my PostGIS data as a tab separated file, transforming the geometric field into GeoJSON.

\COPY (select id, ST_AsGeoJSON(geom) from grid_10) TO '/tmp/grid_10.geojson';

Then I put it somewhere in the S3 filesystem, and loaded it using the csv serializer. It created a table with two fields: and integer, and text (containing GeoJSON).

CREATE EXTERNAL TABLE grid_10 (id bigint, json STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 's3://some-url/data/grids/geojson';

I can generate geometry correctly from this GeoJSON, using this query:

SELECT ID, ST_AsText(ST_GeomFromGeoJSON(json)) from grid_10 limit 3;

Which outputs:

Now I wanted to convert this table into an actual spatial table, where geometry is stored as a BLOB, rather than some text. I did it with the following query:

create table new_grid as SELECT ID, ST_GeomFromGeoJSON(json) as geom from grid_10;  

To my surprise, the content of this table is the same geometry, repeated over and over.

I tried the same approach - creating a geometry from a WKT/GeoJSON and writing it into a table - with the same results. Is this a bug? Does it mean, I am condemned to perform spatial queries using conversions-on-the-fly, and by the way isn't it much costly in computational terms than manipulating BLOBs?

create table grid_cnt as 
SELECT grid_10.id, count(grid_10.id) as ptcnt FROM grid_10 JOIN tweets WHERE     ST_Contains(ST_GeomFromGeoJSON(grid_10.json),ST_Point(tweets.longitude, tweets.latitude))=true GROUP BY grid_10.id;

I was wondering if anybody has experienced the same issues.

Update: This problem was happening with Hive 0.11, running on Amazon Hadoop's Distribution 3.3.1. I was also pulling the ESRI jars, from this link:

https://github.com/Esri/gis-tools-for-hadoop/archive/master.zip

When I switched to the jar 2.0, and the latest hive (0.13), the problem disappeared.

You can find my issue report here. Hope this helps someone experiencing the same issues.


回答1:


I went through same issues described by you above...The solution from some expert that I got was to stored your geometry information in wkt i.e. text format instead of geometry format which you have tried.



来源:https://stackoverflow.com/questions/27147274/how-to-load-spatial-data-using-the-hadoop-gis-framework

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!