How to enable Postgis Query in Spark SQL

浪尽此生 提交于 2021-02-18 19:09:58

问题


I have a PostgreSQL database with Postgis extension, so I can do queries like:

SELECT *
FROM poi_table
WHERE (ST_DistanceSphere(the_geom, ST_GeomFromText('POINT(121.37796 31.208297)', 4326)) < 6000)

And with Spark SQL, I can query the table in my Spark Application (in Scala) like:

spark.sql("select the_geom from poi_table where the_geom is not null").show

The problem is, Spark SQL doesn't support Postgis extension. For example, when I query the table using Postgis function ST_DistanceSphere, I got such an error:

scala> spark.sql("select * FROM poi_table WHERE (ST_DistanceSphere(the_geom, ST_GeomFromText('POINT(121.37796 31.208297)', 4326)) < 60)")
org.apache.spark.sql.AnalysisException: Undefined function: 'ST_DistanceSphere'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 65
  at
...

With Python, I can create a Postgresql connection and send this query to Postgresql server to execute it.

So, is there any similar workaround in Spark/Scala?
Or even better, any jar I can use to enable Spark SQL supporting Postgis extension?


回答1:


With Python, I can create a Postgresql connection and send this query to Postgresql server to execute it.

You can do the same with Scala. Use JDBC (java.sql.{Connection,DriverManager}) and get result set.

Or even better, any jar I can use to enable Spark SQL supporting Postgis extension

You cannot, because this is not a Postgres query. What you execute in spark.sql is a Spark query. What you can do is to use subquery:

  • In Apache Spark 2.0.0, is it possible to fetch a query from an external database (rather than grab the whole table)?
  • How to use SQL query to define table in dbtable?

Maybe it will fit your requirements (if query doesn't have to be dynamic). Unfortunately Spark SQL doesn't support geometric types either, so may have to cast it to something consumable by Spark or define your own dialect.



来源:https://stackoverflow.com/questions/48305560/how-to-enable-postgis-query-in-spark-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!