spark Athena connector

后端 未结 5 1572
[愿得一人]
[愿得一人] 2020-12-20 06:58

I need to use Athena in spark but spark uses preparedStatement when using JDBC drivers and it gives me an exception \"com.amazonaws.athena.jdbc.NotImplementedException: Meth

5条回答
  •  别那么骄傲
    2020-12-20 07:49

    I don't know how you'd connect to Athena from Spark, but you don't need to - you can very easily query the data that Athena contains (or, more correctly, "registers") from Spark.

    There are two parts to Athena

    1. Hive Metastore (now called the Glue Data Catalog) which contains mappings between database and table names and all underlying files
    2. Presto query engine which translates your SQL into data operations against those files

    When you start an EMR cluster (v5.8.0 and later) you can instruct it to connect to your Glue Data Catalog. This is a checkbox in the 'create cluster' dialog. When you check this option your Spark SqlContext will connect to the Glue Data Catalog, and you'll be able to see the tables in Athena.

    You can then query these tables as normal.

    See https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html for more.

提交回复
热议问题