save Spark dataframe to Hive: table not readable because “parquet not a SequenceFile”

前端 未结 4 1226
走了就别回头了
走了就别回头了 2020-12-28 22:08

I\'d like to save data in a Spark (v 1.3.0) dataframe to a Hive table using PySpark.

The documentation states:

\"spark.sql.hive.convertMetasto

4条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-28 22:44

    I've been there...
    The API is kinda misleading on this one.
    DataFrame.saveAsTable does not create a Hive table, but an internal Spark table source.
    It also stores something into Hive metastore, but not what you intend.
    This remark was made by spark-user mailing list regarding Spark 1.3.

    If you wish to create a Hive table from Spark, you can use this approach:
    1. Use Create Table ... via SparkSQL for Hive metastore.
    2. Use DataFrame.insertInto(tableName, overwriteMode) for the actual data (Spark 1.3)

提交回复
热议问题