Save Spark dataframe as dynamic partitioned table in Hive

后端 未结 6 994

I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to a Hive table in parquet format using the method df.sav

6条回答
  •  盖世英雄少女心
    2020-12-02 09:49

    it can be configured on SparkSession in that way:

    spark = SparkSession \
        .builder \
        ...
        .config("spark.hadoop.hive.exec.dynamic.partition", "true") \
        .config("spark.hadoop.hive.exec.dynamic.partition.mode", "nonstrict") \
        .enableHiveSupport() \
        .getOrCreate()
    

    or you can add them to .properties file

    the spark.hadoop prefix is needed by Spark config (at least in 2.4) and here is how Spark sets this config:

      /**
       * Appends spark.hadoop.* configurations from a [[SparkConf]] to a Hadoop
       * configuration without the spark.hadoop. prefix.
       */
      def appendSparkHadoopConfigs(conf: SparkConf, hadoopConf: Configuration): Unit = {
        SparkHadoopUtil.appendSparkHadoopConfigs(conf, hadoopConf)
      }
    

提交回复
热议问题