I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to a Hive table in parquet format using the method
df.sav
it can be configured on SparkSession in that way:
spark = SparkSession \
.builder \
...
.config("spark.hadoop.hive.exec.dynamic.partition", "true") \
.config("spark.hadoop.hive.exec.dynamic.partition.mode", "nonstrict") \
.enableHiveSupport() \
.getOrCreate()
or you can add them to .properties file
the spark.hadoop prefix is needed by Spark config (at least in 2.4) and here is how Spark sets this config:
/**
* Appends spark.hadoop.* configurations from a [[SparkConf]] to a Hadoop
* configuration without the spark.hadoop. prefix.
*/
def appendSparkHadoopConfigs(conf: SparkConf, hadoopConf: Configuration): Unit = {
SparkHadoopUtil.appendSparkHadoopConfigs(conf, hadoopConf)
}