问题
I can write parquet file into partition in pyspark like this:
rdd.write
.partitionBy("created_year", "created_month")
.parquet("hdfs:///my_file")
The parquet file is auto partition into created_year, created_month. How to do the same in java? I don't see an option in ParquetWriter class. Is there another class that can do that?
Thanks,
回答1:
You have to convert your RDD into DataFrame and then call write parquet function.
df = sql_context.createDataFrame(rdd)
df.write.parquet("hdfs:///my_file", partitionBy=["created_year", "created_month"])
来源:https://stackoverflow.com/questions/40234731/how-to-write-parquet-file-in-partition-in-java-similar-to-pyspark