How to write parquet file in partition in java similar to pyspark?

梦想与她 提交于 2019-12-24 05:58:59

问题


I can write parquet file into partition in pyspark like this:

rdd.write
 .partitionBy("created_year", "created_month")
 .parquet("hdfs:///my_file")

The parquet file is auto partition into created_year, created_month. How to do the same in java? I don't see an option in ParquetWriter class. Is there another class that can do that?

Thanks,


回答1:


You have to convert your RDD into DataFrame and then call write parquet function.

df = sql_context.createDataFrame(rdd)
df.write.parquet("hdfs:///my_file", partitionBy=["created_year", "created_month"])


来源:https://stackoverflow.com/questions/40234731/how-to-write-parquet-file-in-partition-in-java-similar-to-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!