How can I configure spark so that it creates “_$folder$” entries in S3? | 易学教程

问题

When I write my dataframe to S3 using

df.write
  .format("parquet")
  .mode("overwrite")
  .partitionBy("year", "month", "day", "hour", "gen", "client")
  .option("compression", "gzip")
  .save("s3://xxxx/yyyy")

I get the following in S3

year=2018
year=2019

but I would like to have this instead:

year=2018
year=2018_$folder$
year=2019
year=2019_$folder$

The scripts that are reading from that S3 location depend on the *_$folder$ entries, but I haven't found a way to configure spark/hadoop to generate them.

Any idea on what hadoop or spark configuration setting control the generation of *_$folder$ files?

回答1:

those markers a legacy feature; I don't think anything creates them any more...though they are often ignored when actually listing directories. (that is, even if there, they get stripped from listings and replaced with directory entries).

来源：https://stackoverflow.com/questions/55693083/how-can-i-configure-spark-so-that-it-creates-folder-entries-in-s3

标签

apache-spark-sql

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!