How to name file when saveAsTextFile in spark?

后端 未结 3 1434
别跟我提以往
别跟我提以往 2021-02-20 17:45

When saving as a textfile in spark version 1.5.1 I use: rdd.saveAsTextFile(\'\').

But if I want to find the file in that direcotry, how d

3条回答
  •  执笔经年
    2021-02-20 18:31

    The correct answer to this question is that saveAsTextFile does not allow you to name the actual file.

    The reason for this is that the data is partitioned and within the path given as a parameter to the call to saveAsTextFile(...), it will treat that as a directory and then write one file per partition.

    You can call rdd.coalesce(1).saveAsTextFile('/some/path/somewhere') and it will create /some/path/somewhere/part-0000.txt.

    If you need more control than this, you will need to do an actual file operation on your end after you do a rdd.collect().

    Notice, this will pull all data into one executor so you may run into memory issues. That's the risk you take.

提交回复
热议问题