发表新帖

发表新帖

How can I make (Spark1.6) saveAsTextFile to append existing file?

前端未结

关注

 3  434

难免孤独 2020-12-17 04:21

In SparkSQL,I use DF.wirte.mode(SaveMode.Append).json(xxxx),but this method get these files like

the filename is too complex and random,I can\'t use

3条回答

-上瘾入骨i (楼主)

2020-12-17 05:02
As spark uses HDFS, this is the typical output it produces. You can use the FileUtil to merge the files back into one. It is an efficient solution as it doesn't require spark to collect whole data into single memory by partitioning it into 1. This is the approach i follow.
```
import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}   

val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
val hdfs = FileSystem.get(hadoopConf)
val mergedPath = "merged-" + filePath + ".json"
val merged = new Path(mergedPath)
if (hdfs.exists(merged)) {
  hdfs.delete(merged, true)
}
df.wirte.mode(SaveMode.Append).json(filePath)

FileUtil.copyMerge(hdfs, path, hdfs, merged, false, hadoopConf, null)
```
You can read the single file using mergedPath location. Hope it helps.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题