In SparkSQL,I use DF.wirte.mode(SaveMode.Append).json(xxxx),but this method get these files like
the filename is too complex and random,I can\'t use
As spark uses HDFS, this is the typical output it produces. You can use the FileUtil
to merge the files back into one. It is an efficient solution as it doesn't require spark to collect whole data into single memory by partitioning it into 1. This is the approach i follow.
import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}
val hadoopConf = sqlContext.sparkContext.hadoopConfiguration
val hdfs = FileSystem.get(hadoopConf)
val mergedPath = "merged-" + filePath + ".json"
val merged = new Path(mergedPath)
if (hdfs.exists(merged)) {
hdfs.delete(merged, true)
}
df.wirte.mode(SaveMode.Append).json(filePath)
FileUtil.copyMerge(hdfs, path, hdfs, merged, false, hadoopConf, null)
You can read the single file using mergedPath
location. Hope it helps.