问题
I have a RDD that is generated using Spark. Now if I write this RDD to a csv file, I am provided with some methods like "saveAsTextFile()" which outputs a csv file to the HDFS.
I want to write the file to my local file system so that my SSIS process can pick the files from the system and load them into the DB.
I am currently unable to use sqoop.
Is it somewhere possible in Java other than writing shell scripts to do that.
Any clarity needed, please let know.
回答1:
saveAsTextFile
is able to take in local file system paths (e.g. file:///tmp/magic/...
). However, if your running on a distributed cluster, you most likely want to collect()
the data back to the cluster and then save it with standard file operations.
来源:https://stackoverflow.com/questions/31239161/save-a-spark-rdd-to-the-local-file-system-using-java