How to overwrite the rdd saveAsPickleFile(path) if file already exist in pyspark?

女生的网名这么多〃 提交于 2019-12-06 09:03:41

Hi you can save RDD files like below Note (code is in scala but logic should be same for python as well) i am using 2.3.0 spark version.

  val sconf = new SparkConf().set("spark.hadoop.validateOutputSpecs", "False").setMaster("local[*]").setAppName("test")
  val scontext = new SparkContext(sconf)
  val lines = scontext.textFile("C:\\Users\\...\\Desktop\\Sampledata.txt", 1)
    println(lines.first)
    lines.saveAsTextFile("C:\\Users\\...\\Desktop\\sample2")

or if ur working with DataFrame then use

DF.write.mode(SaveMode.Overwrite).parquet(path.parquet")

or for more info please look at this

While, the rdd without write mode, and you can convert rdd to df , using df overwrite mode. As follows:

df.coalesce(1).toDF().map(lambda x: (x,)).write.csv(path=yourpath, mode='overwrite')
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!