Write each row of a spark dataframe as a separate file

后端 未结 2 1882
时光说笑
时光说笑 2020-12-18 14:12

I have Spark Dataframe with a single column, where each row is a long string (actually an xml file). I want to go through the DataFrame and save a string from each row as a

2条回答
  •  暖寄归人
    2020-12-18 14:56

    I would do it this way in Java and Hadoop FileSystem API. You can write similar code using Python.

    List strings = Arrays.asList("file1", "file2", "file3");
    JavaRDD stringrdd = new JavaSparkContext().parallelize(strings);
    stringrdd.collect().foreach(x -> {
        Path outputPath = new Path(x);
        Configuration conf = getConf();
        FileSystem fs = FileSystem.get(conf);
        OutputStream os = fs.create(outputPath);
    });
    

提交回复
热议问题