How to write Dataset to a excel file using hadoop office library in apache spark java

微笑、不失礼 提交于 2019-12-10 16:49:05

问题


Currently I am using com.crealytics.spark.excel to read excel file,but using this library I can't write the dataset to an excel file. this link says that using hadoop office library (org.zuinnote.spark.office.excel) we can read and write to the excel file

Please help me to write dataset object to an excel file in spark java.


回答1:


You can use org.zuinnote.spark.office.excel for both reading and writing excel file using Dataset. Examples are given at https://github.com/ZuInnoTe/spark-hadoopoffice-ds/. However, there is one issue if you read the Excel in Dataset and try to write it in another Excel file. Please see the issue and workaround in scala at https://github.com/ZuInnoTe/hadoopoffice/issues/12.

I have written a sample program in Java using org.zuinnote.spark.office.excel and workaround given at that link. Please see if this helps you.

public class SparkExcel {
    public static void main(String[] args) {
        //spark session
        SparkSession spark = SparkSession
                .builder()
                .appName("SparkExcel")
                .master("local[*]")
                .getOrCreate();

        //Read
        Dataset<Row> df = spark
                .read()
                .format("org.zuinnote.spark.office.excel")
                .option("read.locale.bcp47", "de")
                .load("c:\\temp\\test1.xlsx");

        //Print
        df.show();
        df.printSchema();

        //Flatmap function
        FlatMapFunction<Row, String[]> flatMapFunc = new FlatMapFunction<Row, String[]>() {
            @Override
            public Iterator<String[]> call(Row row) throws Exception {
                ArrayList<String[]> rowList = new ArrayList<String[]>();
                List<Row> spreadSheetRows = row.getList(0);
                for (Row srow : spreadSheetRows) {
                    ArrayList<String> arr = new ArrayList<String>();
                    arr.add(srow.getString(0));
                    arr.add(srow.getString(1));
                    arr.add(srow.getString(2));
                    arr.add(srow.getString(3));
                    arr.add(srow.getString(4));
                    rowList.add(arr.toArray(new String[] {}));
                }
                return rowList.iterator();
            }
        };

        //Apply flatMap function
        Dataset<String[]> df2 = df.flatMap(flatMapFunc, spark.implicits().newStringArrayEncoder());

        //Write
        df2.write()
           .mode(SaveMode.Overwrite)
           .format("org.zuinnote.spark.office.excel")
           .option("write.locale.bcp47", "de")
           .save("c:\\temp\\test2.xlsx");

    }
}

I have tested this code with Java 8 and Spark 2.1.0. I am using maven and added dependency for org.zuinnote.spark.office.excel from https://mvnrepository.com/artifact/com.github.zuinnote/spark-hadoopoffice-ds_2.11/1.0.3



来源:https://stackoverflow.com/questions/44799949/how-to-write-dataset-to-a-excel-file-using-hadoop-office-library-in-apache-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!