Create Empty dataframe Java Spark

帅比萌擦擦* 提交于 2021-02-08 10:22:21

问题


There are many examples on how to create empty dataframe/Dataset using Spark Scala/Python. But I would like to know how to create an empty dataframe/Dataset in Java Spark.

I have to create an empty dataframe with just one column with header as Column_1 and type String.


回答1:


Alternative-1 Create empty dataframe with the user defined schema

// alternative - 1
        StructType s = new StructType()
                .add(new StructField("Column_1", DataTypes.StringType, true, Metadata.empty()));
        Dataset<Row> csv = spark.read().schema(s).csv(spark.emptyDataset(Encoders.STRING()));
        csv.show(false);
        csv.printSchema();
        /**
         * +--------+
         * |Column_1|
         * +--------+
         * +--------+
         *
         * root
         *  |-- Column_1: string (nullable = true)
         */

Alternative-2 create dataframe with null value and user defined schema

 Dataset<Row> df4 = spark.sql("select cast(null  as string) Column_1");
        df4.show(false);
        df4.printSchema();
        /**
         * +--------+
         * |Column_1|
         * +--------+
         * |null    |
         * +--------+
         *
         * root
         *  |-- Column_1: string (nullable = true)
         */

Alternative-3

 ClassTag<Row> rowTag = scala.reflect.ClassTag$.MODULE$.apply(Row.class);
        Dataset<Row> df5 = spark.createDataFrame(spark.sparkContext().emptyRDD(rowTag),
                new StructType()
                        .add(new StructField("Column_1", DataTypes.StringType, true, Metadata.empty())));
        df5.show(false);
        df5.printSchema();
        /**
         * +--------+
         * |Column_1|
         * +--------+
         * +--------+
         *
         * root
         *  |-- Column_1: string (nullable = true)
         */

spark.emptyDataframe to create dataframe without any column and value

 Dataset<Row> rowDataset = spark.emptyDataFrame();
        rowDataset.show(false);
        rowDataset.printSchema();
        /**
         * ++
         * ||
         * ++
         * ++
         *
         * root
         */


来源:https://stackoverflow.com/questions/62898190/create-empty-dataframe-java-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!