Creating a simple 1-row Spark DataFrame with Java API

前端 未结 2 2067
无人及你
无人及你 2021-02-04 14:54

In Scala, I can create a single-row DataFrame from an in-memory string like so:

val stringAsList = List(\"buzz\")
val df = sqlContext.sparkContext.parallelize(js         


        
2条回答
  •  眼角桃花
    2021-02-04 15:14

    I have created 2 examples for Spark 2 if you need to upgrade:

    Simple Fizz/Buzz (or foe/bar - old generation :) ):

        SparkSession spark = SparkSession.builder().appName("Build a DataFrame from Scratch").master("local[*]")
                .getOrCreate();
    
        List stringAsList = new ArrayList<>();
        stringAsList.add("bar");
    
        JavaSparkContext sparkContext = new JavaSparkContext(spark.sparkContext());
    
        JavaRDD rowRDD = sparkContext.parallelize(stringAsList).map((String row) -> RowFactory.create(row));
    
        // Creates schema
        StructType schema = DataTypes.createStructType(
                new StructField[] { DataTypes.createStructField("foe", DataTypes.StringType, false) });
    
        Dataset df = spark.sqlContext().createDataFrame(rowRDD, schema).toDF();
    

    2x2 data:

        SparkSession spark = SparkSession.builder().appName("Build a DataFrame from Scratch").master("local[*]")
                .getOrCreate();
    
        List stringAsList = new ArrayList<>();
        stringAsList.add(new String[] { "bar1.1", "bar2.1" });
        stringAsList.add(new String[] { "bar1.2", "bar2.2" });
    
        JavaSparkContext sparkContext = new JavaSparkContext(spark.sparkContext());
    
        JavaRDD rowRDD = sparkContext.parallelize(stringAsList).map((String[] row) -> RowFactory.create(row));
    
        // Creates schema
        StructType schema = DataTypes
                .createStructType(new StructField[] { DataTypes.createStructField("foe1", DataTypes.StringType, false),
                        DataTypes.createStructField("foe2", DataTypes.StringType, false) });
    
        Dataset df = spark.sqlContext().createDataFrame(rowRDD, schema).toDF();
    

    Code can be downloaded from: https://github.com/jgperrin/net.jgp.labs.spark.

提交回复
热议问题