Dataframe from List in Java

后端 未结 3 766
盖世英雄少女心
盖世英雄少女心 2021-01-15 17:23
  • Spark Version : 1.6.2
  • Java Version: 7

I have a List data. Something like:

[[dev, engg, 10000], [kar         


        
3条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-15 17:54

    You can create DataFrame from List and then use selectExpr and split to get desired DataFrame.

    public class SparkSample{
    public static void main(String[] args) {
        SparkConf conf = new SparkConf().setAppName("SparkSample").setMaster("local[*]");
        JavaSparkContext jsc = new JavaSparkContext(conf);
        SQLContext sqc = new SQLContext(jsc);
        // sample data
        List data = new ArrayList();
        data.add("dev, engg, 10000");
        data.add("karthik, engg, 20000");
        // DataFrame
        DataFrame df = sqc.createDataset(data, Encoders.STRING()).toDF();
        df.printSchema();
        df.show();
        // Convert
        DataFrame df1 = df.selectExpr("split(value, ',')[0] as name", "split(value, ',')[1] as degree","split(value, ',')[2] as salary");
        df1.printSchema();
        df1.show(); 
       }
    }
    

    You will get below output.

    root
     |-- value: string (nullable = true)
    
    +--------------------+
    |               value|
    +--------------------+
    |    dev, engg, 10000|
    |karthik, engg, 20000|
    +--------------------+
    
    root
     |-- name: string (nullable = true)
     |-- degree: string (nullable = true)
     |-- salary: string (nullable = true)
    
    +-------+------+------+
    |   name|degree|salary|
    +-------+------+------+
    |    dev|  engg| 10000|
    |karthik|  engg| 20000|
    +-------+------+------+
    

    The sample data you have provided has empty spaces. If you want to remove space and have the salary type as "integer" then you can use trim and cast function like below.

    df1 = df1.select(trim(col("name")).as("name"),trim(col("degree")).‌​as("degree"),trim(co‌​l("salary")).cast("i‌​nteger").as("salary"‌​)); 
    

提交回复
热议问题