How to convert rdd object to dataframe in spark

前端 未结 11 2335
慢半拍i
慢半拍i 2020-11-22 14:59

How can I convert an RDD (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]) to a Dataframe org.apache.spark.sql.DataFrame. I converted a datafram

11条回答
  •  清歌不尽
    2020-11-22 15:44

    Method 1: (Scala)

    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._
    val df_2 = sc.parallelize(Seq((1L, 3.0, "a"), (2L, -1.0, "b"), (3L, 0.0, "c"))).toDF("x", "y", "z")
    

    Method 2: (Scala)

    case class temp(val1: String,val3 : Double) 
    
    val rdd = sc.parallelize(Seq(
      Row("foo",  0.5), Row("bar",  0.0)
    ))
    val rows = rdd.map({case Row(val1:String,val3:Double) => temp(val1,val3)}).toDF()
    rows.show()
    

    Method 1: (Python)

    from pyspark.sql import Row
    l = [('Alice',2)]
    Person = Row('name','age')
    rdd = sc.parallelize(l)
    person = rdd.map(lambda r:Person(*r))
    df2 = sqlContext.createDataFrame(person)
    df2.show()
    

    Method 2: (Python)

    from pyspark.sql.types import * 
    l = [('Alice',2)]
    rdd = sc.parallelize(l)
    schema =  StructType([StructField ("name" , StringType(), True) , 
    StructField("age" , IntegerType(), True)]) 
    df3 = sqlContext.createDataFrame(rdd, schema) 
    df3.show()
    

    Extracted the value from the row object and then applied the case class to convert rdd to DF

    val temp1 = attrib1.map{case Row ( key: Int ) => s"$key" }
    val temp2 = attrib2.map{case Row ( key: Int) => s"$key" }
    
    case class RLT (id: String, attrib_1 : String, attrib_2 : String)
    import hiveContext.implicits._
    
    val df = result.map{ s => RLT(s(0),s(1),s(2)) }.toDF
    

提交回复
热议问题