How can I create a Spark DataFrame from a nested array of struct element?

前端 未结 3 1087
北恋
北恋 2020-12-24 09:21

I have read a JSON file into Spark. This file has the following structure:

scala> tweetBlob.printSchema
root
 |-- related: struct (nullable = true)
 |             


        
3条回答
  •  天命终不由人
    2020-12-24 10:02

    scala> import org.apache.spark.sql.DataFrame
    import org.apache.spark.sql.DataFrame
    
    scala> import org.apache.spark.sql.types._
    import org.apache.spark.sql.types._
    
    scala> case class Bar(x: Int, y: String)
    defined class Bar
    
    scala> case class Foo(bar: Bar)
    defined class Foo
    
    scala> val df = sc.parallelize(Seq(Foo(Bar(1, "first")), Foo(Bar(2, "second")))).toDF
    df: org.apache.spark.sql.DataFrame = [bar: struct]
    
    
    scala> df.printSchema
    root
     |-- bar: struct (nullable = true)
     |    |-- x: integer (nullable = false)
     |    |-- y: string (nullable = true)
    
    
    scala> df.select("bar.*").printSchema
    root
     |-- x: integer (nullable = true)
     |-- y: string (nullable = true)
    
    
    scala> 
    

提交回复
热议问题