How to convert a simple DataFrame to a DataSet Spark Scala with case class?

后端未结

关注

 2  1558

I am trying to convert a simple DataFrame to a DataSet from the example in Spark: https://spark.apache.org/docs/latest/sql-programming-guide.html

case class


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  孤独总比滥情好        
                
              
                            
                2021-01-17 04:03
              
            
            
                                                                       
This is how you create dataset from case class 

case class Person(name: String, age: Long) 


Keep the case class outside of the class that has below code

val primitiveDS = Seq(1,2,3).toDS()
val augmentedDS = primitiveDS.map(i => Person("var_" + i.toString, (i + 1).toLong))
augmentedDS.show()

augmentedDS.as[Person].show()


Hope this helped
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  [愿得一人]        
                
              
                            
                2021-01-17 04:18
              
            
            
                                                                       
If you change Int to Long (or BigInt) it works fine:

case class Person(name: String, age: Long)
import spark.implicits._

val path = "examples/src/main/resources/people.json"

val peopleDS = spark.read.json(path).as[Person]
peopleDS.show()


Output:

+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+


EDIT:
Spark.read.json by default parses numbers as Long types - it's safer to do so.
You can change the col type after using casting or udfs.

EDIT2:

To answer your 2nd question, you need to name the columns correctly before the conversion to Person will work:

val primitiveDS = Seq(1,2,3).toDS()
val augmentedDS = primitiveDS.map(i => ("var_" + i.toString, (i + 1).toLong)).
 withColumnRenamed ("_1", "name" ).
 withColumnRenamed ("_2", "age" )
augmentedDS.as[Person].show()


Outputs:

+-----+---+
| name|age|
+-----+---+
|var_1|  2|
|var_2|  3|
|var_3|  4|
+-----+---+

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复