Custom schema in spark-csv throwing error in spark 1.4.1

只愿长相守 提交于 2019-12-13 01:27:30

问题


I trying to process CSV file using spark -csv package in spark-shell in spark 1.4.1.

scala> import org.apache.spark.sql.hive.HiveContext                                                                                                  
import org.apache.spark.sql.hive.HiveContext                                                                                                         

scala> import org.apache.spark.sql.hive.orc._                                                                                                        
import org.apache.spark.sql.hive.orc._                                                                                                               

scala> import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};                                                         
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}                                                                 

scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)                                                                               
15/12/21 02:06:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.                                                                       
15/12/21 02:06:24 INFO HiveContext: Initializing execution hive, version 0.13.1                                                                      
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@74cba4b                                                   

scala> val customSchema = StructType(Seq(StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true)))
customSchema: org.apache.spark.sql.types.StructType = StructType(StructField(year,IntegerType,true), StructField(make,StringType,true), StructField(model,StringType,true), StructField(comment,StringType,true), StructField(blank,StringType,true))                                                     

scala> val customSchema = (new StructType).add("year", IntegerType, true).add("make", StringType, true).add("model", StringType, true).add("comment", StringType, true).add("blank", StringType, true)
:24: error: not enough arguments for constructor StructType: (fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType. Unspecified value parameter fields.                                                                                                                  

val customSchema = (new StructType).add("year", IntegerType, true).add("make", StringType, true).add("model", StringType,true).add("comment", StringType, true).add("blank", StringType, true)   

回答1:


According to Spark 1.4.1 documentation there isn't a no-arg constructor for StructType, which is why you are getting the error. You need to either upgrade to 1.5.x to get the no-arg constructor or create the schema as you suggest in the first example.

val customSchema = StructType(Seq(StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true)))


来源:https://stackoverflow.com/questions/34398237/custom-schema-in-spark-csv-throwing-error-in-spark-1-4-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!