Spark rdd correct date format in scala?

匿名 (未验证) 提交于 2019-12-03 01:06:02

问题:

This is the date value I want to use when I convert RDD to Dataframe.

Sun Jul 31 10:21:53 PDT 2016 

This schema "DataTypes.DateType" throws an error.

java.util.Date is not a valid external type for schema of date 

So I want to prepare RDD in advance in such a way that above schema can work. How can I correct the date format to work in conversion to dataframe?

//Schema for data frame val schema =                 StructType(                     StructField("lotStartDate", DateType, false) ::                     StructField("pm", StringType, false) ::                     StructField("wc", LongType, false) ::                     StructField("ri", StringType, false) :: Nil)  // rowrdd : [Sun Jul 31 10:21:53 PDT 2016,"PM",11,"ABC"] val df = spark.createDataFrame(rddRow,schema) 

回答1:

Spark's DateType can be encoded from java.sql.Date, so you should convert your input RDD to use that type, e.g.:

val inputRdd: RDD[(Int, java.util.Date)] = ??? // however it's created  // convert java.util.Date to java.sql.Date: val fixedRdd = inputRdd.map {   case (id, date) => (id, new java.sql.Date(date.getTime)) }  // now you can convert to DataFrame given your schema: val schema = StructType(   StructField("id", IntegerType) ::    StructField("date", DateType) ::    Nil )  val df = spark.createDataFrame(   fixedRdd.map(record => Row.fromSeq(record.productIterator.toSeq)),   schema )  // or, even easier - let Spark figure out the schema: val df2 = fixedRdd.toDF("id", "date")  // both will evaluate to the same schema, in this case 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!