Convert string with form “MM/dd/yyyy HH:mm” to joda datetime in dataframe in Spark

问题

I'm reading in csv-files with in one column a string that should be converted to a datetime. The string is in the form MM/dd/yyyy HH:mm. However when I try to transform this using joda-time, I always get the error:

Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.joda.time.DateTime is not supported

I don't know what exactly the problem is...

 val input = c.textFile("C:\\Users\\AAPL.csv").map(_.split(",")).map{p => 
      val formatter: DateTimeFormatter = DateTimeFormat.forPattern("MM/dd/yyyy HH:mm");
      val date: DateTime = formatter.parseDateTime(p(0));
      StockData(date, p(1).toDouble, p(2).toDouble, p(3).toDouble, p(4).toDouble, p(5).toInt, p(6).toInt)
}.toDF()

Anybody who can help?

回答1:

I don't know what exactly the problem is...

Well, the source of the problem is pretty much described by an error message. Spark SQL doesn't support Joda-Time DateTime as an input. A valid input for a date field is java.sql.Date (see Spark SQL and DataFrame Guide, Data Types for reference).

The simplest solution is to adjust StockData class so it takes java.sql.Data as an argument and replace:

val date: DateTime = formatter.parseDateTime(p(0))

with something like this:

val date: java.sql.Date = new java.sql.Date(
  formatter.parseDateTime(p(0)).getMillis)

val date: java.sql.Timestamp = new java.sql.Timestamp(
  formatter.parseDateTime(p(0)).getMillis)

if you want to preserve hour / minutes.

If you think about using window functions with range clause a better option is to pass string to a DataFrame and convert it to an integer timestamp:

import org.apache.spark.sql.functions.unix_timestamp

df.withColumn("ts", unix_timestamp($"date", "MM/dd/yyyy HH:mm"))

See Spark Window Functions - rangeBetween dates for details.

来源：https://stackoverflow.com/questions/33688945/convert-string-with-form-mm-dd-yyyy-hhmm-to-joda-datetime-in-dataframe-in-spa

标签

scala

datetime

apache-spark

jodatime