问题
I'm reading in csv-files with in one column a string that should be converted to a datetime. The string is in the form MM/dd/yyyy HH:mm
. However when I try to transform this using joda-time, I always get the error:
Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.joda.time.DateTime is not supported
I don't know what exactly the problem is...
val input = c.textFile("C:\\Users\\AAPL.csv").map(_.split(",")).map{p =>
val formatter: DateTimeFormatter = DateTimeFormat.forPattern("MM/dd/yyyy HH:mm");
val date: DateTime = formatter.parseDateTime(p(0));
StockData(date, p(1).toDouble, p(2).toDouble, p(3).toDouble, p(4).toDouble, p(5).toInt, p(6).toInt)
}.toDF()
Anybody who can help?
回答1:
I don't know what exactly the problem is...
Well, the source of the problem is pretty much described by an error message. Spark SQL doesn't support Joda-Time DateTime
as an input. A valid input for a date field is java.sql.Date
(see Spark SQL and DataFrame Guide, Data Types for reference).
The simplest solution is to adjust StockData
class so it takes java.sql.Data
as an argument and replace:
val date: DateTime = formatter.parseDateTime(p(0))
with something like this:
val date: java.sql.Date = new java.sql.Date(
formatter.parseDateTime(p(0)).getMillis)
or
val date: java.sql.Timestamp = new java.sql.Timestamp(
formatter.parseDateTime(p(0)).getMillis)
if you want to preserve hour / minutes.
If you think about using window functions with range clause a better option is to pass string to a DataFrame and convert it to an integer timestamp:
import org.apache.spark.sql.functions.unix_timestamp
df.withColumn("ts", unix_timestamp($"date", "MM/dd/yyyy HH:mm"))
See Spark Window Functions - rangeBetween dates for details.
来源:https://stackoverflow.com/questions/33688945/convert-string-with-form-mm-dd-yyyy-hhmm-to-joda-datetime-in-dataframe-in-spa