问题
I'm migrating some code from Spark 1.6 to Spark 2.1 and struggling with the following issue:
This worked perfectly in Spark 1.6
import org.apache.spark.sql.types.{LongType, StructField, StructType}
val schema = StructType(Seq(StructField("i", LongType,nullable=true)))
val rows = sparkContext.parallelize(Seq(Row(Some(1L))))
sqlContext.createDataFrame(rows,schema).show
The same code in Spark 2.1.1:
import org.apache.spark.sql.types.{FloatType, LongType, StructField, StructType}
val schema = StructType(Seq(StructField("i", LongType,nullable=true)))
val rows = ss.sparkContext.parallelize(Seq(Row(Some(1L))))
ss.createDataFrame(rows,schema).show
gives the following Runtime exception:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8.0 failed 4 times, most recent failure: Lost task 0.3 in stage 8.0 (TID 72, i89203.sbb.ch, executor 9): java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.Some is not a valid external type for schema of bigint
So how should I translate such code to Spark 2.x if I want to have nullable Long's rather than using Option[Long]?
回答1:
There is actually an JIRA SPARK-19056 about this issue which is not actually one.
So this behavior is intentional.
Allowing
OptioninRowis never documented and brings a lot of troubles when we apply the encoder framework to all typed operations. Since Spark 2.0, please useDatasetfor typed operation/custom objects. e.g.
val ds = Seq(1 -> None, 2 -> Some("str")).toDS
ds.toDF // schema: <_1: int, _2: string>
回答2:
The error message is clear which says that Some is used when bigint is required
scala.Some is not a valid external type for schema of bigint
So you need to use Option combining with getOrElse so that we can define null when Option returns nullpointer. The following code should work for you
val sc = ss.sparkContext
val sqlContext = ss.sqlContext
val schema = StructType(Seq(StructField("i", LongType,nullable=true)))
val rows = sc.parallelize(Seq(Row(Option(1L) getOrElse(null))))
sqlContext.createDataFrame(rows,schema).show
I hope this answer is helpful
来源:https://stackoverflow.com/questions/44324195/problems-to-create-dataframe-from-rows-containing-optiont