Dataframe to Dataset which has type Any

跟風遠走 提交于 2019-12-01 06:55:20
user6910411

Unless you're interested in limited and ugly workarounds like Encoders.kryo:

import org.apache.spark.sql.Encoders

case class FooBar(foo: Int, bar: Any)

spark.createDataset(
  sc.parallelize(Seq(FooBar(1, "a")))
)(Encoders.kryo[FooBar])

or

spark.createDataset(
  sc.parallelize(Seq(FooBar(1, "a"))).map(x => (x.foo, x.bar))
)(Encoders.tuple(Encoders.scalaInt, Encoders.kryo[Any]))

you don't. All fields / columns in a Dataset have to be of known, homogeneous type for which there is an implicit Encoder in the scope. There is simply no place for Any there.

UDT API provides a bit more flexibility and allows for a limited polymorphism but it is private, not fully compatible with Dataset API and comes with significant performance and storage penalty.

If for a given execution all values of the same type you can of course create specialized classes and make a decision which one to use at run time.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!