GenericRowWithSchema exception in casting ArrayBuffer to HashSet in DataFrame to RDD from Hive table

五迷三道 提交于 2019-12-01 19:26:13

What you actually get during map phase is not an ArrayBuffer[(Int, String)] but an ArrayBuffer[Row] hence the error. Ignoring other columns what you need is something like this:

import org.apache.spark.sql.Row

tempHiveQL.map((a: Row) =>
    a.getAs[Seq[Row]](4).map{case Row(k: Int, v: String) => (k, v)}.toSet)

It looks like this issue has been fixed in Spark 1.5.0.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!