Converting String RDD to Int RDD

こ雲淡風輕ζ 提交于 2021-02-08 05:38:42

问题


I am new to scala..I want to know when processing large datasets with scala in spark is it possible to read as int RDD instead of String RDD

I tried the below:

val intArr = sc
              .textFile("Downloads/data/train.csv")
              .map(line=>line.split(","))
              .map(_.toInt)

But I am getting the error:

error: value toInt is not a member of Array[String]

I need to convert to int rdd because down the line i need to do the below

val vectors = intArr.map(p => Vectors.dense(p))

which requires the type to be integer

Any kind of help is truly appreciated..thanks in advance


回答1:


As far as I understood, one line should create one vector, so it should goes like:

val result = sc
           .textFile("Downloads/data/train.csv")
           .map(line => line.split(","))
           .map(numbers => Vectors.dense(numbers.map(_.toInt)))

numbers.map(_.toInt) will map every element of array to int, so result type will be Array[Int]



来源:https://stackoverflow.com/questions/39727964/converting-string-rdd-to-int-rdd

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!