Splitting strings in Apache Spark using Scala

前端 未结 4 979
独厮守ぢ
独厮守ぢ 2021-02-03 14:24

I have a dataset, which contains lines in the format (tab separated):

Title<\\t>Text

Now for every word in Text, I want to c

4条回答
  •  不要未来只要你来
    2021-02-03 15:13

    The answer which proved above is not good enough. .map( line => line.split("\t") ) may cause:

    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 18.0 failed 4 times, most recent failure: Lost task 0.3 in stage 18.0 (TID 1485, ip-172-31-113-181.us-west-2.compute.internal, executor 10): java.lang.RuntimeException: Error while encoding: java.lang.ArrayIndexOutOfBoundsException: 14

    in case the last column is empty. the best result explained here - Split 1 column into 3 columns in spark scala

提交回复
热议问题