I have a dataset, which contains lines in the format (tab separated):
Title<\\t>Text
Now for every word in Text
, I want to c
The answer which proved above is not good enough.
.map( line => line.split("\t") )
may cause:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 18.0 failed 4 times, most recent failure: Lost task 0.3 in stage 18.0 (TID 1485, ip-172-31-113-181.us-west-2.compute.internal, executor 10): java.lang.RuntimeException: Error while encoding: java.lang.ArrayIndexOutOfBoundsException: 14
in case the last column is empty. the best result explained here - Split 1 column into 3 columns in spark scala