I\'m trying to run this example from apache.spark.org (code is below & entire tutorial is here: https://spark.apache.org/docs/latest/mllib-feature-extraction.html) using
sc.textFile splits on newlines only, and text8 contains no newlines.
You are creating a 1-row RDD. .map(line => line.split(" ").toSeq) creates another 1-row RDD of type RDD[Seq[String]].
Word2Vec works best with 1 sentence per row of RDD (and this should also avoid Java heap errors). Unfortunately text8 has had periods stripped out so you can't just split on them, but you can find the raw version here as well as the perl script used to process it, and it isn't hard to edit the script to not remove periods.