I am trying to implement word vectorization using Spark\'s MLLib. I am following the example given here.
I have bunch of sentences which I want to give as input to t
Your input is correct. However, Word2Vec will automatically remove words that do not occur a minimum number of times in the vocabulary (all sentences combined). By default this value is 5. In your case, it is highly likely that no word occurs 5 or more times in the data you use.
To change the minimum required word occurrences use setMinCount(), for example a min count of 2:
val word2vec = new Word2Vec().setMinCount(2)