发表新帖

发表新帖

Spark MLib Word2Vec Error: The vocabulary size should be > 0

前端未结

关注

 1  854

温柔的废话 2020-12-21 12:56

I am trying to implement word vectorization using Spark\'s MLLib. I am following the example given here.

I have bunch of sentences which I want to give as input to t

1条回答

暖寄归人 (楼主)

2020-12-21 13:09
Your input is correct. However, Word2Vec will automatically remove words that do not occur a minimum number of times in the vocabulary (all sentences combined). By default this value is 5. In your case, it is highly likely that no word occurs 5 or more times in the data you use.

To change the minimum required word occurrences use setMinCount(), for example a min count of 2:
```
val word2vec = new Word2Vec().setMinCount(2)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题