Word2Vec: Number of Dimensions

后端 未结 3 756
再見小時候
再見小時候 2020-12-28 17:02

I am using Word2Vec with a dataset of roughly 11,000,000 tokens looking to do both word similarity (as part of synonym extraction for a downstream task) but I don\'t have a

3条回答
  •  醉话见心
    2020-12-28 17:41

    The number of dimensions reflects the over/under fitting. 100-300 dimensions is the common knowledge. Start with one number and check the accuracy of your testing set versus training set. The bigger the dimension size the easier it will be overfit on the training set and had bad performance on the test. Tuning this parameter is required in case you have high accuracy on training set and low accuracy on the testing set, this means that the dimension size is too big and reducing it might solve the overfitting problem of your model.

提交回复
热议问题