what is the minimum dataset size needed for good performance with doc2vec?
问题 How does doc2vec perform when trained on different sized datasets? There is no mention of dataset size in the original corpus, so I am wondering what is the minimum size required to get good performance out of doc2vec. 回答1: A bunch of things have been called 'doc2vec', but it seems to most-often refer to the 'Paragraph Vector' technique from Le and Mikolov. The original 'Paragraph Vector' paper describes evaluating it on three datasets: 'Stanford Sentiment Treebank': 11,825 sentences of movie