Using Word2Vec for polysemy solving problems

旧街凉风 提交于 2019-12-24 17:50:15

问题


I have some questions about Word2Vec:

  1. What determines the dimension of the result model vectors?

  2. What is elements of this vectors?

  3. Can I use Word2Vec for polysemy solving problems (state = administrative unit vs state = condition), if I already have texts for every meaning of words?


回答1:


(1) You pick the desired dimensionality, as a meta-parameter of the model. Rigorous projects with enough time may try different sizes, to see what works best for their qualitative evaluations.

(2) Individual dimensions/elements of each word-vector (floating-point numbers), in vanilla word2vec are not easily interpretable. It's only the arrangement of words as a whole that has usefulness – placing similar words near each other, and making relative directions (eg "towards 'queen' from 'king'") match human intuitions about categories/continuous-properties. And, because the algorithms use explicit randomization, and optimized multi-threaded operation introduces thread-scheduling randomness to the order-of-training-examples, even the exact same data can result in different (but equally good) vector-coordinates from run-to-run.

(3) Basic word2vec doesn't have an easy fix, but there's a bunch of hints of polysemy in the vectors, and research work to do more to disambiguate contrasting senses.

For example, generally more-polysemous word-tokens wind up with word-vectors that are some combination of their multiple senses, and (often) of a smaller-magnitude than less-polysemous words.

This early paper used multiple representations per word to help discover polysemy. Similar later papers like this one use clustering-of-contexts to discover polysemous words then relabel them to give each sense its own vector.

This paper manages an impressive job of detecting alternate senses via postprocessing of normal word2vec vectors.



来源:https://stackoverflow.com/questions/51330549/using-word2vec-for-polysemy-solving-problems

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!