Word2vec is a open source tool to calculate the words distance provided by Google. It can be used by inputting a word and output the ranked word lists according to the simil
I just stumbled on this while looking for how to do this by modifying the original distance.c version, not by using another library like gensim.
I didn't find an answer so I did some research, and am sharing it here for others who also want to know how to do it in the original implementation.
After looking through the C source, you will find that 'bi' is an array of indexes. If you provide two words, the index for word1 will be in bi[0] and the index of word2 will be in bi[1].
The model 'M' is an array of vectors. Each word is represented as a vector with dimension 'size'.
Using these two indexes and the model of vectors, look them up and calculate the cosine distance (which is the same as the dot product) like this:
dist = 0;
for (a = 0; a < size; a++) {
dist += M[a + bi[0] * size] * M[a + bi[1] * size];
}
after this completes, the value 'dist' is the cosine similarity between the two words.