How to calculate the sentence similarity using word2vec model of gensim with python

后端 未结 14 1328
一向
一向 2020-11-28 00:31

According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words.

e.g.

trained_model.simi         


        
14条回答
  •  醉酒成梦
    2020-11-28 01:00

    If not using Word2Vec we have other model to find it using BERT for embed. Below are reference link https://github.com/UKPLab/sentence-transformers

    pip install -U sentence-transformers
    
    from sentence_transformers import SentenceTransformer
    import scipy.spatial
    
    embedder = SentenceTransformer('bert-base-nli-mean-tokens')
    
    # Corpus with example sentences
    corpus = ['A man is eating a food.',
              'A man is eating a piece of bread.',
              'The girl is carrying a baby.',
              'A man is riding a horse.',
              'A woman is playing violin.',
              'Two men pushed carts through the woods.',
              'A man is riding a white horse on an enclosed ground.',
              'A monkey is playing drums.',
              'A cheetah is running behind its prey.'
              ]
    corpus_embeddings = embedder.encode(corpus)
    
    # Query sentences:
    queries = ['A man is eating pasta.', 'Someone in a gorilla costume is playing a set of drums.', 'A cheetah chases prey on across a field.']
    query_embeddings = embedder.encode(queries)
    
    # Find the closest 5 sentences of the corpus for each query sentence based on cosine similarity
    closest_n = 5
    for query, query_embedding in zip(queries, query_embeddings):
        distances = scipy.spatial.distance.cdist([query_embedding], corpus_embeddings, "cosine")[0]
    
        results = zip(range(len(distances)), distances)
        results = sorted(results, key=lambda x: x[1])
    
        print("\n\n======================\n\n")
        print("Query:", query)
        print("\nTop 5 most similar sentences in corpus:")
    
        for idx, distance in results[0:closest_n]:
            print(corpus[idx].strip(), "(Score: %.4f)" % (1-distance))
    

    Other Link to follow https://github.com/hanxiao/bert-as-service

提交回复
热议问题