check if two words are related to each other

前端 未结 3 1042
傲寒
傲寒 2020-12-18 09:56

I have two lists: one, the interests of the user; and second, the keywords about a book. I want to recommend the book to the user based on his given interests list. I am usi

3条回答
  •  一整个雨季
    2020-12-18 10:25

    At first, I thought to regular expressions to perform additional tests to discriminate the matchings with low ratio. It can be a solution to treat specific problem like the one happening with words ending with ing. But that's only a limited case and thre can be numerous other cases that would need to add specific treatment for each one.

    Then I thought that we could try to find additional criterium to eliminate not semantically matching words having a letters simlarity ratio enough to be detected as matcging together though the ratio is low,
    WHILE in the same time catching real semantically matching terms having low ratio because they are short.

    Here's a possibility

    from difflib import SequenceMatcher
    
    interests = ('shooting','gaming','looping')
    keywords = ('loop','looping','game')
    
    s = SequenceMatcher(None)
    
    limit = 0.50
    
    for interest in interests:
        s.set_seq2(interest)
        for keyword in keywords:
            s.set_seq1(keyword)
            b = s.ratio()>=limit and len(s.get_matching_blocks())==2
            print '%10s %-10s  %f  %s' % (interest, keyword,
                                          s.ratio(),
                                          '** MATCH **' if b else '')
        print
    

    gives

      shooting loop        0.333333  
      shooting looping     0.666667  
      shooting game        0.166667  
    
        gaming loop        0.000000  
        gaming looping     0.461538  
        gaming game        0.600000  ** MATCH **
    
       looping loop        0.727273  ** MATCH **
       looping looping     1.000000  ** MATCH **
       looping game        0.181818  
    

    Note this from the doc:

    SequenceMatcher computes and caches detailed information about the second sequence, so if you want to compare one sequence against many sequences, use set_seq2() to set the commonly used sequence once and call set_seq1() repeatedly, once for each of the other sequences.

提交回复
热议问题