Fuzzy Group By, Grouping Similar Words

后端 未结 5 773
耶瑟儿~
耶瑟儿~ 2020-12-10 07:44

this question is asked here before

What is a good strategy to group similar words?

but no clear answer is given on how to \"group\" items. The solution based

5条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-10 08:06

    Here is an approach based on medoids. First install MlPy. On Ubuntu

    sudo apt-get install python-mlpy
    

    Then

    import numpy as np
    import mlpy
    
    class distance:    
        def compute(self, s1, s2):
            l1 = len(s1)
            l2 = len(s2)
            matrix = [range(zz,zz + l1 + 1) for zz in xrange(l2 + 1)]
            for zz in xrange(0,l2):
                for sz in xrange(0,l1):
                    if s1[sz] == s2[zz]:
                        matrix[zz+1][sz+1] = min(matrix[zz+1][sz] + 1, matrix[zz][sz+1] + 1, matrix[zz][sz])
                    else:
                        matrix[zz+1][sz+1] = min(matrix[zz+1][sz] + 1, matrix[zz][sz+1] + 1, matrix[zz][sz] + 1)
            return matrix[l2][l1]
    
    x =  np.array(['ape', 'appel', 'apple', 'peach', 'puppy'])
    
    km = mlpy.Kmedoids(k=3, dist=distance())
    medoids,clusters,a,b = km.compute(x)
    
    print medoids
    print clusters
    print a
    
    print x[medoids] 
    for i,c in enumerate(x[medoids]):
        print "medoid", c
        print x[clusters[a==i]]
    

    The output is

    [4 3 1]
    [0 2]
    [2 2]
    ['puppy' 'peach' 'appel']
    medoid puppy
    []
    medoid peach
    []
    medoid appel
    ['ape' 'apple']
    

    The bigger word list and using k=10

    medoid he
    ['or' 'his' 'my' 'have' 'if' 'year' 'of' 'who' 'us' 'use' 'people' 'see'
     'make' 'be' 'up' 'we' 'the' 'one' 'her' 'by' 'it' 'him' 'she' 'me' 'over'
     'after' 'get' 'what' 'I']
    medoid out
    ['just' 'only' 'your' 'you' 'could' 'our' 'most' 'first' 'would' 'but'
     'about']
    medoid to
    ['from' 'go' 'its' 'do' 'into' 'so' 'for' 'also' 'no' 'two']
    medoid now
    ['new' 'how' 'know' 'not']
    medoid time
    ['like' 'take' 'come' 'some' 'give']
    medoid because
    []
    medoid an
    ['want' 'on' 'in' 'back' 'say' 'and' 'a' 'all' 'can' 'as' 'way' 'at' 'day'
     'any']
    medoid look
    ['work' 'good']
    medoid will
    ['with' 'well' 'which']
    medoid then
    ['think' 'that' 'these' 'even' 'their' 'when' 'other' 'this' 'they' 'there'
     'than' 'them']
    

提交回复
热议问题