Co-occurrence matrix from nested list of words

前端 未结 8 818
无人及你
无人及你 2020-11-30 10:27

I have a list of names like:

names = [\'A\', \'B\', \'C\', \'D\']

and a list of documents, that in each documents some of these names are m

8条回答
  •  情话喂你
    2020-11-30 10:36

    Here is another solution using itertools and the Counter class from the collections module.

    import numpy
    import itertools
    from collections import Counter
    
    document =[['A', 'B'], ['C', 'B'],['A', 'B', 'C', 'D']]
    
    # Get all of the unique entries you have
    varnames = tuple(sorted(set(itertools.chain(*document))))
    
    # Get a list of all of the combinations you have
    expanded = [tuple(itertools.combinations(d, 2)) for d in document]
    expanded = itertools.chain(*expanded)
    
    # Sort the combinations so that A,B and B,A are treated the same
    expanded = [tuple(sorted(d)) for d in expanded]
    
    # count the combinations
    c = Counter(expanded)
    
    
    # Create the table
    table = numpy.zeros((len(varnames),len(varnames)), dtype=int)
    
    for i, v1 in enumerate(varnames):
        for j, v2 in enumerate(varnames[i:]):        
            j = j + i 
            table[i, j] = c[v1, v2]
            table[j, i] = c[v1, v2]
    
    # Display the output
    for row in table:
        print(row)
    

    The output (which could be easilty turned into a DataFrame) is:

    [0 2 1 1]
    [2 0 2 1]
    [1 2 0 1]
    [1 1 1 0]
    

提交回复
热议问题