How to normalize a Counter and combine 2 normalized Counters? - python

你说的曾经没有我的故事 提交于 2019-12-06 02:15:22

问题


Firstly, I have two list of strings:

['abc','abc','def','jkl']
['abc','def','def','pqr', 'pr', 'foo', 'bar']

And then I need counters of the lists that are normalized such that the sum of the values in each counter equals 1:

Counter({'abc': 0.8164965809277261, 'jkl': 0.4082482904638631, 'def': 0.4082482904638631})
Counter({'abc': 1.1498299142610595, 'def': 1.0749149571305296, 'jkl': 0.4082482904638631, 'pr': 0.3333333333333333, 'bar': 0.3333333333333333, 'pqr': 0.3333333333333333, 'foo': 0.3333333333333333})

The normalizing factor is

math.sqrt(sum(i*i for i in counter.values()))

I've tried the following by iterating throw the counter keys but is there any other way of achieving the say x+y Counter?

>>> from collections import Counter
>>> import math
>>> x = Counter(['abc','abc','def','jkl'])
>>> denominator = 1/math.sqrt(sum(math.pow(i,2) for i in x.values()))
>>> for i in x:
...     x[i]*=denominator
... 
>>> x
Counter({'abc': 0.8164965809277261, 'jkl': 0.4082482904638631, 'def': 0.4082482904638631})
>>> y = Counter(['abc','def','def','pqr', 'pr', 'foo', 'bar'])
>>> denominator2 = 1/math.sqrt(sum(math.pow(i,2) for i in y.values()))
>>> for i in y:
...     y[i]*=denominator2
... 
>>> y
Counter({'def': 0.6666666666666666, 'pr': 0.3333333333333333, 'abc': 0.3333333333333333, 'bar': 0.3333333333333333, 'pqr': 0.3333333333333333, 'foo': 0.3333333333333333})
>>> x+y
Counter({'abc': 1.1498299142610595, 'def': 1.0749149571305296, 'jkl': 0.4082482904638631, 'pr': 0.3333333333333333, 'bar': 0.3333333333333333, 'pqr': 0.3333333333333333, 'foo': 0.3333333333333333})

回答1:


You need to sum the values, then divide each count by the sum:

total = sum(x.values(), 0.0)
for key in x:
    x[key] /= total

By starting the sum with 0.0 we make sure total is a floating point value, avoiding the Python 2 floor division behaviour of / with integer operands.

Demo:

>>> from collections import Counter
>>> x = Counter(['abc','abc','def','jkl'])
>>> total = sum(x.values(), 0.0)
>>> for key in x:
...     x[key] /= total
... 
>>> x
Counter({'abc': 0.5, 'jkl': 0.25, 'def': 0.25})
>>> y = Counter(['abc','def','def','pqr', 'pr', 'foo', 'bar'])
>>> total = sum(y.values(), 0.0)
>>> for key in y:
...     y[key] /= total
... 
>>> y
Counter({'def': 0.2857142857142857, 'pr': 0.14285714285714285, 'abc': 0.14285714285714285, 'bar': 0.14285714285714285, 'pqr': 0.14285714285714285, 'foo': 0.14285714285714285})

If you need to sum the counters, you'd need to re-normalize the resulting counter separately; summing two normalized counters means you have a new counter whole values sum to 2, for example.




回答2:


Normalization of a Counter object (c1) of a List object (l1) is dividing each counts by the total elements in the list that is the lenght of the list (total). This is less costly comparing with calculating the total counts in (c1) like sum(c1.values(), 0.0).

The following example on the first list given can be used:

l1 = ['abc','abc','def','jkl']
c1 = Counter(l1)
# Normalization
total = 1.0 * len(l1) # converting to float to avoid floor division in Python 2.X
for k in c1:
    c1[k] /= total


来源:https://stackoverflow.com/questions/22428842/how-to-normalize-a-counter-and-combine-2-normalized-counters-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!