问题
I am utilizing pandas to create a dataframe that appears as follows:
ratings = pandas.DataFrame({
'article_a':[1,1,0,0],
'article_b':[1,0,0,0],
'article_c':[1,0,0,0],
'article_d':[0,0,0,1],
'article_e':[0,0,0,1]
},index=['Alice','Bob','Carol','Dave'])
I would like to compute another matrix from this input one that will compare each row against all other rows. Let's assume for example the computation was a function to find the length of the intersection set, I'd like an output DataFrame with the len(intersection(Alice,Bob)), len(intersection(Alice,Carol)), len(intersection(Alice,Dave)) in the first row, with each row following that format against the others. Using this example input, the output matrix would be 4x3:
len(intersection(Alice,Bob)),len(intersection(Alice,Carol)),len(intersection(Alice,Dave))
len(intersection(Bob,Alice)),len(intersection(Bob,Carol)),len(intersection(Bob,Dave))
len(intersection(Carol,Alice)),len(intersection(Carol,Bob)),len(intersection(Carol,Dave))
len(intersection(Dave,Alice)),len(intersection(Dave,Bob)),len(intersection(Dave,Carol))
Is there a named method for this kind of function based computation in pandas? What would be the most efficient way to accomplish this?
回答1:
I am not aware of a named method, but I have a one-liner.
In [21]: ratings.apply(lambda row: ratings.apply(
... lambda x: np.equal(row, x), 1).sum(1), 1)
Out[21]:
Alice Bob Carol Dave
Alice 5 3 2 0
Bob 3 5 4 2
Carol 2 4 5 3
Dave 0 2 3 5
回答2:
@Dan Allan solution is 'right', here's a slightly different way of approaching the problem
In [26]: ratings
Out[26]:
article_a article_b article_c article_d article_e
Alice 1 1 1 0 0
Bob 1 0 0 0 0
Carol 0 0 0 0 0
Dave 0 0 0 1 1
In [27]: ratings.apply(lambda x: (ratings.T.sub(x,'index')).sum(),1)
Out[27]:
Alice Bob Carol Dave
Alice 0 -2 -3 -1
Bob 2 0 -1 1
Carol 3 1 0 2
Dave 1 -1 -2 0
来源:https://stackoverflow.com/questions/16924421/pandas-apply-function-to-current-row-against-all-other-rows