inputs for nDCG in sklearn

本秂侑毒 提交于 2019-12-10 15:01:30

问题


I'm unable to understand the input format of sklearn nDcg: http://sklearn.apachecn.org/en/0.19.0/modules/generated/sklearn.metrics.ndcg_score.html

Currently I have the following problem: I have multiple queries for each of which the ranking probabilities have been calculated successfully. But now the problem is calculating nDCG for the test set for which I would like to use the sklearn nDcg. The example given on the link

>>> y_true = [1, 0, 2]
>>> y_score = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
>>> ndcg_score(y_true, y_score, k=2)
1.0

According to site, y_true is ground truth and y_score are the probabilities.So following are my questions:

  1. Is this example for just one query or multiple queries?
  2. If this is for just one query then what does y_true represents: original rankings?
  3. If this is for a single query and why we have multiple input probabilites?
  4. How this method can be applied to multiple queries and their resultant probabilites?

回答1:


You can look at it similar to a multiclass classification problem.

So to answer your question

  1. Is this example for just one query or multiple queries?

One query

  1. If this is for just one query then what does y_true represents: original rankings?

I would refer to it as the relevancy label for the documents as it may have duplicate values.

  1. If this is for a single query and why we have multiple input probabilites?

y_score is the probability distribution of the document belonging to a certain class. In your example y_score = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]] means the 0th document belongs to class 1 (0.55 is the max), the 1st document belongs to class 0 (0.7 is the max) and the 2nd document belongs to class 2 (0.9 is the max). The documentation is lacking and the example is misleading as well. It would be better if there were four documents.

  1. How this method can be applied to multiple queries and their resultant probabilites?

You can then average the nDCG scores for each query across multiple queries.



来源:https://stackoverflow.com/questions/49989128/inputs-for-ndcg-in-sklearn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!