问题
I'm unable to understand the input format of sklearn nDcg: http://sklearn.apachecn.org/en/0.19.0/modules/generated/sklearn.metrics.ndcg_score.html
Currently I have the following problem: I have multiple queries for each of which the ranking probabilities have been calculated successfully. But now the problem is calculating nDCG for the test set for which I would like to use the sklearn nDcg. The example given on the link
>>> y_true = [1, 0, 2]
>>> y_score = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
>>> ndcg_score(y_true, y_score, k=2)
1.0
According to site, y_true is ground truth and y_score are the probabilities.So following are my questions:
- Is this example for just one query or multiple queries?
- If this is for just one query then what does y_true represents: original rankings?
- If this is for a single query and why we have multiple input probabilites?
- How this method can be applied to multiple queries and their resultant probabilites?
回答1:
You can look at it similar to a multiclass classification problem.
So to answer your question
- Is this example for just one query or multiple queries?
One query
- If this is for just one query then what does y_true represents: original rankings?
I would refer to it as the relevancy label for the documents as it may have duplicate values.
- If this is for a single query and why we have multiple input probabilites?
y_score
is the probability distribution of the document belonging to a certain class. In your example y_score = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
means the 0th document belongs to class 1 (0.55 is the max), the 1st document belongs to class 0 (0.7 is the max) and the 2nd document belongs to class 2 (0.9 is the max). The documentation is lacking and the example is misleading as well. It would be better if there were four documents.
- How this method can be applied to multiple queries and their resultant probabilites?
You can then average the nDCG scores for each query across multiple queries.
来源:https://stackoverflow.com/questions/49989128/inputs-for-ndcg-in-sklearn