how to get the log likelihood for a logistic regression model in sklearn?

旧时模样 提交于 2019-12-08 07:51:01

问题


I'm using a logistic regression model in sklearn and I am interested in retrieving the log likelihood for such a model, so to perform an ordinary likelihood ratio test as suggested here.

The model is using the log loss as scoring rule. In the documentation, the log loss is defined "as the negative log-likelihood of the true labels given a probabilistic classifier’s predictions". However, the value is always positive, whereas the log likelihood should be negative. As an example:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss

lr = LogisticRegression()
lr.fit(X_train, y_train)
y_prob = lr.predict_proba(X_test)
log_loss(y_test, y_prob)    # 0.66738

I do not see any method in the documentation for the model, is there any other possibility that I am currently not aware of?


回答1:


Read closely; the log loss is the negative log-likelihood. Since log-likelihood is indeed as you say negative, its negative will be a positive number.

Let's see an example with dummy data:

from sklearn.metrics import log_loss
import numpy as np

y_true = np.array([0, 1, 1])
y_pred = np.array([0.1, 0.2, 0.9])

log_loss(y_true, y_pred)
# 0.60671964791658428

Now, let's compute manually the log-likelihood elements (i.e. one value per label-prediction pair), using the formula given in the scikit-learn docs you have linked to without the minus sign:

log_likelihood_elements = y_true*np.log(y_pred) + (1-y_true)*np.log(1-y_pred)
log_likelihood_elements
# array([-0.10536052, -1.60943791, -0.10536052])

Now, given the log-likelihood elements (which are indeed negative), the log loss is the negative of their sum, divided by the number of samples:

-np.sum(log_likelihood_elements)/len(y_true)
# 0.60671964791658428

log_loss(y_true, y_pred) == -np.sum(log_likelihood_elements)/len(y_true)
# True



回答2:


To get the log likelihood you can calculate:

-log_loss(y_true, y_pred)*len(y_true)


来源:https://stackoverflow.com/questions/48185090/how-to-get-the-log-likelihood-for-a-logistic-regression-model-in-sklearn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!