what is the approach used by BernoulliNB in sklearn package for prediction?

痞子三分冷 提交于 2020-01-16 15:27:17

问题


I was reading up on the implementation of naive bayes in Sklearn, and I was not able to understand the predict part of BernoulliNB:

Code borrowed from source

def _joint_log_likelihood(self, X):
    #.. some code ommited

    neg_prob = np.log(1 - np.exp(self.feature_log_prob_))
    # Compute  neg_prob · (1 - X).T  as  ∑neg_prob - X · neg_prob
    jll = safe_sparse_dot(X, (self.feature_log_prob_ - neg_prob).T)
    jll += self.class_log_prior_ + neg_prob.sum(axis=1)

    return jll

What is the role of neg_prob in this. Can someone explain this approach?

Everywhere I am reading online (source) the simple approach is that:

For word in document:
    For class in all_class:
        class_prob[class] += np.log(class_prob_for[word])
# basically add up the log probability of word given that class.
# (Which is pre computed from training data)

# finally add up the log probability of the class itself.

For class in all_class:
    class_prob[class] += np.log(class_prob_for[class])

But this does not quite give the same result as BernoulliNB

Any information is much appreciated. Please let me know if I should add more detail, thanks.


回答1:


Found out that BernoulliNB is slightly different from MultinomialNB.

As explained here: http://blog.datumbox.com/machine-learning-tutorial-the-naive-bayes-text-classifier/

Terms which don't occur within the document are also used as: (1 - conditional_probability_of_term_in_class)

The Bernoulli variation, as described by Manning et al (2008), generates a Boolean indicator about each term of the vocabulary equal to 1 if the term belongs to the examining document and 0 if it does not. The model of this variation is significantly different from Multinomial not only because it does not take into consideration the number of occurrences of each word, but also because it takes into account the non-occurring terms within the document. While in Multinomial model the non-occurring terms are completely ignored, in Bernoulli model they are factored when computing the conditional probabilities and thus the absence of terms is taken into account.

Algo used in sklearn Source: https://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html



来源:https://stackoverflow.com/questions/49473494/what-is-the-approach-used-by-bernoullinb-in-sklearn-package-for-prediction

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!