R: Naives Bayes classifier bases decision only on a-priori probabilities

余生颓废 提交于 2019-12-04 09:55:27

问题


I'm trying to classify tweets according to their sentiment into three categories (Buy, Hold, Sell). I'm using R and the package e1071.

I have two data frames: one trainingset and one set of new tweets which sentiment need to be predicted.

trainingset dataframe:

   +--------------------------------------------------+

   **text | sentiment**

   *this stock is a good buy* | Buy

   *markets crash in tokyo* | Sell

   *everybody excited about new products* | Hold

   +--------------------------------------------------+

Now I want to train the model using the tweet text trainingset[,2] and the sentiment category trainingset[,4].

classifier<-naiveBayes(trainingset[,2],as.factor(trainingset[,4]), laplace=1)

Looking into the elements of classifier with

classifier$tables$x

I find that the conditional probabilities are calculated..There are different probabilities for every tweet concerning Buy,Hold and Sell.So far so good.

However when I predict the training set with:

predict(classifier, trainingset[,2], type="raw")

I get a classification which is based only on the a-priori probabilities, which means every tweet is classified as Hold (because "Hold" had the largest share among the sentiment). So every tweet has the same probabilities for Buy, Hold, and Sell:

      +--------------------------------------------------+

      **Id | Buy | Hold | Sell**

      1  |0.25 | 0.5  | 0.25

      2  |0.25 | 0.5  | 0.25

      3  |0.25 | 0.5  | 0.25

     ..  |..... | ....  | ...

      N  |0.25 | 0.5  | 0.25

     +--------------------------------------------------+

Any ideas what I'm doing wrong? Appreciate your help!

Thanks


回答1:


It looks like you trained the model using whole sentences as inputs, while it seems that you want to use words as your input features.

Usage:

## S3 method for class 'formula'
naiveBayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)
## Default S3 method:
naiveBayes(x, y, laplace = 0, ...)


## S3 method for class 'naiveBayes'
predict(object, newdata,
  type = c("class", "raw"), threshold = 0.001, ...)

Arguments:

  x: A numeric matrix, or a data frame of categorical and/or
     numeric variables.

  y: Class vector.

In particular, if you train naiveBayes this way:

x <- c("john likes cake", "marry likes cats and john")
y <- as.factor(c("good", "bad")) 
bayes<-naiveBayes( x,y )

you get a classifier able to recognize just these two sentences:

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = x,y = y)

A-priori probabilities:
y
 bad good 
 0.5  0.5 

Conditional probabilities:
            x
      x
y      john likes cake marry likes cats and john
  bad                0                         1
  good               1                         0

to achieve a word level classifier you need to run it with words as inputs

x <-             c("john","likes","cake","marry","likes","cats","and","john")
y <- as.factors( c("good","good", "good","bad",  "bad",  "bad", "bad","bad") )
bayes<-naiveBayes( x,y )

you get

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = x,y = y)

A-priori probabilities:
y
 bad good 
 0.625 0.375 

Conditional probabilities:
      x
y            and      cake      cats      john     likes     marry
  bad  0.2000000 0.0000000 0.2000000 0.2000000 0.2000000 0.2000000
  good 0.0000000 0.3333333 0.0000000 0.3333333 0.3333333 0.0000000

In general R is not well suited for processing NLP data, python (or at least Java) would be much better choice.

To convert a sentence to the words, you can use the strsplit function

unlist(strsplit("john likes cake"," "))
[1] "john"  "likes" "cake" 


来源:https://stackoverflow.com/questions/18257260/r-naives-bayes-classifier-bases-decision-only-on-a-priori-probabilities

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!