How to use PoS tag as a feature for training data by Naive Bayes classifier?

我是研究僧i 提交于 2019-12-11 03:05:33

问题


I'm researching how to extract keyphrases from document for my thesis.

In my research, I used Naive Bayes classifier machine learning for creating a training model of the candidate term features. One of features is PoS tag, I think this feature is important for specifying a term is keyphrase or not.

But the input of Naive Bayes (NB) classifier is numbers and the PoS tag is a string.

So I don't know the way to represent PoS tag feature as a number in order to become a input feature for NB classifier.

Please help me to give your advice.

Thanks and regards, Hien Su


回答1:


You can treat POS tag as a word. Then you can use POS unigram, bigram or trigram as feature.

Example:

They/PRP refuse/VBP to/TO permit/VB us/PRB to/TO obtain/VB the/DT refuse/NN permit/NN.

If you take POS trigrams as features. You can construct a vector with following features.

Feature          Value
(PRP,VBP,TO)      1
(VBP,TO,VB)       1 
(TO,VB,PRB)       1

and so on.

You can also use the tf-idf value for POS features.



来源:https://stackoverflow.com/questions/31091082/how-to-use-pos-tag-as-a-feature-for-training-data-by-naive-bayes-classifier

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!