Part of Speech (POS) tag Feature Selection for Text Classification

我怕爱的太早我们不能终老 提交于 2019-12-09 07:01:42

问题


I have the POS tag sentences obtain using Stanford POS tagger. Eg:

The/DT island/NN was/VBD very/RB beautiful/JJ ./. I/PRP love/VBP it/PRP ./.

(xml format also available)

Can anyone explain how to perform feature selection from this POS tag sentences and convert them into feature vector for text classification using machine learning method.


回答1:


A simple way to start out would be something like the following (assuming word order is not important for your classification algorithm).

First you would manually classify a number of sentences. This is your training dataset. Generally, the more sentences you manually classify from each class, the greater accuracy you will achieve. For a supervised approach like this, keep in mind that the only features being selected would be from your manually classified sentences. Your features are each unique combination of word/POS over all your training sentences.

Finally, you must choose a feature selection algorithm. There are many out there, but a popular one is chi-squared. Some others are Information Gain, Mutual Information, etc. Using chi-squared, you would measure the dependence of the class variable on each feature individually. You would pick some threshold, such as the top 10% of features with the lowest chi-squared value, and only keep those features to later use in your classifier.

The choice of feature selection algorithm is important, and needs to reflect the algorithm you are using. For example, chi-squared is good when you want to find features that both positively and negatively correlate to your class. In other circumstances, you might only want positively correlated features, so you would need to pick another algorithm or modify an existing one.

Hope that helps, William Riley-Land



来源:https://stackoverflow.com/questions/5499448/part-of-speech-pos-tag-feature-selection-for-text-classification

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!