Feature Selection and Reduction for Text Classification

前端未结

关注

 5  554

北恋 2020-12-07 07:29

I am currently working on a project, a simple sentiment analyzer such that there will be 2 and 3 classes in separate cases

5条回答

太阳男子 (楼主)

2020-12-07 08:19

There's a python library for feature selection TextFeatureSelection. This library provides discriminatory power in the form of score for each word token, bigram, trigram etc.

Those who are aware of feature selection methods in machine learning, it is based on filter method and provides ML engineers required tools to improve the classification accuracy in their NLP and deep learning models. It has 4 methods namely Chi-square, Mutual information, Proportional difference and Information gain to help select words as features before being fed into machine learning classifiers.

from TextFeatureSelection import TextFeatureSelection

#Multiclass classification problem
input_doc_list=['i am very happy','i just had an awesome weekend','this is a very difficult terrain to trek. i wish i stayed back at home.','i just had lunch','Do you want chips?']
target=['Positive','Positive','Negative','Neutral','Neutral']
fsOBJ=TextFeatureSelection(target=target,input_doc_list=input_doc_list)
result_df=fsOBJ.getScore()
print(result_df)

#Binary classification
input_doc_list=['i am content with this location','i am having the time of my life','you cannot learn machine learning without linear algebra','i want to go to mars']
target=[1,1,0,1]
fsOBJ=TextFeatureSelection(target=target,input_doc_list=input_doc_list)
result_df=fsOBJ.getScore()
print(result_df)

Check the project for details: https://pypi.org/project/TextFeatureSelection/

0 讨论(0)

查看其它5个回答