Feature Selection and Reduction for Text Classification

前端 未结 5 554
北恋
北恋 2020-12-07 07:29

I am currently working on a project, a simple sentiment analyzer such that there will be 2 and 3 classes in separate cases

5条回答
  •  太阳男子
    2020-12-07 08:19

    There's a python library for feature selection TextFeatureSelection. This library provides discriminatory power in the form of score for each word token, bigram, trigram etc.

    Those who are aware of feature selection methods in machine learning, it is based on filter method and provides ML engineers required tools to improve the classification accuracy in their NLP and deep learning models. It has 4 methods namely Chi-square, Mutual information, Proportional difference and Information gain to help select words as features before being fed into machine learning classifiers.

    from TextFeatureSelection import TextFeatureSelection
    
    #Multiclass classification problem
    input_doc_list=['i am very happy','i just had an awesome weekend','this is a very difficult terrain to trek. i wish i stayed back at home.','i just had lunch','Do you want chips?']
    target=['Positive','Positive','Negative','Neutral','Neutral']
    fsOBJ=TextFeatureSelection(target=target,input_doc_list=input_doc_list)
    result_df=fsOBJ.getScore()
    print(result_df)
    
    #Binary classification
    input_doc_list=['i am content with this location','i am having the time of my life','you cannot learn machine learning without linear algebra','i want to go to mars']
    target=[1,1,0,1]
    fsOBJ=TextFeatureSelection(target=target,input_doc_list=input_doc_list)
    result_df=fsOBJ.getScore()
    print(result_df)
    

    Check the project for details: https://pypi.org/project/TextFeatureSelection/

提交回复
热议问题