I am currently working on a project, a simple sentiment analyzer such that there will be 2 and 3 classes in separate cases
I would recommend dimensionality reduction instead of feature selection. Consider either singular value decomposition, principal component analysis, or even better considering it's tailored for bag-of-words representations, Latent Dirichlet Allocation. This will allow you to notionally retain representations that include all words, but to collapse them to fewer dimensions by exploiting similarity (or even synonymy-type) relations between them.
All these methods have fairly standard implementations that you can get access to and run---if you let us know which language you're using, I or someone else will be able to point you in the right direction.