I need to do tokenization for preprocessing step of NLP. I should use Naive bayes or logistic regression. How should feature sets be? There is not any example on internet. I