Merging CountVectorizer in Scikit-Learn feature extraction

大城市里の小女人 提交于 2019-12-05 22:04:55

You can try:

vecA = CountVectorizer(token_pattern="[a-zA-Z]+", ...)
vecA.fit_transform(list_of_type_A_document_content)
vecB = CountVectorizer(token_pattern="[a-zA-Z0-9]+", ...)
vecB.fit_transform(list_of_type_B_document_content)
combined_features = FeatureUnion([('CountVectorizer', vectA),('CountVect', vectB)])
combined_features.transform(test_data)

You can read more about FeatureUnion from http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html

which is available from version 0.13.1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!