Scikit-Learn's Pipeline: A sparse matrix was passed, but dense data is required

后端 未结 5 2116
傲寒
傲寒 2020-12-07 19:04

I\'m finding it difficult to understand how to fix a Pipeline I created (read: largely pasted from a tutorial). It\'s python 3.4.2:

df = pd.DataFrame
df = Da         


        
5条回答
  •  [愿得一人]
    2020-12-07 19:35

    you can change pandas Series to arrays using the .values method.

    pipeline.fit(df[0].values, df[1].values)
    

    However I think the issue here happens because CountVectorizer() returns a sparse matrix by default, and cannot be piped to the RF classifier. CountVectorizer() does have a dtype parameter to specify the type of array returned. That said usually you need to do some sort of dimensionality reduction to use random forests for text classification, because bag of words feature vectors are very long

提交回复
热议问题