Scikit-Learn's Pipeline: A sparse matrix was passed, but dense data is required

后端 未结 5 2132
傲寒
傲寒 2020-12-07 19:04

I\'m finding it difficult to understand how to fix a Pipeline I created (read: largely pasted from a tutorial). It\'s python 3.4.2:

df = pd.DataFrame
df = Da         


        
5条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-07 19:40

    The most terse solution would be use a FunctionTransformer to convert to dense: this will automatically implement the fit, transform and fit_transform methods as in David's answer. Additionally if I don't need special names for my pipeline steps, I like to use the sklearn.pipeline.make_pipeline convenience function to enable a more minimalist language for describing the model:

    from sklearn.preprocessing import FunctionTransformer
    
    pipeline = make_pipeline(
         CountVectorizer(), 
         FunctionTransformer(lambda x: x.todense(), accept_sparse=True), 
         RandomForestClassifier()
    )
    

提交回复
热议问题