I\'m finding it difficult to understand how to fix a Pipeline I created (read: largely pasted from a tutorial). It\'s python 3.4.2:
df = pd.DataFrame
df = Da
The most terse solution would be use a FunctionTransformer to convert to dense: this will automatically implement the fit, transform and fit_transform methods as in David's answer. Additionally if I don't need special names for my pipeline steps, I like to use the sklearn.pipeline.make_pipeline convenience function to enable a more minimalist language for describing the model:
from sklearn.preprocessing import FunctionTransformer
pipeline = make_pipeline(
CountVectorizer(),
FunctionTransformer(lambda x: x.todense(), accept_sparse=True),
RandomForestClassifier()
)