I\'m finding it difficult to understand how to fix a Pipeline I created (read: largely pasted from a tutorial). It\'s python 3.4.2:
df = pd.DataFrame
df = Da
you can change pandas Series
to arrays using the .values
method.
pipeline.fit(df[0].values, df[1].values)
However I think the issue here happens because CountVectorizer()
returns a sparse matrix by default, and cannot be piped to the RF classifier. CountVectorizer()
does have a dtype
parameter to specify the type of array returned. That said usually you need to do some sort of dimensionality reduction to use random forests for text classification, because bag of words feature vectors are very long