MemoryError in toarray when using DictVectorizer of Scikit Learn

后端未结

关注

 7  2106

無奈伤痛 2021-01-06 05:47

I am trying to implement the SelectKBest algorithm on my data to get the best features out of it. For this I am first preprocessing my data using DictVectorizer and the data

7条回答

我在风中等你 (楼主)

2021-01-06 06:39
The problem was toarray(). DictVetorizer from sklearn (which is designed for vectorizing categorical features with high cardinality) outputs sparse matrices by default. You are running out of memory because you require the dense representation by calling fit_transform().toarray().

Just use:
```
quote_data = DV.fit_transform(quote_data)
```
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...