MemoryError in toarray when using DictVectorizer of Scikit Learn

后端 未结 7 2106
無奈伤痛
無奈伤痛 2021-01-06 05:47

I am trying to implement the SelectKBest algorithm on my data to get the best features out of it. For this I am first preprocessing my data using DictVectorizer and the data

7条回答
  •  我在风中等你
    2021-01-06 06:39

    The problem was toarray(). DictVetorizer from sklearn (which is designed for vectorizing categorical features with high cardinality) outputs sparse matrices by default. You are running out of memory because you require the dense representation by calling fit_transform().toarray().

    Just use:

    quote_data = DV.fit_transform(quote_data)
    

提交回复
热议问题