Is there a way to convert from a pandas.SparseDataFrame to scipy.sparse.csr_matrix, without generating a dense matrix in memory?
sc
Here's a solution that fills the sparse matrix column by column (assumes you can fit at least one column to memory).
import pandas as pd
import numpy as np
from scipy.sparse import lil_matrix
def sparse_df_to_array(df):
""" Convert sparse dataframe to sparse array csr_matrix used by
scikit learn. """
arr = lil_matrix(df.shape, dtype=np.float32)
for i, col in enumerate(df.columns):
ix = df[col] != 0
arr[np.where(ix), i] = df.ix[ix, col]
return arr.tocsr()