Pandas sparse dataFrame to sparse matrix, without generating a dense matrix in memory

后端 未结 6 1409
北荒
北荒 2020-11-30 08:48

Is there a way to convert from a pandas.SparseDataFrame to scipy.sparse.csr_matrix, without generating a dense matrix in memory?

sc         


        
6条回答
  •  不知归路
    2020-11-30 09:53

    Here's a solution that fills the sparse matrix column by column (assumes you can fit at least one column to memory).

    import pandas as pd
    import numpy as np
    from scipy.sparse import lil_matrix
    
    def sparse_df_to_array(df):
        """ Convert sparse dataframe to sparse array csr_matrix used by
        scikit learn. """
        arr = lil_matrix(df.shape, dtype=np.float32)
        for i, col in enumerate(df.columns):
            ix = df[col] != 0
            arr[np.where(ix), i] = df.ix[ix, col]
    
        return arr.tocsr()
    

提交回复
热议问题