Pandas sparse dataFrame to sparse matrix, without generating a dense matrix in memory

后端 未结 6 1397
北荒
北荒 2020-11-30 08:48

Is there a way to convert from a pandas.SparseDataFrame to scipy.sparse.csr_matrix, without generating a dense matrix in memory?

sc         


        
6条回答
  •  迷失自我
    2020-11-30 09:47

    Pandas 0.20.0+:

    As of pandas version 0.20.0, released May 5, 2017, there is a one-liner for this:

    from scipy import sparse
    
    
    def sparse_df_to_csr(df):
        return sparse.csr_matrix(df.to_coo())
    

    This uses the new to_coo() method.

    Earlier Versions:

    Building on Victor May's answer, here's a slightly faster implementation, but it only works if the entire SparseDataFrame is sparse with all BlockIndex (note: if it was created with get_dummies, this will be the case).

    Edit: I modified this so it will work with a non-zero fill value. CSR has no native non-zero fill value, so you will have to record it externally.

    import numpy as np
    import pandas as pd
    from scipy import sparse
    
    def sparse_BlockIndex_df_to_csr(df):
        columns = df.columns
        zipped_data = zip(*[(df[col].sp_values - df[col].fill_value,
                             df[col].sp_index.to_int_index().indices)
                            for col in columns])
        data, rows = map(list, zipped_data)
        cols = [np.ones_like(a)*i for (i,a) in enumerate(data)]
        data_f = np.concatenate(data)
        rows_f = np.concatenate(rows)
        cols_f = np.concatenate(cols)
        arr = sparse.coo_matrix((data_f, (rows_f, cols_f)),
                                df.shape, dtype=np.float64)
        return arr.tocsr()
    

提交回复
热议问题