Efficient way to normalize a Scipy Sparse Matrix

后端 未结 5 1752
孤独总比滥情好
孤独总比滥情好 2020-12-29 20:49

I\'d like to write a function that normalizes the rows of a large sparse matrix (such that they sum to one).

from pylab import *
import scipy.sparse as sp

d         


        
5条回答
  •  暖寄归人
    2020-12-29 21:53

    Without importing sklearn, converting to dense or multiplying matrices and by exploiting the data representation of csr matrices:

    from scipy.sparse import isspmatrix_csr
    
    def normalize(W):
        """ row normalize scipy sparse csr matrices inplace.
        """
        if not isspmatrix_csr(W):
            raise ValueError('W must be in CSR format.')
        else:
            for i in range(W.shape[0]):
                row_sum = W.data[W.indptr[i]:W.indptr[i+1]].sum()
                if row_sum != 0:
                    W.data[W.indptr[i]:W.indptr[i+1]] /= row_sum
    

    Remember that W.indices is the array of column indices, W.data is the array of corresponding nonzero values and W.indptr points to row starts in indices and data.

    You can add a numpy.abs() when taking the sum if you need the L1 norm or use numpy.max() to normalize by the maximum value per row.

提交回复
热议问题