How to access sparse matrix elements?

前端 未结 4 1829
情歌与酒
情歌与酒 2021-01-31 02:07
type(A)

A.shape
(8529, 60877)
print A[0,:]
  (0, 25)   1.0
  (0, 7422) 1.0
  (0, 26062)    1.0
  (0, 31804)    1.0
  (0, 41         


        
相关标签:
4条回答
  • 2021-01-31 02:19

    If it is for calculating TFIDF score using TfidfTransformer, yu can get the IDF by tfidf.idf_. Then the sparse array name, say 'a', a.toarray().

    toarray returns an ndarray; todense returns a matrix. If you want a matrix, use todense; otherwise, use toarray.

    0 讨论(0)
  • 2021-01-31 02:32

    A[1,:] is itself a sparse matrix with shape (1, 60877). This is what you are printing, and it has only one row, so all the row coordinates are 0.

    For example:

    In [41]: a = csc_matrix([[1, 0, 0, 0], [0, 0, 10, 11], [0, 0, 0, 99]])
    
    In [42]: a.todense()
    Out[42]: 
    matrix([[ 1,  0,  0,  0],
            [ 0,  0, 10, 11],
            [ 0,  0,  0, 99]], dtype=int64)
    
    In [43]: print(a[1, :])
      (0, 2)    10
      (0, 3)    11
    
    In [44]: print(a)
      (0, 0)    1
      (1, 2)    10
      (1, 3)    11
      (2, 3)    99
    
    In [45]: print(a[1, :].toarray())
    [[ 0  0 10 11]]
    

    You can select columns, but if there are no nonzero elements in the column, nothing is displayed when it is output with print:

    In [46]: a[:, 3].toarray()
    Out[46]: 
    array([[ 0],
           [11],
           [99]])
    
    In [47]: print(a[:,3])
      (1, 0)    11
      (2, 0)    99
    
    In [48]: a[:, 1].toarray()
    Out[48]: 
    array([[0],
           [0],
           [0]])
    
    In [49]: print(a[:, 1])
    
    
    In [50]:
    

    The last print call shows no output because the column a[:, 1] has no nonzero elements.

    0 讨论(0)
  • 2021-01-31 02:36

    To answer your title's question using a different technique than your question's details:

    csc_matrix gives you the method .nonzero().

    Given:

    >>> import numpy as np
    >>> from scipy.sparse.csc import csc_matrix
    >>> 
    >>> row = np.array( [0, 1, 3])
    >>> col = np.array( [0, 2, 3])
    >>> data = np.array([1, 4, 16])
    >>> A = csc_matrix((data, (row, col)), shape=(4, 4))
    

    You can access the indices poniting to non-zero data by:

    >>> rows, cols = A.nonzero()
    >>> rows
    array([0, 1, 3], dtype=int32)
    >>> cols
    array([0, 2, 3], dtype=int32)
    

    Which you can then use to access your data, without ever needing to make a dense version of your sparse matrix:

    >>> [((i, j), A[i,j]) for i, j in zip(*A.nonzero())]
    [((0, 0), 1), ((1, 2), 4), ((3, 3), 16)]
    
    0 讨论(0)
  • 2021-01-31 02:39

    I fully acknowledge all the other given answers. This is simply a different approach.

    To demonstrate this example I am creating a new sparse matrix:

    from scipy.sparse.csc import csc_matrix
    a = csc_matrix([[1, 0, 0, 0], [0, 0, 10, 11], [0, 0, 0, 99]])
    print(a)
    

    Output:

    (0, 0)  1
    (1, 2)  10
    (1, 3)  11
    (2, 3)  99
    

    To access this easily, like the way we access a list, I converted it into a list.

    temp_list = []
    for i in a:
        temp_list.append(list(i.A[0]))
    
    print(temp_list)
    

    Output:

    [[1, 0, 0, 0], [0, 0, 10, 11], [0, 0, 0, 99]]
    

    This might look stupid, since I am creating a sparse matrix and converting it back, but there are some functions like TfidfVectorizer and others that return a sparse matrix as output and handling them can be tricky. This is one way to extract data out of a sparse matrix.

    0 讨论(0)
提交回复
热议问题