converting scipy.sparse.csr.csr_matrix to a list of lists

后端 未结 3 1010
Happy的楠姐
Happy的楠姐 2021-01-18 13:08

I am learning multi label classification and trying to implement the tfidf tutorial from scikit learning. I am dealing with a text corpus to calculate its tf-idf score. I am

相关标签:
3条回答
  • 2021-01-18 13:51

    I don't know what tf-idf expects, but I may be able help with the sparse end.

    Make a sparse matrix:

    In [526]: M=sparse.random(4,10,.1)
    In [527]: M
    Out[527]: 
    <4x10 sparse matrix of type '<class 'numpy.float64'>'
        with 4 stored elements in COOrdinate format>
    In [528]: print(M)
      (3, 1)    0.281301619779
      (2, 6)    0.830780358032
      (1, 1)    0.242503399296
      (2, 2)    0.190933579917
    

    Now convert it to coo format. This is already that (I could have given the random a format parameter). In any case the values in coo format are stored in 3 arrays:

    In [529]: Mc=M.tocoo()
    In [530]: Mc.data
    Out[530]: array([ 0.28130162,  0.83078036,  0.2425034 ,  0.19093358])
    In [532]: Mc.row
    Out[532]: array([3, 2, 1, 2], dtype=int32)
    In [533]: Mc.col
    Out[533]: array([1, 6, 1, 2], dtype=int32)
    

    Looks like you want to ignore Mc.row, and somehow join the others.

    For example as a dictionary:

    In [534]: {k:v for k,v in zip(Mc.col, Mc.data)}
    Out[534]: {1: 0.24250339929583264, 2: 0.19093357991697379, 6: 0.83078035803205375}
    

    or a columns in a 2d array:

    In [535]: np.column_stack((Mc.col, Mc.data))
    Out[535]: 
    array([[ 1.        ,  0.28130162],
           [ 6.        ,  0.83078036],
           [ 1.        ,  0.2425034 ],
           [ 2.        ,  0.19093358]])
    

    (Also np.array((Mc.col, Mc.data)).T)

    Or as just a list of arrays [Mc.col, Mc.data], or [Mc.col.tolist(), Mc.data.tolist()] list of lists, etc.

    Can you take it from there?

    0 讨论(0)
  • 2021-01-18 13:53

    Base on Scipy I suggest to use this method:

    ndarray = yourMatrix.toarray()
    listOflist = ndarray.tolist()
    
    0 讨论(0)
  • 2021-01-18 14:06

    For this purpose, proper use of the scipy sparse matrix types is essential scipy.sparse. In this case scipy.sparse.lil_matrix it's ideal, whose "data" attribute stores an np.array of lists that represent the column values. A brief script follows

    arrays_of_list = matriz.tolil().data
    list_of_list = arrays_of_list.tolist()
    
    0 讨论(0)
提交回复
热议问题