How to format in numpy savetxt such that zeros are saved only as “0”

前端 未结 3 1195
难免孤独
难免孤独 2020-12-19 10:14

I am saving a numpy sparse array (densed) into a csv. The result is I have a 3GB csv. The problem is 95% of the cells are 0.0000. I used fmt=\'%5.4f\'

3条回答
  •  爱一瞬间的悲伤
    2020-12-19 11:09

    If you look at the source code of np.savetxt, you'll see that, while there is quite a bit of code to handle the arguments and the differences between Python 2 and Python 3, it is ultimately a simple python loop over the rows, in which each row is formatted and written to the file. So you won't lose any performance if you write your own. For example, here's a pared down function that writes compact zeros:

    def savetxt_compact(fname, x, fmt="%.6g", delimiter=','):
        with open(fname, 'w') as fh:
            for row in x:
                line = delimiter.join("0" if value == 0 else fmt % value for value in row)
                fh.write(line + '\n')
    

    For example:

    In [70]: x
    Out[70]: 
    array([[ 0.        ,  0.        ,  0.        ,  0.        ,  1.2345    ],
           [ 0.        ,  9.87654321,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  3.14159265,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])
    
    In [71]: savetxt_compact('foo.csv', x, fmt='%.4f')
    
    In [72]: !cat foo.csv
    0,0,0,0,1.2345
    0,9.8765,0,0,0
    0,3.1416,0,0,0
    0,0,0,0,0
    0,0,0,0,0
    0,0,0,0,0
    

    Then, as long as you are writing your own savetxt function, you might as well make it handle sparse matrices, so you don't have to convert it to a (dense) numpy array before saving it. (I assume the sparse array is implemented using one of the sparse representations from scipy.sparse.) In the following function, the only change is from ... for value in row to ... for value in row.A[0].

    def savetxt_sparse_compact(fname, x, fmt="%.6g", delimiter=','):
        with open(fname, 'w') as fh:
            for row in x:
                line = delimiter.join("0" if value == 0 else fmt % value for value in row.A[0])
                fh.write(line + '\n')
    

    Example:

    In [112]: a
    Out[112]: 
    <6x5 sparse matrix of type ''
        with 3 stored elements in Compressed Sparse Row format>
    
    In [113]: a.A
    Out[113]: 
    array([[ 0.        ,  0.        ,  0.        ,  0.        ,  1.2345    ],
           [ 0.        ,  9.87654321,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  3.14159265,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])
    
    In [114]: savetxt_sparse_compact('foo.csv', a, fmt='%.4f')
    
    In [115]: !cat foo.csv
    0,0,0,0,1.2345
    0,9.8765,0,0,0
    0,3.1416,0,0,0
    0,0,0,0,0
    0,0,0,0,0
    0,0,0,0,0
    

提交回复
热议问题