Creating a zero-filled pandas data frame

前端 未结 6 1779
情书的邮戳
情书的邮戳 2020-12-07 14:36

What is the best way to create a zero-filled pandas data frame of a given size?

I have used:

zero_data = np.zeros(shape=(len(data),len(feature_list)         


        
6条回答
  •  萌比男神i
    2020-12-07 14:54

    Assuming having a template DataFrame, which one would like to copy with zero values filled here...

    If you have no NaNs in your data set, multiplying by zero can be significantly faster:

    In [19]: columns = ["col{}".format(i) for i in xrange(3000)]                                                                                       
    
    In [20]: indices = xrange(2000)
    
    In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)
    
    In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
    100 loops, best of 3: 12.6 ms per loop
    
    In [23]: %timeit d = orig_df * 0.0
    100 loops, best of 3: 7.17 ms per loop
    

    Improvement depends on DataFrame size, but never found it slower.

    And just for the heck of it:

    In [24]: %timeit d = orig_df * 0.0 + 1.0
    100 loops, best of 3: 13.6 ms per loop
    
    In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
    100 loops, best of 3: 8.36 ms per loop
    

    But:

    In [24]: %timeit d = orig_df.copy()
    10 loops, best of 3: 24 ms per loop
    

    EDIT!!!

    Assuming you have a frame using float64, this will be the fastest by a huge margin! It is also able to generate any value by replacing 0.0 to the desired fill number.

    In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
    100 loops, best of 3: 3.68 ms per loop
    

    Depending on taste, one can externally define nan, and do a general solution, irrespective of the particular float type:

    In [39]: nan = np.nan
    In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
    100 loops, best of 3: 4.39 ms per loop
    

提交回复
热议问题