Pandas convert float in scientific notation to string

前端 未结 2 1900
北恋
北恋 2020-12-17 01:57

I used read_csv() to load a dataset that looks like this

userid
NaN
1.091178e+11
1.137856e+11

I want to convert the user ids t

相关标签:
2条回答
  • 2020-12-17 02:38

    I just stumbled upon this problem after reading a dataframe from a json file using the read_json method and unfortunately it does not have a keep_default_na parameter.

    The solution was to convert the long floats to np.int64 before converting them to str.

    In [53]: tweet_id_sample = tweets.iloc[0]['id']
             tweet_id_sample
    Out[53]: 8.924206435553362e+17
    
    In [54]: tweet_id_sample.astype(str)
    Out[54]: '8.924206435553362e+17'
    
    In [55]: tweet_id_sample.astype(np.int64).astype(str)
    Out[55]: '892420643555336192'
    
    In [56]: # This overflows
             tweet_id_sample.astype(int)
    Out[56]: -2147483648
    
    0 讨论(0)
  • 2020-12-17 02:58

    You can use map or apply, as mentioned in this comment:

    print (df.userid.map(lambda x: '{:.0f}'.format(x)))
    0             nan
    1    109117800000
    2    113785600000
    Name: userid, dtype: object
    

    df.userid = df.userid.map(lambda x: '{:.0f}'.format(x))
    print (df)
             userid
    0           nan
    1  109117800000
    2  113785600000
    

    I wondered whether map would be faster, but it is the same:

    #[300000 rows x 1 columns]
    df = pd.concat([df]*100000).reset_index(drop=True)
    #print (df)
    
    In [40]: %timeit (df.userid.map(lambda x: '{:.0f}'.format(x)))
    1 loop, best of 3: 211 ms per loop
    
    In [41]: %timeit (df.userid.apply(lambda x: '{:.0f}'.format(x)))
    1 loop, best of 3: 210 ms per loop
    

    Another solution is to_string, but it is slow:

    print(df.userid.to_string(float_format='{:.0f}'.format))
    0            nan
    1   109117800000
    2   113785600000
    
    In [41]: (df.userid.to_string(float_format='{:.0f}'.format))
    1 loop, best of 3: 2.52 s per loop
    
    0 讨论(0)
提交回复
热议问题