Why does pd.concat change the resulting datatype from int to float?

柔情痞子 提交于 2019-12-05 17:27:54

Because of this -

timestamp      7188 non-null int64
sunrise        7176 non-null float64
...

timestamp has 7188 non-null values, while sunrise and onwards have 7176. It goes without saying that there are 12 values that are not non-null... meaning they're NaNs.

Since NaNs are of dtype=float, every other value in that column is automatically upcasted to float, and float numbers that big are usually represented in scientific notation.

That's the why, but that doesn't really solve your problem. Your options at this point are

  1. drop those rows with NaNs using dropna
  2. fill those NaNs with some default integeral value using fillna

(Now you may downcast these rows to int.)

  1. Alternatively, if you perform pd.concat with join='inner', NaNs are not introduced and the dtypes are preserved.

    pd.concat((timestamp, dataSun, dataData), axis=1, join='inner')
    
           timestamp        sunrise         sunset  temperature     pressure  \    
    0  1521681600000  1521696105000  1521740761000     2.490000  1018.000000   
    1  1521681900000  1521696105000  1521740761000     2.408333  1017.833333   
    2  1521682200000  1521696105000  1521740761000     2.326667  1017.666667   
    3  1521682500000  1521696105000  1521740761000     2.245000  1017.500000   
    4  1521682800000  1521696105000  1521740761000     2.163333  1017.333333   
    
       humidity  
    0      99.0  
    1      99.0  
    2      99.0  
    3      99.0  
    4      99.0 
    

With option 3, an inner join is performed on the indexes of each dataframe.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!