Python - Turn all items in a Dataframe to strings

后端 未结 4 1068
梦毁少年i
梦毁少年i 2020-12-15 16:18

I followed the following procedure: In Python, how do I convert all of the items in a list to floats? because each column of my Dataframe is list, but instead o

4条回答
  •  甜味超标
    2020-12-15 16:55

    With pandas >= 1.0 there is now a dedicated string datatype:

    You can convert your column to this pandas string datatype using .astype('string'):

    df = df.astype('string')
    

    This is different from using str which sets the pandas 'object' datatype:

    df = df.astype(str)
    

    You can see the difference in datatypes when you look at the info of the dataframe:

    df = pd.DataFrame({
        'zipcode_str': [90210, 90211] ,
        'zipcode_string': [90210, 90211],
    })
    
    df['zipcode_str'] = df['zipcode_str'].astype(str)
    df['zipcode_string'] = df['zipcode_str'].astype('string')
    
    df.info()
    
    # you can see that the first column has dtype object
    # while the second column has the new dtype string
     #   Column          Non-Null Count  Dtype 
    ---  ------          --------------  ----- 
     0   zipcode_str     2 non-null      object
     1   zipcode_string  2 non-null      string
    dtypes: object(1), string(1)
    


    From the docs:

    The 'string' extension type solves several issues with object-dtype NumPy arrays:

    1) You can accidentally store a mixture of strings and non-strings in an object dtype array. A StringArray can only store strings.

    2) object dtype breaks dtype-specific operations like DataFrame.select_dtypes(). There isn’t a clear way to select just text while excluding non-text, but still object-dtype columns.

    3) When reading code, the contents of an object dtype array is less clear than string.


    Information about pandas 1.0 can be found here:
    https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html

提交回复
热议问题