Python - Turn all items in a Dataframe to strings

后端 未结 4 1055
梦毁少年i
梦毁少年i 2020-12-15 16:18

I followed the following procedure: In Python, how do I convert all of the items in a list to floats? because each column of my Dataframe is list, but instead o

相关标签:
4条回答
  • 2020-12-15 16:48

    You can use applymap method:

    df = df.applymap(str)
    
    0 讨论(0)
  • 2020-12-15 16:55

    With pandas >= 1.0 there is now a dedicated string datatype:

    You can convert your column to this pandas string datatype using .astype('string'):

    df = df.astype('string')
    

    This is different from using str which sets the pandas 'object' datatype:

    df = df.astype(str)
    

    You can see the difference in datatypes when you look at the info of the dataframe:

    df = pd.DataFrame({
        'zipcode_str': [90210, 90211] ,
        'zipcode_string': [90210, 90211],
    })
    
    df['zipcode_str'] = df['zipcode_str'].astype(str)
    df['zipcode_string'] = df['zipcode_str'].astype('string')
    
    df.info()
    
    # you can see that the first column has dtype object
    # while the second column has the new dtype string
     #   Column          Non-Null Count  Dtype 
    ---  ------          --------------  ----- 
     0   zipcode_str     2 non-null      object
     1   zipcode_string  2 non-null      string
    dtypes: object(1), string(1)
    


    From the docs:

    The 'string' extension type solves several issues with object-dtype NumPy arrays:

    1) You can accidentally store a mixture of strings and non-strings in an object dtype array. A StringArray can only store strings.

    2) object dtype breaks dtype-specific operations like DataFrame.select_dtypes(). There isn’t a clear way to select just text while excluding non-text, but still object-dtype columns.

    3) When reading code, the contents of an object dtype array is less clear than string.


    Information about pandas 1.0 can be found here:
    https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html

    0 讨论(0)
  • 2020-12-15 17:00

    You can use this:

    df = df.astype(str)
    

    out of curiosity I decided to see if there is any difference in efficiency between the accepted solution and mine.

    The results are below:

    example df:

    df = pd.DataFrame([list(range(1000))], index=[0])
    

    test df.astype:

    %timeit df.astype(str) 
    >> 100 loops, best of 3: 2.18 ms per loop
    

    test df.applymap:

    %timeit df.applymap(str)
    1 loops, best of 3: 245 ms per loop
    

    It seems df.astype is quite a lot faster :)

    0 讨论(0)
  • 2020-12-15 17:02

    This worked for me:

    dt.applymap(lambda x: x[0] if type(x) is list else None)
    
    0 讨论(0)
提交回复
热议问题