How to translate “bytes” objects into literal strings in pandas Dataframe, Python3.x?

前端 未结 5 1008
青春惊慌失措
青春惊慌失措 2020-11-29 08:32

I have a Python3.x pandas DataFrame whereby certain columns are strings which as expressed as bytes (like in Python2.x)

import pandas as pd
df = pd.DataFrame         


        
5条回答
  •  栀梦
    栀梦 (楼主)
    2020-11-29 08:54

    I came across this thread while trying to solve the same problem but more generally for a Series where some values my be of type str, others of type bytes. Drawing from earlier solutions, I achieved this selective decoding as follows, resulting in a Series all of whose values are of type str. (python 3.6.9, pandas 1.0.5)

    >>> import pandas as pd
    >>> ser = pd.Series(["value_1".encode("utf-8"), "value_2"])
    >>> ser.values
    array([b'value_1', 'value_2'], dtype=object)
    >>> ser2 = ser.str.decode("utf-8")
    >>> ser[~ser2.isna()] = ser2
    >>> ser.values
    array(['value_1', 'value_2'], dtype=object)
    

    Maybe there exists a more convenient/efficient one-liner for this use case? At first I figured there would be some value to pass in the "errors" kwarg to str.decode but I didn't find one documented.

    EDIT: One can definitely achieve the same in one line, but the ways I have thought to so do so take about 25% (tested for Series of length 10^4 and 10^6), but presumably does no copying. E.g.:

    ser[ser.apply(type) == bytes] = ser.str.decode("utf-8")
    

提交回复
热议问题