can someone explain to me the use of unicode_escape as an encoding argument in python 3.6?

杀马特。学长 韩版系。学妹 提交于 2020-12-11 06:41:00


I work with large pandas dataframes on a daily basis, which gets fed information that we parse from a webAPI (xml encoding is utf-8) local to our network.

After I feed the dataframe and export as a csv file I start getting encoding errors (local machine is cp1252) which I've had to deal with the past few weeks.

The solution I finally found was [here][1] under tangfucious's response.

    df['crumbs'] = df['crumbs'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))

a line of code that takes a string and encodes it using .encode=('unicode_escape'), decoding into utf-8 after.

Can someone explain to me how this code works? Unfortunately, I'm a noob and new to SO so I wasn't able to comment on his response

What is the purpose of unicode-escape under the hood (aside from the obvious, adding a \ to each unicode code point).? How does this affect decoding into utf-8? Why is this necessary? Isn't it always better to encode/decode using the same encoding?

Is there another use in using 'unicode_escape'?

