Delete every non utf-8 symbols from string

前端 未结 4 1212
故里飘歌
故里飘歌 2020-11-29 05:46

I have a big amount of files and parser. What I Have to do is strip all non utf-8 symbols and put data in mongodb. Currently I have code like this.

with op         


        
4条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-29 06:08

    For python 3, as mentioned in a comment in this thread, you can do:

    line = bytes(line, 'utf-8').decode('utf-8', 'ignore')
    

    The 'ignore' parameter prevents an error from being raised if any characters are unable to be decoded.

    If your line is already a bytes object (e.g. b'my string') then you just need to decode it with decode('utf-8', 'ignore').

提交回复
热议问题