How to remove those “\x00\x00”

喜你入骨 提交于 2019-12-10 02:34:00

问题


How to remove those "\x00\x00" in a string ? I have many of those strings (example shown below). I can use re.sub to replace those "\x00". But I am wondering whether there is a better way to do that ? Converting between unicode, bytes and string is always confusing.

'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'.


回答1:


Use rstrip

>>> text = 'Hello\x00\x00\x00\x00'
>>> text.rstrip('\x00')
'Hello'

It removes all \x00 characters at the end of the string.




回答2:


>>> a = 'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' 
>>> a.replace('\x00','')
'Hello'



回答3:


I think the more general solution is to use:

cleanstring = nullterminatedstring.split('\x00',1)[0]

Which will split the string using \x00 as the delimeter 1 time. The split(...) returns a 2 element list: everything before the null in addition to everything after the null (it removes the delimeter). Appending [0] only returns the portion of the string before the first null (\x00) character, which I believe is what you're looking for.

The convention in some languages, specifically C-like, is that a single null character marks the end of the string. For example, you should also expect to see strings that look like:

'Hello\x00dpiecesofsomeoldstring\x00\x00\x00'

The answer supplied here will handle that situation as well as the other examples.




回答4:


Building on the answers supplied, I suggest that strip() is more generic than rstrip() for cleaning up data, as strip() removes chars from the beginning and the end of the supplied string, whereas rstrip() simply removes chars from the end of the string.

However, NUL chars are not treated as whitespace by default by strip(), and as such you need to specify explicitly. This can catch you out, as print() will of course not show the NUL chars. My solution that I used was to clean the string using ".strip().strip('\x00')":

>>> arbBytesFromSocket = b'\x00\x00\x00\x00hello\x00\x00\x00\x00'
>>> arbBytesAsString = arbBytesFromSocket.decode('ascii')
>>> print(arbBytesAsString)
hello
>>> str(arbBytesAsString)
'\x00\x00\x00\x00hello\x00\x00\x00\x00'
>>> arbBytesAsString = arbBytesFromSocket.decode('ascii').strip().strip('\x00')
>>> str(arbBytesAsString)
'hello'
>>>

This gives you the string required, without the NUL chars.



来源:https://stackoverflow.com/questions/38883476/how-to-remove-those-x00-x00

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!