Removing spaces and non-printable character in Python

风流意气都作罢 提交于 2019-12-13 04:41:36

问题


I am working with xml file using lxml etree xpath method. My code is

from lxml import etree
File="c:\file.xml"
doc=etree.parse(File)
alltext = doc.xpath('descendant-or-self::text()')
clump = "".join(alltext)
clump

I got the following output:

             "'\n\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\n\t\n\t\t\t\n\t\n\t\t\n\t\t\t\n\t\t\t\tIntroduction\n\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\tAccessibility\n\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\tOpening eBooks\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\

I want to remove spaces and all tabs from output, so I use another code but failed to get the desired output
Here is that code

import string
filter(lambda x: x in string.printable, clump)

I only want to get text from output which is "Introduction , Accessibilty , Opening eBooks"


回答1:


If you don't mind to do it using regex:

import re
clump = re.sub(r'[\n\t]+', ' ', clump)

If you want to put any other characters to remove, just place those inside the []




回答2:


You can try this:

''.join(clump.split())

Hope, that will solve the problem! To improve this, you can use re and I am using Sabuj's code:

>>> import re
>>> re.sub(r'[\n\t]+', ' ', clump.strip())


来源:https://stackoverflow.com/questions/22795189/removing-spaces-and-non-printable-character-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!