How to properly escape single and double quotes

北城余情 提交于 2019-12-19 09:16:22

问题


I have a lxml etree HTMLParser object that I'm trying to build xpaths with to assert xpaths, attributes of the xpath and text of that tag. I ran into a problem when the text of the tag has either single-quotes(') or double-quotes(") and I've exhausted all my options.

Here's a sample object I created

parser = etree.HTMLParser()
tree = etree.parse(StringIO(<html><body><p align="center">Here is my 'test' "string"</p></body></html>), parser)

Here is the snippet of code and then different variations of the variable being read in

   def getXpath(self)
     xpath += 'starts-with(., \'' + self.text + '\') and '
     xpath += ('count(@*)=' + str(attrsCount) if self.exactMatch else "1=1") + ']'

self.text is basically the expected text of the tag, in this case: Here is my 'test' "string"

this fails when i try to use the xpath method of the HTMLParser object

tree.xpath(self.getXpath())

Reason is because the xpath that it gets is this '/html/body/p[starts-with(.,'Here is my 'test' "string"') and 1=1]'

How can I properly escape the single and double quotes from the self.text variable? I've tried triple quoting, wrapping self.text in repr(), or doing a re.sub or string.replace escaping ' and " with \' and \"


回答1:


According to what we can see in Wikipedia and w3 school, you should not have ' and " in nodes content, even if only < and & are said to be stricly illegal. They should be replaced by corresponding "predefined entity references", that are &apos; and &quot;.

By the way, the Python parsers I use will take care of this transparently: when writing, they are replaced; when reading, they are converted.

After a second reading of your answer, I tested some stuff with the ' and so on in Python interpreter. And it will escape everything for you!

>>> 'text {0}'.format('blabla "some" bla')
'text blabla "some" bla'
>>> 'ntsnts {0}'.format("ontsi'tns")
"ntsnts ontsi'tns"
>>> 'ntsnts {0}'.format("ontsi'tn' \"ntsis")
'ntsnts ontsi\'tn\' "ntsis'

So we can see that Python escapes things correctly. Could you then copy-paste the error message you get (if any)?




回答2:


there are more options to choose from, especially the """ and ''' might be what you want.

s = "a string with a single ' quote"
s = 'a string with a double " quote'
s = """a string with a single ' and a double " quote"""
s = '''another string with those " quotes '.'''
s = r"raw strings let \ be \"
s = r'''and can be added \ to " any ' of """ those things'''
s = """The three-quote-forms
       may contain
       newlines."""


来源:https://stackoverflow.com/questions/7802418/how-to-properly-escape-single-and-double-quotes

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!