I have a lxml etree HTMLParser object that I\'m trying to build xpaths with to assert xpaths, attributes of the xpath and text of that tag. I ran into a problem when the te
According to what we can see in Wikipedia and w3 school, you should not have ' and " in nodes content, even if only < and & are said to be stricly illegal. They should be replaced by corresponding "predefined entity references", that are ' and ".
By the way, the Python parsers I use will take care of this transparently: when writing, they are replaced; when reading, they are converted.
After a second reading of your answer, I tested some stuff with the ' and so on in Python interpreter. And it will escape everything for you!
>>> 'text {0}'.format('blabla "some" bla')
'text blabla "some" bla'
>>> 'ntsnts {0}'.format("ontsi'tns")
"ntsnts ontsi'tns"
>>> 'ntsnts {0}'.format("ontsi'tn' \"ntsis")
'ntsnts ontsi\'tn\' "ntsis'
So we can see that Python escapes things correctly. Could you then copy-paste the error message you get (if any)?