问题
As mentioned in the question, I am trying to generate an XML output( for an iPhone app) using PHP which is reading the data from MySQL's text field.
Whenever there is a horizontal ellipsis character in the field... the XML is not generated properly.
I have tried a few ways to escape it like shown below, but none seems to work...
$row['detail'] = str_replace("&", "&", $row['detail']);
$row['detail'] = str_replace("…", "…", $row['detail']); //<-- prob is here
$row['detail'] = str_replace("<", "<", $row['detail']);
$row['detail'] = str_replace("\'", "'", $row['detail']);
$row['detail'] = str_replace(">", ">", $row['detail']);
$row['detail'] = str_replace("\"", """, $row['detail']);
I have 2 questions basically,
How do I handle horizontal ellipsis chracter?
Are there more such characters which could cause such problem? Any reference to this list and its solution would be great!
Thanks
回答1:
It is possible (and the recommended way) to use the literal, actual character in XML output. Don't use HTML entity based workarounds - it's unnecessary.
The reason why it doesn't work for you is probably because the ellipsis characters's encoding doesn't match the encoding of the XML file that is being generated.
You just need to make sure they match. So for example, if you're generating an UTF-8 XML file, the ellipsis character needs to be UTF-8 as well.
回答2:
Raw XML does not know about any named entities except >
, <
and `&
. All other entities need to either be declared as numeric character codes, or else you need to specify the entities in the Doctype or DTD.
The …
entity is defined in the HTML DTD, which is understood by all browsers, but it isn't defined in most other XML DTDs.
In general, if you're working with a DTD, most of the time it will be a third party DTD that you have no control over, so you can't go adding entities to them. You also don't want to be adding entities ad-hoc to your own DTDs either.
I would avoid putting entity declarations into the doctype header as well. It's unnecessary fluff that doesn't really add much unless you're repeating the same entity over and over in a document.
Therefore my recommendation would be simply to use numeric entities.
So instead of …
, you would use the character code entity …
or …
. The same would apply for any other non-ascii character.
The other option, of course, is to output the XML using UTF-8 or UTF-16 character encoding, which negates the need for any entities at all. That may or may not be an option for you, but if it is possible, it may be the best way to go.
If you have a specific character which you need to find the numeric entity codes for, there are plenty of places on the web to find references for them. Here is the one from Wikipedia: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Hope that helps.
回答3:
XML understands only a few character entities, '"&<>
. Anything is will cause the document to be invalid. You can try adding the entity to the DTD with
<!DOCTYPE text [ <!ENTITY hellip "…"> ]>
来源:https://stackoverflow.com/questions/6536182/how-to-handle-horizontal-ellipsis-three-dots-character-in-xml-output-through-p