How do I XML-encode a string in Erlang?

喜欢而已 提交于 2019-12-21 09:26:23

问题


I have a erlang string which may contain characters like & " < and so on:

1> Unenc = "string & \"stuff\" <".
ok

Is there a Erlang function somewhere that parses the string and encodes all the needed HTML/XML entities, such as:

2> Enc = xmlencode(Unenc).
"string &amp; &quot;stuff&quot; &lt;".

?

My use case is for relatively short strings, which come from user input. The output strings of the xmlencode function will be the content of XML attributes:

<company name="Acme &amp; C." currency="&euro;" />

The final XML will be sent over the wire appropriately.


回答1:


There is a function in the Erlang distribution that escapes angle brackets and ampersands but it isn't documented so probably not best to rely on it:

1> xmerl_lib:export_text("string & \"stuff\" <").
"string &amp; \"stuff\" &lt;"

If you're wanting to build/encode XML structures (instead of just encoding a single string), then the xmerl API would be a good option, e.g.

2> xmerl:export_simple([{foo, [], ["string & \"stuff\" <"]}], xmerl_xml).
["<?xml version=\"1.0\"?>",
 [[["<","foo",">"],
   ["string &amp; \"stuff\" &lt;"],
   ["</","foo",">"]]]]



回答2:


If your needs are simple, you could do this with a map over the chars in the string.

quote($<) -> "&lt;";
quote($>) -> "&gt;";
quote($&) -> "&amp;";
quote($") -> "&quot;";
quote(C) -> C.

Then you would do

1> Raw = "string & \"stuff\" <".
2> Quoted = lists:map(fun quote/1, Raw).

But Quoted would not be a flat list, which is still fine if you are going to send it to a file or as a http reply. I.e. see Erlang's io-lists.

In more recent Erlang releases, there are now encode-decode functions for multibyte utf8 to wide-byte/codepoint representations, see the erlang unicode module.


Reformatted comments, to make code examples stand out:

ettore: That's kind of what I am doing, although I do have to support multibyte characters. Here's my code:

xmlencode([], Acc) -> Acc; 
xmlencode([$<|T], Acc) -> xmlencode(T, Acc ++ "&lt;"); % euro symbol
xmlencode([226,130,172|T], Acc) -> xmlencode(T, Acc ++ "&#8364;");
xmlencode([OneChar|T], Acc) -> xmlencode(T, lists:flatten([Acc,OneChar])). 

Although I would prefer not to reinvent the wheel if possible.

dsmith: The string that you are using would normally be a list of Unicode code-points (ie. a list of numbers), and so any given byte encoding is irrelevant. You would only need worry about specific encodings if you are working directly with binaries.

To clarify, the Unicode code-point for the euro symbol (decimal 8364) would be a single element in your list. So you would just do this:

xmlencode([8364|T], Acc) -> xmlencode(T, Acc ++ "&#8364;"); 



回答3:


I'm not aware of one in the included OTP pakages. However Mochiweb's mochiweb_html module: has an escape function: mochiweb_html.erl it handles lists, binaries, and atoms.

And for url encoding checkout the mochiweb_util module: mochiweb_util.erl with its urlescape function.

You could use either of those libraries to get what you needed.



来源:https://stackoverflow.com/questions/3339014/how-do-i-xml-encode-a-string-in-erlang

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!