General string quoting for TCL

谁说我不能喝 提交于 2019-12-31 19:37:20

问题


I'm writing a utility (which happens to be in python) which is generating output in the form of a TCL script. Given some arbitrary string variable (not unicode) in the python, I want to produce a TCL line like

set s something

... which will set TCL variable 's' to that exact string, regardless of what strange characters are in it. Without getting too weird, I don't want to make the output messier than needed. I believe a decent approach is

  1. if the string is not empty and contains only alphanumerics, and some characters like .-_ (but definitely not $"{}\) then it can be used as-is;

  2. if it contains only printable characters and no double-quotes or curly braces (and does not end in backslash ) then simply put {} around it;

  3. otherwise, put "" around it after using \ escapes for " { } \ $ [ ] , and \nnn escapes for non-printing characters.

Question: is that the full set of characters which need escaping inside double quotes? I can't find this in the docs. And did I miss something (I almost missed that strings for (2) can't end in \ for instance).

I know there are many other strings which can be quoted by {}, but it seems difficult to identify them easily. Also, it looks like non-printing characters (in particular, newline) are OK with (2) if you don't mind them being literally present in the TCL output.


回答1:


You really only need 2 rules,

  • Escape curly braces
  • Wrap the output in curly braces

You don't need to worry about newlines, non printable characters etc. They are valid in a literal string, and TCL has excellent Unicode support.

set s { 
this is
a 
long 
string. I have $10 [10,000 cents] only curly braces \{ need \} to be escaped.
\t is not  a real tab, but '    ' is. "quoting somthing" :
{matchin` curly braces are okay, list = string in tcl}
}

Edit In light of your comment, you can do the following:

  • escape [] {} and $
  • wrap the whole output in set s [subst { $output } ]

The beauty of Tcl is it a has a very simple grammar. There are no other characters besides the 3 above needed to be escaped.

Edit 2 One last try.

If you pass subst some options, you will only need to escape \ and {}

set s [subst -nocommands -novariables { $output } ]

You would need to come up with a regex to convert non printable characters to their escaped codes however.

Good luck!




回答2:


Tcl has very few metacharacters once you're inside a double-quoted string, and all of them can be quoted by putting a backslash in front of them. The characters you must quote are \ itself, $ and [, but it's considered good practice to also quote ], { and } so that the script itself is embeddable. (Tcl's own list command does this, except that it doesn't actually wrap the double quotes so it also handles backslashes and it will also try to use other techniques on “nice” strings. There's an algorithm for doing this, but I advise not bothering with that much complexity in your code; simple universal rules are much better for correct coding.)

The second step is to get the data into Tcl. If you are generating a file, your best option is to write it as UTF-8 and use the -encoding option to tclsh/wish or to the source command to explicitly state what the encoding is. (If you're inside the same process, write UTF-8 data into a string and evaluate that. Job Done.) That option (introduced in Tcl 8.5) is specifically for dealing with this sort of problem:

source -encoding "utf-8" theScriptYouWrote.tcl

If that's not possible, you're going to have to fall back to adding additional quoting. The best thing is to then assume you've only got ASCII support available (a good lowest common denominator) and quote everything else as a separate step to the quoting described in the first paragraph. To quote, convert every Unicode character from U+00080 up to an escape sequence of the form \uXXXX where XXXX are exactly four hex digits[1] and the other two are literal characters. Don't use the \xXX form, as that has some “surprising” misfeatures (alas).


[1] There's an open bug in Tcl about handling characters outside the Basic Multilingual Pane, part of which is that the \u form isn't able to cope. Fortunately, non-BMP characters are still reasonably rare in practice.




回答3:


To do it right you should also specify the encoding your python string is in, typically sys.getdefaultencoding(). Otherwise you might garble encodings when translating it to Tcl.

If you have binary data in your string and want Tcl binary strings as a result this will always work:

data = "".join("\\u00%02x" % ord(c) for c in mystring)
tcltxt = "set x %s" % data

Will look like a hex dump though, but well, it is a hex dump...

If you use any special encoding like UTF-8 you can enhance that a bit by using encoding convertfrom/convertto and the appropriate Python idiom.

data = "".join("\\u00%02x" % ord(c) for c in myutf8string)
tcltext = "set x [encoding convertfrom utf-8 %s]" % data

You can of course refine this a bit, avoiding the \u encoding of all the non special chars, but the above is safe in any case.



来源:https://stackoverflow.com/questions/5302120/general-string-quoting-for-tcl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!