Can printf “%x\n” \'a be performed in awk?

筅森魡賤 提交于 2020-01-14 04:15:30

问题


All printable characters' hex code values can be displayed this way in bash.

printf "%x\n"  \'a
61

awk 'BEGIN{printf("%x\n",\\'a)}'
awk 'BEGIN{printf("%x\n",\'a)}'

None of them can be performed in awk,is there no way to do in awk?
awk doesn't provide this kind of printf format such as in bash?

awk -v var="a"  'BEGIN{printf("%x\n", var)}'
0
echo -n  a|xxd
0000000: 61   

It is simple to get the a printable characters' hex code value with echo -n a|xxd,my question is to ask does awk provide this kind of printf format such as in bash or not ,not about how to get the hex code value with other method in awk.

awk -v var="a"  'BEGIN{printf("%x\n", \'var)}'
bash: syntax error near unexpected token `)'
debian8@debian:~$ awk -v var="a"  "BEGIN{printf("%x\n", \'var)}"
awk: cmd. line:1: BEGIN{printf(%xn, \'var)}
awk: cmd. line:1:              ^ syntax error
awk: cmd. line:1: BEGIN{printf(%xn, \'var)}
awk: cmd. line:1:                   ^ backslash not last character on line
awk: cmd. line:1: BEGIN{printf(%xn, \'var)}
awk: cmd. line:1:                   ^ syntax error

Conclusion:awk doesn't support this kind of printf format.


回答1:


Here's a command that shows that awk's printf function indeed does not support the '-prefixed syntax for getting a character's code point (applies to GNU Awk, Mawk, and BSD/macOS Awk):

$ awk -v char="'a" 'BEGIN { printf "%x\n", char }'
0  # Value 'a is literally interpreted as a number, which defaults to 0

Note that Bash v4+'s printf builtin is Unicode-aware:

$ printf '%x\n' \'€
20ac  # U+20AC is the Unicode code point of the EURO symbol

A hex-dump utility such as xxd will only give you the byte representation of a character, which is only the same as the code point in the 7-bit ASCII range.
In a UTF-8-based locale (which is typical these days), anything beyond the ASCII range will print the bytes that make up the UTF-8-encoded form of the character:

$ xxd <<<€
00000000: e282 ac0a # 0xe2 0x82 0xac are the UTF-8 encoding of Unicode char. U+20AC

The ord() function used with GNU Awk in Ed Morton's helpful answer is limited to ASCII characters. Any character with a codepoint beyond 0x7f results in a negative value.

The create-a-map-of-all-characters workaround from James Brown's helpful answer:

  • is limited to ASCII characters in Mawk and BSD/macOS Awk

  • in principle works with all Unicode characters in GNU Awk, but the fact that a map of all characters must be built makes this somewhat impractical; here's a version that covers the Unicode BMP (basic multilingual plane), into which the most widely used characters fall.

    $ gawk -v char=€ 'BEGIN{ for(n=0;n<=0xffff;n++) ord[sprintf("%c",n)]=n; printf "%x\n", ord[char]}'
    20ac
    



回答2:


If you want a character's hex code value:

$ echo a|awk 'BEGIN { for(n=0;n<256;n++) ord[sprintf("%c",n)]=n }{printf "%x\n", ord[$0]}'
61

In the lack of any atoi() you got to:

BEGIN { for(n=0;n<256;n++)            # for all ascii values
            ord[sprintf("%c",n)]=n }  # make a hash ord[char]=value
      {printf "%x\n", ord[$0] }       # print it out in hex



回答3:


wrt your first attempt that's producing a syntax error - you cannot include a ' in any '-delimited script called from shell. No amount of attempts at escaping will allow you to do so. wrt your 2nd attempt - a "-delimited script can contain "s but they need to be escaped. Both of those syntax errors are being reported above, nothing to do with trying to print hex from awk or anything else and not even related to awk - those are shell syntax errors you'd get calling any tool with a quote-delimited script.

Now - is this what you're trying to do?

$ awk -v var='a' -l ordchr 'BEGIN{printf "%x\n", ord(var)}'
61

The above uses GNU awk for the ord() function.



来源:https://stackoverflow.com/questions/43340142/can-printf-x-n-a-be-performed-in-awk

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!