Unix - How to convert octal escape sequences via pipe

半世苍凉 提交于 2020-06-29 09:31:39

问题


I'm pulling data from a file (in this case an exim mail log) and often it saves characters in an escaped octal sequence like \NNN where 'N' represents an octal value 0-7. This mainly happens when the subject is written in non-Latin characters (Arabic for example).

My goal is to find the cleanest way to convert these octal characters to display correctly in my utf-8 enabled terminal, specifically in 'less' as there is the potential for lots of output.

The best approach I have found so far is as follows:

arbitrary_stream | { while read -r temp; do printf %b "$temp\n"; done } | less

This seems to work pretty well, however I would assume that there is some translator tool, or maybe even a flag built into 'less' to handle this. I also found that if you use something like sed to inject a 0 after each \, you can store it as a variable, then use 'echo -e $data' however this was more messy than the previous solution.

Test case:

octalvar="\342\202\254"

expected output in less:

I'm looking for something cleaner, more complete or just better than my above solution in the form of either:

echo $octalvar | do_something | less

or

echo $octalvar | less --some_magic_flag

Any suggestions? Or is my solution about as clean as I can expect?


回答1:


Conversion in GNU awk (for using strtonum). It proved out to be a hassle so the code is a mess and maybe could be streamlined, feel free to advice:

awk '{
    while(match($0,/\\[0-8]{3}/)) {  # search for \NNNs
        o=substr($0,RSTART,RLENGTH)  # extract it
        sub(/\\/,"0",o)              # replace \ with 0 for strtonum
        c=sprintf("%c",strtonum(o))  # convert to a character
        sub(/\\[0-8]{3}/,c)          # replace the \NNN with the char
    }
}1' foo > bar

or paste the code between single quotes to a file above_program.awk and run it like awk -f above_program.awk foo > bar. Test file foo:

test 123 \342\202\254

Run it in a non-UTF8 locale, I used locale C:

$ locale 
...
LC_ALL=C
$ awk -f above_program.awk foo
test 123 €

If you run it a UTF8 locale, conversion will happen:

$ locale
...
LC_ALL=en_US.utf8
$ awk -f above_program.awk foo
test 123 â¬



回答2:


This is my current version:

echo $arbitrary | { IFS=$'\n'; while read -r temp; do printf %b "$temp\n"; done; unset IFS; } | iconv -f utf-8 -t utf-8 -c | less


来源:https://stackoverflow.com/questions/43461003/unix-how-to-convert-octal-escape-sequences-via-pipe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!