Python - Convert Octal to non-English Text from file

爱⌒轻易说出口 提交于 2019-12-24 08:22:47

问题


I am trying to convert Non-English file encoded in Octal back into it's native format and store it in another file. The files include:

  • i_file : The input Original File with Octal Encoded text
  • o_file : The output Target File which should contain the Kannada(The non-English Language in question) text.
  • octal_to_text.py : The python program that should take the octal text in in the input file and generate corrusponding non-English text in the target file.

Sample i_file

\340\262\270\340\263\215-\340\262\207+\340\262\241\340\263\215
\340\262\205-\340\262\246\340\263\215+\340\262\255\340\263\215
\340\262\205-\340\262\252\340\263\215+\340\262\252\340\263\215
\340\262\250\340\263\215-\340\262\205+\340\262\265\340\263\215
\340\262\205+\340\262\246\340\263\215
\340\262\266\340\263\215-\340\262\206+\340\262\270\340\263\215
\340\262\246\340\263\215-\340\262\205+\340\262\252\340\263\215
\340\262\244\340\263\215-\340\262\205+\340\262\237\340\263\215 \340\262\250\340\263\215-\340\262\205+\340\262\265\340\263\215
\340\262\247\340\263\215-\340\262\205
\340\262\247\340\263\215-\340\262\212
\340\262\205-\340\262\234\340\263\215+\340\262\206
\340\262\263\340\263\215-\340\262\205
\340\262\263\340\263\215-\340\262\207
\340\262\263\340\263\215-\340\262\211
\340\262\212+\340\262\263\340\263\215 \340\262\247\340\263\215-\340\262\212
sp
\340\262\256\340\263\215-\340\262\217+\340\262\262\340\263\215
\340\262\254\340\263\215-\340\262\216+\340\262\237\340\263\215
\340\262\260\340\263\215-\340\262\205+\340\262\271\340\263\215
\340\262\252\340\263\215+\340\262\260\340\263\215 \340\262\205-\340\262\252\340\263\215+\340\262\252\340\263\215
\340\262\265\340\263\215-\340\262\207+\340\262\270\340\263\215 \340\262\270\340\263\215-\340\262\207+\340\262\241\340\263\215
\340\262\217-\340\262\225\340\263\215+\340\262\205
\340\262\211+\340\262\227\340\263\215 \340\262\263\340\263\215-\340\262\211
\340\262\243\340\263\215-\340\262\205+\340\262\227\340\263\215
\340\262\212-\340\262\256\340\263\215+\340\262\254\340\263\215
\340\262\250\340\263\215-\340\262\216+\340\262\263\340\263\215
\340\262\216+\340\262\244\340\263\215
\340\262\205-\340\262\260\340\263\215+\340\262\256\340\263\215
\340\262\260\340\263\215+\340\262\205
\340\262\260\340\263\215+\340\262\206 \340\262\260\340\263\215+\340\262\205
\340\262\260\340\263\215+\340\262\207
\340\262\260\340\263\215+\340\262\212 \340\262\260\340\263\215+\340\262\207
\340\262\260\340\263\215+\340\262\223 \340\262\260\340\263\215+\340\262\207
\340\262\255\340\263\215-\340\262\205+\340\262\246\340\263\215
\340\262\205-\340\262\247\340\263\215+\340\262\257\340\263\215
\340\262\211-\340\262\237\340\263\215+\340\262\211
\340\262\206+\340\262\225\340\263\215
\340\262\205-\340\262\260\340\263\215 \340\262\205-\340\262\260\340\263\215+\340\262\256\340\263\215
\340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\205
\340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\206 \340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\205
\340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\207 \340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\205
\340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\211 \340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\205
\340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\212 \340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\205
\340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\222 \340\262\250\340\263\215-\340\262\250\340\263\215+\340\262\205
\340\262\220-\340\262\250\340\263\215
\340\262\206-\340\262\252\340\263\215+\340\262\206 \340\262\205-\340\262\252\340\263\215+\340\262\252\340\263\215
\340\262\206-\340\262\252\340\263\215+\340\262\223 \340\262\205-\340\262\252\340\263\215+\340\262\252\340\263\215
\340\262\222-\340\262\246\340\263\215+\340\262\205
\340\262\225\340\263\215-\340\262\222+\340\262\237\340\263\215
\340\262\205-\340\262\270\340\263\215+\340\262\205
\340\262\205-\340\262\270\340\263\215+\340\262\207
\340\262\205-\340\262\256\340\263\215+\340\262\270\340\263\215 \340\262\212-\340\262\256\340\263\215+\340\262\254\340\263\215
\340\262\244\340\263\215-\340\262\205+\340\262\260\340\263\215
\340\262\230\340\263\215-\340\262\205 \340\262\263\340\263\215-\340\262\205
\340\262\265\340\263\215-\340\262\206+\340\262\227\340\263\215 \340\262\266\340\263\215-\340\262\206+\340\262\270\340\263\215
\340\262\270\340\263\215-\340\262\205+\340\262\270\340\263\215 \340\262\244\340\263\215-\340\262\205+\340\262\260\340\263\215
\340\262\244\340\263\215-\340\262\211+\340\262\265\340\263\215
\340\262\257\340\263\215

The code i thought would work would be to convert the text into a byte array using bytearray(), then decode it to utf-8 and write it to the targe file. octal_to_text.py

"""
Convert file contents from Octal to text

"""

with open('i_file','r') as tl, open('o_file','w+') as tk:
    for line in tl.readlines():
        line = (line.strip())
        br = bytearray(line)
        tk.write("{}\n".format(br.decode('utf-8')))

In the code above however, the output file generated is the same as the input. bytearray doesn't seem to be doing anything. What exactly am I doing wrong here? Could you provide a python2.7 solution?

NOTE The output file should contain characters like the ones shown below

ಅ
ಆ
ಇ
ಈ
ಉ
ಊ
ಋ
ಎ
ಏ
ಐ
ಒ
ಓ
ಔ
ಕ್
ಖ್
ಗ್
ಘ್
ಚ್
ಛ್
ಜ್
ಝ್
ಟ್
ಠ್
ಡ್
ಢ್
ಣ್

回答1:


The print statement for an Octal encoded string, decodes Octal to the non-English notation only when working the python interpreter.

Thus, a simple work around is to perform the following steps:

  1. Copy the contents of the file to be decoded
  2. Open the python interpreter and assign the file contents to a variable in a multi line string.
  3. Print the variable
  4. Copy contents to a new file


来源:https://stackoverflow.com/questions/43675989/python-convert-octal-to-non-english-text-from-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!