Subprocess command encoding

跟風遠走 提交于 2021-01-27 20:55:22

问题


I'm currently migration a script from Perl to Python3 (3.6.5). Is is running on Windows Server 2016. The Script builds a command line with arguments and executes the created string with subprocess.check_output. One of the argument option is called -location:"my street". The location can contain special chars like umlaut (äöß) or (áŠ).

When I run the Perl script the special chars are passed correctly to the application. When I run the Python script the special chars are replaced by question marks in the application. I think the called application needs a UTF-8 encoded argument string.

The Perl script runs in UTF-8 mode

use UTF8;
binmode( STDOUT, ":utf-8" );

The Python script is created with PyCharm, UTF-8 encoded and the first line of the script contains

# -*- coding: utf-8 -*-

I tried several things to set encoding to UTF-8 for the subprocess arguments, but it didn't work. I used procmon.exe to compare the application call between the Perl and Python script. What I can see is that the command line that is displayed for Python subprocess call in procmon is readable for me. The working Perl call not. The location string looks for the perl script in procmon looks like this:

-location:"HQ/äöööStraße".

The Perl code looks like this:

$command = "C:\\PROGRAM FILES\\Application\\bin\\cfg.exe"
$operand = "-modify -location:123á456ß99"
$result  = `$command $operand`;

The Python code looks like this:

# -*- coding: utf-8 -*-
import subprocess
result = subprocess.check_output(['C:\\PROGRAM FILES\\Application\\bin\\cfg.exe', "-modify", "-location:123á456ß99"], shell=False, stderr=subprocess.STDOUT)

Any idea what I have to do that the python arguments are passed correctly to the application?


回答1:


In Python 3.3+ you can separately indicate that you expect text in a particular encoding. The keyword argument universal_newlines=True was renamed in 3.7 to the more accurate and transparent text=True.

This keyword basically says "just use whatever encoding is default on my system" (so basically UTF-8 on anything reasonably modern except on Windows, where you get some Cthulhu atrocity from the abyss the system's default code page).

In the absence of this keyword, subprocesses receive and return bytes in Python 3.

Of course, if you know the encoding, you can also separately .decode() the bytes you get back.

If you know the encoding it's probably useful to use the encoding= keyword argument (even if you assume it is also the system encoding; this was added in Python 3.6).

response = subprocess.check_output([...], text=True)
response = subprocess.check_output([...], encoding='utf-8')
response = subprocess.check_output([...]).decode('utf-8')



回答2:


The trick to get the script running, is to encode the arguments to 'utf8' and then to decode them to 'ansi'.

command = r'C:\PROGRAM FILES\Application\bin\cfg.exe'
argument = ["-modify", "-location:123á456ß99"]

argument_ansi = []
for x in argument:
    argument_ansi.append(x.encode('utf-8').decode('ansi', 'replace'))
cmd = [command]
cmd.extend(argument_ansi)
result = subprocess.check_output(cmd, shell=False, encoding="utf-8", universal_newlines=True)


来源:https://stackoverflow.com/questions/58522863/subprocess-command-encoding

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!