Subprocess command encoding

问题

I'm currently migration a script from Perl to Python3 (3.6.5). Is is running on Windows Server 2016. The Script builds a command line with arguments and executes the created string with subprocess.check_output. One of the argument option is called -location:"my street". The location can contain special chars like umlaut (äöß) or (áŠ).

When I run the Perl script the special chars are passed correctly to the application. When I run the Python script the special chars are replaced by question marks in the application. I think the called application needs a UTF-8 encoded argument string.

The Perl script runs in UTF-8 mode

use UTF8;
binmode( STDOUT, ":utf-8" );

The Python script is created with PyCharm, UTF-8 encoded and the first line of the script contains

# -*- coding: utf-8 -*-

I tried several things to set encoding to UTF-8 for the subprocess arguments, but it didn't work. I used procmon.exe to compare the application call between the Perl and Python script. What I can see is that the command line that is displayed for Python subprocess call in procmon is readable for me. The working Perl call not. The location string looks for the perl script in procmon looks like this:

-location:"HQ/Ã¤Ã¶Ã¶Ã¶StraÃŸe".

The Perl code looks like this:

$command = "C:\\PROGRAM FILES\\Application\\bin\\cfg.exe"
$operand = "-modify -location:123á456ß99"
$result  = `$command $operand`;

The Python code looks like this:

# -*- coding: utf-8 -*-
import subprocess
result = subprocess.check_output(['C:\\PROGRAM FILES\\Application\\bin\\cfg.exe', "-modify", "-location:123á456ß99"], shell=False, stderr=subprocess.STDOUT)

Any idea what I have to do that the python arguments are passed correctly to the application?

回答1:

In Python 3.3+ you can separately indicate that you expect text in a particular encoding. The keyword argument universal_newlines=True was renamed in 3.7 to the more accurate and transparent text=True.

This keyword basically says "just use whatever encoding is default on my system" (so basically UTF-8 on anything reasonably modern except on Windows, where you get ~~some Cthulhu atrocity from the abyss~~ the system's default code page).

In the absence of this keyword, subprocesses receive and return bytes in Python 3.

Of course, if you know the encoding, you can also separately .decode() the bytes you get back.

If you know the encoding it's probably useful to use the encoding= keyword argument (even if you assume it is also the system encoding; this was added in Python 3.6).

response = subprocess.check_output([...], text=True)
response = subprocess.check_output([...], encoding='utf-8')
response = subprocess.check_output([...]).decode('utf-8')

回答2:

The trick to get the script running, is to encode the arguments to 'utf8' and then to decode them to 'ansi'.

command = r'C:\PROGRAM FILES\Application\bin\cfg.exe'
argument = ["-modify", "-location:123á456ß99"]

argument_ansi = []
for x in argument:
    argument_ansi.append(x.encode('utf-8').decode('ansi', 'replace'))
cmd = [command]
cmd.extend(argument_ansi)
result = subprocess.check_output(cmd, shell=False, encoding="utf-8", universal_newlines=True)

来源：https://stackoverflow.com/questions/58522863/subprocess-command-encoding

标签

python

python-3.x

perl