Why does the following occur:
>>> u\'\\u0308\'.encode(\'mbcs\') #UMLAUT
\'\\xa8\'
>>> u\'\\u041A\'.encode(\'mbcs\') #CYRILLIC CAPITAL L
DISCLAIMER: I'm the author of the fix mentionned in the following.
To support unicode command line on windows with python 2.7, you can use
this patch to subprocess.Popen(..)
The situation
Python 2 support of unicode command line on windows is very poor.
Are severly bugged:
issuing the unicode command line to the system from the caller side (via subprocess.Popen(..)),
and reading the current command line unicode arguments from the callee side (via sys.argv),
It is acknowledged and won't be fixed on Python 2. These are fixed in Python 3.
Technical Reasons
In Python 2, windows implementation of subprocess.Popen(..) and sys.argv use the non unicode ready windows systems call CreateProcess(..) (see python code, and MSDN doc of CreateProcess) and does not use GetCommandLineW(..) for sys.argv.
In Python 3, windows implementation of subprocess.Popen(..) make use of the correct windows systems calls CreateProcessW(..) starting from 3.0 (see code in 3.0) and sys.argv uses GetCommandLineW(..) starting from 3.3 (see code in 3.3).
How is it fixed
The given patch will leverage ctypes module to call C windows
system CreateProcessW(..) directly. It proposes a new fixed Popen object by overriding private method Popen._execute_child(..) and private function _subprocess.CreateProcess(..) to setup and use CreateProcessW(..) from windows system lib in a way that mimics as much as possible how it is done in Python 3.6.
How to use it
How to use the given patch is demonstrated with this blog post explanation. It additionally shows how to read the current processes
sys.argv with another fix.