I need to step through a Python string one character at a time, but a simple \"for\" loop gives me UTF-16 code units instead:
str = \"abc\\u20ac\\U00010302\\
Python normally stores the unicode values internally as UCS2. The UTF-16 representation of the UTF-32 \U00010302 character is \UD800\UDF02, that's why you got that result.
That said, there are some python builds that use UCS4, but these builds are not compatible with each other.
Take a look here.
Py_UNICODE This type represents the storage type which is used by Python internally as basis for holding Unicode ordinals. Python’s default builds use a 16-bit type for Py_UNICODE and store Unicode values internally as UCS2. It is also possible to build a UCS4 version of Python (most recent Linux distributions come with UCS4 builds of Python). These builds then use a 32-bit type for Py_UNICODE and store Unicode data internally as UCS4. On platforms where wchar_t is available and compatible with the chosen Python Unicode build variant, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. On all other platforms, Py_UNICODE is a typedef alias for either unsigned short (UCS2) or unsigned long (UCS4).