In Python 3.x, a string consists of items of Unicode ordinal. (See the quotation from the language reference below.) What is the internal representation of Unicode string? I
The internal representation will change in Python 3.3 which implements PEP 393. The new representation will pick one or several of ascii, latin-1, utf-8, utf-16, utf-32, generally trying to get a compact representation.
Implicit conversions into surrogate pairs will only be done when talking to legacy APIs (those only exist on windows, where wchar_t is two bytes); the Python string will be preserved. Here are the release notes.