When writing a Python 3.1 CGI script, I run into horrible UnicodeDecodeErrors. However, when running the script on the command line, everything works.
You shouldn't read your IO streams as strings for CGI/WSGI; they aren't Unicode strings, they're explicitly byte sequences.
(Consider that Content-Length
is measured in bytes and not characters; imagine trying to read a multipart/form-data
binary file upload submission crunched into UTF-8-decoded strings, or return a binary file download...)
So instead use sys.stdin.buffer
and sys.stdout.buffer
to get the raw byte streams for stdio, and read/write binary with them. It is up to the form-reading layer to convert those bytes into Unicode string parameters where appropriate using whichever encoding your web page has.
Unfortunately the standard library CGI and WSGI interfaces don't get this right in Python 3.1: the relevant modules were crudely converted from the Python 2 originals using 2to3
and consequently there are a number of bugs that will end up in UnicodeError.
The first version of Python 3 that is usable for web applications is 3.2. Using 3.0/3.1 is pretty much a waste of time. It took a lamentably long time to get this sorted out and PEP3333 passed.