SimpleHTTPServer in Python3.6.4 can not handle non-ASCII string(Chinese in my case)

问题

I run SimpleHTTPServer in Python3.6.4 64bit by this command:

python -m http.server --cgi

then I make a form in test.py, submit it to test_form_action.py to print the input text.

cgi-bin/test.py

# coding=utf-8
from __future__ import unicode_literals, absolute_import

print("Content-Type: text/html")  # HTML is following
print()
reshtml = '''<!DOCTYPE html>
<html lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html" charset="utf-8"/>
</head>
<body>
<div style="text-align: center;">
    <form action="/cgi-bin/test_form_action.py" method="POST"
          target="_blank">
        输入:<input type="text" id= "id" name="name"/></td>
        <button type="submit">Submit</button>
    </form>
</div>
</body>
</html>'''

print(reshtml)

cgi-bin/test_form_action.py

# coding=utf-8
from __future__ import unicode_literals, absolute_import

# Import modules for CGI handling
import cgi, cgitb
cgitb.enable()

if __name__ == '__main__':
    print("Content-Type: text/html")  # HTML is following
    print()

    form = cgi.FieldStorage()
    print(form)
    id = form.getvalue("id")
    name = form.getvalue("name")

    print(id)

When I visit http://127.0.0.1:8000/cgi-bin/test.py, The Chinese Character "输入" doesn't show right, it look like "��", I have to manually change the Text Encoding of this page from "Unicode" to "Chinese Simplified" in Firefox to make Chinese Character look normal.

It's weird, since I put charset="utf-8" in cgi-bin/test.py.

Further more, when I put some Chinese in input form, and submit. But cgi-bin/test_form_action.py is blank.

meanwhile some error show in windows terminal where I run SimpleHTTPServer:

127.0.0.1 - - [23/Mar/2018 23:43:32] b'Error in sys.excepthook:\r\nTraceback (most recent call last):\r\n File "E:\Python\Python36\Lib\cgitb.py", line 26 8, in call\r\n
self.handle((etype, evalue, etb))\r\n File "E:\Python\Python36\Lib\cgitb.py", line 288, in handle\r\n
self.file.write(doc + \'\ n\')\r\nUnicodeEncodeError: \'gbk\' codec can\'t encode character \'\ufffd\' in position 1894: illegal multibyte sequence\r\n\r\nOriginal exception was:\r\nT raceback (most recent call last):\r\n File "G:\Python\Project\VideoHelper\cgi-bin\test_form_action.py", line 13, in \r\n print(form)\r\nUnico deEncodeError: \'gbk\' codec can\'t encode character \'\ufffd\' in position 52: illegal multibyte sequence\r\n' 127.0.0.1 - - [23/Mar/2018 23:43:32] CGI script exit status 0x1

回答1:

When you use the print() expression, Python converts the strings to bytes, ie. it encodes them using a default codec. The choice of this default value depends on the environment – in your case it seems to be GBK (judging from the error message).

In the HTML page your CGI script returns, you specify the codec ("charset") as UTF-8. You can of course change this to GBK, but it will only solve your first problem (display of test.py), not the second one (encoding error in test_form_action.py). Instead, it's probably better to get Python to send UTF-8-encoded data on STDOUT.

One approach is to replace all occurrences of

print(x)

with

sys.stdout.buffer.write(x.encode('utf8'))

Alternatively, you can replace sys.stdout with a re-encoded wrapper, without changing the print() occurrences:

sys.stdout = open(sys.stdout.buffer.fileno(), 'w', encoding='utf8'))

Note: These two solutions don't work in Python 2.x (you'd have to omit the .buffer part there). I'm writing this because your code has from __future__ import statements, which have no use in code that is run with Python 3 exclusively.

来源：https://stackoverflow.com/questions/49453682/simplehttpserver-in-python3-6-4-can-not-handle-non-ascii-stringchinese-in-my-ca

标签

python

unicode

encoding

webserver

simplehttpserver