Python unicode behaviour in Google App Engine

大兔子大兔子 提交于 2019-12-06 14:54:23

问题


I got completely confused with gae. I have a script, that does a post request(using urlfetch from Google App Engine api) as a response we get a cp1251 encoded html page.

Then I decode it, using .decode('cp1251') and parse with lxml.

My code works totally fine on my local machine:

import re
import leaf #simple wrapper for lxml
weekdaysD={u'понедельник':1, u'вторник':2, u'среда':3, u'четверг':4, u'пятница':5, u'суббота':6}
document = leaf.parse(leaf.strip_symbols(leaf.strip_accents(html_in_cp1251.decode('cp1251'))))
table=document.get('table')
trs=table('tr') #leaf syntax
for tr in trs:
    tds=tr.xpath('td')
    for td in tds:
        if td.colspan=='3':
            curweek=re.findall('\w+(?=\-)', td.text)[0]               
            curday=weekdaysD[td.text.split(u',')[0]]

but when I deploy it to gae, I get:

curday=weekdaysD[td.text.split(u',')[0]]
KeyError: u'\xd0\xb2\xd1\x82\xd0\xbe\xd1\x80\xd0\xbd\xd0\xb8\xd0\xba'

How is non unicode characters there at all? And why is everything ok locally? I've tried all variations of decoding\encoding placing in my code - nothing helped. I'm stuck for a few days now.

UPD: also, if I add to my script on GAE:

print type(weekdaysD.keys()[0]), type(td.text.split(u',')[0]) 

It returns both as 'unicode'. So, I belive that html was decoded correctly. Could it be something with lxml on GAE?


回答1:


That string you got in the error message has unicode for its type but the contents is actually the bytes that would be the UTF-8 encoding of вторник. It would be helpful if you showed us the code that does the urlfetch call, since there is nothing wrong with the code you are showing.




回答2:


Well, the workaround of adding .encode('latin1').decode('utf-8', 'ignore') did the trick. I wish I could explain why it behaves so.



来源:https://stackoverflow.com/questions/9793086/python-unicode-behaviour-in-google-app-engine

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!