urllib.urlencode doesn't like unicode values: how about this workaround?

前端未结

关注

 8  734

If I have an object like:

d = {\'a\':1, \'en\': \'hello\'}

...then I can pass it to urllib.urlencode, no problem:

相关标签:

8条回答

忘掉有多难

2020-12-12 19:32
Nothing new to add except to point out that the urlencode algorithm is nothing tricky. Rather than processing your data once and then calling urlencode on it, it would be perfectly fine to do something like:
```
from urllib import quote_plus

def urlencode_utf8(params):
    if hasattr(params, 'items'):
        params = params.items()
    return '&'.join(
        (quote_plus(k.encode('utf8'), safe='/') + '=' + quote_plus(v.encode('utf8'), safe='/')
            for k, v in params))
```
Looking at the source code for the urllib module (Python 2.6), their implementation does not do much more. There is an optional feature where values in the parameters that are themselves 2-tuples are turned into separate key-value pairs, which is sometimes useful, but if you know you won't need that, the above will do.

You can even get rid of the if hasattr('items', params): if you know you won't need to handle lists of 2-tuples as well as dicts.
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-12-12 19:34
this one line working fine in my case -->
```
urllib.quote(unicode_string.encode('utf-8'))
```
thanks @IanCleland and @PavelVlasov
0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2020-12-12 19:39
Seems like it is a wider topic than it looks, especially when you have to deal with more complex dictionary values. I found 3 ways of solving the problem:
1. Patch urllib.py to include encoding parameter:
```
def urlencode(query, doseq=0, encoding='ascii'):
```
  and replace all str(v) conversions to something like v.encode(encoding)
  
  Obviously not good, since it's hardly redistributable and even harder to maintain.
2. Change default Python encoding as described here. The author of the blog pretty clearly describes some problems with this solution and who knows how more of them could be lurking in the shadows. So it doesn't look good to me either.
3. So I, personally, ended up with this abomination, which encodes all unicode strings to UTF-8 byte strings in any (reasonably) complex structure:
```
def encode_obj(in_obj):

    def encode_list(in_list):
        out_list = []
        for el in in_list:
            out_list.append(encode_obj(el))
        return out_list

    def encode_dict(in_dict):
        out_dict = {}
        for k, v in in_dict.iteritems():
            out_dict[k] = encode_obj(v)
        return out_dict

    if isinstance(in_obj, unicode):
        return in_obj.encode('utf-8')
    elif isinstance(in_obj, list):
        return encode_list(in_obj)
    elif isinstance(in_obj, tuple):
        return tuple(encode_list(in_obj))
    elif isinstance(in_obj, dict):
        return encode_dict(in_obj)

    return in_obj
```
  You can use it like this: urllib.urlencode(encode_obj(complex_dictionary))
  
  To encode keys also, out_dict[k] can be replaced with out_dict[k.encode('utf-8')], but it was a bit too much for me.
0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-12-12 19:49

It seems that you can't pass a Unicode object to urlencode, so, before calling it, you should encode every unicode object parameter. How you do this in a proper way seems to me very dependent on the context, but in your code you should always be aware of when to use the unicode python object (the unicode representation) and when to use the encoded object (bytestring).

Also, encoding the str values is "superfluous": What is the difference between encode/decode?

0 讨论(0)
发布评论:

提交评论
- 加载中...
抹茶落季

2020-12-12 19:50

Why so long answers?

urlencode(unicode_string.encode('utf-8'))

0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2020-12-12 19:53
I had the same problem with German "Umlaute". The solution is pretty simple:

In Python 3+, urlencode allows to specify the encoding:
```
from urllib import urlencode
args = {}
args = {'a':1, 'en': 'hello', 'pt': u'olá'}
urlencode(args, 'utf-8')

>>> 'a=1&en=hello&pt=ol%3F'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页