Converting list of strings with u'…' to a list of normal strings [duplicate]

问题

I'm a newbie in python. And apologies for a very basic question.

I'm working with python pattern.en library and try to get the synonyms of a word. this is my code and is working fine.

from pattern.en import wordnet
a=wordnet.synsets('human')
print a[0].synonyms

this what the output i get from this:

[u'homo', u'man', u'human being', u'human']

but for my program i need to insert this array as this:

['homo', 'man', 'human being', 'human']

how do i get an output as above and remove the 'u' from my output.

thanks in advance..!

回答1:

Try proper encoding- But care this u does not have any effect on data- it is just an explicit representation of unicode object (not byte array), if your code needs back unicode then better to feed it unicode.

>>>d =  [u'homo', u'man', u'human being', u'human']
>>>print [i.encode('utf-8') for i in d]
>>>['homo', 'man', 'human being', 'human']

回答2:

In short:

There's no need to convert you list of unicodes into strings. They're the same thing

In long:

The u'...' prefix in the string object represents a Unicode object introduced in Python 2.0, see https://docs.python.org/2/tutorial/introduction.html#unicode-strings

Starting with Python 2.0 a new data type for storing text data is available to the programmer: the Unicode object. It can be used to store and manipulate Unicode data (see http://www.unicode.org/) and integrates well with the existing string objects, providing auto-conversions where necessary.

And since Python 3.0, see https://docs.python.org/3.2/tutorial/introduction.html#about-unicode:

Starting with Python 3.0 all strings support Unicode (see http://www.unicode.org/).

Regardless of what is the default string type, when checking for equivalence, they should be the same in both Python 2.x and 3.x:

alvas@ubi:~$ python2
Python 2.7.11 (default, Dec 15 2015, 16:46:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> type(u'man')
<type 'unicode'>
>>> type('man')
<type 'str'>
>>> u'man' == 'man'
True

alvas@ubi:~$ python3
Python 3.4.1 (default, Jun  4 2014, 11:27:44) 
[GCC 4.8.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> type(u'man')
<class 'str'>
>>> type('man')
<class 'str'>
>>> u'man' == 'man'
True

And in Python 2, when you MUST or are required to convert from unicode to str type let's say for type checks or something, e.g.:

alvas@ubi:~$ python3
>>> u'man' == 'man'
True
>>> type(u'man') == type('man')
True
>>> exit()
alvas@ubi:~$ python2
>>> u'man' == 'man'
True
>>> type(u'man') == type('man')
False

then you should be able to simply cast it to str(u'man') or u'man'.encode('utf-8').

But there could be some "pain" / endless errors if your unicode string is out of the ascii range and you're trying to write it to file or print it onto console which might not have defaultencoding set to 'utf-8'. In that case, watch https://www.youtube.com/watch?v=sgHbC6udIqc

Additionally, here are similar questions relating to the u'...' prefix:

What does the 'u' symbol mean in front of string values?
Why is there a 'u' before every line of my output?
Python string prints as [u'String']
https://stackoverflow.com/questions/4855645/how-to-turn-unicode-strings-into-regular-strings
What's the u prefix in a python string
Printing a string prints 'u' before the string in Python?

来源：https://stackoverflow.com/questions/34986329/converting-list-of-strings-with-u-to-a-list-of-normal-strings

标签

python

nlp

nltk

wordnet