问题
I have a list of words for example:
words = ['one','two','three four','five','six seven']
# quote was missing
And I am trying to create a new list where each item in the list is just one word so I would have:
words = ['one','two','three','four','five','six','seven']
Would the best thing to do be join the entire list into a string and then tokenize the string? Something like this:
word_string = ' '.join(words)
tokenize_list = nltk.tokenize(word_string)
Or is there a better option?
回答1:
You can join using a space separator and then split again:
In [22]:
words = ['one','two','three four','five','six seven']
' '.join(words).split()
Out[22]:
['one', 'two', 'three', 'four', 'five', 'six', 'seven']
回答2:
words = ['one','two','three four','five','six seven']
With a loop:
words_result = []
for item in words:
for word in item.split():
words_result.append(word)
or as a comprehension:
words = [word for item in words for word in item.split()]
回答3:
Here's a solution with a slight use of regular expressions:
import re
words = ['one','two','three four','five','six seven']
result = re.findall(r'[a-zA-Z]+', str(words))
来源:https://stackoverflow.com/questions/30085694/python-convert-list-of-multiple-words-to-single-words