How to use the regex module in python to split a string of text into the words only?

别说谁变了你拦得住时间么 提交于 2020-01-15 12:49:28

问题


Here's what I'm working with…

string1 = "Dog,cat,mouse,bird. Human."

def string_count(text):
    text = re.split('\W+', text)
    count = 0
    for x in text:
        count += 1
        print count
        print x

return text

print string_count(string1)

…and here's the output…

1
Dog
2
cat
3
mouse
4
bird
5
Human
6

['Dog', 'cat', 'mouse', 'bird', 'Human', '']

Why am I getting a 6 even though there are only 5 words? I can't seem to get rid of the '' (empty string)! It's driving me insane.


回答1:


Because while it splits based on the last dot, it gives the last empty part also.

You splitted the input string based on \W+ which means split the input string based on one or more non-word character. So your regex matches the last dot also and splits the input based on the last dot also. Because of no string present after to the last dot, it returns an empty string after splitting.




回答2:


Avinash Raj correctly stated WHY it's doing that. Here's how to fix it:

string1 = "Dog,cat,mouse,bird. Human."
the_list = [word for word in re.split('\W+', string1) if word]
# include the word in the list if it's not the empty string

Or alternatively (and this is better...)

string1 = "Dog,cat,mouse,bird. Human."
the_list = re.findall('\w+', string1)
# find all words in string1


来源:https://stackoverflow.com/questions/25496670/how-to-use-the-regex-module-in-python-to-split-a-string-of-text-into-the-words-o

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!