replace special characters in a string python

后端 未结 5 833
感动是毒
感动是毒 2020-12-04 20:06

I am using urllib to get a string of html from a website and need to put each word in the html document into a list.

Here is the code I have so far. I keep getting a

5条回答
  •  天命终不由人
    2020-12-04 20:38

    One way is to use re.sub, that's my preferred way.

    import re
    my_str = "hey th~!ere"
    my_new_string = re.sub('[^a-zA-Z0-9 \n\.]', '', my_str)
    print my_new_string
    

    Output:

    hey there
    

    Another way is to use re.escape:

    import string
    import re
    
    my_str = "hey th~!ere"
    
    chars = re.escape(string.punctuation)
    print re.sub(r'['+chars+']', '',my_str)
    

    Output:

    hey there
    

    Just a small tip about parameters style in python by PEP-8 parameters should be remove_special_chars and not removeSpecialChars

    Also if you want to keep the spaces just change [^a-zA-Z0-9 \n\.] to [^a-zA-Z0-9\n\.]

提交回复
热议问题