strip punctuation with regex - python

前端 未结 3 2063
太阳男子
太阳男子 2020-12-09 03:41

I need to use regex to strip punctuation at the start and end of a word. It seems like regex would be the best option for this. I don\'t want punctuation r

相关标签:
3条回答
  • 2020-12-09 04:16

    You don't need regular expression to do this task. Use str.strip with string.punctuation:

    >>> import string
    >>> string.punctuation
    '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
    >>> '!Hello.'.strip(string.punctuation)
    'Hello'
    
    >>> ' '.join(word.strip(string.punctuation) for word in "Hello, world. I'm a boy, you're a girl.".split())
    "Hello world I'm a boy you're a girl"
    
    0 讨论(0)
  • 2020-12-09 04:33

    I think this function will be helpful and concise in removing punctuation:

    import re
    def remove_punct(text):
        new_words = []
        for word in text:
            w = re.sub(r'[^\w\s]','',word) #remove everything except words and space#how 
                                            #to remove underscore as well
            w = re.sub(r'\_','',w)
            new_words.append(w)
        return new_words
    
    0 讨论(0)
  • 2020-12-09 04:39

    You can remove punctuation from a text file or a particular string file using regular expression as follows -

    new_data=[]
    with open('/home/rahul/align.txt','r') as f:
        f1 = f.read()
        f2 = f1.split()
    
    
    
        all_words = f2 
        punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~''' 
        # You can add and remove punctuations as per your choice 
        #removing stop words in hungarian text and  english text and 
        #display the unpunctuated string
        # To remove from a string, replace new_data with new_str 
        # new_str = "My name$#@ is . rahul -~"
    
        for word in all_words: 
            if word not in punctuations:
               new_data.append(word)
    
        print (new_data)
    

    P.S. - Do the identation properly as per required. Hope this helps!!

    0 讨论(0)
提交回复
热议问题