I would like to replace (and not remove) all punctuation characters by " " in a string in Python.
Is there something efficient of the following flavour?
text = text.translate(string.maketrans("",""), string.punctuation)
This answer is for Python 2 and will only work for ASCII strings:
The string module contains two things that will help you: a list of punctuation characters and the "maketrans" function. Here is how you can use them:
import string replace_punctuation = string.maketrans(string.punctuation, ' '*len(string.punctuation)) text = text.translate(replace_punctuation)
Modified solution from Best way to strip punctuation from a string in Python
import string import re regex = re.compile('[%s]' % re.escape(string.punctuation)) out = regex.sub(' ', "This is, fortunately. A Test! string") # out = 'This is fortunately A Test string'
There is a more robust solution which relies on a regex exclusion rather than inclusion through an extensive list of punctuation characters.
import re print(re.sub('[^\w\s]', '', 'This is, fortunately. A Test! string')) #Output - 'This is fortunately A Test string'
The regex catches anything which is not an alpha-numeric or whitespace character
What's the difference between translating all
; into '' and remove all
Here is to remove all
s = 'dsda;;dsd;sad' table = string.maketrans('','') string.translate(s, table, ';')
And you can do your replacement with translate.
In my specific way, I removed "+" and "&" from the punctuation list:
all_punctuations = string.punctuation selected_punctuations = re.sub(r'(\&|\+)', "", all_punctuations) print selected_punctuations str = "he+llo* ithis& place% if you * here @@" punctuation_regex = re.compile('[%s]' % re.escape(selected_punctuations)) punc_free = punctuation_regex.sub("", str) print punc_free
Result: he+llo ithis& place if you here
This workaround works in python 3:
import string ex_str = 'SFDF-OIU .df !hello.dfasf sad - - d-f - sd' #because len(string.punctuation) = 32 table = str.maketrans(string.punctuation,' '*32) res = ex_str.translate(table) # res = 'SFDF OIU df hello dfasf sad d f sd'