I am using urllib to get a string of html from a website and need to put each word in the html document into a list.
Here is the code I have so far. I keep getting a
You need to call replace
on z
and not on str
, since you want to replace characters located in the string variable z
removeSpecialChars = z.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")
But this will not work, as replace looks for a substring, you will most likely need to use regular expression module re
with the sub
function:
import re
removeSpecialChars = re.sub("[!@#$%^&*()[]{};:,./<>?\|`~-=_+]", " ", z)
Don't forget the []
, which indicates that this is a set of characters to be replaced.