tm custom removePunctuation except hashtag
I have a Corpus of tweets from twitter. I clean this corpus (removeWords, tolower, delete URls) and finally also want to remove punctuation. Here is my code: tweetCorpus <- tm_map(tweetCorpus, removePunctuation, preserve_intra_word_dashes = TRUE) The problem now is, that by doing so I also loose the hashtag (#). Is there a way to remove punctuation with tm_map but remain the hashtag? You could adapt the existing removePunctuation to suit your needs. For example removeMostPunctuation<- function (x, preserve_intra_word_dashes = FALSE) { rmpunct <- function(x) { x <- gsub("#", "\002", x) x <-