Replace words in corpus according to dictionary data frame

时光毁灭记忆、已成空白 提交于 2019-12-01 01:10:10

I would suggest not using a data.frame for a dictionary, since the basic object in R, a vector, is a dictionary by default.

      dict  <- c('primo', 'secondo', 'testo')
names(dict) <- c('first', 'second', 'text')

Then to "tanslate" x, where x might be "second", you simply use:

   dict[[x]]

You dont even need a wrapper function.


If you want to translate in the opposite direction, use

   name(dict)[names(dict) %in% x]

Or you can flip the dictionary

         dict.flip  <- names(dict)
   names(dict.flip) <- dict

In combination with the tm_map function of the tm package, you can use stri_replace_all_fixed from package stringi. For instance:

library(tm)
library(stringi)

docs <- c("first text", "second text")
corp <- Corpus(VectorSource(docs))

word <- c('first', 'second', 'text')
tran <- c('primo', 'secondo', 'testo')

corp <- tm_map(corp, function(x) stri_replace_all_fixed(x, word, tran, vectorize_all = FALSE))
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!