Extract English words from a text in R

蹲街弑〆低调 提交于 2019-11-29 08:42:15

You could use the package I maintain qdapDictionaries (no need for the parent package qdap to be installed). If your data is more complex you may need to use tools like tolower etc. to make it work. The idea here is basically to see where a known word list ?GradyAugmented intersects with your words. Here are two very similar approaches, the first is likely slightly faster depending on data:

vector <- c("picture", "carpet", "lamp", "notaword", "anothernotaword")

library(qdapDictionaries)
vector[vector %in% GradyAugmented]

## [1] "picture" "carpet"  "lamp"

intersect(vector, GradyAugmented)

## [1] "picture" "carpet"  "lamp"   

The error you are receiving with installing qdap sounds like @Ben Bolker is correct. You will need a newer version (I'd suggest the latest version) of data.table installed (use packageVersion("data.table") to check this). That is an oversight on my part with not requiring a minimal version of data.table, I thought setDT (a function in the data.table package) was always around but it appears to not be in your version. But to solve this particular problem you wouldn't need to install the parent qdap package, just qdapDictionaries.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!