Is it possible to use `kwic` function to find words near to each other?

你说的曾经没有我的故事 提交于 2021-01-28 08:24:46

问题


I found this reference : https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch05s07.html Is it possible to use it with kwicfunction in the quanteda package to be able to find documents in a corpus containing words that are not "stuck" but close to each other, with maybe a few other words between ?

for example, if I give two words in the function, I would like to find the documents in a corpus where these two words occur but maybe with some words between. For example, you tell me "engine" and "electrical", I will also get the reports where "electrical synchronous engine" appears but not the ones in which "engine" and "electrical" appear in completely different contexts.


回答1:


quanteda does not have a NEAR operator, but you can do the same thing using window argument of tokens_select(). In this example, I am searching words five words from "america*" uisng kwic():

require(quanteda)
toks <- tokens(data_corpus_inaugural)
toks_america <- tokens_select(toks, "america*", window = 5)

kwic(toks_america, "econom*")
# [2013-Obama, 45] has been tested by crises | economic | recovery has begun. America's

kwic(toks_america, "power")
# [1997-Clinton, 85] it can give Americans the | power | to make a government is


来源:https://stackoverflow.com/questions/49907577/is-it-possible-to-use-kwic-function-to-find-words-near-to-each-other

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!