quanteda kwic regex operation

前端 未结 2 1715
野趣味
野趣味 2020-12-21 21:59

Further edit to original question.
Question originated by expectation that regexes would work identically or nearly to \"grep\" or to some programming l

相关标签:
2条回答
  • 2020-12-21 22:19

    The examples from the ITAUR repository are based on an older syntax. What you need is the phrase() wrapper - see ?phrase. You should also probably brush up on the regular expression syntax you are trying to achieve with the *, since it may not be what you want, and since a regular expression cannot start with a "*". (This might help.) The default "glob" valuetype will probably achieve what you want.

    library("quanteda")
    ## Package version: 1.1.4
    ## Parallel computing: 2 of 8 threads used.
    ## See https://quanteda.io for tutorials and examples.
    
    kwic(data_char_ukimmig2010, phrase("will deport"))
    
    ## [BNP, 156:157] nation.- The BNP | will deport | all foreigners convicted of crimes
    
    kwic(data_char_ukimmig2010, phrase("will .*deport.*"), valuetype = "regex")
    
    ## [BNP, 156:157] nation.- The BNP | will deport | all foreigners convicted of crimes
    
    0 讨论(0)
  • 2020-12-21 22:30

    You are trying to match a phrase with your pattern. By default, the pattern argument is treated as a space separated list of keywords, and the search is performed against this list. So, you may get your expected result using

    > kwic(immigCorpus, phrase("will deport"), window = 3)
    [BNP, 156:157] - The BNP | will deport | all foreigners convicted
    

    A valuetype = "regex" makes sense if you are using a regex. E.g. to get both shall and will deport use

    > kwic(immigCorpus, phrase("(will|shall) deport"), window = 3, valuetype = "regex")
    
       [BNP, 156:157]             - The BNP | will deport  | all foreigners convicted
     [BNP, 1951:1952] illegal immigrants We | shall deport | all illegal immigrants  
     [BNP, 2584:2585]  Foreign Criminals We | shall deport | all criminal entrants 
    

    See this kwic documentation.

    0 讨论(0)
提交回复
热议问题