How do I extract contents from a koRpus object in R?

时光总嘲笑我的痴心妄想 提交于 2019-12-24 21:13:40

问题


I'm using the tm package, and looking to get the Flesch-Kincaid scores for a document using R. I found the koRpus package has some a lot of metrics including reading-level, and started using that. However, the object returned seems to be a very complicated s4 object I don't understand how to parse.

So, I apply this to my corpus:

txt <- system.file("texts", "txt", package = "tm") 
(d <- Corpus(DirSource(txt, encoding = "UTF-8"), readerControl = list(language = "lat")))

f <- function(x) tokenize(x, format="obj", lang='en') 
g <- function(x) flesch.kincaid(x)
x <- foreach(i=1:5) %dopar% g(f(d[[i]]))

x is then the vector of flesch.kincaid applied to Ovid.

> x[[1]]

Flesch-Kincaid Grade Level
  Parameters: default 
       Grade: 13.62 
         Age: 18.62 

Text language: en 

How can I get just the return values grade=13.62, and age=18.62? The str(x) is so large it's hard to parse, ie:

> str(x[[1]])
Formal class 'kRp.readability' [package "koRpus"] with 49 slots
  ..@ hyphen                   :Formal class 'kRp.hyphen' [package "koRpus"] with 3 slots
  .. .. ..@ lang  : chr "en"
  .. .. ..@ desc  :List of 5
  .. .. .. ..$ num.syll         : num 196
  .. .. .. ..$ syll.distrib     : num [1:6, 1:4] 25 25 65 27.8 27.8 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
  .. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
  .. .. .. ..$ syll.uniq.distrib: num [1:6, 1:4] 15 15 61 19.7 19.7 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
  .. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
  .. .. .. ..$ avg.syll.word    : num 2.18
  .. .. .. ..$ syll.per100      : num 218
  .. .. ..@ hyphen:'data.frame':    90 obs. of  2 variables:
  .. .. .. ..$ syll: num [1:90] 1 1 1 1 2 3 1 2 3 1 ...
  .. .. .. ..$ word: chr [1:90] "Si" "quis" "in" "hoc" ...
  ..@ param                    :List of 1
  .. ..$ Flesch.Kincaid: Named num [1:3] 0.39 11.8 15.59
  .. .. ..- attr(*, "names")= chr [1:3] "asl" "asw" "const"
  ..@ ARI                      :List of 1
  .. ..$ : logi NA
  ..@ ARI.NRI                  :List of 1
  .. ..$ : logi NA
  ..@ ARI.simple               :List of 1
  .. ..$ : logi NA
  ..@ Bormuth                  :List of 1
  .. ..$ : logi NA
  ..@ Coleman                  :List of 1
  .. ..$ : logi NA
  ..@ Coleman.Liau             :List of 1
  .. ..$ : logi NA
  ..@ Dale.Chall               :List of 1
  .. ..$ : logi NA
  ..@ Dale.Chall.PSK           :List of 1
  .. ..$ : logi NA
  ..@ Dale.Chall.old           :List of 1
  .. ..$ : logi NA
  ..@ Danielson.Bryan          :List of 1
  .. ..$ : logi NA
  ..@ Dickes.Steiwer           :List of 1
  .. ..$ : logi NA
  ..@ DRP                      :List of 1
  .. ..$ : logi NA
  ..@ ELF                      :List of 1
  .. ..$ : logi NA
  ..@ Flesch                   :List of 1
  .. ..$ : logi NA
  ..@ Flesch.PSK               :List of 1
  .. ..$ : logi NA
  ..@ Flesch.de                :List of 1
  .. ..$ : logi NA
  ..@ Flesch.es                :List of 1
  .. ..$ : logi NA
  ..@ Flesch.fr                :List of 1
  .. ..$ : logi NA
  ..@ Flesch.nl                :List of 1
  .. ..$ : logi NA
  ..@ Flesch.Kincaid           :List of 3
  .. ..$ flavour: chr "default"
  .. ..$ grade  : num 13.6
  .. ..$ age    : num 18.6
  ..@ Farr.Jenkins.Paterson    :List of 1
  .. ..$ : logi NA
  ..@ Farr.Jenkins.Paterson.PSK:List of 1
  .. ..$ : logi NA
  ..@ FOG                      :List of 1
  .. ..$ : logi NA
  ..@ FOG.PSK                  :List of 1
  .. ..$ : logi NA
  ..@ FOG.NRI                  :List of 1
  .. ..$ : logi NA
  ..@ FORCAST                  :List of 1
  .. ..$ : logi NA
  ..@ FORCAST.RGL              :List of 1
  .. ..$ : logi NA
  ..@ Fucks                    :List of 1
  .. ..$ : logi NA
  ..@ Harris.Jacobson          :List of 1
  .. ..$ : logi NA
  ..@ Linsear.Write            :List of 1
  .. ..$ : logi NA
  ..@ LIX                      :List of 1
  .. ..$ : logi NA
  ..@ RIX                      :List of 1
  .. ..$ : logi NA
  ..@ SMOG                     :List of 1
  .. ..$ : logi NA
  ..@ SMOG.de                  :List of 1
  .. ..$ : logi NA
  ..@ SMOG.C                   :List of 1
  .. ..$ : logi NA
  ..@ SMOG.simple              :List of 1
  .. ..$ : logi NA
  ..@ Spache                   :List of 1
  .. ..$ : logi NA
  ..@ Spache.old               :List of 1
  .. ..$ : logi NA
  ..@ Strain                   :List of 1
  .. ..$ : logi NA
  ..@ Traenkle.Bailer          :List of 1
  .. ..$ : logi NA
  ..@ TRI                      :List of 1
  .. ..$ : logi NA
  ..@ Wheeler.Smith            :List of 1
  .. ..$ : logi NA
  ..@ Wheeler.Smith.de         :List of 1
  .. ..$ : logi NA
  ..@ Wiener.STF               :List of 1
  .. ..$ : logi NA
  ..@ lang                     : chr "en"
  ..@ desc                     :List of 26
  .. ..$ sentences          : int 10
  .. ..$ words              : int 90
  .. ..$ letters            : Named num [1:12] 492 0 8 9 14 18 14 9 10 6 ...
  .. .. ..- attr(*, "names")= chr [1:12] "all" "l1" "l2" "l3" ...
  .. ..$ all.chars          : int 692
  .. ..$ syllables          : Named num [1:5] 196 25 32 25 8
  .. .. ..- attr(*, "names")= chr [1:5] "all" "s1" "s2" "s3" ...
  .. ..$ lttr.distrib       : num [1:6, 1:11] 0 0 90 0 0 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
  .. .. .. ..$ : chr [1:11] "1" "2" "3" "4" ...
  .. ..$ syll.distrib       : num [1:6, 1:4] 25 25 65 27.8 27.8 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
  .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
  .. ..$ syll.uniq.distrib  : num [1:6, 1:4] 15 15 61 19.7 19.7 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
  .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
  .. ..$ punct              : int 17
  .. ..$ conjunctions       : int 0
  .. ..$ prepositions       : int 0
  .. ..$ pronouns           : int 0
  .. ..$ foreign            : int 0
  .. ..$ TTR                : num 0.844
  .. ..$ avg.sentc.length   : num 9
  .. ..$ avg.word.length    : num 5.47
  .. ..$ avg.syll.word      : num 2.18
  .. ..$ sntc.per.word      : num 0.111
  .. ..$ sntc.per100        : num 11.1
  .. ..$ lett.per100        : num 547
  .. ..$ syll.per100        : num 218
  .. ..$ FOG.hard.words     : NULL
  .. ..$ Bormuth.NOL        : NULL
  .. ..$ Dale.Chall.NOL     : NULL
  .. ..$ Harris.Jacobson.NOL: NULL
  .. ..$ Spache.NOL         : NULL
  ..@ TT.res                   :'data.frame':   107 obs. of  6 variables:
  .. ..$ token : chr [1:107] "Si" "quis" "in" "hoc" ...
  .. ..$ tag   : chr [1:107] "word.kRp" "word.kRp" "word.kRp" "word.kRp" ...
  .. ..$ lemma : chr [1:107] "" "" "" "" ...
  .. ..$ lttr  : num [1:107] 2 4 2 3 5 6 3 5 6 1 ...
  .. ..$ wclass: chr [1:107] "word" "word" "word" "word" ...
  .. ..$ desc  : chr [1:107] "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" ...

I'd ideally like to assign the F-K score to the meta(d) back in tm.

I'd appreciate learning either how to understand this return object and take out its values, but also, if there's another, better, faster way to get a F-K score, I'm all ears!


回答1:


Similar to @Paul answer but one liner solution

   sapply(lapply(x,slot,'Flesch.Kincaid'),'[',c('age','grade'))
      [,1]     [,2]     [,3]     [,4]     [,5]  
age   18.61778 17.62351 17.77699 18.29032 18.645
grade 13.61778 12.62351 12.77699 13.29032 13.645



回答2:


Just use:

slot(x[[1]], "Flesch.Kincaid")

to get the subset of the object that contains these values. To get these in a list for each element in x, do something like:

list_fk = lapply(x, slot, "Flesch.Kincaid)

...and to get a vector with grade:

grades = sapply(list_fk, "[[", "grade")


来源:https://stackoverflow.com/questions/14835894/how-do-i-extract-contents-from-a-korpus-object-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!