POS tagging for each record in R

蹲街弑〆低调 提交于 2019-12-11 09:13:28

问题


I have a data frame like

Task    Response

1   NA
2   NA
3   EFFICACY
4   I was sent to external vendor for solution (PDA parts), but at PDA parts they identified within few minites that new battery would not solve the issue. I wonder why this diagnosis part could no have been done at the locla IS service in the Amgen office. Now I spent time to visit PDA parts at their place, while this finally did not bring any solution.
5   Issue could not be resolved

Where the 2 columns are tasks and Response. And response has certain NA values.

Now i am looking to create POS tagging for each record and extract only the NOUNS

Where for the 5 records POS tagging created should be like -

Task   POSTagged
1      NA/NNP

2      NA/NNP
3      EFFICACY/NNP
4       vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN
5      Issue/NN

So it should be matrix of 2 columns and 5 records

I am trying to use the function

tagPOS =  function(x) {
  s <- as.String(x)

  sent_token_annotator = Maxent_Sent_Token_Annotator()
  word_token_annotator = Maxent_Word_Token_Annotator()
  a2 = annotate(s, list(sent_token_annotator, word_token_annotator))
  pos_tag_annotator = Maxent_POS_Tag_Annotator()
  a3 = annotate(s, pos_tag_annotator, a2)
  a3w = subset(a3, type == "word")
  POStags = unlist(lapply(a3w$features, `[[`, "POS"))
  gc()
  return(paste(POStags,collapse = " "))
}

I have tried lapply, with,by to loop through the records but all are giving the combined POStagged for all the 5 records against each record.

I.e. for each record I am getting the POStagged as

NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN

What i am getting is

Task Response
1   NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN
2   NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN

3   NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN

4   NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN

5   NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN

Which is not What i am looking for. Codes tried

lapply(df2$Task, tagPOS (df2$Response), data = df2)
resultset <- group_by(df2, Task) %>% do(tagPOS (df2$Response))
df2[,c("Keywords"):= tagPOS(strip(df2$Response)),by = Task]
Responsedf<-lapply(Response, extractPOS, "NN")
df2$noun <- with(df2, extractPOS(df2$Response, "NN"))

But nothing worked so far Hope i made sense.

Any suggestion would be appreciated


回答1:


Found the solution -

for (i in 0:nrow(df2)) {
  df2$noun[i]<-lapply(df2$short_description[i], extractPOS, "NN")
  gc()
}

Thanks.



来源:https://stackoverflow.com/questions/45287154/pos-tagging-for-each-record-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!