问题
I have a tm Corpus object like this:
> summary(corp.eng)
A corpus with 154 text documents
The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
create_date creator
Available variables in the data frame are:
MetaID
The metadata for each document in the corpus looks this:
> meta(corp.eng[[1]])
Available meta data pairs are:
Author :
DateTimeStamp: 2013-04-18 14:37:24
Description :
Heading :
ID : Smith-John_e.txt
Language : en_CA
Origin :
I know that I can set the Author of one document at a time with this:
meta(corp.eng[[1]],tag="Author") <-
paste(
rev(
unlist(
strsplit(meta(corp.eng[[1]],tag="ID"), c("[-_]"))
)[1:2]
), collapse=' ')
which gives me a result like this:
> meta(corp.eng[[1]],tag="Author")
[1] "John Smith"
How do I batch the job?
回答1:
NOTE: This should still probably be a comment, but there is some working portion, so here goes an example:
data(crude)
extracted.values <- meta(crude,tag="Places",type="local")
for (i in seq_along(extracted.values)) {
meta(crude[[i]],tag="Places") <- substr(extracted.values[[i]],1,3)
}
One should be able to do it using lapply
as well, but as I am not familiar with the inner workings of tm
, I'll stick with loop. Substitute the substr
function with the one you need, and the data on the left side as well of course. Hope this helps.
来源:https://stackoverflow.com/questions/16090001/how-to-set-author-for-each-doc-in-a-corpus-by-parsing-doc-id