问题
I have a large data set with text comments and their ratings on different variables, like so:
df <- data.frame(
comment = c("commentA","commentB","commentB","commentA","commentA","commentC"
sentiment=c(1,2,1,4,1,2),
tone=c(1,5,3,2,6,1)
)
Every comment is present between one and 3 times, since multiple people are asked to rate the same comment sometimes.
I'm looking to create a data frame where the "comment" column only has unique values, and the other columns are appended, so any one text comment has as many "sentiment" and "tone" columns as there are ratings (which will result in NA's for comments that have not been rated as often, but that's okay):
df <- data.frame(
comment = c("commentA","commentB","commentC",
sentiment.1=c(1,2,2),
sentiment.2=c(4,1,NA),
sentiment.3=c(1,NA,NA),
tone.1=c(1,5,1),
tone.2=c(2,3,NA),
tone.3=c(6,NA,NA)
)
I've been trying to figure this out using reshape
to go from long to wide using
reshape(df,
idvar = "comment",
timevar = c("sentiment","tone"),
direction = "wide"
)
But that results in all possible combinations between sentiment and tone, rather than simply duplicating sentiment and tone independently.
I also tried using gather
like so df %>% gather(key, value, -comment)
, but that only gets me halfway there...
Could anyone please point me in the right direction?
回答1:
You need to create a variable to use as the numbers in the columns. rowid(comment)
does the trick.
In dcast you put the row identifiers to the left of ~
and the column identifiers to the right. Then value.var is a character vector of all columns you want to include int this long-to-wide transformation.
library(data.table)
setDT(df)
dcast(df, comment ~ rowid(comment), value.var = c('sentiment', 'tone'))
# comment sentiment_1 sentiment_2 sentiment_3 tone_1 tone_2 tone_3
# 1: commentA 1 4 1 1 2 6
# 2: commentB 2 1 NA 5 3 NA
# 3: commentC 2 NA NA 1 NA NA
来源:https://stackoverflow.com/questions/59310225/r-combine-duplicate-rows-by-appending-columns