Replicate each row of data.frame and specify the number of replications for each row?

北城余情 提交于 2019-11-27 15:49:51

Update

Upon revisiting this question, I have a feeling that @Codoremifa was correct in their assumption that your "frequency" column might be a factor.

Here's an example if that were the case. It won't match your actual data since I don't know what other levels are in your dataset.

mydf$F2 <- factor(as.character(mydf$frequency))
## expandRows(mydf, "F2")
mydf[rep(rownames(mydf), mydf$F2), ]
#      a b frequency F2
# 1    5 3         2  2
# 1.1  5 3         2  2
# 1.2  5 3         2  2
# 2    5 7         1  1
# 3    9 1        40 40
# 3.1  9 1        40 40
# 3.2  9 1        40 40
# 3.3  9 1        40 40
# 4   12 4         5  5
# 4.1 12 4         5  5
# 4.2 12 4         5  5
# 4.3 12 4         5  5
# 4.4 12 4         5  5
# 5   12 5        13 13
# 5.1 12 5        13 13

Hmmm. That doesn't look like 61 rows to me. Why not? Because rep uses the numeric values underlying the factor, which is quite different in this case from the displayed value:

as.numeric(mydf$F2)
# [1] 3 1 4 5 2

To properly convert it, you would need:

as.numeric(as.character(mydf$F2))
# [1]  2  1 40  5 13

Original answer

A while ago I wrote a function that is a bit more of a generalization of @Simono101's answer. The function looks like this:

expandRows <- function(dataset, count, count.is.col = TRUE) {
  if (!isTRUE(count.is.col)) {
    if (length(count) == 1) {
      dataset[rep(rownames(dataset), each = count), ]
    } else {
      if (length(count) != nrow(dataset)) {
        stop("Expand vector does not match number of rows in data.frame")
      }
      dataset[rep(rownames(dataset), count), ]
    }
  } else {
    dataset[rep(rownames(dataset), dataset[[count]]), 
            setdiff(names(dataset), names(dataset[count]))]
  }
}

For your purposes, you could just use expandRows(mydf, "frequency")

head(expandRows(mydf, "frequency"))
#     a b
# 1   5 3
# 1.1 5 3
# 2   5 7
# 3   9 1
# 3.1 9 1
# 3.2 9 1   

Other options are to repeat each row the same number of times:

expandRows(mydf, 2, count.is.col=FALSE)
#      a b frequency
# 1    5 3         2
# 1.1  5 3         2
# 2    5 7         1
# 2.1  5 7         1
# 3    9 1        40
# 3.1  9 1        40
# 4   12 4         5
# 4.1 12 4         5
# 5   12 5        13
# 5.1 12 5        13

Or to specify a vector of how many times to repeat each row.

expandRows(mydf, c(1, 2, 1, 0, 2), count.is.col=FALSE)
#      a b frequency
# 1    5 3         2
# 2    5 7         1
# 2.1  5 7         1
# 3    9 1        40
# 5   12 5        13
# 5.1 12 5        13

Note the required count.is.col = FALSE argument in those last two options.

Nearly. You want to pass [ a vector of row indices, not row.names. Try this...

jb[ rep( seq_len( nrow(jb) ) , times = jb$frequency ) , ]

rep( seq_len( nrow(jb) ) , times = jb$frequency ) 
# [1] 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
# [39] 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5

This might be more of a comment but seeing that all the other answers are suggesting new options - if you correct the spelling of jb$freqency when creating jb.expanded, and convert jb$frequency to an integer then the construction you mention in your question also works.

And why I suspect jb$frequency is a factor is because the incorrect frequencies are neatly ordered as 11,12,13,14.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!