How to sum every nth (200) observation in a data frame using R [duplicate]

别来无恙 提交于 2020-01-14 06:24:10

问题


I am new to R so any help is greatly appreciated!

I have a data frame of 278800 observations for each of my 10 variables, I am trying to create an 11th variable that sums every 200 observations (or rows) of a specific variable/column (sum(1:200, 201:399, 400:599 etc.) Similar to the offset function in excel. I have tried subsetting my data to just the variable of interest with the aim of adding a new variable that continuously sums every 200 rows however I cannot figure it out. I understand my new "variable" will produce 1,394 data points (278,800/200). I have tried to use the rollapply function, however the output does not sum in blocks of 200, it sums 1:200, 2:201, 3:202 etc.)

Thanks,

E


回答1:


rollapply has a by= argument for that. Here is a smaller example using n = 3 instead of n = 200. Note that 1+2+3=6, 4+5+6=15, 7+8+9=24 and 10+11+12=33.

# test data
DF <- data.frame(x = 1:12)

n <- 3
rollapply(DF$x, n, sum, by = n)
## [1]  6 15 24 33



回答2:


First let's generate some data and get a label for each group:

library(tidyverse)
df <-
  rnorm(1000) %>% 
  as_tibble() %>% 
  mutate(grp = floor(1 + (row_number() - 1) / 200))

> df
# A tibble: 1,000 x 2
    value   grp
     <dbl> <dbl>
 1  -1.06      1
 2   0.668     1
 3  -2.02      1
 4   1.21      1
...
1000 0.78      5

This creates 1000 random N(0,1) variables, turns it into a data frame, and then adds an incrementing numeric label for each group of 200.

df %>% 
  group_by(grp) %>% 
  summarize(grp_sum = sum(value))

# A tibble: 5 x 2
    grp grp_sum
  <dbl>   <dbl>
1     1    9.63
2     2  -12.8 
3     3  -18.8 
4     4   -8.93
5     5  -25.9 

Then we just need to do a group-by operation on the second column and sum the values. You can use the pull() operation to get a vector of the results:

df %>% 
  group_by(grp) %>% 
  summarize(grp_sum = sum(value)) %>% 
  pull(grp_sum)
[1]   9.62529 -12.75193 -18.81967  -8.93466 -25.90523



回答3:


I created a vector with 278800 observations (a)

 a<- rnorm(278800)    
 b<-NULL #initializing the column of interest 
 j<-1 
 for (i in seq(1,length(a),by=200)){
 b[j]<-sum(a[i:i+199]) #b is your column of interest
 j<-j+1
 }
 View(b)


来源:https://stackoverflow.com/questions/52339492/how-to-sum-every-nth-200-observation-in-a-data-frame-using-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!