conditionally duplicating rows in a data frame

…衆ロ難τιáo~ 提交于 2021-01-28 03:12:23

问题


This is a sample of my data set:

   day city count
1   1    A    50
2   2    A   100
3   2    B   110
4   2    C    90

Here is the code for reproducing it:

  df <- data.frame(
    day = c(1,2,2,2),
    city = c("A","A","B","C"),
    count = c(50,100,110,90)
    )

As you could see, the count data is missing for city B and C on the day 1. What I want to do is to use city A's count as an estimate for the other two cities. So the desired output would be:

   day city count
1   1    A    50
2   1    B    50
3   1    C    50
4   2    A   100
5   2    B   110
6   2    C    90

I could come up with a for loop to do it, but I feel like there should be an easier way of doing it. My idea is to count the number of observations for each day, and then for the days that the number of observations is less than the number of cities in the data set, I would replicate the row to complete the data for that day. Any better ideas? or a more efficient for-loop? Thanks.


回答1:


With dplyr and tidyr, we can do:

library(dplyr)
library(tidyr)

df %>% 
  expand(day, city) %>% 
  left_join(df) %>% 
  group_by(day) %>% 
  fill(count, .direction = "up") %>% 
  fill(count, .direction = "down")

Alternatively, we can avoid the left_join using thelatemail's solution:

df %>% 
  complete(day, city) %>% 
  group_by(day) %>% 
  fill(count, .direction = "up") %>% 
  fill(count, .direction = "down")

Both return:

# A tibble: 6 x 3
    day city  count
  <dbl> <fct> <dbl>
1    1. A       50.
2    1. B       50.
3    1. C       50.
4    2. A      100.
5    2. B      110.
6    2. C       90.

Data (slightly modified to show .direction filling both directions):

df <- data.frame(
  day = c(1,2,2,2),
  city = c("B","A","B","C"),
  count = c(50,100,110,90)
)


来源:https://stackoverflow.com/questions/49184893/conditionally-duplicating-rows-in-a-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!