问题
Good morning
I have a dataset of fisheries data with several variables that look like this:
ID Day Month Year Depth Haul number Count LengthClass
H111200840 11 1 2008 -80 40 4 10-20
H111200840 11 1 2008 -80 40 15 20-30
H29320105 29 3 2010 -40 5 3 50-60
H29320105 29 3 2010 -40 5 8 60-70
The column ID is a unique ID made by paste
the columns day,month,Year and Haul.number. As you can see for the same ID I have data of different Length Class. En each Haul, fish from different lengths are captured.
However, LengthClass is a factor variable with the following levels: 10-20, 20-30, 30-40, 40-50 and fish of a certain length class that is not captured in a Haul is not recorded in the dataset.
I need to include in the above data.frame example new rows for each ID with the missing levels of LengthClass.
The missing Length classes should have a Count of 0 but the rest of the variables have to be the same.
This is an example of what I would like
ID Day Month Year Depth Haul number Count LengthClass
H111200840 11 1 2008 -80 40 4 10-20
H111200840 11 1 2008 -80 40 15 20-30
H111200840 11 1 2008 -80 40 0 30-40
H111200840 11 1 2008 -80 40 0 40-50
H111200840 11 1 2008 -80 40 0 50-60
H29320105 29 3 2010 -40 5 3 40-60
H29320105 29 3 2010 -40 5 8 50-60
H29320105 29 3 2010 -40 5 0 10-20
H29320105 29 3 2010 -40 5 0 20-30
H29320105 29 3 2010 -40 5 0 30-40
Is there anyway to do this in R? I have tried for loops with if arguments but with no luck and also the example of this post:
Thanks for any advice in advance
回答1:
You can use tidyr
for this.
First use tidyr::complete to fill in all the combinations of LengthClass
, specifying that Count
should be filled in as 0
.
Then sort the data and use tidyr::fill to fill in the same values for the other columns (other than ID
, LengthClass
, and Count
).
Create Data
library(tidyr)
library(dplyr)
df <- readr::read_csv(
'ID,Day,Month,Year,Depth,Haul_number,Count,LengthClass
H111200840,11,1,2008,-80,40,4,10-20
H111200840,11,1,2008,-80,40,15,20-30
H29320105,29,3,2010,-40,5,3,50-60
H29320105,29,3,2010,-40,5,8,60-70') %>%
mutate(LengthClass = as.factor(LengthClass))
df
#> # A tibble: 4 x 8
#> ID Day Month Year Depth Haul_number Count LengthClass
#> <chr> <int> <int> <int> <int> <int> <int> <fctr>
#> 1 H111200840 11 1 2008 -80 40 4 10-20
#> 2 H111200840 11 1 2008 -80 40 15 20-30
#> 3 H29320105 29 3 2010 -40 5 3 50-60
#> 4 H29320105 29 3 2010 -40 5 8 60-70
Fill in the extra rows
df %>%
group_by(ID) %>%
complete(LengthClass, fill = list(Count = 0)) %>%
arrange(ID, Day) %>%
fill(-ID, -LengthClass, -Count, .direction = "down") %>%
ungroup()
#> # A tibble: 8 x 8
#> ID LengthClass Day Month Year Depth Haul_number Count
#> <chr> <fctr> <int> <int> <int> <int> <int> <dbl>
#> 1 H111200840 10-20 11 1 2008 -80 40 4
#> 2 H111200840 20-30 11 1 2008 -80 40 15
#> 3 H111200840 50-60 11 1 2008 -80 40 0
#> 4 H111200840 60-70 11 1 2008 -80 40 0
#> 5 H29320105 50-60 29 3 2010 -40 5 3
#> 6 H29320105 60-70 29 3 2010 -40 5 8
#> 7 H29320105 10-20 29 3 2010 -40 5 0
#> 8 H29320105 20-30 29 3 2010 -40 5 0
来源:https://stackoverflow.com/questions/44271398/for-loops-including-rows-in-a-dataframe-by-the-missing-values-of-factor-levels