问题
Groups Names COL1 COL2 COL3 COL4
1 G1 SP1 1 0.400 0.500 Sequence1
2 G1 SP1 1 0.004 0.005 Sequence2
3 G1 SP1 0 0.004 0.005 Sequence3
4 G1 SP2 0 0.400 0.005 Sequence123
5 G1 SP2 0 0.004 0.500 Sequence14
6 G1 SP3 0 0.005 0.006 Sequence15
7 G1 SP5 1 0.400 0.006 Sequence16
8 G1 SP6 1 0.008 0.002 Sequence20
10 G2 Sp1 0 0.004 0.005 Sequence17
11 G2 SP1 0 0.050 0.600 Sequence18
12 G2 SP1 0 0.400 0.600 Sequence3
13 G2 SP2 0 0.004 0.005 Sequence22
14 G2 SP2 0 0.004 0.005 Sequence23
15 G2 SP5 0 0.004 0.005 Sequence16
16 G2 SP6 0 0.003 0.002 Sequence21
17 G2 SP7 0 0.560 0.760 Sequence67
Here is the dput:
dput(test_df)
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("G1", "G2"), class = "factor"),
Names = structure(c(2L, 2L, 2L, 3L, 3L, 4L, 5L, 6L, 1L, 2L,
2L, 3L, 3L, 5L, 6L, 7L), .Label = c("Sp1", "SP1", "SP2",
"SP3", "SP5", "SP6", "SP7"), class = "factor"), COL1 = c(1L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), COL2 = c(0.4, 0.004, 0.004, 0.4, 0.004, 0.005, 0.4, 0.008,
0.004, 0.05, 0.4, 0.004, 0.004, 0.004, 0.003, 0.56), COL3 = c(0.5,
0.005, 0.005, 0.005, 0.5, 0.006, 0.006, 0.002, 0.005, 0.6,
0.6, 0.005, 0.005, 0.005, 0.002, 0.76), COL4 = structure(c(1L,
8L, 13L, 2L, 3L, 4L, 5L, 9L, 6L, 7L, 13L, 11L, 12L, 5L, 10L,
14L), .Label = c("Sequence1", "Sequence123", "Sequence14",
"Sequence15", "Sequence16", "Sequence17", "Sequence18", "Sequence2",
"Sequence20", "Sequence21", "Sequence22", "Sequence23", "Sequence3",
"Sequence67"), class = "factor")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "10", "11", "12", "13", "14",
"15", "16", "17"))
and from this dataf
rame I whant to get another dataframe such as :
G1 G2
SP1 A B
SP2 x x
SP3 x NA
SP4 NA NA
SP5 A X
SP6 a x
SP7 NA b
The idea is for each Groups to add the Names that are present in the row and add letters A,B, X or NA in the cells and the upper or lowercases will depend if we find an identic COL4 value for at least one species in anotger Groups.
- A is put when
anyrow for theNamehave aCOL1 >0AND there is at least one Names that have the sameCOL4content as the same Name in a different Groups - a is put when
anyrow for theNamehave aCOL1 >0AND there is no Names that have the sameCOL4content as the same Name in a different Groups - B is put when
anyrow for theNamehave aCOL1=0ANDCOL2ANDCOL3 > 0.05AND there is at least one Names that have the sameCOL4content as the same Name in a different Groups - b is put when
anyrow for theNamehave aCOL1=0ANDCOL2ANDCOL3 > 0.05AND there is no Names that have the sameCOL4content as the same Name in a different Groups - X is put when
allrow for theNamehave aCOL1=0ANDCOL2``ORCOL3 > 0.05ANDthere is at least one Names that have the sameCOL4content as the same Name in a different Groups - x is put when
allrow for theNamehave aCOL1=0ANDCOL2ORCOL3 > 0.05ANDthere is no Names that have the sameCOL4content as the same Name in a different Groups - NA is put when there is not the
Namein theGroup
let's take 4 examples :
1)
We see for the G1-SP1 that the row1 has a COL1 > 0, then it will have a letter A or a in the new dataframe.
Now in order to know if it will be an A or an a we have to look at the COL4, we see in the row12 the Sequence3 is also present in the G2 for the SP1, so it will be an 'A'
2)
We see for the G2-SP1 that the row12 has a COL2 and COL3 are > 0.05, then it will have a letter B or b in the new dataframe.
And it will be B because in the G1, row3 the Sequence3 is also present in the G2 for the SP1.
3)
We see for the G2-SP2 that none row has a COL1 >0X or COL2 and COL3 are > 0.05, then it will have a letter B or x in the new dataframe.
And it will be x because none other SP2 in other Groups have the same Sequence `(Sequence22,Sequence23 or Sequence24)
4)
We see for the G1-SP6 the row8 has a COL1 > 0, then it will have a letter A or a in the new dataframe.
And it will be a because none other SP1 in other Groups have the same Sequence (Sequence20)
`
For that I tried:
Env_table<-as.data.frame(test_df) %>%
group_by(Groups,Names) %>%
mutate(Env_variable = replace_na(COL1, "."),
Env_variable = ifelse(any(COL1 >=1) , "A", Env_variable)) %>%
mutate(Env_variable = ifelse(all(COL1 ==0 ) && all(COL2 >0.05) && all(COL3 >0.05) , "B", Env_variable)) %>%
mutate(Env_variable = ifelse(all(COL1 ==0 ) && all(COL2 <0.05) && all(COL3 <0.05) , "X", Env_variable)) %>%
mutate(Env_variable = ifelse(all(COL1 ==0 ) && all(COL2 <0.05) && all(COL3 >0.05) , "X", Env_variable)) %>%
mutate(Env_variable = ifelse(all(COL1 ==0 ) && all(COL2 >0.05) && all(COL3 <0.05) , "X", Env_variable)) %>%
mutate(Env_variable = ifelse(all(COL1 ==0) && all(!is.na(COL1)) && all(COL2 >0.05) && all(COL3 >0.05) , "*", Env_variable))%>%
slice(1) %>%
pivot_wider(id_col = Names, names_from = Groups, values_from = Env_variable) %>%
arrange(as.integer(str_extract(Names, "\\d+")))
where Env_variable is juste an empty column that will store the A,B,X or NA values.
Thanks fo your help
回答1:
Your question is not crystal-clear, but here is an attempt to answer:
test_df %>%
group_by(Groups, Names) %>%
summarise(
x=case_when(
any(COL1>=1, na.rm=TRUE) ~ "A",
any(COL1==0 & (COL2>0.05 & COL3>0.05), na.rm=TRUE) ~ "B",
any(COL1==0 & (COL2<0.05 | COL3<0.05), na.rm=TRUE) ~ "X",
TRUE ~ NA_character_
)
) %>%
pivot_wider(names_from = Groups, values_from = x)
This would give this output:
Names G1 G2 <fct> <chr> <chr> 1 SP1 A B 2 SP2 X X 3 SP3 X NA 4 SP5 A X 5 SP6 A X 6 SP1 NA X 7 SP7 NA B
来源:https://stackoverflow.com/questions/61595711/using-dplyr-to-create-new-dataframe-depending-on-thresholds