How to make several for loops to perform different functions using R

*爱你&永不变心* 提交于 2020-06-13 11:45:11

问题


This is my limit point and I need R professionals to help me with a quick way of looping my codes. I have a df like:

GENES <- c('RCD-7','ADF-1','BBF-10','BBF-10','BBF-10','CCF-103')
pos_1 <- c ('T','G','T','A','C','T')
pos_2 <- c ('G','T','A','A','C','G')
df <- data.frame(GENES,pos_1,pos_2)

print(df)

GENES   pos_1 pos_2
RCD-7     T     G
ADF-1     G     T
BBF-10    T     A
BBF-10    A     A
BBF-10    C     C
CCF-103   T     G

What I do with the df is that I want to calculate the percentage of each Nucleotide (let's say alphabet) in each position (which are columns) and get the maximum percentage for each position for each GENE in the first column. I have received my desired output by writing separate lines of codes. However, my df has more than 200 rows and columns so I want to avoid keep pasting the same codes for different positions again and again.

Here are the command lines (I'm showing just for two positions) I have used to get my calculations and to get the desired output.

counts1 <- table(df$GENES, df$pos_1) 
counts2 <- table(df$GENES, df$pos_2)
#
counts_df1 <- as.data.frame(unclass(counts1))
counts_df2 <- as.data.frame(unclass(counts2))
#
ordered_df1 <- tibble::rownames_to_column(counts_df1, "GENES")
ordered_df2 <- tibble::rownames_to_column(counts_df2, "GENES")
#
colnames(ordered_df1) <- c("GENES", "A1", "T1", "C1","G1")
colnames(ordered_df2) <- c("GENES", "A2", "T2", "C2", "G2")
#
ordered_df1[, c(2:4)] <- sapply(ordered_df1[, c(2:4)], as.numeric)
ordered_df2[, c(2:4)] <- sapply(ordered_df2[, c(2:4)], as.numeric)
#
final_df1 <- cbind(ordered_df1[1], prop.table(as.matrix(ordered_df1[-1]), margin = 1)*100)
final_df2 <- cbind(ordered_df2[1], prop.table(as.matrix(ordered_df2[-1]), margin = 1)*100)
#
row_max_df1 <- final_df1 %>% mutate(pos_1_max=pmax(A1, C1, G1, T1))
row_max_df2 <- final_df2 %>% mutate(pos_2_max=pmax(A2, C2, G2, T2))
#
col_combined1 <- cbind (row_max_df1[,c(1,6)],row_max_df2[,6] )

The desired output should be:

GENES     pos_1_max        pos_2_max
ADF-1     100.00000        100.00000
BBF-10    33.33333         66.66667
CCF-103   100.00000        100.00000
RCD-7     100.00000        100.00000

I even couldn't start writing a loop for the first two lines of my code so I would really appreciate any help.

来源:https://stackoverflow.com/questions/62130265/how-to-make-several-for-loops-to-perform-different-functions-using-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!