R: How to change the column names in a data frame based on a specification

一曲冷凌霜 提交于 2019-12-07 13:32:18

问题


I have a data frame, the start of it is below:

                                SM_H1455          SM_V1456          SM_K1457      SM_X1461          SM_K1462
ENSG00000000419.8                290               270               314               364               240
ENSG00000000457.8                252               230               242               220               106
ENSG00000000460.11               154               158               162               136                64
ENSG00000000938.7              20106             18664             19764             15640             19024
ENSG00000000971.11                30                10                 4                 2                10

Note that there are many more cols and rows.

Here's what I want to do: I want to change the name of the columns. The most important information in a column's name, e.g. SM_H1455, is the 4th character of the character string. In this case it's a H. What I want to do is to change the "SM" part to "Control" if the 4th character is "H" or "K", and "Case" if the 4th column is "X" or "V". I'd like to keep everything else in the name. So that in the end, I'd like a table like this:

                        Control_H1455          Case_V1456        Control_K1457      Case_X1461        Control_K1462
ENSG00000000419.8                290               270               314               364               240
ENSG00000000457.8                252               230               242               220               106
ENSG00000000460.11               154               158               162               136                64
ENSG00000000938.7              20106             18664             19764             15640             19024
ENSG00000000971.11                30                10                 4                 2                10

Please keep in mind that whether the 4th character is "V", "X", "K" or "H" is completely random.

I'd appreciate any help! Thanks.


回答1:


One way, where x is your df:

controls <- which(substring(names(x),4,4) %in% c("H","K"))
cases <- which(substring(names(x),4,4) %in% c("X","V"))
names(x)[controls] <- gsub("SM","Control",names(x)[controls])
names(x)[cases] <- gsub("SM","Case",names(x)[cases])

Alternatively:

names(x) <- sapply(names(x),function(z) {
    if(substring(z,4,4) %in% c("H","K"))
        sub("SM","Control",z)
    else if(substring(z,4,4) %in% c("X","V"))
        sub("SM","Case",z)
})



回答2:


One-line alternative:

names(x) <- sub("^..(.(H|K))", "Control\\1", sub("^..(.(X|V))", "Case\\1", names(x))

First the names containing X and V are changed, then in the output string H and K containing names are changed.



来源:https://stackoverflow.com/questions/17970287/r-how-to-change-the-column-names-in-a-data-frame-based-on-a-specification

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!