问题
Sorry if it is too basic, but I am not familiar with R.
I have a data frame with multiple columns having the same column names, so after being imported to R, identifiers have been added. Something like this:
A = c(2, 3, 5)
A.1 = c('aa', 'bb', 'cc')
A.2 = c(TRUE, FALSE, TRUE)
B = c(1, 2, 5)
B.1 = c('bb', 'cc', 'dd')
B.2 = c(TRUE, TRUE, TRUE)
df = data.frame(A, A.1, A.2, B, B.1, B.2)
df
A A.1 A.2 B B.1 B.2
1 2 aa TRUE 1 bb TRUE
2 3 bb FALSE 2 cc TRUE
3 5 cc TRUE 5 dd TRUE
I would like to extract all columns that have A
, regardless of the identifier extension so it becomes like:
A A.1 A.2
1 2 aa TRUE
2 3 bb FALSE
3 5 cc TRUE
I know we can
df2 = df[, c("A", "A.1", "A.2")]
But I have many of this type of columns so I do not want to type in individually. I am sure there are smart ways to do this.
Thanks!
回答1:
Try this to get all the columns with names starting with "A"
df2 = df[, grepl("^A", names( df))]
R's extraction '['
-function allows the use of logical indexing in its two-argument mode. You will find the regex functions in R very useful and may I recommend reading ?regex
as well as looking for examples on SO and Rhelp Archives by @G. Grothendieck
回答2:
library(stringr)
A = c(2, 3, 5)
A.1 = c('aa', 'bb', 'cc')
A.2 = c(TRUE, FALSE, TRUE)
B = c(1, 2, 5)
B.1 = c('bb', 'cc', 'dd')
B.2 = c(TRUE, TRUE, TRUE)
df = data.frame(A, A.1, A.2, B)
df[,str_detect(names(df),'A')]
A A.1 A.2
1 2 aa TRUE
2 3 bb FALSE
3 5 cc TRUE
#If you want to find out A or B.
A = c(2, 3, 5)
A.1 = c('aa', 'bb', 'cc')
A.2 = c(TRUE, FALSE, TRUE)
B = c(1, 2, 5)
B.1 = c('bb', 'cc', 'dd')
F.2 = c(TRUE, TRUE, TRUE)
df = data.frame(A, A.1, A.2, B,F.2)
df[,str_detect(names(df),'A|B')]
A A.1 A.2 B
1 2 aa TRUE 1
2 3 bb FALSE 2
3 5 cc TRUE 5
回答3:
If we are using tidyverse
, starts_with
is one way
library(tidyverse)
df %>%
select(starts_with("A"))
# A A.1 A.2
#1 2 aa TRUE
#2 3 bb FALSE
#3 5 cc TRUE
来源:https://stackoverflow.com/questions/44121843/how-to-extract-columns-with-same-name-but-different-identifiers-in-r