Selecting by both rows and columns in a symmetrical matrix in R

问题

I have a symmetrical dataframe and would like to select a subset of the data to use for analysis. This means selecting both the desired rows and columns and maintaining the right order so the new dataframe is still a symmetrical cube. With example data:

# Example data 
Sample <- c('Sample_A', 'Sample_B', 'Sample_C', 'Sample_D', 'Sample_E') 
Sample_A <- c(0, 3.16, 1, 1.41, 3) 
Sample_B <- c(3.16, 0, 3, 2.83, 1) 
Sample_C <- c(1, 3, 0, 1, 2.83) 
Sample_D <- c(1.41, 2.83, 1, 0, 2.65) 
Sample_E <- c(3, 1, 2.83, 2.65, 0) 
df = data.frame(Sample, Sample_A, Sample_B, Sample_C, Sample_D, Sample_E)
df

Then I separately define the samples I'm interested in e.g.

samples_to_use <- c("Sample_B", "Sample_D", "Sample_E")

What I want to end up with looks like this

# Desired output
Sample <- c('Sample_B', 'Sample_D', 'Sample_E')
Sample_B <- c(0, 2.83, 1)
Sample_D <- c(2.83, 0, 2.65)
Sample_E <- c(1, 2.65, 0)
df_2 = data.frame(Sample, Sample_B, Sample_D, Sample_E)
df_2

i.e. I select the rows and columns that match samples_to_use.

I've tried separately selecting the rows by merging df with a dataframe of samples_to_use but that seems inelegant and also leaves me with the wrong columns that no longer match the rows. Looking for a more elegant solution, thanks!

回答1:

We can use column index with 'samples_to_use' while the row index can be a logical index to check whether the 'samples_to_use' elements are %in% the column 'Sample'

df[df$Sample %in% samples_to_use, c("Sample", samples_to_use)]

NOTE: Is is not a symmetric matrix. If it needs to be a symmetric matrix, the first column should be removed and it should be row names and convert the 'data.frame' to 'matrix'

m1 <- as.matrix(df[-1])
row.names(m1) <- df$Sample

Then, the subsetting is easier

m1[samples_to_use, samples_to_use]
#         Sample_B Sample_D Sample_E
#Sample_B     0.00     2.83     1.00
#Sample_D     2.83     0.00     2.65
#Sample_E     1.00     2.65     0.00

来源：https://stackoverflow.com/questions/53383376/selecting-by-both-rows-and-columns-in-a-symmetrical-matrix-in-r

标签

dataframe

merge