问题
I have a symmetrical dataframe and would like to select a subset of the data to use for analysis. This means selecting both the desired rows and columns and maintaining the right order so the new dataframe is still a symmetrical cube. With example data:
# Example data
Sample <- c('Sample_A', 'Sample_B', 'Sample_C', 'Sample_D', 'Sample_E')
Sample_A <- c(0, 3.16, 1, 1.41, 3)
Sample_B <- c(3.16, 0, 3, 2.83, 1)
Sample_C <- c(1, 3, 0, 1, 2.83)
Sample_D <- c(1.41, 2.83, 1, 0, 2.65)
Sample_E <- c(3, 1, 2.83, 2.65, 0)
df = data.frame(Sample, Sample_A, Sample_B, Sample_C, Sample_D, Sample_E)
df
Then I separately define the samples I'm interested in e.g.
samples_to_use <- c("Sample_B", "Sample_D", "Sample_E")
What I want to end up with looks like this
# Desired output
Sample <- c('Sample_B', 'Sample_D', 'Sample_E')
Sample_B <- c(0, 2.83, 1)
Sample_D <- c(2.83, 0, 2.65)
Sample_E <- c(1, 2.65, 0)
df_2 = data.frame(Sample, Sample_B, Sample_D, Sample_E)
df_2
i.e. I select the rows and columns that match samples_to_use.
I've tried separately selecting the rows by merging df with a dataframe of samples_to_use but that seems inelegant and also leaves me with the wrong columns that no longer match the rows. Looking for a more elegant solution, thanks!
回答1:
We can use column index with 'samples_to_use' while the row index can be a logical index to check whether the 'samples_to_use' elements are %in% the column 'Sample'
df[df$Sample %in% samples_to_use, c("Sample", samples_to_use)]
NOTE: Is is not a symmetric matrix. If it needs to be a symmetric matrix, the first column should be removed and it should be row names and convert the 'data.frame' to 'matrix'
m1 <- as.matrix(df[-1])
row.names(m1) <- df$Sample
Then, the subsetting is easier
m1[samples_to_use, samples_to_use]
# Sample_B Sample_D Sample_E
#Sample_B 0.00 2.83 1.00
#Sample_D 2.83 0.00 2.65
#Sample_E 1.00 2.65 0.00
来源:https://stackoverflow.com/questions/53383376/selecting-by-both-rows-and-columns-in-a-symmetrical-matrix-in-r