Split data.frame into groups by column name

问题

I'm new to R. I have a data frame with column names of such type:

file_001   file_002   block_001   block_002   red_001   red_002 ....etc'  
  0.05       0.2        0.4         0.006       0.05       0.3
  0.01       0.87       0.56        0.4         0.12       0.06

I want to split them into groups by the column name, to get a result like this:

group_file
file_001   file_002
  0.05       0.2
  0.01       0.87

group_block
block_001   block_002
  0.4        0.006
  0.56       0.4

group_red
red_001    red_002
  0.05       0.3
  0.12       0.06

...etc'

My file is huge. I don't have a certain number of groups. It needs to be just by the column name's start.

回答1:

In base R, you can use sub and split.default like this to return a list of data.frames:

myDfList <- split.default(dat, sub("_\\d+", "", names(dat)))

this returns

myDfList
$block
  block_001 block_002
1      0.40     0.006
2      0.56     0.400

$file
  file_001 file_002
1     0.05     0.20
2     0.01     0.87

$red
  red_001 red_002
1    0.05    0.30
2    0.12    0.06

split.default will split data.frames by variable according to its second argument. Here, we use sub and the regular expression "_\d+" to remove the underscore and all numeric values following it in order to return the splitting values "block", "file", and "red".

As a side note, it is typically a good idea to keep these data.frames in a list and work with them through functions like lapply. See gregor's answer to this post for some motivating examples.

回答2:

Thank you lmo, after using your code, it didn't work as I wanted, but I came with a solution thanks to your guidance.

So, in order to divide a Data Frame list:

myDfList <- split.default(dat, sub(x = as.character(names(dat)), pattern = "\\_.*", ""))

hope it'll help people in the future!

来源：https://stackoverflow.com/questions/47287845/split-data-frame-into-groups-by-column-name

标签

dataframe

strsplit