问题
I'm new to R. I have a data frame with column names of such type:
file_001 file_002 block_001 block_002 red_001 red_002 ....etc'
0.05 0.2 0.4 0.006 0.05 0.3
0.01 0.87 0.56 0.4 0.12 0.06
I want to split them into groups by the column name, to get a result like this:
group_file
file_001 file_002
0.05 0.2
0.01 0.87
group_block
block_001 block_002
0.4 0.006
0.56 0.4
group_red
red_001 red_002
0.05 0.3
0.12 0.06
...etc'
My file is huge. I don't have a certain number of groups. It needs to be just by the column name's start.
回答1:
In base R, you can use sub
and split.default
like this to return a list of data.frames:
myDfList <- split.default(dat, sub("_\\d+", "", names(dat)))
this returns
myDfList
$block
block_001 block_002
1 0.40 0.006
2 0.56 0.400
$file
file_001 file_002
1 0.05 0.20
2 0.01 0.87
$red
red_001 red_002
1 0.05 0.30
2 0.12 0.06
split.default
will split data.frames by variable according to its second argument. Here, we use sub
and the regular expression "_\d+" to remove the underscore and all numeric values following it in order to return the splitting values "block", "file", and "red".
As a side note, it is typically a good idea to keep these data.frames in a list and work with them through functions like lapply
. See gregor's answer to this post for some motivating examples.
回答2:
Thank you lmo, after using your code, it didn't work as I wanted, but I came with a solution thanks to your guidance.
So, in order to divide a Data Frame list:
myDfList <- split.default(dat, sub(x = as.character(names(dat)), pattern = "\\_.*", ""))
hope it'll help people in the future!
来源:https://stackoverflow.com/questions/47287845/split-data-frame-into-groups-by-column-name