Read csv with two headers into a data.frame

时间秒杀一切 提交于 2019-11-29 15:48:57

Use base R reshape():

temp = read.delim(text="a,,,b,,
x,y,z,x,y,z
10,1,5,22,1,6
12,2,6,21,3,5
12,2,7,11,3,7
13,1,4,33,2,8
12,2,5,44,1,9", header=TRUE, skip=1, sep=",")
names(temp)[1:3] = paste0(names(temp[1:3]), ".0")
OUT = reshape(temp, direction="long", ids=rownames(temp), varying=1:ncol(temp))
OUT
#     time  x y z id
# 1.0    0 10 1 5  1
# 2.0    0 12 2 6  2
# 3.0    0 12 2 7  3
# 4.0    0 13 1 4  4
# 5.0    0 12 2 5  5
# 1.1    1 22 1 6  1
# 2.1    1 21 3 5  2
# 3.1    1 11 3 7  3
# 4.1    1 33 2 8  4
# 5.1    1 44 1 9  5

Basically, you should just skip the first row, where there are the letters a-g every third column. Since the sub-column names are all the same, R will automatically append a grouping number after all of the columns after the third column; so we need to add a grouping number to the first three columns.

You can either then create an "id" variable, or, as I've done here, just use the row names for the IDs.

You can change the "time" variable to your "cell" variable as follows:

# Change the following to the number of levels you actually have
OUT$cell = factor(OUT$time, labels=letters[1:2])

Then, drop the "time" column:

OUT$time = NULL

Update

To answer a question in the comments below, if the first label was something other than a letter, this should still pose no problem. The sequence I would take would be as follows:

temp = read.csv("path/to/file.csv", skip=1, stringsAsFactors = FALSE)
GROUPS = read.csv("path/to/file.csv", header=FALSE, 
                  nrows=1, stringsAsFactors = FALSE)
GROUPS = GROUPS[!is.na(GROUPS)]
names(temp)[1:3] = paste0(names(temp[1:3]), ".0")
OUT = reshape(temp, direction="long", ids=rownames(temp), varying=1:ncol(temp))
OUT$cell = factor(temp$time, labels=GROUPS)
OUT$time = NULL
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!