Read csv with two headers into a data.frame

前端 未结 1 1999
轮回少年
轮回少年 2020-12-19 19:20

Apologies for the seemingly simple question, but I can\'t seem to find a solution to the following re-arrangement problem.

I\'m used to using read.csv

相关标签:
1条回答
  • 2020-12-19 20:07

    Use base R reshape():

    temp = read.delim(text="a,,,b,,
    x,y,z,x,y,z
    10,1,5,22,1,6
    12,2,6,21,3,5
    12,2,7,11,3,7
    13,1,4,33,2,8
    12,2,5,44,1,9", header=TRUE, skip=1, sep=",")
    names(temp)[1:3] = paste0(names(temp[1:3]), ".0")
    OUT = reshape(temp, direction="long", ids=rownames(temp), varying=1:ncol(temp))
    OUT
    #     time  x y z id
    # 1.0    0 10 1 5  1
    # 2.0    0 12 2 6  2
    # 3.0    0 12 2 7  3
    # 4.0    0 13 1 4  4
    # 5.0    0 12 2 5  5
    # 1.1    1 22 1 6  1
    # 2.1    1 21 3 5  2
    # 3.1    1 11 3 7  3
    # 4.1    1 33 2 8  4
    # 5.1    1 44 1 9  5
    

    Basically, you should just skip the first row, where there are the letters a-g every third column. Since the sub-column names are all the same, R will automatically append a grouping number after all of the columns after the third column; so we need to add a grouping number to the first three columns.

    You can either then create an "id" variable, or, as I've done here, just use the row names for the IDs.

    You can change the "time" variable to your "cell" variable as follows:

    # Change the following to the number of levels you actually have
    OUT$cell = factor(OUT$time, labels=letters[1:2])
    

    Then, drop the "time" column:

    OUT$time = NULL
    

    Update

    To answer a question in the comments below, if the first label was something other than a letter, this should still pose no problem. The sequence I would take would be as follows:

    temp = read.csv("path/to/file.csv", skip=1, stringsAsFactors = FALSE)
    GROUPS = read.csv("path/to/file.csv", header=FALSE, 
                      nrows=1, stringsAsFactors = FALSE)
    GROUPS = GROUPS[!is.na(GROUPS)]
    names(temp)[1:3] = paste0(names(temp[1:3]), ".0")
    OUT = reshape(temp, direction="long", ids=rownames(temp), varying=1:ncol(temp))
    OUT$cell = factor(temp$time, labels=GROUPS)
    OUT$time = NULL
    
    0 讨论(0)
提交回复
热议问题