Reading two-line headers in R

前端 未结 5 747
梦如初夏
梦如初夏 2020-11-29 05:22

What is the best way to read a file into R when the header has two necessary lines for the header?

This happens to me all the time, as people often use one line for

相关标签:
5条回答
  • 2020-11-29 05:40

    Almost the same method to the other answers, just shortening to 2 statements:

    dat <- "trt   biomass    yield
    crop    Mg/ha    bu/ac
    C2      17.76   205.92
    C2      17.96   207.86
    CC      17.72   197.22
    CC      18.42   205.20
    CCW     18.15   200.51
    CCW     17.45   190.59
    P       3.09    0.00
    P       3.34    0.00
    S2      5.13    49.68
    S2      5.36    49.72"
    
    header <- sapply(read.table(text=dat, nrow=2), paste, collapse="_")
    result <- read.table(text=dat, skip=2, col.names=header)
    

    Result:

    > head(result,2)
      trt_crop biomass_Mg/ha yield_bu/ac
    1       C2         17.76      205.92
    2       C2         17.96      207.86
    ...
    
    0 讨论(0)
  • 2020-11-29 05:42

    Use readLines with 2 for the limit, parse it, paste0 them together, then read in with read.table with skip =2 and header=FALSE (the default). Finish the process off with assignment of the column names:

    dat <- "trt biomass yield
     crop   Mg/ha   bu/ac
     C2 17.76   205.92
     C2 17.96   207.86
     CC 17.72   197.22
     CC 18.42   205.20
     CCW    18.15   200.51
     CCW    17.45   190.59
     P  3.09    0.00
     P  3.34    0.00
     S2 5.13    49.68
     S2 5.36    49.72
     "
    

    You would probably use a file argument but using the text argument to the read-functions makes this more self-contained:

     readLines(textConnection(dat),n=2)
    #[1] "trt\tbiomass\tyield" "crop\tMg/ha\tbu/ac" 
     head2 <- read.table(text=readLines(textConnection(dat),n=2), sep="\t", stringsAsFactors=FALSE)
     with(head2, paste0(head2[1,],head2[2,]) )
    # [1] "trtcrop"      "biomassMg/ha" "yieldbu/ac"  
     joinheadrs <- with(head2, paste0(head2[1,],head2[2,]) )
    
    newdat <- read.table(text=dat, sep="\t",skip=2)
    colnames(newdat)<- joinheadrs
    #-------------------
    > newdat
       trtcrop biomassMg/ha yieldbu/ac
    1       C2        17.76     205.92
    2       C2        17.96     207.86
    3       CC        17.72     197.22
    4       CC        18.42     205.20
    5      CCW        18.15     200.51
    6      CCW        17.45     190.59
    7        P         3.09       0.00
    8        P         3.34       0.00
    9       S2         5.13      49.68
    10      S2         5.36      49.72
    

    Might be better to use paste with an underscore-sep:

    joinheadrs <- with(head2, paste(head2[1,],head2[2,] ,sep="_")  )
    joinheadrs
    #[1] "trt_crop"      "biomass_Mg/ha" "yield_bu/ac"  
    
    0 讨论(0)
  • 2020-11-29 05:45

    A slightly different explained step by step approach:

    1. Read only the first two lines of the files as data (without headers):

      headers <- read.table("data.txt", nrows=2, header=FALSE)
      
    2. Create the headers names with the two (or more) first rows, sappy allows to make operations over the columns (in this case paste) - read more about sapply here :

      headers_names <- sapply(headers,paste,collapse="_")
      
    3. Read the data of the files (skipping the first 2 rows):

      data <- read.csv(file="data.txt", skip = 2, header=FALSE)
      
    4. And assign the headers of step two to the data:

      names(data) <- headers_names
      

    The advantage is that you would have clear control of the the parameters of read.table (such as sep for commas, and stringAsFactors - for both the headers and the data)

    0 讨论(0)
  • 2020-11-29 05:59

    I would do two steps, assuming we know that the first row contains the labels, and there are always two headers.

    header <- scan("file.txt", nlines = 1, what = character())
    data <- read.table("file.txt", skip = 2, header = FALSE)
    

    Then add the character vector header on as the names component:

    names(data) <- header
    

    For your data this would be

    header <- scan("data.txt", nlines = 1, what = character())
    data <- read.table("data.txt", skip = 2, header = FALSE)
    names(data) <- header
    
    head(data)
    
    >     head(data)
      trt biomass  yield
    1  C2   17.76 205.92
    2  C2   17.96 207.86
    3  CC   17.72 197.22
    4  CC   18.42 205.20
    5 CCW   18.15 200.51
    6 CCW   17.45 190.59
    

    If you want the units, as per @DWin's answer, then do a second scan() on line 2

    header2 <- scan("data.txt", skip = 1, nlines = 1, what = character())
    names(data) <- paste0(header, header2)
    
    > head(data)
      trtcrop biomassMg/ha yieldbu/ac
    1      C2        17.76     205.92
    2      C2        17.96     207.86
    3      CC        17.72     197.22
    4      CC        18.42     205.20
    5     CCW        18.15     200.51
    6     CCW        17.45     190.59
    
    0 讨论(0)
  • 2020-11-29 06:00

    Here is a function to read in headers over multiple lines, largely based on Gavin Simpson's excellent answer.

    The function defaults to comma separated values and two lines of header, and returns a data.frame with the first line in the file as header.

    The function:

    read.multi.line.header <- function(path, header.lines = 2, sep = ","){
    
      header <- scan(path, nlines = 1, what = character(), sep = sep)
    
      data <- read.table(path, skip = header.lines, header = FALSE, sep = sep)
    
      base::names(data) <- header
    
      return(data)
    }
    

    Produces:

    mydata <- read.multi.line.header(path = "data.txt")
    
    > head(mydata)
          trt      biomass      yield
    1      C2        17.76     205.92
    2      C2        17.96     207.86
    3      CC        17.72     197.22
    4      CC        18.42     205.20
    5     CCW        18.15     200.51
    6     CCW        17.45     190.59
    
    0 讨论(0)
提交回复
热议问题