R - Reading lines from a .txt-file after a specific line

后端 未结 3 1015
心在旅途
心在旅途 2020-12-10 16:55

I have a bunch of output .txt-files that consists of a large parameter list and a X-Y-coordinate set. I need to extract these coordinates from all files so that only those l

相关标签:
3条回答
  • 2020-12-10 17:09

    1) read.pattern read.pattern in gsubfn can be used to read only lines matching a specific pattern. In this example we match beginning of line, optional space(s), 1 or more digits, 1 or more spaces, an optional minus followed by 1 or more digits, optional space(s), end of line. The portions matching the parenthesized portions of the regexp are returned as columns in a data.frame. text = Lines in this self contained example can be replaced with "myfile.txt", say, if the data is coming from a file. Modify the pattern to suit.

    Lines <- "junk
    junk
    ##XYDATA= (X++(Y..Y))
    131071    -2065
    131070    -4137
    131069    -6408
    131068    -8043"
    
    library(gsubfn)
    DF <- read.pattern(text = Lines, pattern = "^ *(\\d+) +(-?\\d+) *$")
    

    giving:

    > DF
          V1    V2
    1 131071 -2065
    2 131070 -4137
    3 131069 -6408
    4 131068 -8043
    

    2) read twice Another possibility using only base R is simply to read it once to determine the value of skip= and a second time to do the actual read using that value. To read from a file myfile.txt replace text = Lines and textConnection(Lines) with "myfile.txt" .

    read.table(text = Lines, 
        skip = grep("##XYDATA=", readLines(textConnection(Lines))))
    

    Added Some revisions and added second approach.

    0 讨论(0)
  • 2020-12-10 17:25

    An possible approach could be the following:

         conn<-file("file.txt",open="rt")
         x<-TRUE
         while (x)
            {x<-!grepl("coordinatesXY",readLines(conn,n=1))}
         ret<-read.table(conn,...) #insert additional parameters to read.table
         close(conn)
    

    You read one line at the time from the input file and stop when you find the indicator string. Then you read the file through read.table. With this approach you don't store the entire file in memory, but just the piece you need.

    0 讨论(0)
  • 2020-12-10 17:28

    This looks like a job for data.table's fread

    library(data.table)
    impcoord <- fread("file.txt",skip="coordinatesXY")
    

    --edit--

    That is why it is good to give a reproducible example. That error means your file is causing trouble.

    The skip command matches the text you give it to the file to identify what line to start at, so you need to give it a unique string from the start of the line that you want it to start reading from. That function would work for something like this:

    ## some random text
    ## some more random text
    ## More random text
    table_heading1, table_heading2, table_heading3 ...etc
    value1, value2, value3 ... etc
    etc
    
    Just_The_Table <- fread("the_above_as_a_text_file.txt", skip="table_heading1", header=T)
    
    0 讨论(0)
提交回复
热议问题