Manipulating multiple files in R

后端 未结 1 855
没有蜡笔的小新
没有蜡笔的小新 2020-12-13 11:41

I am new to R and am looking for a code to manipulate hundreds of files that I have at hand. They are .txt files with a few rows of unwanted text, followed by columns of dat

相关标签:
1条回答
  • 2020-12-13 12:10

    Alright - I think I hit on all your questions here, but let me know if I missed something. The general process that we will go through here is:

    1. Identify all of the files that we want to read in and process in our working directory
    2. Use lapply to iterate over each of those file names to create a single list object that contains all of the data
    3. Select your columns of interest
    4. Merge them together by the common column

    For the purposes of the example, consider I have four files named file1.txt through file4.txt that all look like this:

        x           y          y2
    1   1  2.44281173 -2.32777987
    2   2 -0.32999022 -0.60991623
    3   3  0.74954561  0.03761497
    4   4 -0.44374491 -1.65062852
    5   5  0.79140012  0.40717932
    6   6 -0.38517329 -0.64859906
    7   7  0.92959219 -1.27056731
    8   8  0.47004041  2.52418636
    9   9 -0.73437337  0.47071120
    10 10  0.48385902  1.37193941
    
    ##1. identify files to read in
    filesToProcess <- dir(pattern = "file.*\\.txt$")
    > filesToProcess
    [1] "file1.txt" "file2.txt" "file3.txt" "file4.txt"
    
    
    ##2. Iterate over each of those file names with lapply
    listOfFiles <- lapply(filesToProcess, function(x) read.table(x, header = TRUE))
    
    ##3. Select columns x and y2 from each of the objects in our list
    listOfFiles <- lapply(listOfFiles, function(z) z[c("x", "y2")])
    
    ##NOTE: you can combine steps 2 and 3 by passing in the colClasses parameter to read.table.
    #That code would be:
    listOfFiles <- lapply(filesToProcess, function(x) read.table(x, header = TRUE
      , colClasses = c("integer","NULL","numeric")))
    
    ##4. Merge all of the objects in the list together with Reduce. 
    # x is the common columns to join on
    out <- Reduce(function(x,y) {merge(x,y, by = "x")}, listOfFiles)
    #clean up the column names
    colnames(out) <- c("x", sub("\\.txt", "", filesToProcess))
    

    Results in the following:

    > out
        x       file1        file2       file3        file4
    1   1 -2.32777987 -0.671934857 -2.32777987 -0.671934857
    2   2 -0.60991623 -0.822505224 -0.60991623 -0.822505224
    3   3  0.03761497  0.049694686  0.03761497  0.049694686
    4   4 -1.65062852 -1.173863215 -1.65062852 -1.173863215
    5   5  0.40717932  1.189763270  0.40717932  1.189763270
    6   6 -0.64859906  0.610462808 -0.64859906  0.610462808
    7   7 -1.27056731  0.928107752 -1.27056731  0.928107752
    8   8  2.52418636 -0.856625895  2.52418636 -0.856625895
    9   9  0.47071120 -1.290480033  0.47071120 -1.290480033
    10 10  1.37193941 -0.235659079  1.37193941 -0.235659079
    
    0 讨论(0)
提交回复
热议问题