Combining tab delim files into a single file using R

烈酒焚心 提交于 2019-12-19 03:42:14

问题


I have several txt files with 3 columns in each files like this: file 1:

ProbeID X_Signal_intensity X_P-Value   
xxx         2.34          .89
xxx         6.45          .04 
xxx         1.09          .91  
xxx         5.87          .70
.            .            . 
.            .            .
.            .            .     

file 2:

ProbeID Y_Signal_intensity Y_P-Value   
xxx         1.4             .92
xxx         2.55            .14 
xxx         4.19            .16  
xxx         3.47            .80
.            .               . 
.            .               .
.            .               . 

file 3:

ProbeID Z_Signal_intensity Z_P-Value   
xxx         9.40             .82
xxx         1.55            .04 
xxx         3.19            .56  
xxx         2.47            .90
.            .               . 
.            .               .
.            .               . 

In all the above files the values of ProbeID column are identical but not the other columns.Now I want to combine the all the above files using a for-loop into a single file like this:

ProbeID X_intensity X_P-Value   Y_intensity Y_P-Value   Z_intensity Z_P-Value     
xxx      2.34          .89       1.4             .92     9.40            .82
xxx      6.45          .04       2.55            .14     1.55            .04
xxx      1.09          .91       4.19            .16     3.19            .56
xxx      5.87          .70       3.47            .80     2.47            .90

Please do help me.


回答1:


My approach is to read the files into data.frames

see help(read.delim) for reading modes.

After you have your three data.frames you can use

total <- merge(dataframeA,dataframeB,by="ProbeID")

look here http://www.statmethods.net/management/merging.html for documentation.




回答2:


Read in the files as given by Richie Cotton, but make sure you add the appropriate extra arguments in the apply call. For one, header=TRUE should probably be added.

file.names <- c("file X.txt", "file Y.txt", "file Z.txt")
file.list <- lapply(file.names, read.table, header=TRUE)

Then you'll probably need a merge_recurse from the reshape package :

require(reshape)
mynewframe <- merge_recurse(file.list,all.x=TRUE,all.y=TRUE,by="ProbeID")

This will work for any given number of dataframes, provided it's not a billion of them. For more information on the arguments used, see the help page of ?merge.

CORRECTION : in merge_recurse, you have to use all.x and all.y as shown in the correction above. You can't just use the shortcut all or you'll get errors.

Small demonstration :

X2 <- data.frame(ProbeID=(2:4),Z2=4:6)
X1 <- data.frame(ProbeID=1:3,Z1=1:3)
X3 <- data.frame(ProbeID=1:3,Z3=7:9)
file.list <- list(X1,X2,X3)
mynewframe <- merge_recurse(file.list,all.x=TRUE,all.y=TRUE,by="ProbeID")
> mynewframe
  ProbeID Z1 Z2 Z3
1       1  1 NA  7
2       2  2  4  8
3       3  3  5  9
4       4 NA  6 NA



回答3:


Read in your files

filenames <- c("file X.txt", "file Y.txt", "file Z.txt")
data_list <- lapply(filenames, read.table)

Combine them into one big data frame

all_data <- do.call(cbind, data_list)

all_data <- do.call(merge, data_list, by = "ProbeID")

This gives a good lesson to "always concentrate when providing an answer". cbind isn't smart enough to do ID matching, and merge isn't smart enough to handle more than two data frames. Take a look at Joris's answer and use merge_recurse instead. Or forget what you thought you wanted and use my other answer below.


Actually, a better idea, rather than having many columns would be to have just 4 columns: ProbeID, Signal_intensity, P_value and Source_file.

data_list <- lapply(data_list, function(x) {
  colnames(x) <- c("ProbeID", "Signal_intensity", "P_value")
  x
})

all_data <- do.call(rbind, data_list)
all_data$Source_file <- rep(filenames, times = sapply(data_list, nrow))



回答4:


I am going to throw another approach into the mix which uses Reduce

Reduce(function(...) merge(..., all = T), file.list)


来源:https://stackoverflow.com/questions/6942662/combining-tab-delim-files-into-a-single-file-using-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!