I would like to merge/combine two files, so that if an entry in column B of my first file falls into the range of columns B and C in my second file, the output will contain
Something like this should do the trick. You could probably make it more concise, but to elucidate all the steps I made it overly obvious.
NewMatrixCol1 <- c()
NewMatrixCol2 <- c()
NewMatrixCol3 <- c()
NewMatrixCol4 <- c()
NewMatrixCol5 <- c()
for (i in 1:length(file1$A)) {
for (j in 1:length(file2$A)) {
LowNumber <- file2$B[j]
HighNumber <- file2$C[j]
if (LowNumber <= file1$B[i] & file1$B[i] <= HighNumber) {
append(NewMatrixCol1, file1$A[i])
append(NewMatrixCol2, file1$B[i])
append(NewMatrixCol3, file2$A[j])
append(NewMatrixCol4, file2$B[j])
append(NewMatrixCol5, file2$C[j])
} else {}
}
}
dataframe <- data.frame(Col1 = NewMatrixCol1, Col2 = NewMatrixCol2, Col3 = NewMatrixCol3, Col4 = NewMatrixCol4, Col5 = NewMatrixCol5)
EDIT1: I misunderstood the question, and am now working on it.
EDIT2: This new solution should work as indicated.
EDIT3: There was a missing )
, as indicated by mfk534.
I see you've already accepted an answer, but here is another possible solution.
This function was just hacked together, but could be worked on some more to be made more generalized.
myfun = function(DATA1, DATA2, MATCH1, MIN, MAX) {
temp = sapply(1:nrow(DATA2),
function(x) DATA1[[MATCH1]] >= DATA2[[MIN]][x] &
DATA1[[MATCH1]] <= DATA2[[MAX]][x])
if (isTRUE(any(rowSums(temp) == 0))) {
temp1 = DATA1[-(which(rowSums(temp) == 0)), ]
}
OUT = cbind(temp1[order(temp1[[MATCH1]]), ],
DATA2[order(DATA2[[MIN]]), ], row.names=NULL)
condition = ((OUT[4] <= OUT[2] & OUT[2] <= OUT[5]) == 0)
if (isTRUE(any(condition))) {
OUT[-which(condition), ]
} else {
OUT
}
}
Here's what the function does:
data.frame
with the values in the second and third columns of the second data.frame
. FALSE
for both conditions, and removes them from the first data.frame
. data.frame
by the second column, and the second data.frame
by the "min" match column.Now, here is some sample data. A
and B
are the same as your provided data. X
and Y
have been changed for further testing purposes. In the merge between X
and Y
, there should be only one row.
A = read.table(header=TRUE, text="A B
rs10 23353
rs100 10000
rs234 54440")
B = read.table(header=TRUE, text="A B C
E235 20000 30000
E255 50000 60000")
X = A[c(3, 1, 2), ]
X[1, 2] = 57000
Y = B
Y[2, 3] = 55000
Here's how you would use the function and the output you would get.
myfun(A, B, 2, 2, 3)
# A B A B C
# 1 rs10 23353 E235 20000 30000
# 2 rs234 54440 E255 50000 60000
myfun(X, Y, 2, 2, 3)
# A B A B C
# 1 rs10 23353 E235 20000 30000
UPDATE: This question was more complicated than indicated here. The solution can be found here: Merge by Range in R - Applying Loops, and is delivered by using the GenomicRanges
package in Bioconductor. Very useful package!