match two columns with two other columns

后端 未结 3 1076
既然无缘
既然无缘 2020-12-10 13:47

I have several rows of data (tab separated). I want to find the row which matches elements from two columns (3rd & 4th) in each row with two other colum

3条回答
  •  死守一世寂寞
    2020-12-10 14:51

    The below function compare takes advantage of R´s capability for fast sorting. Function arguments a and b are matrices; rows in a are screend for matching rows in b for any number of columns. In case column order is irrelevant, set row_order=TRUE to have the row entries sorted in increasing order. Guess the function should work as well with dataframes and character / factors columns, as well as duplicate entries in a and/or b. Despite using the for & while it´s relatively quick in returning the first row match in b for each row of a (or 0, if no match is found).

    compare<-function(a,b,row_order=TRUE){
    
        len1<-dim(a)[1]
        len2<-dim(b)[1]
        if(row_order){
            a<-t(apply(t(a), 2, sort))
            b<-t(apply(t(b), 2, sort))
        }
        ord1<-do.call(order, as.data.frame(a))
        ord2<-do.call(order, as.data.frame(b))
        a<-a[ord1,]
        b<-b[ord2,] 
        found<-rep(0,len1)  
        dims<-dim(a)[2]
        do_dims<-c(1:dim(a)[2])
        at<-1
        for(i in 1:len1){
            for(m in do_dims){
                while(b[at,m]len2){break}              
                }
                if(at>len2){break}
                if(b[at,m]>a[i,m]){break}
                if(m==dims){found[i]<-at}
            }
            if(at>len2){break}
        }
        return(found[order(ord1)]) # indicates the first match of a found in b and zero otherwise
    
    }
    
    
    # example data sets:
    a <- matrix(sample.int(1E4,size = 1E4, replace = T), ncol = 4)
    b <- matrix(sample.int(1E4,size = 1E4, replace = T), ncol = 4)
    b <- rbind(a,b) # example of b containing a
    
    
    # run the function
    found<-compare(a,b,row_order=TRUE)
    # check
    all(found>0) 
    # rows in a not contained in b (none in this example):
    a[found==0,]
    

提交回复
热议问题