How to use apply function instead of for loop if you have multiple if conditions to be excecuted

可紊 提交于 2019-12-25 11:47:15

问题


1st DF:

t.d
  V1 V2 V3 V4
1  1  6 11 16
2  2  7 12 17
3  3  8 13 18
4  4  9 14 19
5  5 10 15 20


names(t.d) <- c("ID","A","B","C")

t.d$FinalTime <- c("7/30/2009 08:18:35","9/30/2009 19:18:35","11/30/2009 21:18:35","13/30/2009 20:18:35","15/30/2009 04:18:35")

t.d$InitTime <- c("6/30/2009 9:18:35","6/30/2009 9:18:35","6/30/2009 9:18:35","6/30/2009 9:18:35","6/30/2009 9:18:35")

>t.d
  ID  A  B  C           FinalTime          InitTime
1  1  6 11 16  7/30/2009 08:18:35 6/30/2009 9:18:35
2  2  7 12 17  9/30/2009 19:18:35 6/30/2009 9:18:35
3  3  8 13 18 11/30/2009 21:18:35 6/30/2009 9:18:35
4  4  9 14 19 13/30/2009 20:18:35 6/30/2009 9:18:35
5  5 10 15 20 15/30/2009 04:18:35 6/30/2009 9:18:35

2nd DF:

> s.d
   F  D  E                Time
1  10 19 28  6/30/2009 08:18:35
2  11 20 29  8/30/2009 19:18:35
3  12 21 30  9/30/2009 21:18:35
4  13 22 31 01/30/2009 20:18:35
5  14 23 32 10/30/2009 04:18:35
6  15 24 33 11/30/2009 04:18:35
7  16 25 34 12/30/2009 04:18:35
8  17 26 35 13/30/2009 04:18:35
9  18 27 36 15/30/2009 04:18:35

Output to be:

From DF "t.d" I have to calculate the time interval for each row between "FinalTime" and "InitTime" (InitTime will always be less than FinalTime).

Another DF "temp" from "s.d" has to be formed having data only within the above time interval, and then the most recent values of "F","D","E" have to be taken and attached to the 'ith' row of "t.d" from which the time interval was calculated.

Also we have to see if the newly formed DF "temp" has the following conditions true:

here 'j' represents value for each row:

if(temp$F[j] < 35.5) + (temp$D[j] >= 100) >= 1)
{
  temp$Flag <- 1
} else{
  temp$Flag <- 0
}

Originally I have 3 million rows in the dataframe and 20 columns in each DF.

I have solved the above problem using "for loop" but it obviously takes 2 to 3 days as there are a lot of rows.

(Also if I have to add new columns to the resultant DF if multiple conditions get satisfied on each row?)

Can anybody suggest a different technique? Like using apply functions?


回答1:


My suggestion is:

  • use lapply over row indices
  • handle in the function call your if branches
  • return either your dataframe or NULL
  • combine everything with rbind
  • by replacing lapply with mclapply from the 'parallel' package, your code gets executed in parallel.

    resultList <- lapply(1:nrow(t.d), function(i){
    do stuff
    if(condition){
        return(df)
    }else{
        return(NULL)
    }
    resultDF <- do.call(rbind, resultList)
    


来源:https://stackoverflow.com/questions/41934056/how-to-use-apply-function-instead-of-for-loop-if-you-have-multiple-if-conditions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!