How to avoid a loop to calculate competition index

前端 未结 2 479
故里飘歌
故里飘歌 2020-12-20 08:56

I\'ve to calculate so called competition index for a couple of the experiments. I have known position of the object and its size. I\'d like to calculate the sum of the sizes

相关标签:
2条回答
  • 2020-12-20 09:17

    I use dplyr and a join on exp. Then summarise for each (generated) id.

    res <- df %>% mutate(id = row_number()) %>%
      merge(df, by='exp') %>% 
      mutate(dist = sqrt((x.x - x.y)^2 + (y.x - y.y)^2)) %>% 
      filter(dist < 2 ) %>%
      group_by(id,x.x,y.x,di.x) %>%
      summarise(comp1 = sum(di.y),
                          dist = sum(dist))
    

    results in :

    Source: local data frame [2,000 x 6]
    Groups: id, x.x, y.x [?]
    
          id       x.x       y.x       di.x      comp1     dist
       <int>     <dbl>     <dbl>      <dbl>      <dbl>    <dbl>
    1      1 127.36166  89.64637 -0.2508979 -0.2508979 0.000000
    2      2  90.98491 153.17911  1.4561061  1.4561061 0.000000
    3      3  58.96620 144.72710  2.7909274  2.7909274 0.000000
    4      4 162.44443 132.35379  3.0175213  3.0175213 0.000000
    5      5 184.52673  47.12997  1.1127618  1.1127618 0.000000
    6      6  57.07334 126.03554 -0.2508979 -0.2508979 0.000000
    7      7  22.28946 110.69319  1.4561061  2.5688679 1.267998
    8      8  40.54007 123.32645  2.7909274  2.7909274 0.000000
    9      9 179.37667  61.45213  3.0175213  3.0175213 0.000000
    10    10  73.82714  67.86194  1.1127618  1.1127618 0.000000
    # ... with 1,990 more rows
    

    PS: looking at the criterium if(dist < 2 & x$exp[i] == x$exp[j]) means only a few rows match the criterium of dist < 2.

    0 讨论(0)
  • 2020-12-20 09:21

    Loops like this are a perfect candidate for speeding up with Rcpp. The logic translates across unchanged:

    library(Rcpp)
    
    cppFunction('
    List
    computeIndex(const NumericVector x,
                 const NumericVector y, 
                 const NumericVector di,
                 const CharacterVector ex)
    {
        int n = x.size();
        NumericVector comp1(n), dist(n);
    
        for(int i = 0; i < n; ++i)
        {
            for(int j = 0; j < n; ++j)
            {
                double dx = x[j] - x[i], dy = y[j] - y[i];
                double d = std::sqrt(dx*dx + dy*dy);
    
                if((d < 2) && (ex[i] == ex[j]))
                {
                    comp1[i] += di[j];
                    dist[i] +=  d;
                }
            }
        }
    
        return List::create(Named("comp1") = comp1,
                            Named("dist") = dist);
    }
    ')
    
    res <- data.frame(computeIndex(df$x, df$y, df$di, df$exp))
    

    Not only is this faster than the equivalent R-only code, but it avoids having to allocate any O(N^2) objects. You can also combine this with dplyr to avoid needless comparisons between rows with different exp values:

    df %>%
        group_by(exp) %>%
        do({
            res <- computeIndex(.$x, .$y, .$di, .$exp)
            data.frame(., res)
        })
    
    0 讨论(0)
提交回复
热议问题