I\'ve to calculate so called competition index for a couple of the experiments. I have known position of the object and its size. I\'d like to calculate the sum of the sizes
I use dplyr
and a join on exp
. Then summarise
for each (generated) id.
res <- df %>% mutate(id = row_number()) %>%
merge(df, by='exp') %>%
mutate(dist = sqrt((x.x - x.y)^2 + (y.x - y.y)^2)) %>%
filter(dist < 2 ) %>%
group_by(id,x.x,y.x,di.x) %>%
summarise(comp1 = sum(di.y),
dist = sum(dist))
results in :
Source: local data frame [2,000 x 6]
Groups: id, x.x, y.x [?]
id x.x y.x di.x comp1 dist
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 127.36166 89.64637 -0.2508979 -0.2508979 0.000000
2 2 90.98491 153.17911 1.4561061 1.4561061 0.000000
3 3 58.96620 144.72710 2.7909274 2.7909274 0.000000
4 4 162.44443 132.35379 3.0175213 3.0175213 0.000000
5 5 184.52673 47.12997 1.1127618 1.1127618 0.000000
6 6 57.07334 126.03554 -0.2508979 -0.2508979 0.000000
7 7 22.28946 110.69319 1.4561061 2.5688679 1.267998
8 8 40.54007 123.32645 2.7909274 2.7909274 0.000000
9 9 179.37667 61.45213 3.0175213 3.0175213 0.000000
10 10 73.82714 67.86194 1.1127618 1.1127618 0.000000
# ... with 1,990 more rows
PS: looking at the criterium if(dist < 2 & x$exp[i] == x$exp[j])
means only a few rows match the criterium of dist < 2.
Loops like this are a perfect candidate for speeding up with Rcpp. The logic translates across unchanged:
library(Rcpp)
cppFunction('
List
computeIndex(const NumericVector x,
const NumericVector y,
const NumericVector di,
const CharacterVector ex)
{
int n = x.size();
NumericVector comp1(n), dist(n);
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
double dx = x[j] - x[i], dy = y[j] - y[i];
double d = std::sqrt(dx*dx + dy*dy);
if((d < 2) && (ex[i] == ex[j]))
{
comp1[i] += di[j];
dist[i] += d;
}
}
}
return List::create(Named("comp1") = comp1,
Named("dist") = dist);
}
')
res <- data.frame(computeIndex(df$x, df$y, df$di, df$exp))
Not only is this faster than the equivalent R-only code, but it avoids having to
allocate any O(N^2) objects. You can also combine this with dplyr to avoid needless comparisons between rows with different exp
values:
df %>%
group_by(exp) %>%
do({
res <- computeIndex(.$x, .$y, .$di, .$exp)
data.frame(., res)
})