Choose n most evenly spread points across point dataset in R

问题

Given a set of points, I am trying to select a subset of n points that are most evenly distributed across this set of points. In other words, I am trying to thin out the dataset while still evenly sampling across space.

So far, I have the following, but this approach likely won't do well with larger datasets. Maybe there is a more intelligent way to choose the subset of points in the first place... The following code randomly chooses a subset of the points, and seeks to minimize the distance between the points within this subset and the points outside of this subset.

Suggestions appreciated!

evenSubset <- function(xy, n) {

    bestdist <- NA
    bestSet <- NA
    alldist <- as.matrix(dist(xy))
    diag(alldist) <- NA
    alldist[upper.tri(alldist)] <- NA
    for (i in 1:1000){
        subset <- sample(1:nrow(xy),n)
        subdists <- alldist[subset,-subset]
        distsum <- sum(subdists,na.rm=T)
        if (distsum < bestdist | is.na(bestdist)) {
            bestdist <- distsum
            bestSet <- subset
        }
    }
    return(xy[bestSet,])
}

xy2 <- evenSubset(xy=cbind(rnorm(1000),rnorm(1000)), n=20)
plot(xy)
points(xy2,col='blue',cex=1.5,pch=20)

回答1:

Following @Spacedman's suggestion, I used voronoi tesselation to identify and drop those points that were closest to other points.

Here, the percentage of points to drop is given to the function. This appears to work quite well, except for the fact that it is slow with large datasets.

library(tripack)
voronoiFilter <- function(occ,drop) {
    n <- round(x=(nrow(occ) * drop),digits=0)
    subset <- occ
    dropped <- vector()
    for (i in 1:n) {
        v <- voronoi.mosaic(x=subset[,'Longitude'],y=subset[,'Latitude'],duplicate='error')
        info <- cells(v)
        areas <- unlist(lapply(info,function(x) x$area))
        smallest <- which(areas == min(areas,na.rm=TRUE))
        dropped <- c(dropped,which(paste(occ[,'Longitude'],occ[,'Latitude'],sep='_') == paste(subset[smallest,'Longitude'],subset[smallest,'Latitude'],sep='_')))
        subset <- subset[-smallest,]
    }
    return(occ[-dropped,])
}

xy <- cbind(rnorm(500),rnorm(500))
colnames(xy) <- c('Longitude','Latitude')
xy2 <- voronoiFilter(xy, drop=0.7)

plot(xy)
points(xy2,col='blue',cex=1.5,pch=20)

来源：https://stackoverflow.com/questions/22228946/choose-n-most-evenly-spread-points-across-point-dataset-in-r

标签

distance