Partition into classes: jenks vs kmeans

一曲冷凌霜 提交于 2019-11-30 18:53:37

To answer your original question:

What makes the Jenks algorithm so slow, and is there a faster way to run it?

Indeed, meanwhile there is a faster way to apply the Jenks algorithm, the setjenksBreaks function in the BAMMtools package.

However, be aware that you have to set the number of breaks differently, i.e. if you set the breaks to 5 in the the classIntervals function of the classInt package you have to set the breaks to 6 the setjenksBreaks function in the BAMMtools package to get the same results.

# Install and load library
install.packages("BAMMtools")
library(BAMMtools)

# Set up example data
my_n <- 100
set.seed(1)
x <- mapply(rnorm, n = my_n, mean = (1:5) * 5)

# Apply function
getJenksBreaks(x, 6)

The speed up is huge, i.e.

> microbenchmark( getJenksBreaks(x, 6, subset = NULL),  classIntervals(x, n = 5, style = "jenks"), unit="s", times=10)
Unit: seconds
                                      expr         min          lq        mean      median          uq         max neval cld
       getJenksBreaks(x, 6, subset = NULL) 0.002824861 0.003038748 0.003270575 0.003145692 0.003464058 0.004263771    10  a 
 classIntervals(x, n = 5, style = "jenks") 2.008109622 2.033353970 2.094278189 2.103680325 2.111840853 2.231148846    10   

From ?BAMMtools::getJenksBreaks

The Jenks natural breaks method was ported to C from code found in the classInt R package.

The two programs are the same; one is faster than the other because of their implementation (C vs R).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!