I have numeric data in a vector and I'm trying to run kmeans on it. The following gives an error
> kmeans( mydata, centers = 2 ) # trying centers 2 to 20 but failing at 2 Error in do_one(nmeth) : NAs in foreign function call (arg 13) In addition: Warning message: In do_one(nmeth) : NAs introduced by coercion > str(mydata) num [1:44990687] 3.44e-06 3.44e-06 3.44e-06 3.44e-06 4.35e-05 ... > is.numeric(mydata) [1] TRUE
My code works for the datasets that are smaller than this one, so I suspect it may have something to do with the size of the data. Any ideas on how to fix the error? Thanks in advance.
Update: I've tried the following:
> x <- length(mydata) > kmeans( mydata[1:(x/2)], centers = 2 ) > kmeans( mydata[(x/2):x], centers = 2 )
Both calls to kmeans finish with no errors. So it does look like it has something to do with the size of the data and not the format/types. If that's the case, what should I do to be able to handle it? Thanks again.
Try using a previous version of R, like 2.15.3. That worked for me.
I had similar issues using the most current version as of this writing, v3.1.2 and it recreated the NA coercion issue.
I created a similar thread here: kmeans on 46 million elements coerces NA values
This has been a bug introduced (by me) into R 3.0.1 when I fixed another bug. http://bugs.r-project.org/bugzilla/show_bug.cgi?id=15364#c6
The bug has been fixed 10 minutes ago in R 3.2.0 alpha (to appear as R 3.2.0 in about two weeks). Note however that your nrow(x)
was already within a factor of <= 50 of the maximal 32 bit integer (2^31 - 1) which is a strict upper limit for the number of rows for the default kmeans()
algorithm in R, as it is currently using standard Fortran which does not allow larger matrix dimensions.
I have the same error and when I read the error log it says NaN which mean Not a Number. So I double check my data set and yes, there's a row that contains a String (words). I remove that String and it works perfectly.