Error Package KlaR kmodes : Error: Column index must be at most 5 if positive, not 6

筅森魡賤 提交于 2019-12-20 03:52:12

问题


Applying the klaR kmodes algorith to the below dataset

> summary(raw)
    CREDIT_LIMIT         CP        gender     IE_CHILD_NB IE_TOT_DEP_NB    TOTAL_INCOME   IE_HOUSE_CHARGE  maritial    
 >2000    :  612   11500  :  145   MM: 5435   0:7432      0:1446        >2000    :3524   >2000    :    2   D   : 1195  
 0-500    :10458   11100  :   90   MR:12983   1:4119      1:3748        0-500    :1503   0-500    :17146   M   :10507  
 1000-1500: 2912   08830  :   71              2:5787      2:3386        1000-1500:6649   1000-1500:   44   MISS: 1446  
 1500-2000: 2254   11406  :   68              3: 947      3:3740        1500-2000:4116   1500-2000:    5   Ot  : 1043  
 500-1000 : 2182   35018  :   66              4: 133      4:6098        500-1000 :2626   500-1000 : 1221   S   : 4227  
                   11510  :   62                                                                                       
                   (Other):17916                                                                                       
  new_age      job_age     
 >70  : 295   0-20 :14627  
 0-30 : 815   20-30: 1986  
 30-40:4867   30-40:  612  
 40-50:7293   40-50:  124  
 50-60:3883   50-60: 1069  
 60-70:1265              

I get the following error

> cluster.results <-kmodes(data=raw, modes=4, iter.max = 10, weighted=FALSE )
Error: Column index must be at most 5 if positive, not 6

Any idea about what is the error about?

Bests


回答1:


Partial answer for anyone searching about that error: the error means that somewhere an object is being called to return elements outside it's range, such as more columns than exist, e.g.:

> aa <- tibble(bb = c(1,2))
> aa
# A tibble: 2 x 1
     bb
  <dbl>
1  1.00
2  2.00
> aa[,2]
Error: Column index must be at most 1 if positive, not 2

I'm not sure of the source of the error exactly in this case, it doesn't occur with lists and data frames (dfs return undefined columns selected, and lists return NULL), and I don't use that package.




回答2:


I experienced the same problem when trying to use kmodes to cluster the following cateforical dataframe:

 > summary(raw_df)
  Age       Years_At_Present_Employment Marital_Status_Gender Dependents Housing       Job      
  (0,20] :  80   A71: 310                    A91: 250              1:4225     A151: 895   A171: 110  
  (20,30]:1975   A72: 860                    A92:1550              2: 775     A152:3565   A172:1000  
  (30,45]:2015   A73:1695                    A93:2740                         A153: 540   A173:3150  
  (45,60]: 705   A74: 870                    A94: 460                                     A174: 740  
  (60,75]: 225   A75:1265                                                                            

  Foreign_Worker Current_Address_Yrs Telephone  
  A201:4815      Min.   :1.000       A191:2980  
  A202: 185      1st Qu.:2.000       A192:2020  
                 Median :3.000                  
                 Mean   :2.845                  
                 3rd Qu.:4.000                  
                 Max.   :4.000  

Then I got the error

 > (raw_clusters <- klaR::kmodes(raw_df, 5))
 Error: Column index must be at most 4 if positive, not 6

It seems that this implementation of kmodes (klaR) requires that the categorical variables need to be numerical, so you need to convert the variables from factors into numerical (keeping in mind that they are really categorical)

raw_4clust <- raw_df %>% 
                       mutate(
                          Age = as.numeric(Age),
                          Years_At_Present_Employment = as.numeric(Years_At_Present_Employment),
                          Marital_Status_Gender = as.numeric(Marital_Status_Gender),
                          Housing = as.numeric(Housing),
                          Job = as.numeric(Job),
                          Foreign_Worker = as.numeric(Foreign_Worker),
                          Telephone = as.numeric(Telephone)
                                   )

after that it worked for me.

Hope that helps




回答3:


In my case, i have used dplyr for doing data transformation. so what I did was converting my object to data frame:

tmp = as.data.frame(tmp)

And my problem solved.



来源:https://stackoverflow.com/questions/50025225/error-package-klar-kmodes-error-column-index-must-be-at-most-5-if-positive-n

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!