How do I test for numeric values in a dataframe of characters, and convert those to numeric?

倾然丶 夕夏残阳落幕 提交于 2019-12-13 11:48:28

问题


I have a dataframe somewhat like the following:

> theDF
   ID        Ticker INDUSTRY_SECTOR              VAR             CVAR
1   1      USD CASH                                0                0
12  2      ZAR CASH                 -181412.82055904 -301731.22832191
23  3 BAT SJ EQUITY       Financial  61711.951234826 102641.162795691
34  4 HCI SJ EQUITY       Financial 1095.16002541256 1821.50290513369
45  5 PSG SJ EQUITY       Financial 16498.2192382422  27440.331617902

We can see these are all character columns:

> apply(theDF, 2, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 

I would like something that will change ONLY the numeric-type vectors to numeric. Basically, if it "looks like" a numeric, make it numeric, otherwise leave it be. I cannot find anything on StackOverflow which does not require knowing the names or columns you want to convert before hand. This DF will not always be in the same order, or have the columns, so I need some dynamic way to check if the columns "look like" numeric and make those columns numerics.

This (obviously) gives me a bunch of NA;s for the character columns:

> apply(theDF, 2, as.numeric)
     ID Ticker INDUSTRY_SECTOR        VAR        CVAR
[1,]  1     NA              NA       0.00       0.000
[2,]  2     NA              NA -181412.82 -301731.228
[3,]  3     NA              NA   61711.95  102641.163
[4,]  4     NA              NA    1095.16    1821.503
[5,]  5     NA              NA   16498.22   27440.332

I tried something like this, but not only does it not work, it seems horribly inefficient:

> apply(theDF, 2, function(x) tryCatch(as.numeric(x),error=function(e) e, warning=function(w) x))
     ID  Ticker          INDUSTRY_SECTOR VAR                CVAR              
[1,] "1" "USD CASH"      ""              "0"                "0"               
[2,] "2" "ZAR CASH"      ""              "-181412.82055904" "-301731.22832191"
[3,] "3" "BAT SJ EQUITY" "Financial"     "61711.951234826"  "102641.162795691"
[4,] "4" "HCI SJ EQUITY" "Financial"     "1095.16002541256" "1821.50290513369"
[5,] "5" "PSG SJ EQUITY" "Financial"     "16498.2192382422" "27440.331617902" 

Is there a better way to do this?

EDIT: People keep asking for this, so here goes...

> apply(theDF, 2, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> sapply(theDF, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> apply(theDF, 2, class)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> sapply(theDF, class)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 

回答1:


Looks like a job for type.convert().

theDF[] <- lapply(theDF, type.convert, as.is = TRUE)
## check the result
sapply(theDF, class)
#          ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
#   "integer"     "character"     "character"       "numeric"       "numeric" 

type.convert() coerces a vector to its "most appropriate" type. Setting as.is = TRUE allows us to keep characters as such, where they otherwise would be coerced to factors.

Update: For columns that are not character, they will need to be first coerced to character.

theDF[] <- lapply(theDF, function(x) type.convert(as.character(x), as.is = TRUE))


来源:https://stackoverflow.com/questions/35896290/how-do-i-test-for-numeric-values-in-a-dataframe-of-characters-and-convert-those

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!