R: convert integers in a character vector (json) to multiple boolean columns

心不动则不痛 提交于 2019-12-11 17:35:34

问题


I actually have a data frame with 2000 rows (different days), each row contains a character ”vector” containing binary info on 30 different skills. If the skill has been used its number appear in the vector. But to simplify:
If I have a data frame with 3 observations (3 days) of 10 different skills -named "S_total":
S_total= [1,3,7,8,9,10], [5,9], [], and a variable Day= 1,2,3 I'd like to construct a dataframe with 3 rows and 12 columns
The columns being: Day,S_total,,s1,s,2,s3,s4,s5,s6,s7,s8,s9,s10 Where the numbered variables could be of the format true/false.

I have thought in the direction of as.numeric(read.csv) and then a for-loop containing cbind.
But there must be a better way ? tidy verse? I could hope for someone demonstrating: regular expression and the Map-command


回答1:


You can simply add a new column by either using dataFrame$newColumn or dataFrame[, "newColum]. Then you can use grepl to test if a skill is found in the vector dataFrame$S_total. e.g.

dataFrame[, "1"] <- grepl("1", dataFrame$S_total)

To get all different skills that occur in the dataset, you can split the character vectors into single numbers and then use unique. Then you can loop through all different skills and create one new column for each skill:

 > dataFrame <- data.frame(S_total = c(toString(c(1,3,7,8,11,20)),  toString(c(5,12)), ""),
    +                         Day = c(1,2,3),
    +                         stringsAsFactors = FALSE)
    > 
    > dataFrame
                 S_total Day
    1 1, 3, 7, 8, 11, 20   1
    2              5, 12   2
    3                      3
    > 
    > allSkill <- sort(unique(unlist(strsplit(dataFrame$S_total, ", "))))
    > for(i in allSkill){
    +   dataFrame[, i] <- grepl(i, dataFrame$S_total)
    + }
    > dataFrame
                 S_total Day     1    11    12    20     3     5     7     8
    1 1, 3, 7, 8, 11, 20   1  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
    2              5, 12   2  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
    3                      3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

If your dataset is not that large, this will do it. If you have a very large set and performance is important, you can first create empty columns and then loop through them which increases performance see.

No need to use map or any of the tidyverse packages in my opinion.




回答2:


Very cool solution, Just what I needed. I only needed to remove my brackets to get this to work. SO, imagining that my vector "S_total" had brackets, I'd have to:

S_total_nobracket <- gsub("\\[|\\]", "", S_total).

Thanks a mill, for your answer. It was just what I needed :-)



来源:https://stackoverflow.com/questions/47398026/r-convert-integers-in-a-character-vector-json-to-multiple-boolean-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!