R/dplyr: How to only keep integers in a data frame?

女生的网名这么多〃 提交于 2020-06-16 09:44:56

问题


I have a data frame that has years in it (data type chr):

Years:
5 yrs
10 yrs
20 yrs
4 yrs

I want to keep only the integers to get a data frame like this (data type num):

Years:
5
10
20
4

How do I do this in R?


回答1:


you need to extract the numbers and treat them as type numeric

df$year <- as.numeric(sub(" yrs", "", df$year))



回答2:


Per your additional requirements a more general purpose solution but it has limits too. The nice thing about the more complicated years3 solution is it deals more gracefully with unexpected but quite possible answers.

library(dplyr)
library(stringr)
library(purrr)

Years <- c("5 yrs",
           "10 yrs",
           "20 yrs",
           "4 yrs",
           "4-5 yrs",
           "75 to 100 YEARS old",
           ">1 yearsmispelled or whatever")
df <- data.frame(Years)

# just the numbers but loses the -5 in 4-5
df$Years1 <- as.numeric(sub("(\\d{1,4}).*", "\\1", df$Years)) 
#> Warning: NAs introduced by coercion

# just the numbers but loses the -5 in 4-5 using str_extract
df$Years2 <- str_extract(df$Years, "[0-9]+")

# a lot more needed to account for averaging

df$Years3 <- str_extract_all(df$Years, "[0-9]+") %>%
  purrr::map( ~ ifelse(length(.x) == 1, 
                as.numeric(.x), 
                mean(unlist(as.numeric(.x)))))

df
#>                           Years Years1 Years2 Years3
#> 1                         5 yrs      5      5      5
#> 2                        10 yrs     10     10     10
#> 3                        20 yrs     20     20     20
#> 4                         4 yrs      4      4      4
#> 5                       4-5 yrs      4      4    4.5
#> 6           75 to 100 YEARS old     75     75   87.5
#> 7 >1 yearsmispelled or whatever     NA      1      1



回答3:


Base R solution:

clean_years <- as.numeric(gsub("\\D", "", Years))

Data:

Years <- c("5 yrs",
               "10 yrs",
               "20 yrs",
               "4 yrs",
               "5 yrs")


来源:https://stackoverflow.com/questions/62177380/r-dplyr-how-to-only-keep-integers-in-a-data-frame

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!