Convert from lowercase to uppercase all values in all character variables in dataframe

前端 未结 7 747
萌比男神i
萌比男神i 2020-12-05 09:17

I have a mixed dataframe of character and numeric variables.

city,hs_cd,sl_no,col_01,col_02,col_03
Austin,1,2,,46,Female
Austin,1,3,,32,Male
Austin,1,4,,27,         


        
相关标签:
7条回答
  • 2020-12-05 09:41

    If you need to deal with data.frames that include factors you can use:

    df = data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],v4=as.factor(letters[1:5]),v5=runif(5),stringsAsFactors=FALSE)
    
    df
        v1 v2 v3 v4        v5
        1  a  1  j  a 0.1774909
        2  b  2  k  b 0.4405019
        3  c  3  l  c 0.7042878
        4  d  4  m  d 0.8829965
        5  e  5  n  e 0.9702505
    
    
    sapply(df,class)
             v1          v2          v3          v4          v5
    "character"   "integer" "character"    "factor"   "numeric"
    

    Use mutate_each_ to convert factors to character then convert all to uppercase

       upper_it = function(X){X %>% mutate_each_( funs(as.character(.)), names( .[sapply(., is.factor)] )) %>%
       mutate_each_( funs(toupper), names( .[sapply(., is.character)] ))}   # convert factor to character then uppercase
    

    Gives

      upper_it(df)
          v1 v2 v3 v4
        1  A  1  J  A
        2  B  2  K  B
        3  C  3  L  C
        4  D  4  M  D
        5  E  5  N  E
    

    While

    sapply( upper_it(df),class)
             v1          v2          v3          v4          v5
    "character"   "integer" "character" "character"   "numeric"
    
    0 讨论(0)
  • 2020-12-05 09:42

    From the dplyr package you can also use the mutate_all() function in combination with toupper(). This will affect both character and factor classes.

    library(dplyr)
    df <- mutate_all(df, funs=toupper)
    
    0 讨论(0)
  • 2020-12-05 09:48

    Alternatively, if you just want to convert one particular row to uppercase, use the code below:

    df[[1]] <- toupper(df[[1]])
    
    0 讨论(0)
  • 2020-12-05 09:51

    Starting with the following sample data :

    df <- data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],stringsAsFactors=FALSE)
    
      v1 v2 v3
    1  a  1  j
    2  b  2  k
    3  c  3  l
    4  d  4  m
    5  e  5  n
    

    You can use :

    data.frame(lapply(df, function(v) {
      if (is.character(v)) return(toupper(v))
      else return(v)
    }))
    

    Which gives :

      v1 v2 v3
    1  A  1  J
    2  B  2  K
    3  C  3  L
    4  D  4  M
    5  E  5  N
    
    0 讨论(0)
  • 2020-12-05 09:57

    A side comment here for those using any of these answers. Juba's answer is great, as it's very selective if your variables are either numberic or character strings. If however, you have a combination (e.g. a1, b1, a2, b2) etc. It will not convert the characters properly.

    As @Trenton Hoffman notes,

    library(dplyr)
    df <- mutate_each(df, funs(toupper))
    

    affects both character and factor classes and works for "mixed variables"; e.g. if your variable contains both a character and a numberic value (e.g. a1) both will be converted to a factor. Overall this isn't too much of a concern, but if you end up wanting match data.frames for example

    df3 <- df1[df1$v1 %in% df2$v1,]
    

    where df1 has been has been converted and df2 contains a non-converted data.frame or similar, this may cause some problems. The work around is that you briefly have to run

    df2 <- df2 %>% mutate_each(funs(toupper), v1)
    #or
    df2 <- df2 %>% mutate_each(df2, funs(toupper))
    #and then
    df3 <- df1[df1$v1 %in% df2$v1,]
    

    If you work with genomic data, this is when knowing this can come in handy.

    0 讨论(0)
  • 2020-12-05 10:00

    It simple with apply function in R

    f <- apply(f,2,toupper)
    

    No need to check if the column is character or any other type.

    0 讨论(0)
提交回复
热议问题