How to fix spaces in column names of a data.frame (remove spaces, inject dots)?

后端 未结 12 1216
我寻月下人不归
我寻月下人不归 2020-12-07 15:28

After importing a file, I always try try to remove spaces from the column names to make referral to column names easier.

Is there a better way to do this other then

相关标签:
12条回答
  • 2020-12-07 16:05

    It's often convenient to change the names of your columns within one chunk of dplyr code rather than renaming the columns after you've created the data frame. Piping in rename_all() is very useful in these situations:

    ctm2 %>% rename_all(function(x) gsub(" ", "_", x))
    

    The code above will replace all spaces in every column name with an underscore.

    0 讨论(0)
  • 2020-12-07 16:06

    There is an easy way to remove spaces in column names in data.table. You will have to convert your data frame to data table.

    setnames(x=DT, old=names(DT), new=gsub(" ","",names(DT)))
    

    Country Code will be converted to CountryCode

    0 讨论(0)
  • 2020-12-07 16:09

    To replace only the first space in each column you could also do:

    names(ctm2) <- sub(" ", ".", names(ctm2))
    

    or to replace all spaces (which seems like it would be a little more useful):

    names(ctm2) <- gsub(" ", "_", names(ctm2))
    

    or, as mentioned in the first answer (though not in a way that would fix all spaces):

    spaceless <- function(x) {colnames(x) <- gsub(" ", "_", colnames(x));x}
    newDF <- spaceless(ctm2)
    

    where x is the name of your data.frame. I prefer to use "_" to avoid issues with "." as part of an ID.

    The point is that gsub doesn't stop at the first instance of a pattern match.

    0 讨论(0)
  • 2020-12-07 16:10

    Assign the names like this. This works best. It replaces all white spaces in the name with underscore.

    names(ctm2)<-gsub("\\s","_",names(ctm2))

    0 讨论(0)
  • 2020-12-07 16:13

    There exists more elegant and general solution for that purpose:

    tidy.name.vector <- make.names(name.vector, unique=TRUE)
    

    make.names() makes syntactically valid names out of character vectors. A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.

    Additionally, flag unique=TRUE allows you to avoid possible dublicates in new column names.

    As code to implement

    d<-read_delim(urltxt,delim='\t',)
    names(d)<-make.names(names(d),unique = TRUE)
    
    0 讨论(0)
  • 2020-12-07 16:15

    There is a very useful package for that, called janitor that makes cleaning up column names very simple. It removes all unique characters and replaces spaces with _.

    library(janitor)
    
    #can be done by simply
    ctm2 <- clean_names(ctm2)
    
    #or piping through `dplyr`
    ctm2 <- ctm2 %>%
            clean_names()
    
    0 讨论(0)
提交回复
热议问题