Removing Whitespace From a Whole Data Frame in R

后端 未结 10 615
孤街浪徒
孤街浪徒 2020-12-05 03:01

I\'ve been trying to remove the white space that I have in a data frame (using R). The data frame is large (>1gb) and has multiple columns that contains whi

10条回答
  •  北荒
    北荒 (楼主)
    2020-12-05 03:32

    If i understood you correctly then you want to remove all the white spaces from entire data frame, i guess the code which you are using is good for removing spaces in the column names.I think you should try this:

     apply(myData,2,function(x)gsub('\\s+', '',x))
    

    Hope this works.

    This will return a matrix however, if you want to change it to data frame then do:

    as.data.frame(apply(myData,2,function(x)gsub('\\s+', '',x)))
    

    EDIT In 2020:

    Using lapply and trimws function with both=TRUE can remove leading and trailing spaces but not inside it.Since there was no input data provided by OP, I am adding a dummy example to produce the results.

    DATA:

    df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)
    

    #situation: 1 (Using Base R), when we want to remove spaces only at the leading and trailing ends NOT inside the string values, we can use trimws

    cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
    df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)
    

    # situation: 2 (Using Base R) , when we want to remove spaces at every place in the dataframe in character columns (inside of a string as well as at the leading and trailing ends).

    (This was the initial solution proposed using apply, please note a solution using apply seems to work but would be very slow, also the with the question its apparently not very clear if OP really wanted to remove leading/trailing blank or every blank in the data)

    cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
    df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\\s+','',x))
    

    ## situation: 1 (Using data.table, removing only leading and trailing blanks)

    library(data.table)
    setDT(df)
    cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
    df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]
    

    Output from situation1:

        val val1 num num1
    1:  abc  klm   1    2
    2: kl m gdfs   2    3
    3: dfsd  123   3    4
    

    ## situation: 2 (Using data.table, removing every blank inside as well as leading/trailing blanks)

    cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
    df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\\s+', '', x)), .SDcols = cols_to_be_rectified]
    

    Output from situation2:

        val val1 num num1
    1:  abc  klm   1    2
    2:  klm gdfs   2    3
    3: dfsd  123   3    4
    

    Note the difference between the outputs of both situation, In row number 2: you can see that, with trimws we can remove leading and trailing blanks, but with regex solution we are able to remove every blank(s).

    I hope this helps , Thanks

提交回复
热议问题