Removing Whitespace From a Whole Data Frame in R

后端未结

关注

 10  602

I\'ve been trying to remove the white space that I have in a data frame (using R). The data frame is large (>1gb) and has multiple columns that contains whi

相关标签:

10条回答

被撕碎了的回忆

2020-12-05 03:55
A lot of the answers are older, so here in 2019 is a simple dplyr solution that will operate only on the character columns to remove trailing and leading whitespace.
```
library(dplyr)
library(stringr)

data %>%
  mutate_if(is.character, str_trim)

## ===== 2020 edit for dplyr (>= 1.0.0) =====
df %>% 
  mutate(across(where(is.character), str_trim))
```
You can switch out the str_trim() function for other ones if you want a different flavor of whitespace removal.
```
# for example, remove all spaces
df %>% 
  mutate(across(where(is.character), str_remove_all, pattern = fixed(" ")))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2020-12-05 03:55
If you're dealing with large data sets like this, you could really benefit form the speed of data.table.
```
library(data.table)

setDT(df)

for (j in names(df)) set(df, j = j, value = df[[trimws(j)]]) 
```
I would expect this to be the fastest solution. This line of code uses the set operator of data.table, which loops over columns really fast. There is a nice explanation here: Fast looping with set.
0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-12-05 03:57
If you want to maintain the variable classes in your data.frame - you should know that using apply will clobber them because it outputs a matrix where all variables are converted to either character or numeric. Building upon the code of Fremzy and Anthony Simon Mielniczuk you can loop through the columns of your data.frame and trim the white space off only columns of class factor or character (and maintain your data classes):
```
for (i in names(mydata)) {
  if(class(mydata[, i]) %in% c("factor", "character")){
    mydata[, i] <- trimws(mydata[, i])
  }
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2020-12-05 03:58
Picking up on Fremzy and the comment from Stamper, this is now my handy routine for cleaning up whitespace in data:
```
df <- data.frame(lapply(df, trimws), stringsAsFactors = FALSE)
```
As others have noted this changes all types to character. In my work, I first determine the types available in the original and conversions required. After trimming, I re-apply the types needed.

If your original types are OK, apply the solution from MarkusN below https://stackoverflow.com/a/37815274/2200542

Those working with Excel files may wish to explore the readxl package which defaults to trim_ws = TRUE when reading.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2