问题
I have a data frame (data) in R with thousands of rows and 10 columns. 9 of the columns contain factors with several levels.
Here is a small portion of the data frame.
A gr1
10 303.90
11 304.1
12 303.6
13 303.90 obs
14 303.90k
As an example, one factor has a level that is "303.90" and another level that is "303.90 obs". I want to change the "303.90 obs" to "303.90". I am using the following command to edit the names of the level.
data[] = as.data.frame(lapply(data, function(x) {x = gsub("303.90 obs","303.90", fixed = T, x)}))
But this is not changing the level "303.90 obs" to "303.90". It just stays the same. Still this command works for other strings, eg. "303.9" gets changed to "303.90" when I use:
data[] = as.data.frame(lapply(data, function(x) {x = gsub("303.9 obs","303.90", fixed = T, x)}))
Any suggestions to why this might be ?
回答1:
I'm not that familiar with lapply therefore my solution simply loops over the columns of the dataframe. This works as it should.
col1 <- 1:10
col2 <- 21:30
col3 <- c("503.90", "303.90 obs", "803.90sfsdf sf", "203.90 obs", "303.90", "103.90 obs", "303.90", "403.90 obs", "803.90sfsdf sf", "303.90 obs")
col4 <- c("303.90", "303.90 obs", "303.90", "203.90 obs", "303.90", "107.40fghfg", "303.90", "303.90 obs", "303.90", "303.90 obs")
data <- data.frame(col1, col2, col3, col4)
data$col3 <- as.factor(data$col3)
data$col4 <- as.factor(data$col4)
for(i in 3:4) {
matchedExpression = regexpr(pattern = "\\d+\\.\\d+", text = data[,i])
data[,i] = regmatches(x = data[,i], m = matchedExpression)
data[,i] <- as.factor(data[,i])
}
EDIT
OP changed description. To change all factors to 303.90
regex is a better solution. However, more information are necessary from the OP to give a general solution e.g. is it only 303.90
which should be changed?
EDIT2
Updated the script since OP provided more information e.g. columns can have different factors than 303.90
.
来源:https://stackoverflow.com/questions/49431216/using-gsub-on-columns-in-r