How to replace second or more occurrences of a dot from a column name

问题

Folks, how can I replace second occurrence of a dot from column names?

Sample data:

age.range.abc = sample(c("ar2-15", "ar16-29", "ar30-44"), 200, replace = TRUE)
gender.region.q = sample(c("M", "F"), 200, replace = TRUE)
region_g.a = sample(c("A", "B", "C"), 200, replace = TRUE)
physi = sample(c("Poor", "Average", "Good"), 200, replace = TRUE)
survey = data.frame(age.range.abc, gender.region.q, region_g.a,physi)
head(survey)

I tried this but it removes all dots with underscore. I want to replace only second or more occurrences with underscore.

names(survey) = gsub("\\.", "_", names(survey))
names(survey)
# [1] "age_range_abc"   "gender_region_q" "region_g_a"      "physi"

Thanks, J

回答1:

In the spirit of your original code:

names(survey) = sub("(\\..*?)\\.", "\\1_", names(survey))
names(survey)
[1] "age.range_abc"   "gender.region_q" "region_g.a"      "physi"

A little extra detail in case it helps.

\\. matches the first .
.*? The . matches any character. .* matches zero or more instances of any character. But the matching is greedy; it would match as much as possible. I want matching that is not greedy (only up until the second .) so I added ? to suppress the greedy match and .*? matches any group of characters up until we hit the next thing in the regex which is ...
another \\. to match the second ..
Because the first part was enclosed in parentheses (\\..*?) it is stored as \1, so the substitution pattern \\1_ restores everything before the second . and the second . is replaced with the _ .

回答2:

One option is strsplit

names(survey) <- sapply(strsplit(names(survey), "[.]"), function(x) 
    if(length(x) >1) paste(x[1], paste(x[-1], collapse="_"), sep=".") else x)
names(survey)
#[1] "age.range_abc"   "gender.region_q" "region_g.a"      "physi"

来源：https://stackoverflow.com/questions/43077846/how-to-replace-second-or-more-occurrences-of-a-dot-from-a-column-name

标签

regex

gsub