Sort each row of character strings alphabetically in R

心不动则不痛 提交于 2021-02-10 21:36:07

问题


I've looked around and can't seem to find a decent way to solve this issue.

I have a column that has rows of names. I'd like to sort each row alphabetically so that I can later identify rows that have the same names just in different orders.

The data looks like this:

names <- c("John D., Josh C., Karl H.",
        "John D., Bob S., Tim H.",
        "Amy A., Art U., Wes T.",
        "Josh C., John D., Karl H.")

var1 <- rnorm(n = length(names), mean = 0, sd = 2)
var2 <- rnorm(n = length(names), mean = 20, sd = 5)

df <- data.frame(names, var1, var2)
df

                      names       var1     var2
1 John D., Josh C., Karl H. -0.3570142 15.58512
2   John D., Bob S., Tim H. -3.0022367 12.32608
3    Amy A., Art U., Wes T. -0.6900956 18.01553
4 Josh C., John D., Karl H. -2.0162847 16.04281

For example, row 4 would get sorted to look like row 1. Row 2 would get sorted as Bob, John, and Tim.

I've tried sort(df$names) but that just orders the names in all rows into alphabetical order.


回答1:


With dplyr, you can try:

df %>%
 rowwise() %>%
 mutate(names = paste(sort(unlist(strsplit(names, ", ", fixed = TRUE))), collapse = ", "))

  names                       var1  var2
  <chr>                      <dbl> <dbl>
1 John D., Josh C., Karl H. -0.226  19.9
2 Bob S., John D., Tim H.    0.424  24.8
3 Amy A., Art U., Wes T.     1.42   25.0
4 John D., Josh C., Karl H.  5.42   20.4

Sample data:

df <- data.frame(names, var1, var2,
                 stringsAsFactors = FALSE)



回答2:


In base R you could do this:

# Converting factor to character
df$names <- as.character(df$names)

# Splitting string on comma+space(s), sorting them in list, 
# and pasting them back together with a comma and a space
df$names <- sapply(lapply(strsplit(df$names, split = ",\\s*"), sort), paste, collapse = ", ")

df
                      names      var1     var2
1 John D., Josh C., Karl H. -2.285181 15.82278
2   Bob S., John D., Tim H.  2.797259 21.42946
3    Amy A., Art U., Wes T.  1.001353 17.30004
4 John D., Josh C., Karl H.  4.034996 24.86374



回答3:


Define a function Sort which scans in names splitting them into individual fields, sorts them and puts them back together. Then sapply it to the names. No packages are used.

Sort <- function(x) {
  s <- scan(text = as.character(x), what = "", sep = ",", 
    strip.white = TRUE, quiet = TRUE)
  toString(sort(s))
}
transform(df, names = sapply(names, Sort))

giving:

                      names      var1     var2
1 John D., Josh C., Karl H. -0.324619 28.02955
2   Bob S., John D., Tim H.  1.126112 14.21096
3    Amy A., Art U., Wes T.  3.295635 23.28294
4 John D., Josh C., Karl H. -1.546707 32.74496


来源:https://stackoverflow.com/questions/57258712/sort-each-row-of-character-strings-alphabetically-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!