问题
I'm creating a shiny app in which the user will upload a .csv file that contains several variables. Using dplyr, I will select the first four variables, shown below, and convert them from long format.
DATA
df <- read.table(text = c("
Customer Rate Factor Power
W1 6 TK1 5
W2 3 TK1 0
W3 1 TK1 0
W4 2 TK1 0
W5 4 TK1 0
W6 8 TK1 0
W7 5 TK1 0
W8 7 TK1 3
W1 6 TK2 0
W2 3 TK2 1
W3 1 TK2 0
W4 2 TK2 5
W5 4 TK2 0
W6 8 TK2 0
W7 5 TK2 0
W8 7 TK2 3
W1 6 TK3 0
W2 3 TK3 5
W3 1 TK3 1
W4 2 TK3 0
W5 4 TK3 0
W6 8 TK3 0
W7 5 TK3 0
W8 7 TK3 0
W1 6 TK4 0
W2 3 TK4 3
W3 1 TK4 0
W4 2 TK4 0
W5 4 TK4 0
W6 8 TK4 0
W7 5 TK4 0
W8 7 TK4 0
W1 6 TK5 1
W2 3 TK5 0
W3 1 TK5 5
W4 2 TK5 0
W5 4 TK5 1
W6 8 TK5 0
W7 5 TK5 0
W8 7 TK5 0
W1 6 TK6 0
W2 3 TK6 0
W3 1 TK6 0
W4 2 TK6 0
W5 4 TK6 0
W6 8 TK6 0
W7 5 TK6 5
W8 7 TK6 0
W1 6 TK7 0
W2 3 TK7 0
W3 1 TK7 0
W4 2 TK7 0
W5 4 TK7 0
W6 8 TK7 3
W7 5 TK7 3
W8 7 TK7 0
W1 6 TK8 0
W2 3 TK8 0
W3 1 TK8 1
W4 2 TK8 0
W5 4 TK8 0
W6 8 TK8 3
W7 5 TK8 0
W8 7 TK8 0
W1 6 TK9 0
W2 3 TK9 0
W3 1 TK9 0
W4 2 TK9 0
W5 4 TK9 5
W6 8 TK9 0
W7 5 TK9 0
W8 7 TK9 0
W1 6 TK10 0
W2 3 TK10 0
W3 1 TK10 0
W4 2 TK10 0
W5 4 TK10 0
W6 8 TK10 5
W7 5 TK10 0
W8 7 TK10 0
W1 6 TK11 0
W2 3 TK11 0
W3 1 TK11 0
W4 2 TK11 0
W5 4 TK11 0
W6 8 TK11 0
W7 5 TK11 0
W8 7 TK11 3
W1 6 TK12 0
W2 3 TK12 0
W3 1 TK12 0
W4 2 TK12 0
W5 4 TK12 0
W6 8 TK12 0
W7 5 TK12 0
W8 7 TK12 5"), header = T)
I used the code below to convert from long to wide format
LONG TO WIDE
library(dplyr)
library(tidyr)
df_wide <- df %>%
tidyr::spread(Factor, Power)
RESULT
> df_wide
Customer Rate TK1 TK10 TK11 TK12 TK2 TK3 TK4 TK5 TK6 TK7 TK8 TK9
1 W1 6 5 0 0 0 0 0 0 1 0 0 0 0
2 W2 3 0 0 0 0 1 5 3 0 0 0 0 0
3 W3 1 0 0 0 0 0 1 0 5 0 0 1 0
4 W4 2 0 0 0 0 5 0 0 0 0 0 0 0
5 W5 4 0 0 0 0 0 0 0 1 0 0 0 5
6 W6 8 0 5 0 0 0 0 0 0 0 3 3 0
7 W7 5 0 0 0 0 0 0 0 0 5 3 0 0
8 W8 7 3 0 3 5 3 0 0 0 0 0 0 0
The wide format is showing the levels of Factor variable as TK1 and then TK10
> levels(df$Factor)
[1] "TK1" "TK10" "TK11" "TK12" "TK2" "TK3" "TK4" "TK5" "TK6" "TK7" "TK8" "TK9"
I want the levels of Factor to be from TK1, TK2 till TK12
I can solve as below
df$Factor <- factor(df$Factor, levels = c("TK1", "TK2" , "TK3" , "TK4", "TK5" , "TK6" , "TK7" , "TK8" , "TK9", "TK10", "TK11", "TK12"))
However, the levels of Factor variable will be a function of the user's input. It might be 14, 15 or 20.
QUESTION
Is there any way to arrange the levels of Factor column from lowest to highest regardless of the user's input?
回答1:
We can change it to factor with levels specified
df %>%
mutate(Factor = factor(Factor, levels = paste0("TK", 1:12))) %>%
spread(Factor, Power)
Or make it more dynamic, we extract the non-numeric and numeric part into separate columns ('Factor1', 'Factor2'), change the 'Factor' to factor with levels specified by pasteing the sequence of min to max values in 'Factor2' with that of the first character value in 'Factor1', remove the 'Factor1' and 'Factor2', and spread.
library(tidyr)
res <- df %>%
extract(Factor, into = c("Factor1", "Factor2"), "(\\D+)(\\d+)",
remove = FALSE, convert=TRUE) %>%
mutate(Factor = factor(Factor, levels = paste0(Factor1[1],
min(Factor2):max(Factor2)))) %>%
select(-Factor1, -Factor2) %>%
spread(Factor, Power)
head(res, 2)
# Customer Rate TK1 TK2 TK3 TK4 TK5 TK6 TK7 TK8 TK9 TK10 TK11 TK12
#1 W1 6 5 0 0 0 1 0 0 0 0 0 0 0
#2 W2 3 0 1 5 3 0 0 0 0 0 0 0 0
来源:https://stackoverflow.com/questions/38194583/dplyr-and-tidyr-convert-long-to-wide-format-and-arrange-columns