问题
I have this dataset:
df <- structure(list(V1 = c("B1D01", "B1D01", "B1D01", "B1D01", "B1D01",
"B1D01", "U0155"), V2 = c("U0155", "U0155", "U0155", "U0155",
"U0155", "U0155", "U3003"), V3 = c("U3003", "U3003", "C1B00",
"U3003", "U3003", "U3003", "C1B00"), V4 = c("C1B00", "C1B00",
"U0073", "C1B00", "C1B00", "C1B00", "P037D"), V5 = c("P037D",
"P037D", NA, "P037D", "P037D", "P037D", "P0616"), V6 = c("P0616",
"P0616", NA, "P0616", "P0616", "P0616", "P0562"), V7 = c("P0562",
"P0562", NA, "P0562", "P0562", "P0562", "U0073"), V8 = c("U0073",
"U0073", NA, "U0073", "U0073", "U0073", NA)), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8"), row.names = 1719:1725, class = "data.frame")
When I print(df)
:
V1 V2 V3 V4 V5 V6 V7 V8
1719 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
1720 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
1721 B1D01 U0155 C1B00 U0073 <NA> <NA> <NA> <NA>
1722 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
1723 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
1724 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
1725 U0155 U3003 C1B00 P037D P0616 P0562 U0073 <NA>
As you can observe, there is a mix in these codes. For instance, U3003
is primarily in V3
, but it can also be shown in V2
(last row).
I would like to reorganize this data frame with these conditions:
- Each code might be placed in one column.
- Names of the column should be the name of the codes.
- If there are more codes than 8 columns, number of columns might reflect number of codes.
- The cell values might keep the name of the codes.
- If the code is not present in a row,
NA
must appear.
Be aware that my original data frame contains much more rows than this small example extracted from the original.
回答1:
The best way I found is to 'massage' the dataframe, pivoting to a longer form, and then bring it back to the initial form:
library(tidyverse)
df %>%
rownames_to_column() %>%
pivot_longer(-rowname, values_drop_na = TRUE) %>%
pivot_wider(rowname, names_from = value, values_from = value)
#> # A tibble: 7 x 9
#> rowname B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1719 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> 2 1720 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> 3 1721 B1D01 U0155 <NA> C1B00 <NA> <NA> <NA> U0073
#> 4 1722 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> 5 1723 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> 6 1724 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> 7 1725 <NA> U0155 U3003 C1B00 P037D P0616 P0562 U0073
Created on 2020-04-03 by the reprex package (v0.3.0)
来源:https://stackoverflow.com/questions/61009363/reorganize-data-frame-elements-depending-on-the-content-of-the-rows-in-r