R convert Datatable distinct column values to column names and column values as values from another column [duplicate]

眉间皱痕 提交于 2020-01-03 04:45:05

问题


Question 1)

I have an R data table with three columns (The actual dataset is bigger but simplifying for better understanding)

Column_One, Column_Two, Column_Three

A, 1, 4
A, 2, 3
A, 3, 77
B, 1, 44
B, 2, 32
B, 3, 770
C, 1, 43
C, 2, 310
C, 3, 68

I want to create a new matrix (data table) from the above as shown below.

A, B, C
4, 44, 43
3, 32, 310
77, 770, 68

Please note in the actual data table there are hundreds of different values for column one and two. Hence a generic solution would be needed.

Any questions, please let me know. Much appreciative of any suggestions.

Question 2)

There could be another level as in a fourth column, column zero, which links a few of the column ones. In this case we need to create new data tables based on column zero and then apply the solution to column one on each sub data table. Please suggest the quickest / simplest way possible.

Column_Zero, Column_One, Column_Two, Column_Three

XX,A, 1, 4
XX,A, 2, 3
XX,A, 3, 77
XX,B, 1, 44
XX,B, 2, 32
XX,B, 3, 770
XX,C, 1, 43
XX,C, 2, 310
XX,C, 3, 68       
YY,A1, 1, 4
YY,A1, 2, 3
YY,A1, 3, 77
YY,B1, 1, 44
YY,B1, 2, 32
YY,B1, 3, 770
YY,C1, 1, 43
YY,C1, 2, 310
YY,C1, 3, 68 
YY,D2, 1, 4
YY,D2, 2, 5
YY,D2, 3, 6 

--------- And so on -----

We then need to create,

------ Data Table one ------

A, B, C
4, 44, 43
3, 32, 310
77, 770, 68

------ Data Table Two ------

A1, B1, C1, D2
4, 44, 43,4
3, 32, 310,5
77, 770, 68,6

------ and so on -----

Related Question:

Once this matrix is split and recast, it becomes important to know the dimensions of the new data structure and its components and also how to access them individually, which is discussed here:

R Finding Multidimension Array Dimension Sizes


回答1:


We can use acast to convert from 'long' to 'wide' format. The resulting dataset will be a matrix.

library(reshape2)
acast(df1, Column_Two~Column_One, value.var="Column_Three")
#   A   B   C
#1  4  44  43
#2  3  32 310
#3 77 770  68

For the second dataset, we can split by "Column_Zero" and then loop over the list and do the acast as before

 lst <- lapply(split(df2[-1], df2$Column_Zero), function(x) 
         acast(x, Column_Two~Column_One,value.var="Column_Three"))

lst
#$XX
#   A   B   C
#1  4  44  43
#2  3  32 310
#3 77 770  68

#$YY
#  A1  B1  C1 D2
#1  4  44  43  4
#2  3  32 310  5
#3 77 770  68  6



回答2:


A possible tidyr / dplyr solution:

library(dplyr)
library(tidyr)

df %>% spread(Column_One, Column_Three) %>% select(2:4)

#   A   B   C
#1  4  44  43
#2  3  32 310
#3 77 770  68


来源:https://stackoverflow.com/questions/34994790/r-convert-datatable-distinct-column-values-to-column-names-and-column-values-as

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!