data reorganization in r

浪子不回头ぞ 提交于 2019-12-07 16:28:33

This maybe roughly what you want

Person <- c("A", "B", "C", "AB", "BC", "AC",  "D", "E")
Father <- c(NA,  NA,  NA,   "A", "B", "C",    NA, "D")
Mother <- c(NA,  NA,  NA, "B",   "C", "A", "C",    NA)
var1 <- c(  1,   2,   3,     4,   2,   1,     6, 9)
var2 <- c(1.4, 2.3, 4.3,  3.4, 4.2, 6.1,   2.6, 8.2)
myd <- data.frame (Person, Father, Mother, var1, var2,stringsAsFactors=F)

note the slight change in definition of myd using stringsAsFactors=F

parentage<-function(x,myd){
    y<-myd[x,]
    p1<-as.character(y['Father'])
    p2<-as.character(y['Mother'])
    out<-y
    if(!is.na(p1)){
        out<-rbind(out,myd[myd$Person==p1,])
    }
    if(!is.na(p2)){
        out<-rbind(out,myd[myd$Person==p2,])
    }
    out$Trio=x
    out
}

ans<-lapply(seq_along(myd$Person),parentage,myd)

 > ans
[[1]]
  Person Father Mother var1 var2 Trio
1      A   <NA>   <NA>    1  1.4    1

[[2]]
  Person Father Mother var1 var2 Trio
2      B   <NA>   <NA>    2  2.3    2

[[3]]
  Person Father Mother var1 var2 Trio
3      C   <NA>   <NA>    3  4.3    3

[[4]]
   Person Father Mother var1 var2 Trio
4      AB      A      B    4  3.4    4
2       A   <NA>   <NA>    1  1.4    4
21      B   <NA>   <NA>    2  2.3    4

[[5]]
  Person Father Mother var1 var2 Trio
5     BC      B      C    2  4.2    5
2      B   <NA>   <NA>    2  2.3    5
3      C   <NA>   <NA>    3  4.3    5

[[6]]
   Person Father Mother var1 var2 Trio
6      AC      C      A    1  6.1    6
3       C   <NA>   <NA>    3  4.3    6
31      A   <NA>   <NA>    1  1.4    6

[[7]]
  Person Father Mother var1 var2 Trio
7      D   <NA>      C    6  2.6    7
3      C   <NA>   <NA>    3  4.3    7

[[8]]
  Person Father Mother var1 var2 Trio
8      E      D   <NA>    9  8.2    8
7      D   <NA>      C    6  2.6    8

if you want to have a dataframe you can use the plyr package

library(plyr)
ans<-adply(seq_along(myd$Person),1,parentage,myd)

I would represent your problem as a graph and then design a graph traversal algorithm to collect all the trios that you are looking for.

For instance, here you have a subset of the trios in your problem:

A    B    C
 \  / \  /
  vv   vv
  AB   BC 

You could start by the vertices without any edge going out (AB and BC), and create a trio with their parents. Then move to their parents and repeat the process. You will need a way to keep track of which vertices (persons) you have already visited in order to avoid exploring the same vertices more than once.

R has several packages for using graphs. For instance, you may have a look at igraph.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!