问题
i want to check equality of a dataset. the data set is looking like this
Equips <- c(1,1,1,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,5,6,7,8)
Notifs <- c(10,10,20,55,63,67,71,73,73,73,81,81,83,32,32,32,32,
47,48,45,45,45,51,51,55,56,69,65,88)
Comps <- c("Motor","Ventil","Motor","Gehäuse","Ventil","Motor","Steuerung","Motor",
"Ventil","Gehäuse","Gehäuse","Ventil","Motor","Schraube","Motor","Festplatte",
"Heizgerät","Motor","Schraube","Schraube","Lichtmaschine","Bremse","Lichtmaschine",
"Schraube","Lichtmaschine","Lichtmaschine","Motor","Ventil","Schraube")
rank <- c(1,1,2,1,2,3,1,2,2,2,3,3,4,1,1,1,1,2,3,1,1,1,2,2,3,4,1,1,1)
df <- data.frame(Equips,Notifs,Comps,rank)
The data frame should be read line by line.
My problem is the following: I have a very big data set, and i want to take a look if the Comps in one Equips are the same in all ranks.
To specify: Equips 1 has got rank 1 and 2 i want to compare if there is a component listed in rank 1 and rank 2 ( in this example: YES)
Equips 2 hast got 3 ranks and here is, as well, no Comps which is listed in the first, second and third rank.
Equips 5 hast got 4 ranks and yes here is a Comps which is in every rank: namely "Lichtmaschine".
So what is my desired output? It would be enough, if i got an output, with the number of Equips, and with TRUE or FALSE(like summary command)
TRUE should be the output if there is a Comps which is listed in every rank (within one Equips)
There are also some notes: the dataset is very big so i need an automize version AND if it's possible, just with the standard R programm without any packages.
A really big Thanks for your effort.
Charly
回答1:
Here is an answer which uses the plyr
package :
library(plyr)
ddply(df, .(Equips), function(d) {
nb.comps <- length(unique(d$rank))
tab <- table(d$rank, d$Comps) > 0
tab <- margin.table(tab, 2)
return(sum(tab>=nb.comps)>0)
})
Which gives :
Equips V1
1 1 TRUE
2 2 FALSE
3 3 FALSE
4 4 FALSE
5 5 TRUE
If you really don't want to use plyr
, you can use the by
function :
by(df, df$Equips, function(d) {
nb.comps <- length(unique(d$rank))
tab <- table(d$rank, d$Comps) > 0
tab <- margin.table(tab, 2)
return(sum(tab>=nb.comps)>0)
})
df$Equips: 1
[1] TRUE
--------------------------------------------------------
df$Equips: 2
[1] FALSE
--------------------------------------------------------
df$Equips: 3
[1] FALSE
--------------------------------------------------------
df$Equips: 4
[1] FALSE
--------------------------------------------------------
df$Equips: 5
[1] TRUE
If you want to summarize the result you can do something like this :
result <- by(df, df$Equips, function(d) {
nb.comps <- length(unique(d$Comps))
tab <- table(d$rank, d$Comps) > 0
tab <- margin.table(tab, 2)
return(sum(tab>=nb.comps)>0)
})
data.frame(nb.equips=dim(result), nb.matched=sum(result))
Which gives :
nb.equips nb.matched
1 5 2
来源:https://stackoverflow.com/questions/14769346/checking-for-equality