Thank you so much for your help in advance!
I am trying to modify an existing matrix such that, when a new line is added to the matrix, it removes values from the pr
You could try something like this. Having 'nrow' in there will allow you to run it multiple times with other 'I.vec's. I tried to do this in a single line with 'apply' but couldn't get a matrix to come out again.
mat1<-matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1),byrow=T, nrow=4)
I.vec<-c(0,1,1,0)
mat.I.r<-rbind(mat1,I.vec)
for(i in 1:ncol(mat.I.r))
{
ifelse(mat.I.r[nrow(mat.I.r),i]==1, mat.I.r[sample(which(mat.I.r[1:(nrow(mat.I.r)-1),i]==1),1), i] <- 0, "")
}
mat.I.r
There is a little bit of ambiguity in the description from the OP, so two solutions are suggested:
1
s in relevant columns can be set to 0
I'll just alter the original function (see below). The change is to the line defining rows
. I now have (there was a bug in the original - the version below is revised to handle deal with the bug):
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] == 1L)
out <- if(length(ones) == 1L) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
Basically, what this does is, for each column we need to swap a 1
to a 0
, we work out which rows of the column contain 1
s and sample one of these.
Edit: We have to handle the case where there is only a single 1
in a column. If we just sample from a length 1 vector, R's sample()
will treat it as if we wanted to sample from the set seq_len(n)
not from the length 1 set n
. We handle this now with an if, else
statement.
We have to do this individually for each column so we get the correct rows. I suppose we could do some nice manipulation to avoid repeated calls to which()
and sample()
, but how escapes me at the moment, because we do have to handle the case where there is only one 1
in the column. Here's the finished function (updated to handle the length 1 sample bug in the original):
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec == 1L)
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] == 1L)
out <- if(length(ones) == 1L) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
ind <- (nr*(cols-1)) + rows
mat[ind] <- 0
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
and here it is in action:
> set.seed(2)
> foo(mat1, ivec)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 1 1 0
and it works when there is only one 1
in a column we want to do a swap in:
> foo(mat1, c(0,0,1,1))
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 0 1 1
Here is a vectorised answer, where we treat the matrix as a vector when doing the replacement. Using the example data:
mat1 <- matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1), byrow = TRUE, nrow = 4)
ivec <- c(0,1,1,0)
## Set a seed to make reproducible
set.seed(2)
## number of rows and columns of our matrix
nr <- nrow(mat1)
nc <- ncol(mat1)
## which of ivec are 1L
cols <- which(ivec == 1L)
## sample length(cols) row indices, with replacement
## so same row can be drawn more than once
rows <- sample(seq_len(nr), length(cols), replace = TRUE)
## Compute the index of each rows cols combination
## if we treated mat1 as a vector
ind <- (nr*(cols-1)) + rows
## ind should be of length length(cols)
## copy for illustration
mat2 <- mat1
## replace the indices we want with 0, note sub-setting as a vector
mat2[ind] <- 0
## bind on ivec
mat2 <- rbind(mat2, ivec)
This gives us:
> mat2
[,1] [,2] [,3] [,4]
1 0 0 0
0 1 0 0
1 0 0 0
0 0 1 1
ivec 0 1 1 0
If I were doing this more than once or twice, I'd wrap this in a function:
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec == 1L)
rows <- sample(seq_len(nr), length(cols), replace = TRUE)
ind <- (nr*(cols-1)) + rows
mat[ind] <- 0
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
Which gives:
> foo(mat1, ivec)
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 1 1 0
If you wanted to do this for multiple ivec
s, growing mat1
each time, then you probably don't want to do that in a loop as growing objects is slow (it involves copies etc). But you could just modify the definition of ind
to include the extra n
rows you bind on for the n
ivec
s.