问题
Let's use the dataset from this question:
dat<-data.frame(replicate(20,sample(c("A", "B", "C","D"), size = 100, replace=TRUE)))
Then we can build the transition matrix and the markov chain:
# Build transition matrix
trans.matrix <- function(X, prob=T)
{
tt <- table( c(X[,-ncol(X)]), c(X[,-1]) )
if(prob) tt <- tt / rowSums(tt)
tt
}
trans.mat <- trans.matrix(as.matrix(dat))
attributes(trans.mat)$class <- 'matrix'
# Build markovchain
library(markovchain)
chain <- new('markovchain', transitionMatrix = trans.mat)
If I now encounter a new sequence, let's say AAABCAD
can I then calculate the probability of observing this sequence given this markovchain?
回答1:
I cannot see a function in markovchain
exactly for that, but it can be easily done manually too. There's one caveat though: the transition matrix does not provide the probability of observing the first A
, which needs to be provided by you. Let it be 0.25, as it would be if all four states were equally likely (which is true in your example).
Then the transitions in the observed chain can be obtained with
cbind(head(obs, -1), obs[-1])
# [,1] [,2]
# [1,] "A" "A"
# [2,] "A" "A"
# [3,] "A" "B"
# [4,] "B" "C"
# [5,] "C" "A"
# [6,] "A" "D"
Probabilities for each of those transitions then are
trans.mat[cbind(head(obs, -1), obs[-1])]
# [1] 0.2268722 0.2268722 0.2268722 0.2926316 0.2791165 0.2665198
and the final answer is 0.25 * (the product of the above vector):
0.25 * prod(trans.mat[cbind(head(obs, -1), obs[-1])])
# [1] 6.355069e-05
For comparison, we may estimate this probability by generating many chains of length 7:
dat <- replicate(2000000, paste(sample(c("A", "B", "C", "D"), size = 7, replace = TRUE), collapse = ""))
mean(dat == "AAABCAD")
# [1] 6.55e-05
Looks close enough!
来源:https://stackoverflow.com/questions/55611370/calculate-probability-of-observing-sequence-using-markovchain-package