Calculate probability of observing sequence using markovchain package

问题

Let's use the dataset from this question:

dat<-data.frame(replicate(20,sample(c("A", "B", "C","D"), size = 100, replace=TRUE)))

Then we can build the transition matrix and the markov chain:

# Build transition matrix
trans.matrix <- function(X, prob=T)
{
  tt <- table( c(X[,-ncol(X)]), c(X[,-1]) )
  if(prob) tt <- tt / rowSums(tt)
  tt
}
trans.mat <- trans.matrix(as.matrix(dat))
attributes(trans.mat)$class <- 'matrix'

# Build markovchain
library(markovchain)
chain <- new('markovchain', transitionMatrix = trans.mat)

If I now encounter a new sequence, let's say AAABCAD can I then calculate the probability of observing this sequence given this markovchain?

回答1:

I cannot see a function in markovchain exactly for that, but it can be easily done manually too. There's one caveat though: the transition matrix does not provide the probability of observing the first A, which needs to be provided by you. Let it be 0.25, as it would be if all four states were equally likely (which is true in your example).

Then the transitions in the observed chain can be obtained with

cbind(head(obs, -1), obs[-1])
#      [,1] [,2]
# [1,] "A"  "A" 
# [2,] "A"  "A" 
# [3,] "A"  "B" 
# [4,] "B"  "C" 
# [5,] "C"  "A" 
# [6,] "A"  "D"

Probabilities for each of those transitions then are

trans.mat[cbind(head(obs, -1), obs[-1])]
# [1] 0.2268722 0.2268722 0.2268722 0.2926316 0.2791165 0.2665198

and the final answer is 0.25 * (the product of the above vector):

0.25 * prod(trans.mat[cbind(head(obs, -1), obs[-1])])
# [1] 6.355069e-05

For comparison, we may estimate this probability by generating many chains of length 7:

dat <- replicate(2000000, paste(sample(c("A", "B", "C", "D"), size = 7, replace = TRUE), collapse = ""))
mean(dat == "AAABCAD")
# [1] 6.55e-05

Looks close enough!

来源：https://stackoverflow.com/questions/55611370/calculate-probability-of-observing-sequence-using-markovchain-package

标签

markov-chains