Traminer substitution cost

空扰寡人 提交于 2019-12-13 19:24:54

问题


I have a logical problem with the transition cost matrix. I am working on sequences dissimilarity using the R package Traminer.

I try to give you a simple example (very simple, but I hope useful to explain my problem):

There are three sequences and I want to be calculate the dissimilarity matrix. The alphabet is: H (in health), I (ill at home), IH (ill at hospital), D (died)

I observe the 3 subjects for 5 observations. These are the sequences:

H – H – I – D – D 
H – I – I – I – D 
I – I – H – IH – IH 

The substitution cost matrix is a 4x4 table (state x state). It must be symmetric? This is my logical problem: while it is possible to “transit” from states H, I or IH to state Died, the contrary is illogical.

Can I use a non-symmetric substitution cost matrix in TraMineR?

If, in my database, the substitution cost (calculated with sm = "TRATE", for instance) from state “I” to “D” is lower (0.5) than the substitution cost from state 'I' to 'IH' (0.6) , the OM algorithm substitute the “I” whith “D” instead of “HI”.


回答1:


it seems to me that you're looking for a custom cost matrix. It is not mandatory to use either the TRATE or CONSTANT method.

To create a custom matrix you'll just have to do something like this:

myscm <- matrix(c(0,1,2, 
                  1,0,2, 
                  2,2,0), nrow=3, ncol=3) 
dist.om <- seqdist(my.seq, method="OM", sm=myscm)

where myscm is your custom matrix

This was taken from http://lists.r-forge.r-project.org/pipermail/traminer-users/2011-July/000075.html

I believe you have two options:

1) Create a rationale for all the transitions and a full custom matrix

2) Get the transition matrix you've already generated (using seqsubm(your.seq, method = "TRATE") ) and change just the inconsistent values. That's what I've done in my last analysis.

But keep in mind the point made by Gilbert in An "asymmetric" pairwise distance matrix




回答2:


The transitions rates (estimated transition probabilities) should not be confused with the substitution costs. Substitution costs are supposed to reflect the dissimilarities between states.

The matrix of transition rates (returned by seqtrate) is NOT symmetric.

The substitution costs used to compute distances such as the optimal matching distance, must be symmetric. Otherwise, the result would not be a distance matrix, and inputting such a non symmetric matrix to, for example, a clustering procedure would lead to unexpected results.

Deriving substitution cost from transition rates is just one over several possibilities to define substitution costs. Letting $p(i|j)$ be the probability to transit from $j$ to $i$, it consists in defining the substitution cost as

$c(i,j) = 2 - p(i|j) - p(j|i)$



来源:https://stackoverflow.com/questions/28586009/traminer-substitution-cost

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!