sequence-analysis

How to compute dissimilarities between sequences when sequences contain gaps?

拈花ヽ惹草 提交于 2019-12-24 11:24:03
问题 I want to cluster sequences with optimal matching with TraMineR::seqdist() from data that contains missings, i.e. sequences containing gaps. library(TraMineR) data(ex1) sum(is.na(ex1)) # [1] 38 sq <- seqdef(ex1[1:13]) sq # Sequence # s1 *-*-*-A-A-A-A-A-A-A-A-A-A # s2 D-D-D-B-B-B-B-B-B-B # s3 *-D-D-D-D-D-D-D-D-D-D # s4 A-A-*-*-B-B-B-B-D-D # s5 A-*-A-A-A-A-*-A-A-A # s6 *-*-*-C-C-C-C-C-C-C # s7 *-*-*-*-*-*-*-*-*-*-*-*-* sm <- seqsubm(sq, method='TRATE') round(sm,digits=3) # A-> B-> C-> D-> # A->

Traminer substitution cost

空扰寡人 提交于 2019-12-13 19:24:54
问题 I have a logical problem with the transition cost matrix. I am working on sequences dissimilarity using the R package Traminer. I try to give you a simple example (very simple, but I hope useful to explain my problem): There are three sequences and I want to be calculate the dissimilarity matrix. The alphabet is: H (in health), I (ill at home), IH (ill at hospital), D (died) I observe the 3 subjects for 5 observations. These are the sequences: H – H – I – D – D H – I – I – I – D I – I – H –

Fitting a VLMC to very long sequences

霸气de小男生 提交于 2019-12-11 03:39:26
问题 I am trying to fit a VLMC to a dataset where the longest sequence is 296 states. I do it as shown below: # Load libraries library(PST) library(RCurl) library(TraMineR) # Load and transform data x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/241ef39125ecb55a85b43d7f4cd3d58f617b2ecf/challenge_level.csv") data <- read.csv(text = x) data.seq <- seqdef(data[,2:ncol(data)], missing = NA, right = NA, nr = "*") S1 <- pstree(data.seq, ymin = 0.01, lik

How to get the largest possible column sequence with the least possible row NAs from a huge matrix?

北城余情 提交于 2019-12-10 11:27:22
问题 I want to select columns from a data frame so that the resulting continuous column-sequences are as long as possible, while the number of rows with NAs is as small as possible, because they have to be dropped afterwards. (The reason I want to do this is, that I want to run TraMineR::seqsubm() to automatically get a matrix of transition costs (by transition probability) and later run cluster::agnes() on it. TraMineR::seqsubm() doesn't like NA states and cluster::agnes() with NA states in the

How to get the largest possible column sequence with the least possible row NAs from a huge matrix?

自闭症网瘾萝莉.ら 提交于 2019-12-06 04:12:56
I want to select columns from a data frame so that the resulting continuous column-sequences are as long as possible, while the number of rows with NAs is as small as possible, because they have to be dropped afterwards. (The reason I want to do this is, that I want to run TraMineR::seqsubm() to automatically get a matrix of transition costs (by transition probability) and later run cluster::agnes() on it. TraMineR::seqsubm() doesn't like NA states and cluster::agnes() with NA states in the matrix doesn't necessarily make much sense.) For that purpose I already wrote a working function that