traminer

Too many unique sequences

蹲街弑〆低调 提交于 2021-02-10 20:17:17
问题 I have a large dataset with above 2 million sequences, including about 180,000 unique ones. I am using the seqdist command to measure distances, and I'll ultimately also try to identify clusters of sequences. Below is the error message I get: Code and error message Is there any way of setting a different maximum number of sequences, or some other workaround? Thank you very much in advance! 回答1: The size limits for the distance matrix follows from the maximum allowed index value. This value is

Formatting timestamps to avoid R/TraMineR crash?

荒凉一梦 提交于 2021-01-27 12:03:55
问题 I have a sequence dataset where the timestamp is in seconds since the epoch: id event time end 1 723 opened 1356963741 1356963741 2 722 opened 1356931342 1356931342 3 721 referenced 1356988206 1356988206 4 721 referenced 1356988186 1356988186 5 721 closed 1356988186 1356988186 6 721 merged 1356988186 1356988186 7 721 closed 1356988186 1356988186 8 721 merged 1356988186 1356988186 9 721 discussed 1356966433 1356966433 10 721 discussed 1356963870 1356963870 I want to create an STS sequence

Is it possible to make a graph with pattern fills using TraMineR and R base graphs?

半世苍凉 提交于 2020-01-23 03:06:04
问题 enter image description hereA common problem1 2 in the publication of a sequence analysis or generally of graphs with many categorical states is that they are not easily transferable to b/w paper publications. There are some tools, like Colorbrewer, which can help to make a well informed decision on grey scale colors. Nonetheless, the results are unsatisfactory if the color palette exceeds 5 or more shades of greys. Thus, it would be really helpful to add pattern fills to certain graph areas

TraMineR: Can I get the complete sequence if I give an event sub sequence?

一世执手 提交于 2020-01-05 05:35:06
问题 I have a sequence dataset like below: customerid flag 0 1 2 3 4 5 6 7 8 9 10 11 abc234 1 3 4 3 4 5 8 4 3 3 2 14 14 abc233 0 4 4 4 4 4 4 4 4 4 4 4 4 qpr81 0 9 8 7 8 8 7 8 8 7 8 8 7 qnr94 0 14 14 14 2 14 14 14 14 14 14 14 14 Values in column 0 to 11 are the sequences. There are two sets of customers with flag=1 and flag=0, I have differentiating event sequences for both sets. ( Only frequencies and residuals for 2 groups are shown here) Subsequence Freq.0 Freq.1 Resid.0 Resid.1 (3>4) 0.19208177

Problem with big data (?) during computation of sequence distances using TraMineR

天大地大妈咪最大 提交于 2019-12-29 05:29:07
问题 I am trying to run an optimal matching analysis using TraMineR but it seems that I am encountering an issue with the size of the dataset. I have a big dataset of European countries which contains employment spells. I have more than 57,000 sequences which are 48 units long and consist of 9 distinct states. In order to get an idea of the analysis, here is the head of sequence object employdat.sts : [1] EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-... [2] EF-EF-EF-EF-EF-EF

R: sequence analysis of consumer baskets

为君一笑 提交于 2019-12-24 12:01:16
问题 I have 3 year history of real transaction data for 700 consumers, 400 different products. I am trying to make sequence analysis using TraMineR package and instructions from http://analyzecore.com/2014/12/04/sequence-carts-in-depth-analysis-with-r/ Unfortunately I have encountered several problems: End date ("to" parameter) of some purchases are same as beginning of next ones - I solved it by using every second order - it worked, but I would like to have all orders While trying to make

How to compute dissimilarities between sequences when sequences contain gaps?

拈花ヽ惹草 提交于 2019-12-24 11:24:03
问题 I want to cluster sequences with optimal matching with TraMineR::seqdist() from data that contains missings, i.e. sequences containing gaps. library(TraMineR) data(ex1) sum(is.na(ex1)) # [1] 38 sq <- seqdef(ex1[1:13]) sq # Sequence # s1 *-*-*-A-A-A-A-A-A-A-A-A-A # s2 D-D-D-B-B-B-B-B-B-B # s3 *-D-D-D-D-D-D-D-D-D-D # s4 A-A-*-*-B-B-B-B-D-D # s5 A-*-A-A-A-A-*-A-A-A # s6 *-*-*-C-C-C-C-C-C-C # s7 *-*-*-*-*-*-*-*-*-*-*-*-* sm <- seqsubm(sq, method='TRATE') round(sm,digits=3) # A-> B-> C-> D-> # A->

Measuring reliability of tree/dendrogram (Traminer)

寵の児 提交于 2019-12-24 03:12:47
问题 I did an analysis using TraMineR in order to measure the similarity among sequences of spatial use (for example Rural(R) vs Urban (U): sequence example -> RRRRRUUURRUUU) A requirement in my analysis is that states are compared at the same moment in time and therefore I used the hamming sequence similarity. Based on the similarity matrix I created a dendrogram, giving the distances among individual sequences, helping to identify "behavioral similarities" in sequential spatial use. Now I am

Pivoting a CSV file using R

家住魔仙堡 提交于 2019-12-24 00:59:24
问题 I have a file that looks like this: type created_at repository_name 1 IssuesEvent 2012-03-11 06:48:31 bootstrap 2 IssuesEvent 2012-03-11 06:48:31 bootstrap 3 IssuesEvent 2012-03-11 06:48:31 bootstrap 4 IssuesEvent 2012-03-11 06:52:50 bootstrap 5 IssuesEvent 2012-03-11 06:52:50 bootstrap 6 IssuesEvent 2012-03-11 06:52:50 bootstrap 7 IssueCommentEvent 2012-03-11 07:03:57 bootstrap 8 IssueCommentEvent 2012-03-11 07:03:57 bootstrap 9 IssueCommentEvent 2012-03-11 07:03:57 bootstrap 10 IssuesEvent

Multiple events in traminer

自作多情 提交于 2019-12-23 15:11:45
问题 I'm trying to analyse multiple sequences with TraMineR at once. I've had a look at seqdef but I'm struggling to understand how I'd create a TraMineR dataset when I'm dealing with multiple variables. I guess I'm working with something similar to the dataset used by Aassve et al. (as mentioned in the tutorial), whereby each wave has information about several states (e.g. children, marriage, employment). All my variables are binary. Here's an example of a dataset with three waves (D,W2,W3) and