adist: different Levenshtein alignments depending on how the strings are entered

梦想与她 提交于 2019-12-12 13:32:45

问题


When using the adist function in R to compute the Levenshtein alignments between pairs of character strings, I get different results depending on whether I run the function once for each pair or use vectors to enter several pairs at once. Why is that?

Example: Transformations for the string pairs 'knijpen'-'kneifen', 'grijpen'-'greifen' and 'lopen'-'laufen':

attr(adist("knijpen", "kneifen", counts = TRUE), "trafos")
#      [,1]      
# [1,] "MMIMSDMM"

attr(adist("grijpen", "greifen", counts = TRUE), "trafos")
#      [,1]      
# [1,] "MMIMSDMM"

attr(adist("lopen", "laufen", counts = TRUE), "trafos")
#      [,1]    
# [1,] "MSSIMM"

These agree with my own manual solutions. When I enter the strings using vectors, though, I get a slightly different result:

dutch <- c("knijpen", "grijpen", "lopen")
german <- c("kneifen", "greifen", "laufen")
attr(adist(dutch, german, counts = TRUE), "trafos")
#      [,1]       [,2]       [,3]      
# [1,] "MMIMSDMM" "SSIMSDMM" "SSSSDMMM"
# [2,] "SSIMSDMM" "MMIMSDMM" "SSSSDMMM"
# [3,] "SSSIIMMM" "SSSIIMMM" "MSSIMMM" 

The [3,3] element in this matrix should correspond to attr(adist("lopen", "laufen", counts = TRUE), "trafos") (i.e., "MSSIMM"), but it has another M tacked onto it. Why?

来源:https://stackoverflow.com/questions/30590126/adist-different-levenshtein-alignments-depending-on-how-the-strings-are-entered

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!