问题
When using the adist
function in R to compute the Levenshtein alignments between pairs of character strings, I get different results depending on whether I run the function once for each pair or use vectors to enter several pairs at once. Why is that?
Example: Transformations for the string pairs 'knijpen'-'kneifen', 'grijpen'-'greifen' and 'lopen'-'laufen':
attr(adist("knijpen", "kneifen", counts = TRUE), "trafos")
# [,1]
# [1,] "MMIMSDMM"
attr(adist("grijpen", "greifen", counts = TRUE), "trafos")
# [,1]
# [1,] "MMIMSDMM"
attr(adist("lopen", "laufen", counts = TRUE), "trafos")
# [,1]
# [1,] "MSSIMM"
These agree with my own manual solutions. When I enter the strings using vectors, though, I get a slightly different result:
dutch <- c("knijpen", "grijpen", "lopen")
german <- c("kneifen", "greifen", "laufen")
attr(adist(dutch, german, counts = TRUE), "trafos")
# [,1] [,2] [,3]
# [1,] "MMIMSDMM" "SSIMSDMM" "SSSSDMMM"
# [2,] "SSIMSDMM" "MMIMSDMM" "SSSSDMMM"
# [3,] "SSSIIMMM" "SSSIIMMM" "MSSIMMM"
The [3,3] element in this matrix should correspond to attr(adist("lopen", "laufen", counts = TRUE), "trafos")
(i.e., "MSSIMM"
), but it has another M
tacked onto it. Why?
来源:https://stackoverflow.com/questions/30590126/adist-different-levenshtein-alignments-depending-on-how-the-strings-are-entered