According to the python-Levenshtein.ratio source:
https://github.com/miohtama/python-Levenshtein/blob/master/Levenshtein.c#L722
it\'s computed a
By looking more carefully at the C code, I found that this apparent contradiction is due to the fact that ratio treats the "replace" edit operation differently than the other operations (i.e. with a cost of 2), whereas distance treats them all the same with a cost of 1.
This can be seen in the calls to the internal levenshtein_common function made within ratio_py function:
https://github.com/miohtama/python-Levenshtein/blob/master/Levenshtein.c#L727
static PyObject*
ratio_py(PyObject *self, PyObject *args)
{
size_t lensum;
long int ldist;
if ((ldist = levenshtein_common(args, "ratio", 1, &lensum)) < 0) //Call
return NULL;
if (lensum == 0)
return PyFloat_FromDouble(1.0);
return PyFloat_FromDouble((double)(lensum - ldist)/(lensum));
}
and by distance_py function:
https://github.com/miohtama/python-Levenshtein/blob/master/Levenshtein.c#L715
static PyObject*
distance_py(PyObject *self, PyObject *args)
{
size_t lensum;
long int ldist;
if ((ldist = levenshtein_common(args, "distance", 0, &lensum)) < 0)
return NULL;
return PyInt_FromLong((long)ldist);
}
which ultimately results in different cost arguments being sent to another internal function, lev_edit_distance, which has the following doc snippet:
@xcost: If nonzero, the replace operation has weight 2, otherwise all
edit operations have equal weights of 1.
Code of lev_edit_distance():
/**
* lev_edit_distance:
* @len1: The length of @string1.
* @string1: A sequence of bytes of length @len1, may contain NUL characters.
* @len2: The length of @string2.
* @string2: A sequence of bytes of length @len2, may contain NUL characters.
* @xcost: If nonzero, the replace operation has weight 2, otherwise all
* edit operations have equal weights of 1.
*
* Computes Levenshtein edit distance of two strings.
*
* Returns: The edit distance.
**/
_LEV_STATIC_PY size_t
lev_edit_distance(size_t len1, const lev_byte *string1,
size_t len2, const lev_byte *string2,
int xcost)
{
size_t i;
[ANSWER]
So in my example,
ratio('ab', 'ac') implies a replacement operation (cost of 2), over the total length of the strings (4), hence 2/4 = 0.5.
That explains the "how", I guess the only remaining aspect would be the "why", but for the moment I'm satisfied with this understanding.