Is there a way to measure string similarity in Google BigQuery

前端 未结 7 2417
礼貌的吻别
礼貌的吻别 2020-12-03 15:35

I\'m wondering if anyone knows of a way to measure string similarity in BigQuery.

Seems like would be a neat function to have.

My case is i need to compare

7条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-03 16:17

    Levenshtein via JS would be the way to go. You can use the algorithm to get absolute string distance, or convert it to a percentage similarity by simply calculating abs(strlen - distance / strlen).

    The easiest way to implement this would be to define a Levenshtein UDF that takes two inputs, a and b, and calculates the distance between them. The function could return a, b, and the distance.

    To invoke it, you'd then pass in the two URLs as columns aliased to 'a' and 'b':

    SELECT a, b, distance
    FROM
      Levenshtein(
         SELECT
           some_url AS a, other_url AS b
         FROM
           your_table
      )
    

提交回复
热议问题