Is there a way to measure string similarity in Google BigQuery

前端 未结 7 2422
礼貌的吻别
礼貌的吻别 2020-12-03 15:35

I\'m wondering if anyone knows of a way to measure string similarity in BigQuery.

Seems like would be a neat function to have.

My case is i need to compare

7条回答
  •  清歌不尽
    2020-12-03 16:20

    Below is quite simpler version for Hamming Distance by using WITH OFFSET instead of ROW_NUMBER() OVER()

    #standardSQL
    WITH Input AS (
      SELECT 'abcdef' AS strings UNION ALL
      SELECT 'defdef' UNION ALL
      SELECT '1bcdef' UNION ALL
      SELECT '1bcde4' UNION ALL
      SELECT '123de4' UNION ALL
      SELECT 'abc123'
    )
    SELECT 'abcdef' AS target, strings, 
      (SELECT COUNT(1) 
        FROM UNNEST(SPLIT('abcdef', '')) a WITH OFFSET x
        JOIN UNNEST(SPLIT(strings, '')) b WITH OFFSET y
        ON x = y AND a != b) hamming_distance
    FROM Input
    

提交回复
热议问题