Is there a way to measure string similarity in Google BigQuery

前端 未结 7 2425
礼貌的吻别
礼貌的吻别 2020-12-03 15:35

I\'m wondering if anyone knows of a way to measure string similarity in BigQuery.

Seems like would be a neat function to have.

My case is i need to compare

7条回答
  •  生来不讨喜
    2020-12-03 16:19

    If you're familiar with Python, you can use the functions defined by fuzzywuzzy in BigQuery using external libraries loaded from GCS.

    Steps:

    1. Download the javascript version of fuzzywuzzy (fuzzball)
    2. Take the compiled file of the library: dist/fuzzball.umd.min.js and rename it to a clearer name (like fuzzball)
    3. Upload it to a google cloud storage bucket
    4. Create a temp function to use the lib in your query (set the path in OPTIONS to the relevant path)
    CREATE TEMP FUNCTION token_set_ratio(a STRING, b STRING)
    RETURNS FLOAT64
    LANGUAGE js AS """
      return fuzzball.token_set_ratio(a, b);
    """
    OPTIONS (
      library="gs://my-bucket/fuzzball.js");
    
    with data as (select "my_test_string" as a, "my_other_string" as b)
    
    SELECT  a, b, token_set_ratio(a, b) from data
    

提交回复
热议问题