find all two word phrases that appear in more than one row in a dataset

后端 未结 2 909
旧巷少年郎
旧巷少年郎 2020-12-18 13:56

We would like to run a query that returns two word phrases that appear in more than one row. So for e.g. take the string \"Data Ninja\". Since it appears in more than one ro

2条回答
  •  伪装坚强ぢ
    2020-12-18 14:14

    A very useful hack which inspired me to solve my problem, thanks.

    My data is a combination of passengers and their age where age is a string of numbers:

    adults ages
    ------ -------------
      4    "53,67,65,68"       
      4    "44,45,69,65" 
      3    "20,21,20"
      3    "30,32,62"
    

    I wanted to add a column on each row containing the difference in age between the highest and lowest value

    adults ages          agediff
    ------ ------------- -------
       4   "53,67,65,68" 15       
       4   "44,45,69,65" 25
       3   "20,21,20"    1
       3   "30,32,62"    32
    

    This was done by the following, heavily inspired by the hack:

    SELECT adults, ages, SUBTRACT(INTEGER(maxage),INTEGER(minage)) agediff FROM 
     (SELECT adults, ages, max(age) maxage, min(age) minage FROM
      (SELECT adults, ages, age FROM 
       (SELECT adults, ages, REGEXP_EXTRACT(ages, r'([\d\d\,]{2})') age FROM [PaxAgeCombinations] WHERE (adults="3")),
       (SELECT adults, ages, REGEXP_EXTRACT(ages, r'\d\d\,([\d\d\,]{2})') age FROM [PaxAgeCombinations] WHERE (adults="3")),
       (SELECT adults, ages, REGEXP_EXTRACT(ages, r'\d\d\,\d\d\,([\d\d\,]{2})') age FROM [PaxAgeCombinations] WHERE (adults="3"))
      ),
      (SELECT adults, ages, age FROM 
       (SELECT adults, ages, REGEXP_EXTRACT(ages, r'([\d\d\,]{2})') age FROM [PaxAgeCombinations] WHERE (adults="4")),
       (SELECT adults, ages, REGEXP_EXTRACT(ages, r'\d\d\,([\d\d\,]{2})') age FROM [PaxAgeCombinations] WHERE (adults="4")),
       (SELECT adults, ages, REGEXP_EXTRACT(ages, r'\d\d\,\d\d\,([\d\d\,]{2})') age FROM [PaxAgeCombinations] WHERE (adults="4")),
       (SELECT adults, ages, REGEXP_EXTRACT(ages, r'\d\d\,\d\d\,\d\d\,([\d\d\,]{2})') age FROM [PaxAgeCombinations] WHERE (adults="4"))
      )
    

    )

提交回复
热议问题