BigQuery to find the Subsequence

孤街浪徒 提交于 2020-01-03 02:38:06

问题


Assuming my table is

WITH `sample_project.sample_dataset.table` AS (
  SELECT 'user1' user, 2 sequence, 'T1' ts UNION ALL
  SELECT 'user1', 2, 'T2' UNION ALL
  SELECT 'user1', 1, 'T3' UNION ALL
  SELECT 'user1', 1, 'T4' UNION ALL
  SELECT 'user1', 3, 'T5' UNION ALL
  SELECT 'user1', 2, 'T6' UNION ALL
  SELECT 'user1', 3, 'T7' UNION ALL
  SELECT 'user1', 3, 'T8' 
)

Can I find Subsequence of Integers available in sequence column without using STRING_AGG and REGEX OR JOIN operations ? This is to make query more efficient.

A subsequence is a part of String. For example consider String "banana", A sample subsequence is "anna" as each index character of "anna" from banana is strictly increasing. Characters in a subsequence need not be contiguous.

Say for the above table when order by timestamp (INCREASING), I would get STRING_AGG for sequence column as 22113233. In the String 22113233 subsequence 1 2 3 is available whereas subsequence 3 2 1 is not available. Given a subsequence 213, How can I say if this subsequence is available or not (in 22113233 which sorted by timestamp) ?


回答1:


Given a subsequence 213, How can I say if this subsequence is available or not (in 22113233 ...

Below example is for BigQuery SQL

#standardSQL
WITH `sequences` AS (
  SELECT '22113233' sequence_list 
), `subsequenses` AS (
  SELECT '123' subsequence UNION ALL
  SELECT '321' UNION ALL
  SELECT '213'
)
SELECT sequence_list, subsequence, 
  REGEXP_CONTAINS(sequence_list, REGEXP_REPLACE(subsequence, '', '.*')) available
FROM `sequences` l
CROSS JOIN `subsequenses` s   

with result as below

sequence_list   subsequence     available    
22113233        321             false    
22113233        123             true     
22113233        213             true     

if you are looking for specific subsequence - this can be further simplified as

#standardSQL
WITH `sequences` AS (
  SELECT '22113233' sequence_list UNION ALL
  SELECT '11223322'
)
SELECT sequence_list,  
  REGEXP_CONTAINS(sequence_list, REGEXP_REPLACE('213', '', '.*')) available
FROM `sequences`

with result as

sequence_list   available    
22113233        true     
11223322        false    


来源:https://stackoverflow.com/questions/47819258/bigquery-to-find-the-subsequence

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!