BQ scripting: Writing results of a loop to a table

北城以北 提交于 2020-01-14 04:14:29

问题


I am working with BigQuery scripting, I have written a simple WHILE loop which iterates through daily Google Analytics tables and sums the visits, now I'd like to write these results out to a table.

I've gotten as far as creating the table, but I can't capture the value of visits from my SQL query to populate the table. Date works fine, because it is defined outside of the SQL. I tried to DECLARE the value of visits with a new variable, but again this does not work because it's not known outside of the statement.

SET vis = visits;

How can I correctly write my results out to a table?

DECLARE d DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
DECLARE pfix STRING DEFAULT REGEXP_REPLACE(CAST(d AS STRING),"-","");
DECLARE vis INT64;

CREATE OR REPLACE TABLE test.looped_results (Date DATE, Visits INT64);

WHILE d > '2019-10-01' DO
  SELECT d, SUM(totals.visits) AS visits
  FROM `project.dataset.ga_sessions_*`
  WHERE _table_suffix = pfix
  GROUP  BY Date;
  SET d = DATE_SUB(d, INTERVAL 1 DAY);
  SET vis = visits;
  INSERT INTO test.looped_results VALUES (d, visits);
END WHILE;

Update: I also tried an alternative solution, assigning visits to it's own variable, but this produces the same error:

WHILE d > '2019-10-01' DO
  SET vis_count = (SELECT SUM(totals.visits) AS visits
                    FROM `mindful-agency-136314.43786551.ga_sessions_*`
                    WHERE _table_suffix = pfix);

  INSERT INTO test.looped_results VALUES (d, vis_count);

  SET d = DATE_SUB(d, INTERVAL 1 DAY);

END WHILE;

Results:

In my results I see the correct number of rows created, with the correct dates, but the value of visits for each is the value for the most recent day.


回答1:


Actually, you need to update the pfix variable in there. Also, it is a good idea to instantiate the visits. Finally, your GROUPBY doesn't necessarily need a dimension if you are providing it with a pfix constraint.

This should do it:

DECLARE d DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
DECLARE pfix STRING DEFAULT REGEXP_REPLACE(CAST(d AS STRING),'-','');
DECLARE visits int64;
SET visits = 0;

CREATE OR REPLACE TABLE project.dataset.looped_results (Date DATE, Visits INT64);

WHILE d > '2019-10-01' DO
  SET visits = (SELECT SUM(totals.visits) FROM `project.dataset.ga_sessions_*` WHERE _table_suffix = pfix);
  SET d = DATE_SUB(d, INTERVAL 1 DAY);
  SET pfix = REGEXP_REPLACE(CAST(d AS STRING),"-","");
  INSERT INTO dataset.looped_results VALUES (d, visits);
END WHILE;

Hope it helps.




回答2:


I would also move INSERT INTO outside of the WHILE loop by collecting result into result variable (along with few other minor changes) as in below example

DECLARE d DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
DECLARE pfix STRING; 
DECLARE vis_count INT64;
DECLARE result ARRAY<STRUCT<vis_date DATE, vis_count INT64>> DEFAULT [];

CREATE OR REPLACE TABLE test.looped_results (Date DATE, Visits INT64);

WHILE d > '2019-10-01' DO
  SET pfix = REGEXP_REPLACE(CAST(d AS STRING),"-","");
  SET vis_count = (SELECT SUM(totals.visits) AS visits
                    FROM `project.dataset.ga_sessions_*`
                    WHERE _table_suffix = pfix);
  SET result = ARRAY_CONCAT(result, [STRUCT(d, vis_count)]);
  SET d = DATE_SUB(d, INTERVAL 1 DAY);
END WHILE;

INSERT INTO test.looped_results SELECT * FROM UNNEST(result);

Note: I hope your example is for scripting learning purpose and not for production as whenever possible we should stick with set based processing which can be easily done in your case




回答3:


Here is a better way which is faster and without using a loop.

Basically, you form an array of suffix and do SELECT/INSERT in single query:

DECLARE date_range ARRAY<DATE> DEFAULT
  GENERATE_DATE_ARRAY(DATE '2019-10-01', DATE '2019-10-10', INTERVAL 1 DAY);

DECLARE suffix_array ARRAY<STRING> 
  DEFAULT (SELECT ARRAY_AGG(REGEXP_REPLACE(CAST(dates AS STRING),"-","")) 
           FROM UNNEST(date_range) dates);

CREATE OR REPLACE TABLE test.looped_results (Date DATE, Visits INT64);

INSERT INTO test.looped_results
SELECT Date, SUM(totals.visits)
FROM `project.dataset.ga_sessions_*`
WHERE _table_suffix in UNNEST(suffix_array);
GROUP BY Date;



回答4:


Having reviewed my code (several times!) I realized that I wasn't refreshing the variable which transforms the data into the table prefix within the loop.

Here is a working version of the script, where I set pfix at the end of the loop:

DECLARE d DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
DECLARE pfix STRING DEFAULT REGEXP_REPLACE(CAST(d AS STRING),"-","");
DECLARE vis_count INT64;
CREATE OR REPLACE TABLE test.looped_results (Date DATE, Visits INT64);

WHILE d > '2019-10-01' DO
  SET vis_count = (SELECT SUM(totals.visits) AS visits
                    FROM `project.dataset.ga_sessions_*`
                    WHERE _table_suffix = pfix);
  INSERT INTO test.looped_results VALUES (d, vis_count);
  SET d = DATE_SUB(d, INTERVAL 1 DAY);
  SET pfix = REGEXP_REPLACE(CAST(d AS STRING),"-","");
END WHILE;


来源:https://stackoverflow.com/questions/58267001/bq-scripting-writing-results-of-a-loop-to-a-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!