问题
Given the install date of users, I would like to get the Firebase (1) Event Occurrences and (2) Event Distinct Users' Count for all our 200+ Firebase events on Day0 to Day30. I simulated the output table below (for D0-D30) in a screenshot, but the code is only for Day0-Day7.
(1) Event Occurrences
SELECT
event.name as event_name,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170802' THEN event_count END) AS D0_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170802' AND _TABLE_SUFFIX < '20170803' THEN event_count END) AS D1_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170803' AND _TABLE_SUFFIX < '20170804' THEN event_count END) AS D2_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170804' AND _TABLE_SUFFIX < '20170805' THEN event_count END) AS D3_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170805' AND _TABLE_SUFFIX < '20170806' THEN event_count END) AS D4_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170806' AND _TABLE_SUFFIX < '20170807' THEN event_count END) AS D5_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170807' AND _TABLE_SUFFIX < '20170808' THEN event_count END) AS D6_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170808' AND _TABLE_SUFFIX < '20170809' THEN event_count END) AS D7_USERS
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170809' AND
user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000;
and
(2) Event Distinct Users' Count
SELECT
event.name as event_name,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170802' THEN user_dim.app_info.app_instance_id END) AS D0_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170802' AND _TABLE_SUFFIX < '20170803' THEN user_dim.app_info.app_instance_id END) AS D1_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170803' AND _TABLE_SUFFIX < '20170804' THEN user_dim.app_info.app_instance_id END) AS D2_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170804' AND _TABLE_SUFFIX < '20170805' THEN user_dim.app_info.app_instance_id END) AS D3_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170805' AND _TABLE_SUFFIX < '20170806' THEN user_dim.app_info.app_instance_id END) AS D4_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170806' AND _TABLE_SUFFIX < '20170807' THEN user_dim.app_info.app_instance_id END) AS D5_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170807' AND _TABLE_SUFFIX < '20170808' THEN user_dim.app_info.app_instance_id END) AS D6_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170808' AND _TABLE_SUFFIX < '20170809' THEN user_dim.app_info.app_instance_id END) AS D7_USERS
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170809'
AND user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000
GROUP BY 1;
Questions:
- Is there a more optimised way to write this? For small amount of columns it makes sense (D0-D7), but for D0-D30 I thought there might be a better way. Any suggestions are much appreciated !
Final Answer after Mikhail's feedback:
I combined both queries in one query and created a pivot table thereafter. Remember to select "Standard SQL" in BigQuery editor before execution.
SELECT
event.name AS event_name,
_TABLE_SUFFIX as day,
COUNT(1) as event_occurances,
COUNT(DISTINCT user_dim.app_info.app_instance_id) as event_unique_users
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170901' AND
user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000
GROUP BY event_name, day
ORDER BY event_name;
Appendix Notes:
Timestamp Conversion of 1 Aug 2017
- Epoch timestamp: 1501545600
- Timestamp in milliseconds: 1501545600000
Timestamp Conversion of 2 Aug 2017
- Epoch timestamp: 1501632000
- Timestamp in milliseconds: 1501632000000
回答1:
Is there a more optimised way to write this?
1. One way to optimize this is to rewrite below
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170802' THEN event_count END) AS D0_USERS
to this
COUNTIF(_TABLE_SUFFIX = '20170801') AS D0_USERS
:o( You still will need to write this line 31 times for the D0-D30 case, but at least it is less heavy
2. Another (proper) way is to follow best practices and separate retrieval of data from data visualization
So you can do something like below to retrieve needed data
#standardSQL
SELECT
event.name AS event_name,
_TABLE_SUFFIX as day,
COUNT(1) as users
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170809' AND
user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000
GROUP BY event_name, day
Then you can pivot this result with whatever tool you prefer
For example, with BigQuery Mate without leaving UI you can get pivot that will look like below
As a quick disclosure - I am an author of the BigQuery Mate Chrome Extension
Please note: I have not adjusted or changed anyhow logic of your query - i just answered your specific question - Is there a more optimised way to write this?
来源:https://stackoverflow.com/questions/46550365/firebase-event-occurrences-for-new-installed-users-in-bigquery