BigQuery subselect in JOIN does not recognize fields

六月ゝ 毕业季﹏ 提交于 2020-01-05 03:33:10

问题


I've got a table with multiple dated snapshots per user and a table with the latest date of the snapshot for each user (generated via a query).

I've tried a number of variations to get a simple join of the two to work but I'm having no luck. I want to select all records from the snapshots table that match the user id and date from the other table.

I've gota variety of errors, but this is the latest (sub-selects and renames were done to debug what field might be causing the problem):

SELECT t1.uuid, t1.username, t1.d 
  FROM (SELECT uuid, username, date AS d FROM [Activity.user_snapshots]) as t1
  JOIN EACH (SELECT uuid, date AS dg FROM [Activity.latest_snapshots]) as t2
  ON t1.uuid = t2.uuid AND t1.d = t2.dg;

The error response that I get in this case is:

Error: Field 'dg' not found in table '__S0'.

When I've tried the much more straight-forward query:

SELECT t1.uuid, t1.username, t1.date
  FROM [Activity.user_snapshots] as t1
  JOIN EACH [Activity.latest_snapshots] as t2
  ON t1.uuid = t2.uuid AND t1.date = t2.date;

I get this error:

Error: Field date from table __S0 is not a leaf field.

Any ideas?


回答1:


There is a bug joining on timestamp values. If you coerce them to their underlying microsecond values, you should be good. This query should work:

SELECT t1.uuid, t1.username, USEC_TO_TIMESTAMP(t1.d)
  FROM (
    SELECT uuid, username, TIMESTAMP_TO_USEC(date) AS d 
    FROM [Activity.user_snapshots]) as t1
  JOIN EACH (
    SELECT uuid, TIMESTAMP_TO_USEC(date) AS dg 
    FROM [Activity.latest_snapshots]) as t2
  ON t1.uuid = t2.uuid AND t1.d = t2.dg;



回答2:


In case it's helpful to anyone else. The problem that I was having was that when I created the latest_snapshots table since I had to convert the STRING date field into a timestamp to do a MAX operator on it, it was saved to the resulting table as a timestamp object.

So the error messages are misleading. Annoyingly, I had to create a new table where I converted the timestamp back into a string object since there was no way to do that in the JOIN-ON clause.

If anyone knows how to do all of this is one query without all the extra table creation, that would be cool. Thus far, my attempts to do it with sub-selects have failed.

Note the join on timestamp issue was fixed in a previous release; please let us know if you continue to see problems with it.



来源:https://stackoverflow.com/questions/17054338/bigquery-subselect-in-join-does-not-recognize-fields

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!