Compare Tables in BigQuery

瘦欲@ 提交于 2019-12-03 17:00:54

Now that I have your actual sample dataset, I can write a query that finds every domain in one table that is not on the other table:

https://bigquery.cloud.google.com/table/inbound-acolyte-377:demo.1024 has 24,729,816 rows. https://bigquery.cloud.google.com/table/inbound-acolyte-377:demo.1025 has 24,732,640 rows.

Let's look at everything in 1025 that is not in 1024:

SELECT a.domain
FROM [inbound-acolyte-377:demo.1025] a
LEFT OUTER JOIN EACH [inbound-acolyte-377:demo.1024] b
ON a.domain = b.domain
WHERE b.domain IS NULL

Result: 39,629 rows. (8.1s elapsed, 2.04 GB processed)

To get the differences (given that tkey is your unique row identifier):

SELECT a.tkey, a.name, b.name
FROM [your.tableold] a
JOIN EACH [your.tablenew] b
ON a.tkey = b.tkey
WHERE a.name != b.name
LIMIT 100

For the new rows, one way is the one you proposed:

SELECT col1, col2
FROM table2
WHERE col1 NOT IN
  (SELECT col1 FROM Table1)

(you'll have to switch to a JOIN EACH when Table1 gets too large)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!