BigQuery: error in SPLIT() returns

馋奶兔 提交于 2019-12-11 05:57:25

问题


I have a table TabA in BigQuery which has a column ColA and the column ColA has the following structure

1038627|21514184

and the table TabA has more than a million records. I used this to split into multiple columns

SELECT ColA,FIRST(SPLIT(ColA, '/')) part1,
       NTH(2, SPLIT(ColA, '/')) part2
FROM TabA

But for some reason, after certain rows the split seems to be not working properly.

We are getting records like this,

     ColA            part1   part2
1038627|21507470    1038627 21507470     
1038627|21534857    1038627 21507470     
1038627|21546455    1038627 21507470     
1038627|21577167    1038627 21507470

It his happening on a random basis. Not sure where is there error.

SELECT COUNT(*) FROM TabA - returns say 1.7M records


SELECT ColA,FIRST(SPLIT(ColA, '|')) part1, NTH(2, SPLIT(ColA, '|')) part2 FROM TabA - returns 1.7M records with the wrong split


SELECT FIRST(SPLIT(ColA, '|')) part1, NTH(2, SPLIT(ColA, '|')) part2 FROM TabA - returns just 1.4L records with correct split

Don't know what exactly is happening...is it the problem with the data or the problem with the split ??

Any help would be greatly appreciated. Thanks in advance!!


回答1:


is it the problem with the data or the problem with the split ??

To help in troubleshooting - I would recommend running same logic in BigQuery Standard SQL

#standardSQL
SELECT 
  ColA,
  SPLIT(ColA, '|')[SAFE_OFFSET(0)] AS part1,
  SPLIT(ColA, '|')[SAFE_OFFSET(1)] AS part2
FROM TabA


来源:https://stackoverflow.com/questions/44828808/bigquery-error-in-split-returns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!