google-bigquery

Counting substrings in a string

时光怂恿深爱的人放手 提交于 2020-01-04 19:56:19
问题 How can I count the numbers of times a substring shows up in a string? In this case, I'm looking for every time " connect.facebook.net/en_US/all.js " shows up in the HTML bodies of the top 300K internet sites (stored in httparchive). 回答1: You could use SPLIT() on the string, and count the number of records produced: SELECT fb_times, COUNT(*) n_pages FROM (SELECT COUNT(splits)-1 WITHIN RECORD AS fb_times FROM (SELECT SPLIT(body, 'connect.facebook.net/en_US/all.js') splits FROM [httparchive

Semijoin expression must be a part of logical AND

Deadly 提交于 2020-01-04 14:37:32
问题 I have a table (we can call it "A") with some fields (model:string, age:integer, code1:integer, code2:integer, code3:integer) and another table (it can be "codes") with classified codes (code:integer,codetype:string,description:string). That field codetype is there in order to group codes. For example, codes between 200 and 300 are brown. And every item can have up to 3 codes. Now, I just want to get that simple, simple query: SELECT model,age FROM dataset.A WHERE code1 IN (SELECT code FROM

Activities to avoid big query warm-up occurence

半世苍凉 提交于 2020-01-04 09:06:17
问题 On the streaming data article, it is mentioned "After several hours of inactivity, the warm-up period will occur again during the next insert." https://developers.google.com/bigquery/streaming-data-into-bigquery What are those activities that can keep the connection warm? I'll be writing connection pool and it is expected to provide bigquery object that can put data in without any warm up period. 回答1: The data goes in whether it's warming up or not. It's only added to a queue to get inserted

Activities to avoid big query warm-up occurence

China☆狼群 提交于 2020-01-04 09:05:28
问题 On the streaming data article, it is mentioned "After several hours of inactivity, the warm-up period will occur again during the next insert." https://developers.google.com/bigquery/streaming-data-into-bigquery What are those activities that can keep the connection warm? I'll be writing connection pool and it is expected to provide bigquery object that can put data in without any warm up period. 回答1: The data goes in whether it's warming up or not. It's only added to a queue to get inserted

More detailed error messages from Node.js BigQuery client library

浪尽此生 提交于 2020-01-04 07:37:10
问题 I'm using the official Google Node connector to BigQuery. I have the following snippet to stream records into the database: module.exports.sendToBigQuery = (rows) => { bigquery .dataset(DATASET_NAME) .table(TABLE_NAME) .insert(rows) .catch(err => { if (err && err.name === 'PartialFailureError') { if (err.errors && err.errors.length > 0) { console.log('Insert errors:'); err.errors.forEach(err => console.error(err)); } } else { console.error('ERROR:', err); } }); }; Unfortunately, whenever my

Lead & Analytical Functions in BigQuery

我的未来我决定 提交于 2020-01-04 06:07:53
问题 Assume my table is this I am trying to modify my table with this information I have added two columns where column WhenWasLastBasicSubjectDone will let you know when in which semester the student completed his latest Basic Course (sorted by Semester). The other column TotalBasicSubjectsDoneTillNow explains how many times had the student completed Basic Course(Subject) till now (sorted by Semester) ? I think this is easy to solve with Joins as well as with UDFs but I want to use the power of

BigQuery - BackEnd error when loading from JAVA API

拥有回忆 提交于 2020-01-04 04:42:08
问题 I'm getting 503 - Backend error when trying to set up a load job using JAVA API. The file I am trying to load is on Google Cloud Storage, and if I try to load the data from BigQuery web interface, telling that I want to load from Google Cloud Storage, everything works. Probably there is something wrong with how I use APIs. Here is the code: String url = "[GOOGLE_CLOUD_STORAGE_URL]; Job job = new Job(); JobConfiguration config = new JobConfiguration(); JobConfigurationLoad loadConfig = new

How to do repeatable sampling in BigQuery Standard SQL?

感情迁移 提交于 2020-01-04 03:54:07
问题 In this blog a Google Cloud employee explains how to do repeatable sampling of data sets for machine learning in BigQuery. This is very important for creating (and replicating) train/validation/test partitions of your data. However the blog uses Legacy SQL, which Google has now deprecated in favor of Standard SQL. How would you re-write the blog's sampling code shown below, but using Standard SQL? #legacySQL SELECT date, airline, departure_airport, departure_schedule, arrival_airport, arrival

NativeApplicationClient and OAuth2Authenticator not resolved

可紊 提交于 2020-01-04 03:29:04
问题 I am writing a Console Application to download data from BigQuery. Once again, the .NET library is vague and confusing. In this question, two Google employees have posted a response and neither of the responses is working on my machine because they haven't quite made it clear which references they are using. I paste the code once again and elaborate: using DotNetOpenAuth.OAuth2; using Google.Apis.Authentication.OAuth2; using Google.Apis.Authentication.OAuth2.DotNetOpenAuth; using Google.Apis

NativeApplicationClient and OAuth2Authenticator not resolved

点点圈 提交于 2020-01-04 03:28:20
问题 I am writing a Console Application to download data from BigQuery. Once again, the .NET library is vague and confusing. In this question, two Google employees have posted a response and neither of the responses is working on my machine because they haven't quite made it clear which references they are using. I paste the code once again and elaborate: using DotNetOpenAuth.OAuth2; using Google.Apis.Authentication.OAuth2; using Google.Apis.Authentication.OAuth2.DotNetOpenAuth; using Google.Apis