google-bigquery | 易学教程

How to get the correct centroid of a bigquery polygon with st_centroid

阅读更多关于 How to get the correct centroid of a bigquery polygon with st_centroid

问题 I'm having some trouble with the ST_CENTROID function in bigquery. There is a difference between getting the centroid of a GEOGRAPHY column and from the same WKT version of the column. The table is generated using a bq load with a geography column and a newline_delimited_json file containing the polygon as wkt text. Example: select st_centroid(polygon) loc, st_centroid(ST_GEOGFROMTEXT(st_astext(polygon))) loc2,polygon from table_with_polygon Result: POINT(-174.333247842246 -51.6549479435566)

BigQuery: JOIN ON with repeated / array STRUCT field in Standard SQL?

阅读更多关于 BigQuery: JOIN ON with repeated / array STRUCT field in Standard SQL?

问题 I have basically two tables, Orders and Items . As these tables are imported from Google Cloud Datastore backup files, references are not made by a simple ID field, but a <STRUCT> for one-to-one relationship, where its id field represents the actual unique ID I want to match. For one-to-many relationship (REPEATED), the schema uses ARRAY of <STRUCT> . I can query the one-to-one relationships with a LEFT OUTER JOIN, I also know how to join on a non-repeated struct and a repeated string or int,

Take one table as input and output using another table BigQuery

阅读更多关于 Take one table as input and output using another table BigQuery

问题 I have one table and want to use it as my input for a query pulling from another table: input table: +----------+--------+ | item | period | +----------+--------+ | HD.4TB | 6 | | 12333445 | 7 | | 12344433 | 5 | +----------+--------+ And I'm using this query to use the input: SELECT snapshot, item_name, commodity_code, planning_category, type, SUM(quantity) qty, sdm_month_start_date, FROM planning_extract WHERE planning_category IN (SELECT item FROM input) GROUP BY snapshot, item_name,

Error connecting to BigQuery from Dataproc with Datalab using BigQuery Spark connector (Error getting access token from metadata server at)

阅读更多关于 Error connecting to BigQuery from Dataproc with Datalab using BigQuery Spark connector (Error getting access token from metadata server at)

问题 I have BigQuery table, Dataproc cluster (with Datalab) and I follow this guide: https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example bucket = spark._jsc.hadoopConfiguration().get("fs.gs.system.bucket") project = spark._jsc.hadoopConfiguration().get("fs.gs.project.id") # Set an input directory for reading data from Bigquery. todays_date = datetime.strftime(datetime.today(), "%Y-%m-%d-%H-%M-%S") input_directory = "gs://{}/tmp/bigquery-{}".format(bucket, todays_date)

How do i calculate an average time using standardSQL

阅读更多关于 How do i calculate an average time using standardSQL

问题 At the moment I have a column in a table with this information: For example: 00:11:35 00:20:53 00:17:52 00:06:41 And I need to display the average of that time. These times would give an average of 00:14:15. How to do that? Ah, I'm trying to display this in Metabase, so I'd need a conversion form where after averaging the time it was converted to string. So maybe it's not that simple. The structure of field is: Table Field: tma (type time) 回答1: Below is for BigQuery Standard SQL #standardSQL

Restructure table and check for values

阅读更多关于 Restructure table and check for values

问题 I have one table (Table 1) which looks like below- keys AAB12B34 CC34DE5W SEF5C6T4 SQA7ZZ87 LM24NO3P X34YY78Z And another table (Table 2) which looks like below- category_id category_name associated_keys 111 Books CC34DE5W|SQA7ZZ87|LM24NO3P 222 Office LM24NO3P|AAB12B34 444 Furniture X34YY78Z|LM24NO3P|SQA7ZZ87|SEF5C6T4|CC34DE5W|AAB12B34 222 Office X34YY78Z I want to do 2 tasks- Task 1: At any given point I want to have only one row for each category_id. If there are 2 rows (meaning if the id

loading avro files with different schemas into one bigquery table

阅读更多关于 loading avro files with different schemas into one bigquery table

问题 I have a set of avro files with slightly varying schemas which I'd like to load into one bq table. Is there a way to do that with one line? Every automatic way to handle schema difference would be fine for me. Here is what I tried so far. 0) If I try to do it in a straightforward way, bq fails with error: bq load --source_format=AVRO myproject:mydataset.logs gs://mybucket/logs/* Waiting on bqjob_r4e484dc546c68744_0000015bcaa30f59_1 ... (4s) Current status: DONE BigQuery error in load

Using external data sources in BQ with specific generation from Google Storage

阅读更多关于 Using external data sources in BQ with specific generation from Google Storage

问题 I want to use external data sources in a BQ select statement with not the latest but a specific generation of a file from Google Cloud Storage. I currently use the following: val sourceFile = "gs://test-bucket/flights.csv" val queryConfig = QueryJobConfiguration.newBuilder(query) .addTableDefinition("tmpTable", ExternalTableDefinition.newBuilder(sourceFile, schema, format) .setCompression("GZIP") .build()) .build(); bigQuery.query(queryConfig) I tried to set the sourceFile variable as follows

Google big query API returns “too many free query bytes scanned for this project”

阅读更多关于 Google big query API returns “too many free query bytes scanned for this project”

问题 I am using Google's big query API to retrieve results from their n-gram dataset. So I send multiple queries of "SELECT ngram from trigram_dataset where ngram == 'natural language processing'". I'm basically using the same code posted here (https://developers.google.com/bigquery/bigquery-api-quickstart) replaced with my query statement. On every program run, I have to get a new code of authorization and type it in the console, which gives authorization to my program to send queries to google

Is it possible to have both Pub/Sub and BigQuery as inputs in Google Dataflow?

阅读更多关于 Is it possible to have both Pub/Sub and BigQuery as inputs in Google Dataflow?

问题 In my project, I am looking to use a streaming pipeline in Google Dataflow in order to process Pub/Sub messages. In cleaning the input data, I am looking to also have a side input from BigQuery. This has presented a problem that will cause one of the two inputs to not work. I have set in my Pipeline options for streaming=True, which allows the Pub/Sub inputs to process properly. But BigQuery is not compatible with streaming pipelines (see link below): https://cloud.google.com/dataflow/docs