google-bigquery

How to get the correct centroid of a bigquery polygon with st_centroid

可紊 提交于 2021-01-28 18:47:55
问题 I'm having some trouble with the ST_CENTROID function in bigquery. There is a difference between getting the centroid of a GEOGRAPHY column and from the same WKT version of the column. The table is generated using a bq load with a geography column and a newline_delimited_json file containing the polygon as wkt text. Example: select st_centroid(polygon) loc, st_centroid(ST_GEOGFROMTEXT(st_astext(polygon))) loc2,polygon from table_with_polygon Result: POINT(-174.333247842246 -51.6549479435566)

BigQuery: JOIN ON with repeated / array STRUCT field in Standard SQL?

♀尐吖头ヾ 提交于 2021-01-28 13:39:53
问题 I have basically two tables, Orders and Items . As these tables are imported from Google Cloud Datastore backup files, references are not made by a simple ID field, but a <STRUCT> for one-to-one relationship, where its id field represents the actual unique ID I want to match. For one-to-many relationship (REPEATED), the schema uses ARRAY of <STRUCT> . I can query the one-to-one relationships with a LEFT OUTER JOIN, I also know how to join on a non-repeated struct and a repeated string or int,

Take one table as input and output using another table BigQuery

被刻印的时光 ゝ 提交于 2021-01-28 12:14:49
问题 I have one table and want to use it as my input for a query pulling from another table: input table: +----------+--------+ | item | period | +----------+--------+ | HD.4TB | 6 | | 12333445 | 7 | | 12344433 | 5 | +----------+--------+ And I'm using this query to use the input: SELECT snapshot, item_name, commodity_code, planning_category, type, SUM(quantity) qty, sdm_month_start_date, FROM planning_extract WHERE planning_category IN (SELECT item FROM input) GROUP BY snapshot, item_name,

Error connecting to BigQuery from Dataproc with Datalab using BigQuery Spark connector (Error getting access token from metadata server at)

本小妞迷上赌 提交于 2021-01-28 12:10:25
问题 I have BigQuery table, Dataproc cluster (with Datalab) and I follow this guide: https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example bucket = spark._jsc.hadoopConfiguration().get("fs.gs.system.bucket") project = spark._jsc.hadoopConfiguration().get("fs.gs.project.id") # Set an input directory for reading data from Bigquery. todays_date = datetime.strftime(datetime.today(), "%Y-%m-%d-%H-%M-%S") input_directory = "gs://{}/tmp/bigquery-{}".format(bucket, todays_date)

How do i calculate an average time using standardSQL

ぃ、小莉子 提交于 2021-01-28 12:03:03
问题 At the moment I have a column in a table with this information: For example: 00:11:35 00:20:53 00:17:52 00:06:41 And I need to display the average of that time. These times would give an average of 00:14:15. How to do that? Ah, I'm trying to display this in Metabase, so I'd need a conversion form where after averaging the time it was converted to string. So maybe it's not that simple. The structure of field is: Table Field: tma (type time) 回答1: Below is for BigQuery Standard SQL #standardSQL

Restructure table and check for values

ε祈祈猫儿з 提交于 2021-01-28 07:51:43
问题 I have one table (Table 1) which looks like below- keys AAB12B34 CC34DE5W SEF5C6T4 SQA7ZZ87 LM24NO3P X34YY78Z And another table (Table 2) which looks like below- category_id category_name associated_keys 111 Books CC34DE5W|SQA7ZZ87|LM24NO3P 222 Office LM24NO3P|AAB12B34 444 Furniture X34YY78Z|LM24NO3P|SQA7ZZ87|SEF5C6T4|CC34DE5W|AAB12B34 222 Office X34YY78Z I want to do 2 tasks- Task 1: At any given point I want to have only one row for each category_id. If there are 2 rows (meaning if the id

loading avro files with different schemas into one bigquery table

早过忘川 提交于 2021-01-28 07:51:08
问题 I have a set of avro files with slightly varying schemas which I'd like to load into one bq table. Is there a way to do that with one line? Every automatic way to handle schema difference would be fine for me. Here is what I tried so far. 0) If I try to do it in a straightforward way, bq fails with error: bq load --source_format=AVRO myproject:mydataset.logs gs://mybucket/logs/* Waiting on bqjob_r4e484dc546c68744_0000015bcaa30f59_1 ... (4s) Current status: DONE BigQuery error in load

Using external data sources in BQ with specific generation from Google Storage

梦想的初衷 提交于 2021-01-28 05:40:30
问题 I want to use external data sources in a BQ select statement with not the latest but a specific generation of a file from Google Cloud Storage. I currently use the following: val sourceFile = "gs://test-bucket/flights.csv" val queryConfig = QueryJobConfiguration.newBuilder(query) .addTableDefinition("tmpTable", ExternalTableDefinition.newBuilder(sourceFile, schema, format) .setCompression("GZIP") .build()) .build(); bigQuery.query(queryConfig) I tried to set the sourceFile variable as follows

Google big query API returns “too many free query bytes scanned for this project”

前提是你 提交于 2021-01-28 05:00:02
问题 I am using Google's big query API to retrieve results from their n-gram dataset. So I send multiple queries of "SELECT ngram from trigram_dataset where ngram == 'natural language processing'". I'm basically using the same code posted here (https://developers.google.com/bigquery/bigquery-api-quickstart) replaced with my query statement. On every program run, I have to get a new code of authorization and type it in the console, which gives authorization to my program to send queries to google

Is it possible to have both Pub/Sub and BigQuery as inputs in Google Dataflow?

最后都变了- 提交于 2021-01-28 03:02:39
问题 In my project, I am looking to use a streaming pipeline in Google Dataflow in order to process Pub/Sub messages. In cleaning the input data, I am looking to also have a side input from BigQuery. This has presented a problem that will cause one of the two inputs to not work. I have set in my Pipeline options for streaming=True, which allows the Pub/Sub inputs to process properly. But BigQuery is not compatible with streaming pipelines (see link below): https://cloud.google.com/dataflow/docs