google-bigquery

BigQuery - querying only a subset of keys in a table with key value schema

青春壹個敷衍的年華 提交于 2020-01-14 03:52:11
问题 So I have a table with the following schema: timestamp: TIMESTAMP key: STRING value: FLOAT There are around 200 unique keys. I am partitioning the dataset by date. I want to run several (5-6 currently, but I expect to add at least 15 more) queries on a daily basis on this database. Brute forcing these would cost me a lot daily, which I want to avoid. The issue is that because of this key - value format, and BigQuery being a columnar database, each query queries the whole day's data, despite

Tag huge list of elements with lat/long with large list of geolocation data

夙愿已清 提交于 2020-01-13 20:46:34
问题 I have a huge list of geolocation events: Event (1 billion) ------ id datetime lat long And a list of point of interest loaded from open street map: POI (1 million) ------ id tag (shop, restaurant, etc.) lat long I would like to assign to each to each event the tag of the point of interest. What is the best architecture to achieve this problem? We tried using Google BigQuery but we have to do a cross join and it does not work. We are open to use any other big data system. 回答1: Using Dataflow

SQL convert yyyymmdd to timestamp in BigQuery?

丶灬走出姿态 提交于 2020-01-13 20:31:07
问题 I'm attempting to convert a string to a timestamp within SQL. The question is really quite simple, how can I convert this string into a timestamp that starts at midnight on that day? Within my database I also have a field stored in timestamp_micros either one of these could work and I think converting the micros to a timestamp would be easier than the string. For example 20170118 => timestamp Query: WITH allTables as ( SELECT event.date as date, count(*) as totalSessions, count(DISTINCT user

Is “count distinct” exact with BigQuery new standard SQL syntax?

天涯浪子 提交于 2020-01-13 19:53:13
问题 With the legacy BigQuery syntax, we have to use the exact_count_distinct function if we want to have the exact number of distinct values for a field. With the Standard SQL 2011 syntax, I wonder if " count(distinct myfield )" will always return the exact number of distinct values if I don't select the 'Use Legacy SQL' option. 回答1: COUNT(DISTINCT input) gives an exact count in standard SQL. One important distinction is that COUNT(DISTINCT input) is more scalable than EXACT_COUNT_DISTINCT(input)

What are the pros and cons of loading data directly into Google BigQuery vs going through Cloud Storage first?

匆匆过客 提交于 2020-01-13 19:13:06
问题 Also, is there anything wrong with doing transforms/joins directly within BigQuery? I'd like to minimize the number of components and steps involved for a data warehouse I'm setting up (simple transaction and inventory data for a chain of retail stores.) 回答1: Loading data via Cloud Storage is the fastest (and the cheapest) way. Loading directly can be done via app (using streaming insert which add some additional cost) For the doing transformation - if what are you plan/need to do can be done

How to obtain the most recent row per type and perform calculations, depending on the row type?

一个人想着一个人 提交于 2020-01-13 16:29:52
问题 I need some help writing/optimizing a query to retrieve the latest version of each row by type and performing some calculations depending on the type. I think would be best if I illustrate it with an example. Given the following dataset: +-------+-------------------+---------------------+-------------+---------------------+--------+----------+ | id | event_type | event_timestamp | message_id | sent_at | status | rate | +-------+-------------------+---------------------+-------------+---------

How to convert an Epoch timestamp to a Date in Standard SQL

余生长醉 提交于 2020-01-13 08:38:07
问题 I didn't find any simple answer to this while I was looking around, so I thought I'd put it up here in case anyone was having the same problem as me with what should have been a trivial issue. I was using ReDash analytics with Google's BigQuery and had turned on Standard SQL in the datasource settings. For the purposes of my query, I needed to convert a timestamp - unix time in milliseconds, as a string - to a Date format so that I could use the DATE_DIFF method. As an example...

Is user_pseudo_id the same as a a session id? How to group all events by session? - Firebase BigQuery

岁酱吖の 提交于 2020-01-13 05:54:30
问题 I have an iOS App. I am trying to figure out how users move through my app. I am looking for a way to group all the events by some sort of session id. I assumed all fireBase events would have a session id. This does not seem to be the case. I noticed there is a user_pseudo_id . I did some testing, where I logged an event that only I could ever have created. I noticed that sometimes the user_pseudo_id changes. Any idea what triggers a new id? I restarted and deleted/reinstalled the app many

what's the practical difference between google datastore nosql and google bigquery sql?

主宰稳场 提交于 2020-01-13 03:04:30
问题 I want to know how to evaluate one tool over another. My major concern is as following: In google datastore, we define 'kind'. Each 'entities' has 'properties'. Then the datastore backends use those properties to index data for future query. The query itself use almost the same idea in SQL, though different syntax, to filter data and find what we want. If you index every property, the index metadata would be even bigger than real data. Google bigquery uses it's dialect of SQL. And it's fully

what's the practical difference between google datastore nosql and google bigquery sql?

杀马特。学长 韩版系。学妹 提交于 2020-01-13 03:03:05
问题 I want to know how to evaluate one tool over another. My major concern is as following: In google datastore, we define 'kind'. Each 'entities' has 'properties'. Then the datastore backends use those properties to index data for future query. The query itself use almost the same idea in SQL, though different syntax, to filter data and find what we want. If you index every property, the index metadata would be even bigger than real data. Google bigquery uses it's dialect of SQL. And it's fully