Use google bigquery to build histogram graph

前端未结

关注

 7  1090

再見小時候

How can write a query that makes histogram graph rendering easier?

For example, we have 100 million people with ages, we want to draw the histogram/buckets for age 0

相关标签:

7条回答

南笙

2020-12-17 22:26

With #standardSQL and an auxiliary stats query, we can define the range the histogram should look into.

Here for the time to fly between SFO and JFK - with 10 buckets:

WITH data AS ( 
    SELECT *, ActualElapsedTime datapoint
    FROM `fh-bigquery.flights.ontime_201903`
    WHERE FlightDate_year = "2018-01-01" 
    AND Origin = 'SFO' AND Dest = 'JFK'
)
, stats AS (
  SELECT min+step*i min, min+step*(i+1)max
  FROM (
    SELECT max-min diff, min, max, (max-min)/10 step, GENERATE_ARRAY(0, 10, 1) i
    FROM (
      SELECT MIN(datapoint) min, MAX(datapoint) max
      FROM data
    )
  ), UNNEST(i) i
)

SELECT COUNT(*) count, (min+max)/2 avg
FROM data 
JOIN stats
ON data.datapoint >= stats.min AND data.datapoint<stats.max
GROUP BY avg
ORDER BY avg

If you need round numbers, see: https://stackoverflow.com/a/60159876/132438

0 讨论(0)

攒了一身酷

2020-12-17 22:32

Write a subquery like this:

(SELECT '1' AS agegroup, count(*) FROM people WHERE AGE <= 10 AND AGE >= 0)

Then you can do something like this:

SELECT * FROM
(SELECT '1' AS agegroup, count(*) FROM people WHERE AGE <= 10 AND AGE >= 0),
(SELECT '2' AS agegroup, count(*) FROM people WHERE AGE <= 20 AND AGE >= 10),
(SELECT '3' AS agegroup, count(*) FROM people WHERE AGE <= 120 AND AGE >= 20)

Result will be like:

Row agegroup count 
1   1       somenumber
2   2       somenumber
3   3       another number

I hope this helps you. Of course in the age group you can write anything like: '0 to 10'

0 讨论(0)

梦谈多话

2020-12-17 22:33

See the 2019 update, with #standardSQL --Fh

The subquery idea works, as does "CASE WHEN" and then doing a group by:

SELECT SUM(field1), bucket 
FROM (
    SELECT field1, CASE WHEN age >=  0 AND age < 10 THEN 1
                        WHEN age >= 10 AND age < 20 THEN 2
                        WHEN age >= 20 AND age < 30 THEN 3
                        ...
                        ELSE -1 END as bucket
    FROM table1) 
GROUP BY bucket

Alternately, if the buckets are regular -- you could just divide and cast to an integer:

SELECT SUM(field1), bucket 
FROM (
    SELECT field1, INTEGER(age / 10) as bucket FROM table1)
GROUP BY bucket

0 讨论(0)

执笔经年

2020-12-17 22:46
Using a cross join to get your min and max values (not that expensive on a single tuple) you can get a normalized bucket list of any given bucket count:
```
select
  min(data.VAL) as min,
  max(data.VAL) as max,
  count(data.VAL) as num,
  integer((data.VAL-value.min)/(value.max-value.min)*8) as group
from [table] data
CROSS JOIN (SELECT MAX(VAL) as max, MIN(VAL) as min, from [table]) value
GROUP BY group
ORDER BY group 
```
in this example we're getting 8 buckets (pretty self explanatory) plus one for null VAL
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-17 22:48
There is now the APPROX_QUANTILES aggregation function in standard SQL.
```
SELECT
    APPROX_QUANTILES(column, number_of_bins)
...    
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
滥情空心

2020-12-17 22:52
You're looking for a single vector of information. I would normally query it like this:
```
select
  count(*) as num,
  integer( age / 10 ) as age_group
from mytable
group by age_group 
```
A big case statement will be needed for arbitrary groups. It would be simple but much longer. My example should be fine if every bucket contains N years.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页