Random Sampling in Google BigQuery

前端 未结 4 532
攒了一身酷
攒了一身酷 2020-11-29 21:48

I just discovered that the RAND() function, while undocumented, works in BigQuery. I was able to generate a (seemingly) random sample of 10 words from the Shakespeare datase

4条回答
  •  青春惊慌失措
    2020-11-29 22:29

    Great to know RAND() is available!

    In my case I needed a predefined sample size. Instead of needing to know the total number of rows and do the division sample size over total rows, I'm using the following query:

    SELECT word, rand(5) as rand
    FROM [publicdata:samples.shakespeare]
    order by rand
    #Sample size needed = 10
    limit 10
    

    Summarizing, I use ORDER BY + LIMIT to ramdomize and then extract a defined number of samples.

提交回复
热议问题