Find out the amount of space each field takes in Google Big Query

只谈情不闲聊 提交于 2020-01-05 04:02:27

问题


I want to optimize the space of my Big Query and google storage tables. Is there a way to find out easily the cumulative space that each field in a table gets? This is not straightforward in my case, since I have a complicated hierarchy with many repeated records.


回答1:


You can do this in Web UI by simply typing (and not running) below query changing to field of your interest

SELECT <column_name>
FROM YourTable

and looking into Validation Message that consists of respective size

Important - you do not need to run it – just check validation message for bytesProcessed and this will be a size of respective column

Validation is free and invokes so called dry-run

If you need to do such “columns profiling” for many tables or for table with many columns - you can code this with your preferred language using Tables.get API to get table schema ; then loop thru all fields and build respective SELECT statement and finally Dry Run it (within the loop for each column) and get totalBytesProcessed which as you already know is the size of respective column




回答2:


I don't think this is exposed in any of the meta data. However, you may be able to easily get good approximations based on your needs. The number of rows is provided, so for some of the data types, you can directly calculate the size: https://cloud.google.com/bigquery/pricing

For types such as string, you could get the average length by querying e.g. the first 1000 fields, and use this for your storage calculations.



来源:https://stackoverflow.com/questions/39079195/find-out-the-amount-of-space-each-field-takes-in-google-big-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!