spliting dataset for training and evaluation in Bigquery ML

ε祈祈猫儿з 提交于 2021-01-29 13:36:44

问题


Does the BigQuery ML automatically split the dataset for training and evaluation? Or do we have to get manually 80% datset for training, 10% for validation and 10% for evaluation with logistic Regression BigQuery ML? If both are affirmative, which of these would be better?

Thanks


回答1:


Yes, BigQuery ML will automatically split data for it's validation processes. It would also be fairly common practice for you to manually split a holdout set to perform some additional validation on data that the model has never seen.

You can use the DATA_SPLIT_METHOD argument to tell BigQuery ML how you want to split the data. The default split is AUTO_SPLIT which is defined as follows:

When there are fewer than 500 rows in the input data, all rows are used as training data. When there are between 500 and 50,000 rows in the input data, 20% of the data is used as evaluation data in a RANDOM split. When there are more than 50,000 rows in the input data, only 10,000 of them are used as evaluation data in a RANDOM split.

For more information I would recommend reading over the official documentation.



来源:https://stackoverflow.com/questions/58913361/spliting-dataset-for-training-and-evaluation-in-bigquery-ml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!