How to perform linear regression in BigQuery?

前端 未结 3 2013
北荒
北荒 2020-12-09 06:29

BigQuery has some statistical aggregation functions such as STDDEV(X) and CORR(X, Y), but it doesn\'t offer functions to directly perform linear regression.

How can

3条回答
  •  死守一世寂寞
    2020-12-09 07:10

    Here the code to create a linear regression model using the public dataset on natality (live births) and to generate this into a dataset named demo_ml_bq. This must be created before running the below statement.

    %%bq query
    CREATE or REPLACE MODEL demo_bq_ml.babyweight_model_asis
    OPTIONS
      (model_type='linear_reg', labels=['weight_pounds']) AS
    
    WITH natality_data AS (
      SELECT
         weight_pounds, -- this is the label; because it is continuous, we need to use regression
        CAST(is_male AS STRING) AS is_male,
        mother_age,
        CAST(plurality AS STRING) AS plurality,
        gestation_weeks,
        CAST(alcohol_use AS STRING) AS alcohol_use,
        CAST(year AS STRING) AS year,
        ABS(FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING)))) AS hashmonth
      FROM
        publicdata.samples.natality
      WHERE
        year > 2000
        AND gestation_weeks > 0
        AND mother_age > 0
        AND plurality > 0
        AND weight_pounds > 0
    )
    
    SELECT
        weight_pounds,
        is_male,
        mother_age,
        plurality,
        gestation_weeks,
        alcohol_use,
        year
    FROM
        natality_data
    WHERE
      MOD(hashmonth, 4) < 3  -- select 75% of the data as training
    

提交回复
热议问题