Casting not working correctly in Amazon Athena (Presto)?

不羁的心 提交于 2019-12-31 05:15:06

问题


I have a doctor license registry dataset which includes the total_submitted_charge_amount for each doctor as well as the number of entitlements with medicare & medicaid . I used the query from the answer suggested below:

    with datamart AS 
    (SELECT npi,
         provider_last_name,
         provider_first_name,
         provider_mid_initial,
         provider_address_1,
         provider_address_2,
         provider_city,
         provider_zipcode,
         provider_state_code,
         provider_country_code,
         provider_type,
         number_of_services,

        CASE
        WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN
        null
        ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
        END AS medicare_medicaid_entitlement,
        CASE
        WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN
        null
        ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
        END AS total_submitted_charge_amount
    FROM cmsaggregatepayment2017)
SELECT *
FROM datamart
ORDER BY  total_submitted_charge_amount DESC

Unfortunately I get the error

INVALID_CAST_ARGUMENT: Cannot cast VARCHAR '' to DECIMAL(38, 0)

This query ran against the aggregatepayment_data_2017 database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: be01d1e8-dc4d-4c75-a648-428dcb6be3a5." I have tried Decimal, Real, Big int and nothing works for casting num_entitlement_medicare_medicaid. Below is a screenshot of how the data looks like:

Can someone please suggest how to rephrase this query?


回答1:


Instead of putting cast/replace in your queries, you could convert the data into a new table with 'clean' data:

CREATE TABLE clean_table
WITH (format='Parquet', external_location='s3://my_bucket/clean_data/')
AS
SELECT
  npi,
  provider_last_name,
  provider_first_name,
  ...
  CASE WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN null
       ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
       END AS medicare_medicaid_entitlement,
  CASE WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN null
       ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
       END AS total_submitted_charge_amount
  FROM cmsaggregatepayment2017

You can the SELECT ... FROM clean_table without having to do any conversions.

In data warehousing, this type of process is known as ETL (Extract, Transform, Load). The cleaning process is the 'transform' to convert the data into a more useful format.

See: CREATE TABLE AS - Amazon Athena




回答2:


The reason you are getting error is you have blank value(but it is not null) in the column and we cannot cast varchar '' as decimal. You can probably use case statement. Also as per the data set column num_entitlement_medicare_medicaid has comma ',' in it which you are not replacing.

    SELECT npi, 
case
when REPLACE(num_entitlement_medicare_medicaid,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(num_entitlement_medicare_medicaid,'[^0-9.]', '') AS DECIMAL)
end as medicare_medicaid_entitlement,
case 
when REPLACE(total_submitted_charge_amount,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(total_submitted_charge_amount,'[^0-9.]', '') AS DECIMAL)
end as total_submitted_charge_amount
FROM cmsaggregatepayment2017


来源:https://stackoverflow.com/questions/59132598/casting-not-working-correctly-in-amazon-athena-presto

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!