问题
I have a doctor license registry dataset which includes the total_submitted_charge_amount for each doctor as well as the number of entitlements with medicare & medicaid . I used the query from the answer suggested below:
with datamart AS
(SELECT npi,
provider_last_name,
provider_first_name,
provider_mid_initial,
provider_address_1,
provider_address_2,
provider_city,
provider_zipcode,
provider_state_code,
provider_country_code,
provider_type,
number_of_services,
CASE
WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE
WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017)
SELECT *
FROM datamart
ORDER BY total_submitted_charge_amount DESC
Unfortunately I get the error
INVALID_CAST_ARGUMENT: Cannot cast VARCHAR '' to DECIMAL(38, 0)
This query ran against the aggregatepayment_data_2017
database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: be01d1e8-dc4d-4c75-a648-428dcb6be3a5
." I have tried Decimal, Real, Big int and nothing works for casting num_entitlement_medicare_medicaid. Below is a screenshot of how the data looks like:
Can someone please suggest how to rephrase this query?
回答1:
Instead of putting cast/replace in your queries, you could convert the data into a new table with 'clean' data:
CREATE TABLE clean_table
WITH (format='Parquet', external_location='s3://my_bucket/clean_data/')
AS
SELECT
npi,
provider_last_name,
provider_first_name,
...
CASE WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017
You can the SELECT ... FROM clean_table
without having to do any conversions.
In data warehousing, this type of process is known as ETL (Extract, Transform, Load). The cleaning process is the 'transform' to convert the data into a more useful format.
See: CREATE TABLE AS - Amazon Athena
回答2:
The reason you are getting error is you have blank value(but it is not null) in the column and we cannot cast varchar '' as decimal. You can probably use case statement. Also as per the data set column num_entitlement_medicare_medicaid has comma ',' in it which you are not replacing.
SELECT npi,
case
when REPLACE(num_entitlement_medicare_medicaid,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(num_entitlement_medicare_medicaid,'[^0-9.]', '') AS DECIMAL)
end as medicare_medicaid_entitlement,
case
when REPLACE(total_submitted_charge_amount,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(total_submitted_charge_amount,'[^0-9.]', '') AS DECIMAL)
end as total_submitted_charge_amount
FROM cmsaggregatepayment2017
来源:https://stackoverflow.com/questions/59132598/casting-not-working-correctly-in-amazon-athena-presto