amazon-athena | 易学教程

Unable to locate hive jars to connect to metastore : while using pyspark job to connect to athena tables

阅读更多关于 Unable to locate hive jars to connect to metastore : while using pyspark job to connect to athena tables

问题 We are using sagemaker instance to connect to EMR in AWS. We are having some pyspark scripts that unloads athena tables and processes them as part of pipeline. We are using athena tables using glue catalog but when we try to run the job via spark submit, our job fails Code snippet from pyspark import SparkContext, SparkConf from pyspark.context import SparkContext from pyspark.sql import Row, SQLContext, SparkSession import pyspark.sql.dataframe def process_data(): conf = SparkConf()

Presto (Athena) loading of a CSV file with quote-escaped commas

阅读更多关于 Presto (Athena) loading of a CSV file with quote-escaped commas

问题 Consider the following row in a CSV file: 1,0,True,"{""foo"":null,""bar"":null}",0,1 ▲ The highlighted , is part of a column . That is, this full text: " {""foo"":null,""bar"":null}" is the value of a single column. However AWS Athena is interpreting the highlighted , as a column-delimiting comma , incorrectly splitting that text into multiple columns. I know I could change the column delimiter to something else to avoid this problem. My question is: Is this a bug in AWS Athena / Presto? How

How S3 select pricing works? What is data returned and scanned in s3 select means

阅读更多关于 How S3 select pricing works? What is data returned and scanned in s3 select means

问题 I have a 1M rows of CSV data. select 10 rows, Will I be billed for 10 rows. What is data returned and data scanned means in S3 Select? There is less documentation on these terms of S3 select 回答1: To keep things simple lets forget for some time that S3 reads in a columnar way. Suppose you have the following data: | City | Last Updated Date | |------------|---------------------| | London | 1st Jan | | London | 2nd Jan | | New Delhi | 2nd Jan | A query for fetching the latest update date forces

Presto check if NULL and return default (NVL analog)

阅读更多关于 Presto check if NULL and return default (NVL analog)

问题 Is there any analog of NVL in Presto DB? I need to check if a field is NULL and return a default value. I solve this somehow like this: SELECT CASE WHEN my_field is null THEN 0 ELSE my_field END FROM my_table But I'm curious if there is something that could simplify this code. 回答1: The ISO SQL function for that is COALESCE coalesce(my_field,0) https://prestodb.io/docs/current/functions/conditional.html P.S. COALESCE can be used with multiple arguments. It will return the first (from the left)

AWS Athena (Presto) how to transpose map to columns

阅读更多关于 AWS Athena (Presto) how to transpose map to columns

问题 AWS Athena query question; I have a nested map in my rows, of which I would like to transpose the keys to columns. I could name the columns explicitly like items['label_a'] , but in this case the keys are actually dynamic... From these rows: {id=1, items={label_a=foo, label_b=foo}} {id=2, items={label_a=bar, label_c=bar}} {id=3, items={label_b=baz, label_c=baz}} I would like to get a table like so: | id | label_a | label_b | label_c | ------------------------------------ | 1 | foo | foo | | |

AWS Athena (Presto) how to transpose map to columns

阅读更多关于 AWS Athena (Presto) how to transpose map to columns

Querying optional nested JSON fields in Athena

阅读更多关于 Querying optional nested JSON fields in Athena

问题 I have json data that looks something like: { "col1" : 123, "metadata" : { "opt1" : 456, "opt2" : 789 } } where the various metadata fields (of which there are many) are optional and may or may not be present. My query is: select col1, metadata.opt1 from "db-name".tablename If opt1 is not present in any rows, I would expect this to return all rows with a blank for the opt1 column, but if there wasn't a row with the opt1 in metadata when the crawler ran (and might still not be present in data

athena presto - multiple columns from long to wide

阅读更多关于 athena presto - multiple columns from long to wide

问题 I am new to Athena and I am trying to understand how to turn multiple columns from long to wide format. It seems like presto is what is needed, but I've only successfully been able to apply map_agg to one variable. I think my below final outcome can be achieved with multimap_agg but cannot quite get it to work. Below I walk through my steps and data. If you have some suggestions or questions, please let me know! First, the data starts like this: id | letter | number | value ------------------

Create Table in Athena From Nested JSON

阅读更多关于 Create Table in Athena From Nested JSON

问题 I have nested JSON of type [{ "emails": [{ "label": "", "primary": "", "relationdef_id": "", "type": "", "value": "" }], "licenses": [{ "allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": "" }, { "allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": "" }] }, { "emails": [{ "label": "", "primary": "", "relationdef_id": "", "type": "", "value": "" }], "licenses": [{

Create Table in Athena From Nested JSON

阅读更多关于 Create Table in Athena From Nested JSON