amazon-athena

Partitioning Athena Tables from Glue Cloudformation template

风格不统一 提交于 2021-02-19 06:28:26
问题 Using AWS::Glue::Table, you can set up an Athena table like here. Athena supports partitioning data based on folder structure in S3. I would like to partition my Athena table from my Glue template. From AWS Glue Table TableInput, it appears that I can use PartitionKeys to partition my data, but when I try to use the below template, Athena fails and can't get any data. Resources: ... MyGlueTable: Type: AWS::Glue::Table Properties: DatabaseName: !Ref MyGlueDatabase CatalogId: !Ref AWS:

Athena can only see the first JSON record written to Firehose by Kinesis Analytics

为君一笑 提交于 2021-02-19 03:53:25
问题 I am using Kinesis Analytics to read in JSON from Kinesis Firehose. I am successfully filtering out some of the records and writing a subset of the JSON properties to another Firehose. I wanted to execute an Athena query on the data being written to S3 via the destination Firehose. However, the JSON records written to the files in S3 do not have any newlines. Consequently, when I query the data using Athena, it only returns the first record in each file. When I write records to the source

How can I get result format JSON from Athena in AWS?

白昼怎懂夜的黑 提交于 2021-02-19 03:18:58
问题 I want to get result value format JSON from Athena in AWS. When I select from the Athena then the result format like this. {test.value={report_1=test, report_2=normal, report_3=hard}} Is there any way to get JSON format result without replacing "=" to ":" ? The column format is map<string,map<string,string>> 回答1: select mycol from mytable ; +--------------------------------------------------------------+ | mycol | +--------------------------------------------------------------+ | {test.value=

How can I get result format JSON from Athena in AWS?

不羁岁月 提交于 2021-02-19 03:17:38
问题 I want to get result value format JSON from Athena in AWS. When I select from the Athena then the result format like this. {test.value={report_1=test, report_2=normal, report_3=hard}} Is there any way to get JSON format result without replacing "=" to ":" ? The column format is map<string,map<string,string>> 回答1: select mycol from mytable ; +--------------------------------------------------------------+ | mycol | +--------------------------------------------------------------+ | {test.value=

Athena puts data in incorrect columns when input data format changes

这一生的挚爱 提交于 2021-02-17 05:38:30
问题 We have some pipe delimited .txt reports coming into a folder in S3, on which we run Glue crawler to determine the schema and query in Athena. The format of the report changed recently so there are two new columns in the middle. Old files: Columns A B C D E F Data a1 b1 c1 d1 e1 f1 New files with extra "G" and "H" columns: Columns A B G H C D E F Data a2 b2 g2 h2 c2 d2 e2 f2 What we get in the table created by the crawler as seen in Athena: Columns A B C D E F G H <- Puts new columns at the

How to UnPivot COLUMNS into ROWS in AWS Glue / Py Spark script

心已入冬 提交于 2021-02-11 15:05:00
问题 I have a large nested json document for each year (say 2018, 2017), which has aggregated data by each month (Jan-Dec) and each day (1-31). { "2018" : { "Jan": { "1": { "u": 1, "n": 2 } "2": { "u": 4, "n": 7 } }, "Feb": { "1": { "u": 3, "n": 2 }, "4": { "u": 4, "n": 5 } } } } I have used AWS Glue Relationalize.apply function to convert above hierarchal data into flat structure: dfc = Relationalize.apply(frame = datasource0, staging_path = my_temp_bucket, name = my_ref_relationalize_table,

Calculate Median for each group in AWS Athena table

强颜欢笑 提交于 2021-02-11 15:03:33
问题 Below is the schema for the athena table I wish to calculate median for 'parameter_value' group by standard_lab_parameter_name & units. For this I followed link : https://docs.aws.amazon.com/redshift/latest/dg/r_MEDIAN.html But on running the query select median(parameter_value) from table_name group by standard_lab_parameter_name, units It throws error SYNTAX_ERROR: line 1:8: Function median not registered Any help? Or if some alternative query would be great 回答1: Athena is based on Presto 0

Calculate Median for each group in AWS Athena table

心不动则不痛 提交于 2021-02-11 15:02:30
问题 Below is the schema for the athena table I wish to calculate median for 'parameter_value' group by standard_lab_parameter_name & units. For this I followed link : https://docs.aws.amazon.com/redshift/latest/dg/r_MEDIAN.html But on running the query select median(parameter_value) from table_name group by standard_lab_parameter_name, units It throws error SYNTAX_ERROR: line 1:8: Function median not registered Any help? Or if some alternative query would be great 回答1: Athena is based on Presto 0

Presto SQL / Athena: select between times across different days

让人想犯罪 __ 提交于 2021-02-11 14:21:37
问题 I have a database that contains a series of events and their timestamp. I find myself needing to select all events that happen between 11:00 and 11:10 and 21:00 and 21:05, for all days. So what I would do is I extract from timestamp the hour and the minute, and: SELECT * WHERE (hour = 11 AND minute <= 10) OR (hour = 21 AND minute <= 05) However, I was wondering if there's a simpler / less verbose way to do this, such as when you query between dates: SELECT * WHERE date BETWEEN '2020-07-01'

Can you use Athena ODBC/JDBC to return the S3 location of results?

▼魔方 西西 提交于 2021-02-11 14:06:01
问题 I've been using the metis package to run Athena queries via R. While this is great for small queries, there still does not seem to be a viable solution for queries with very large return datasets (10's of thousands of rows, for example). However, when running these same queries in the AWS console, it is fast/straightforward to use the download link to obtain the CSV file of the query result. This got me thinking: is there a mechanism for sending the query via R but returning/obtaining the S3: