amazon-athena

Athena: $path vs. partition

不想你离开。 提交于 2021-01-28 10:55:47
问题 I'm storing daily reports per client for query with Athena. At first I thought I'd use a client=c_1/month=12/day=01/ or client=c2/date=2020-12-01/ folder structure, and run MSCK REPAIR TABLE daily to make new day partition available for query. Then I realized there's the $path special column, so if I store files as 2020-12-01.csv I could run a query with WHERE $path LIKE '%12-01% thus saving a partition and the need to detect/add it daily. I can see this having an impact on performance if

How to rename AWS Athena columns with parquet file source?

|▌冷眼眸甩不掉的悲伤 提交于 2021-01-28 09:28:59
问题 I have data loaded in my S3 bucket folder as multiple parquet files. After loading them into Athena I can query the data successfully. What are the ways to rename the Athena table columns for parquet file source and still be able to see the data under renamed column after querying? Note: checked with edit schema option, column is getting renamed but after querying you will not see data under that column. 回答1: There is as far as I know no way to create a table with different names for the

Create athena table with column as unstructured JSON from S3

萝らか妹 提交于 2021-01-28 07:41:21
问题 I am currently creating an Athena table as follows: CREATE EXTERNAL TABLE `foo_streaming`( `type` string, `message` struct<a:string,b:string,c:string>) PARTITIONED BY ( `dt` string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://foo/data' However, instead of treating the message struct as structured data, I would like to read it

Aws Athena - Rename column name

纵然是瞬间 提交于 2021-01-28 04:20:37
问题 I am trying to change a column name in an AWS Athena table. From old_name to new_name . Normal DDL commands does not affect the table (They cannot be executed). Is It possible to change a column name without deleting and re-creating the table from scratch ? 回答1: I was mistaken, Athena uses HIVE DDL syntax so the correct command is : ALTER TABLE %%table-name%% CHANGE %%old-column-name%% %%new-column-name%%<string>; I based my answer on a hive related question. 回答2: You can find more about

Cross-account access to AWS Glue Data Catalog via Athena

假装没事ソ 提交于 2021-01-23 02:01:10
问题 Is it possible to directly access AWS Glue Data Catalog of Account B via the Athena interface of Account A ? 回答1: I was just trying to resolve this same issue in my own setup, but then stumbled across this bummer (the last bullet under Cross-Account Access Limitations on this page): Cross-account access to the Data Catalog is not supported when using an AWS Glue crawler, Amazon Athena, or Amazon Redshift. So it sounds like even with the cross-account access that is possible today, they won't

Cross-account access to AWS Glue Data Catalog via Athena

倖福魔咒の 提交于 2021-01-23 02:00:06
问题 Is it possible to directly access AWS Glue Data Catalog of Account B via the Athena interface of Account A ? 回答1: I was just trying to resolve this same issue in my own setup, but then stumbled across this bummer (the last bullet under Cross-Account Access Limitations on this page): Cross-account access to the Data Catalog is not supported when using an AWS Glue crawler, Amazon Athena, or Amazon Redshift. So it sounds like even with the cross-account access that is possible today, they won't

Athena DDL for Ion format?

五迷三道 提交于 2021-01-05 07:22:46
问题 I'm trying to use Athena to query some files that are in Ion format produced by the recently added Export To S3 feature of DynamoDB backups. This is a blatantly stupid format which is basically the string $ion_1_0 followed by json. The unquoted $ion_1_0 string at the front makes the data invalid json . I tried using the Ion Serde from here: CREATE EXTERNAL TABLE mydb.mytable ( `myfields` string, ... ) ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe' LOCATION 's3:/.../dynamodb-export

Specify a SerDe serialization lib with AWS Glue Crawler

守給你的承諾、 提交于 2021-01-02 20:09:22
问题 Every time I run a glue crawler on existing data, it changes the Serde serialization lib to LazySimpleSerDe , which doesn't classify correctly (e.g. for quoted fields with commas in) I then need to manually edit the table details in the Glue Catalog to change it to org.apache.hadoop.hive.serde2.OpenCSVSerde . I've tried making my own csv Classifier but that doesn't help. How do I get the crawler to specify a particular serialization lib for the tables produced or updated? 回答1: You can't

Athena/Presto - UNNEST MAP to columns

狂风中的少年 提交于 2020-12-26 12:26:28
问题 Assume i have a table like this, table: qa_list id | question_id | question | answer | ---------+--------------+------------+------------- 1 | 100 | question1 | answer | 2 | 101 | question2 | answer | 3 | 102 | question3 | answer | 4 | ... ... | ... and a query that gives below result (since I couldn't find a direct way to transpose the table), table: qa_map id | qa_map --------+--------- 1 | {question1=answer,question2=answer,question3=answer, ....} Where qa_map is the result of a map_agg of

Athena/Presto - UNNEST MAP to columns

前提是你 提交于 2020-12-26 12:19:15
问题 Assume i have a table like this, table: qa_list id | question_id | question | answer | ---------+--------------+------------+------------- 1 | 100 | question1 | answer | 2 | 101 | question2 | answer | 3 | 102 | question3 | answer | 4 | ... ... | ... and a query that gives below result (since I couldn't find a direct way to transpose the table), table: qa_map id | qa_map --------+--------- 1 | {question1=answer,question2=answer,question3=answer, ....} Where qa_map is the result of a map_agg of