amazon-athena | 易学教程

Athena: $path vs. partition

阅读更多关于 Athena: $path vs. partition

问题 I'm storing daily reports per client for query with Athena. At first I thought I'd use a client=c_1/month=12/day=01/ or client=c2/date=2020-12-01/ folder structure, and run MSCK REPAIR TABLE daily to make new day partition available for query. Then I realized there's the $path special column, so if I store files as 2020-12-01.csv I could run a query with WHERE $path LIKE '%12-01% thus saving a partition and the need to detect/add it daily. I can see this having an impact on performance if

How to rename AWS Athena columns with parquet file source?

阅读更多关于 How to rename AWS Athena columns with parquet file source?

问题 I have data loaded in my S3 bucket folder as multiple parquet files. After loading them into Athena I can query the data successfully. What are the ways to rename the Athena table columns for parquet file source and still be able to see the data under renamed column after querying? Note: checked with edit schema option, column is getting renamed but after querying you will not see data under that column. 回答1: There is as far as I know no way to create a table with different names for the

Create athena table with column as unstructured JSON from S3

阅读更多关于 Create athena table with column as unstructured JSON from S3

问题 I am currently creating an Athena table as follows: CREATE EXTERNAL TABLE `foo_streaming`( `type` string, `message` struct<a:string,b:string,c:string>) PARTITIONED BY ( `dt` string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://foo/data' However, instead of treating the message struct as structured data, I would like to read it

Aws Athena - Rename column name

阅读更多关于 Aws Athena - Rename column name

问题 I am trying to change a column name in an AWS Athena table. From old_name to new_name . Normal DDL commands does not affect the table (They cannot be executed). Is It possible to change a column name without deleting and re-creating the table from scratch ? 回答1: I was mistaken, Athena uses HIVE DDL syntax so the correct command is : ALTER TABLE %%table-name%% CHANGE %%old-column-name%% %%new-column-name%%<string>; I based my answer on a hive related question. 回答2: You can find more about

Cross-account access to AWS Glue Data Catalog via Athena

阅读更多关于 Cross-account access to AWS Glue Data Catalog via Athena

问题 Is it possible to directly access AWS Glue Data Catalog of Account B via the Athena interface of Account A ? 回答1: I was just trying to resolve this same issue in my own setup, but then stumbled across this bummer (the last bullet under Cross-Account Access Limitations on this page): Cross-account access to the Data Catalog is not supported when using an AWS Glue crawler, Amazon Athena, or Amazon Redshift. So it sounds like even with the cross-account access that is possible today, they won't

Cross-account access to AWS Glue Data Catalog via Athena

阅读更多关于 Cross-account access to AWS Glue Data Catalog via Athena

Athena DDL for Ion format?

阅读更多关于 Athena DDL for Ion format?

问题 I'm trying to use Athena to query some files that are in Ion format produced by the recently added Export To S3 feature of DynamoDB backups. This is a blatantly stupid format which is basically the string $ion_1_0 followed by json. The unquoted $ion_1_0 string at the front makes the data invalid json . I tried using the Ion Serde from here: CREATE EXTERNAL TABLE mydb.mytable ( `myfields` string, ... ) ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe' LOCATION 's3:/.../dynamodb-export

Specify a SerDe serialization lib with AWS Glue Crawler

阅读更多关于 Specify a SerDe serialization lib with AWS Glue Crawler

问题 Every time I run a glue crawler on existing data, it changes the Serde serialization lib to LazySimpleSerDe , which doesn't classify correctly (e.g. for quoted fields with commas in) I then need to manually edit the table details in the Glue Catalog to change it to org.apache.hadoop.hive.serde2.OpenCSVSerde . I've tried making my own csv Classifier but that doesn't help. How do I get the crawler to specify a particular serialization lib for the tables produced or updated? 回答1: You can't

Athena/Presto - UNNEST MAP to columns

阅读更多关于 Athena/Presto - UNNEST MAP to columns

问题 Assume i have a table like this, table: qa_list id | question_id | question | answer | ---------+--------------+------------+------------- 1 | 100 | question1 | answer | 2 | 101 | question2 | answer | 3 | 102 | question3 | answer | 4 | ... ... | ... and a query that gives below result (since I couldn't find a direct way to transpose the table), table: qa_map id | qa_map --------+--------- 1 | {question1=answer,question2=answer,question3=answer, ....} Where qa_map is the result of a map_agg of

Athena/Presto - UNNEST MAP to columns

阅读更多关于 Athena/Presto - UNNEST MAP to columns