amazon-athena | 易学教程

Create Table in Athena From Nested JSON

阅读更多关于 Create Table in Athena From Nested JSON

问题 I have nested JSON of type [{ "emails": [{ "label": "", "primary": "", "relationdef_id": "", "type": "", "value": "" }], "licenses": [{ "allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": "" }, { "allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": "" }] }, { "emails": [{ "label": "", "primary": "", "relationdef_id": "", "type": "", "value": "" }], "licenses": [{

Create Table in Athena From Nested JSON

阅读更多关于 Create Table in Athena From Nested JSON

AWS Athena row cast fails when key is a reserved keyword despite double quotes

阅读更多关于 AWS Athena row cast fails when key is a reserved keyword despite double quotes

问题 I'm working with data in AWS Athena, and I'm trying to match the structure of some input data. This involves a nested structure where "from" is a key. This consistently throws errors. I've narrowed the issue down to the fact that Athena queries don't work when you try to use reserved keywords as keys in rows. The following examples demonstrate this behavior. This simple case, SELECT CAST(ROW(1) AS ROW("from" INTEGER)) , fails with the following error: GENERIC_INTERNAL_ERROR: Unable to create

Special characters in AWS Athena show up as question marks

阅读更多关于 Special characters in AWS Athena show up as question marks

问题 I've added a table in AWS Athena from a csv file, which uses special characters "æøå". These show up as � in the output. The csv file is encoded using unicode. I've also tried changing the encoding to UTF-8, with no luck. I've uploaded the csv in S3 and then added the table to Athena using the following DDL: CREATE EXTERNAL TABLE `regions_dk`( `postnummer` string COMMENT 'from deserializer', `kommuner` string COMMENT 'from deserializer', `regioner` string COMMENT 'from deserializer') ROW

Athena displays special characters as?

阅读更多关于 Athena displays special characters as?

问题 I have an external table with below DDL CREATE EXTERNAL TABLE `table_1`( `name` string COMMENT 'from deserializer', `desc1` string COMMENT 'from deserializer', `desc2` string COMMENT 'from deserializer', ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'quoteChar'='\"', 'separatorChar'='|', 'skip.header.line.count'='1') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

AWS Athena: Named boto3 queries not creating corresponding tables

阅读更多关于 AWS Athena: Named boto3 queries not creating corresponding tables

问题 I have the following boto3 draft script #!/usr/bin/env python3 import boto3 client = boto3.client('athena') BUCKETS='buckets.txt' DATABASE='some_db' QUERY_STR="""CREATE EXTERNAL TABLE IF NOT EXISTS some_db.{}( BucketOwner STRING, Bucket STRING, RequestDateTime STRING, RemoteIP STRING, Requester STRING, RequestID STRING, Operation STRING, Key STRING, RequestURI_operation STRING, RequestURI_key STRING, RequestURI_httpProtoversion STRING, HTTPstatus STRING, ErrorCode STRING, BytesSent BIGINT,

Query by “$path” field

阅读更多关于 Query by “$path” field

问题 I want to query by a file / group of files under a partition inside a table. I found out that when I'm using the "$path" field Athena scans the entire partition, and not the files I want Is there a way to make this kind of query more efficient and scan only the given files? Something like partition pruning for files... Here is a sample query: SELECT * FROM my_table WHERE day = '2019-01-01' AND "$path" = 's3://my-bucket/my-table/day=2019-01-01/my_file' 回答1: No. It's not possible to get Athena

Query by “$path” field

阅读更多关于 Query by “$path” field

Rename Column in Athena

阅读更多关于 Rename Column in Athena

问题 Athena table "organization" reads data from parquet files in s3. I need to change a column name from "cost" to "fee" . The data files goes back to Jan 2018. If I just rename the column in Athena , table won't be able to find data for new column in parquet file. Please let me know if there ways to resolve it. 回答1: You have to change the schema and point to new column "fee" But it depends on ur situation. If you have two data sets, in one dataset it is called "cost" and in another dataset it is

Spark Small ORC Stripes

阅读更多关于 Spark Small ORC Stripes

问题 We use Spark to flatten out clickstream data and then write the same to S3 in ORC+zlib format, I have tried changing many settings in Spark but still the resultant stripe sizes of the ORC file getting created are very small (<2MB) Things which I tried so far to decrease the stripe size, Earlier each file was 20MB in size, using coalesce I am now creating files which are of 250-300MB in size, but still there are 200 stripes per file i.e each stripe <2MB Tried using hivecontext instead of