amazon-athena | 易学教程

Locate value in column

阅读更多关于 Locate value in column

问题 I am passing Elastic Load Balancing Access logs to Athena and I want to get a value that is located in the URL column. The below works for mySQL but Athena uses SQL is there a way I can grab the value of county from the below? create table elb_logs(row varchar(100), url varchar(100)); insert into elb_logs values("Row1", "Lauguage=English&Country=USA&Gender=Male"); insert into elb_logs values("Row2", "Gender=Female&Language=French&Country="); insert into elb_logs values("Row3", "Country=Canada

String/INT96 to Datatime - Amazon Athena/SQL - DDL/DML

阅读更多关于 String/INT96 to Datatime - Amazon Athena/SQL - DDL/DML

问题 I have hosted my data on S3 Bucket in parquet format and i am trying to access it using Athena. I can see i can successfully access the hosted table. I detected something fishy when i try to access a column "createdon". createdon is a timestamp column and it reflects same on Athena table, but when i try to query it using the provided SQL below query SELECT createdon FROM "uat-raw"."opportunity" limit 10; Unexpected output : +51140-02-21 19:00:00.000 +51140-02-21 21:46:40.000 +51140-02-22 00

String/INT96 to Datatime - Amazon Athena/SQL - DDL/DML

阅读更多关于 String/INT96 to Datatime - Amazon Athena/SQL - DDL/DML

Athena - DATE column correct values from JSON

阅读更多关于 Athena - DATE column correct values from JSON

问题 I have a S3 bucket with many JSON files. JSON file example: {"id":"x109pri", "import_date":"2017-11-06"} The "import_date" field is DATE type in standard format YYYY-MM-DD. I am creating a Database connection in Athena to link all these JSON files. However, when I create a new table in Athena and specify this field format as DATE I get: "Internal error" with no other explanation provided. To clarify, the table gets created just fine but if I want to preview it or query, I get this error.

AWS Glue+Athena skip header row

阅读更多关于 AWS Glue+Athena skip header row

问题 As of January 19, 2018 updates, Athena can skip the header row of files, Support for ignoring headers. You can use the skip.header.line.count property when defining tables, to allow Athena to ignore headers. I use AWS Glue in Cloudformation to manage my Athena tables. Using the Glue Table Input, how can I tell Athena to skip the header row? 回答1: Basing off the full template for AWS::Glue::Table here, making the change from, Resources: ... MyGlueTable: ... Properties: ... TableInput: ...

Multi-line JSON file querying in hive

阅读更多关于 Multi-line JSON file querying in hive

问题 I understand that the majority of JSON SerDe formats expect .json files to be stored with one record per line. I have an S3 bucket with multi-line indented .json files (don't control the source) that I'd like to query using Amazon Athena (though I suppose this applies just as well to Hive generally). Is there a SerDe format out there that is able to parse multi-line indented .json files? If there isn't a SerDe format to do this: Is there a best practice for dealing with files like this?

AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3

阅读更多关于 AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3

问题 Part One : I tried glue crawler to run on dummy csv loaded in s3 it created a table but when I try view table in athena and query it it shows Zero Records returned. But the demo data of ELB in Athena works fine. Part Two (Scenario:) Suppose I Have a excel file and data dictionary of how and what format data is stored in that file , I want that data to be dumped in AWS Redshift What would be best way to achieve this ? 回答1: I experienced the same issue. You need to give the folder path instead

AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3

阅读更多关于 AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3

How to solve this HIVE_PARTITION_SCHEMA_MISMATCH?

阅读更多关于 How to solve this HIVE_PARTITION_SCHEMA_MISMATCH?

问题 I have partitioned data in CSV files on S3: s3://bucket/dataset/p=1/*.csv (partition #1) ... s3://bucket/dataset/p=100/*.csv (partition #100) I run a classifier over s3://bucket/dataset/ and the result looks very much promising as it detects 150 columns (c1,...,c150) and assigns various data types. Loading the resulting table in Athena and querying ( select * from dataset limit 10 ) it though will yield the error message: HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table

How to skip documents that do not match schema in Athena?

阅读更多关于 How to skip documents that do not match schema in Athena?

问题 Suppose I have an external table like this: CREATE EXTERNAL TABLE my.data ( `id` string, `timestamp` string, `profile` struct< `name`: string, `score`: int> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'ignore.malformed.json' = 'true' ) LOCATION 's3://my-bucket-of-data' TBLPROPERTIES ('has_encrypted_data'='false'); A few of my documents have an invalid profile.score (a string rather than an integer). This causes queries in Athena