amazon-athena

Locate value in column

谁说胖子不能爱 提交于 2020-01-25 09:25:05
问题 I am passing Elastic Load Balancing Access logs to Athena and I want to get a value that is located in the URL column. The below works for mySQL but Athena uses SQL is there a way I can grab the value of county from the below? create table elb_logs(row varchar(100), url varchar(100)); insert into elb_logs values("Row1", "Lauguage=English&Country=USA&Gender=Male"); insert into elb_logs values("Row2", "Gender=Female&Language=French&Country="); insert into elb_logs values("Row3", "Country=Canada

String/INT96 to Datatime - Amazon Athena/SQL - DDL/DML

喜欢而已 提交于 2020-01-25 09:06:08
问题 I have hosted my data on S3 Bucket in parquet format and i am trying to access it using Athena. I can see i can successfully access the hosted table. I detected something fishy when i try to access a column "createdon". createdon is a timestamp column and it reflects same on Athena table, but when i try to query it using the provided SQL below query SELECT createdon FROM "uat-raw"."opportunity" limit 10; Unexpected output : +51140-02-21 19:00:00.000 +51140-02-21 21:46:40.000 +51140-02-22 00

String/INT96 to Datatime - Amazon Athena/SQL - DDL/DML

萝らか妹 提交于 2020-01-25 09:04:16
问题 I have hosted my data on S3 Bucket in parquet format and i am trying to access it using Athena. I can see i can successfully access the hosted table. I detected something fishy when i try to access a column "createdon". createdon is a timestamp column and it reflects same on Athena table, but when i try to query it using the provided SQL below query SELECT createdon FROM "uat-raw"."opportunity" limit 10; Unexpected output : +51140-02-21 19:00:00.000 +51140-02-21 21:46:40.000 +51140-02-22 00

Athena - DATE column correct values from JSON

旧巷老猫 提交于 2020-01-16 09:08:52
问题 I have a S3 bucket with many JSON files. JSON file example: {"id":"x109pri", "import_date":"2017-11-06"} The "import_date" field is DATE type in standard format YYYY-MM-DD. I am creating a Database connection in Athena to link all these JSON files. However, when I create a new table in Athena and specify this field format as DATE I get: "Internal error" with no other explanation provided. To clarify, the table gets created just fine but if I want to preview it or query, I get this error.

AWS Glue+Athena skip header row

怎甘沉沦 提交于 2020-01-15 10:32:36
问题 As of January 19, 2018 updates, Athena can skip the header row of files, Support for ignoring headers. You can use the skip.header.line.count property when defining tables, to allow Athena to ignore headers. I use AWS Glue in Cloudformation to manage my Athena tables. Using the Glue Table Input, how can I tell Athena to skip the header row? 回答1: Basing off the full template for AWS::Glue::Table here, making the change from, Resources: ... MyGlueTable: ... Properties: ... TableInput: ...

Multi-line JSON file querying in hive

余生颓废 提交于 2020-01-15 04:14:48
问题 I understand that the majority of JSON SerDe formats expect .json files to be stored with one record per line. I have an S3 bucket with multi-line indented .json files (don't control the source) that I'd like to query using Amazon Athena (though I suppose this applies just as well to Hive generally). Is there a SerDe format out there that is able to parse multi-line indented .json files? If there isn't a SerDe format to do this: Is there a best practice for dealing with files like this?

AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3

人盡茶涼 提交于 2020-01-14 09:51:50
问题 Part One : I tried glue crawler to run on dummy csv loaded in s3 it created a table but when I try view table in athena and query it it shows Zero Records returned. But the demo data of ELB in Athena works fine. Part Two (Scenario:) Suppose I Have a excel file and data dictionary of how and what format data is stored in that file , I want that data to be dumped in AWS Redshift What would be best way to achieve this ? 回答1: I experienced the same issue. You need to give the folder path instead

AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3

五迷三道 提交于 2020-01-14 09:51:41
问题 Part One : I tried glue crawler to run on dummy csv loaded in s3 it created a table but when I try view table in athena and query it it shows Zero Records returned. But the demo data of ELB in Athena works fine. Part Two (Scenario:) Suppose I Have a excel file and data dictionary of how and what format data is stored in that file , I want that data to be dumped in AWS Redshift What would be best way to achieve this ? 回答1: I experienced the same issue. You need to give the folder path instead

How to solve this HIVE_PARTITION_SCHEMA_MISMATCH?

不羁的心 提交于 2020-01-06 07:01:45
问题 I have partitioned data in CSV files on S3: s3://bucket/dataset/p=1/*.csv (partition #1) ... s3://bucket/dataset/p=100/*.csv (partition #100) I run a classifier over s3://bucket/dataset/ and the result looks very much promising as it detects 150 columns (c1,...,c150) and assigns various data types. Loading the resulting table in Athena and querying ( select * from dataset limit 10 ) it though will yield the error message: HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table

How to skip documents that do not match schema in Athena?

家住魔仙堡 提交于 2020-01-06 02:36:05
问题 Suppose I have an external table like this: CREATE EXTERNAL TABLE my.data ( `id` string, `timestamp` string, `profile` struct< `name`: string, `score`: int> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'ignore.malformed.json' = 'true' ) LOCATION 's3://my-bucket-of-data' TBLPROPERTIES ('has_encrypted_data'='false'); A few of my documents have an invalid profile.score (a string rather than an integer). This causes queries in Athena