amazon-athena

AWS Glue issue with double quote and commas

蹲街弑〆低调 提交于 2019-12-10 17:37:57
问题 I have this CSV file: reference,address V7T452F4H9,"12410 W 62TH ST, AA D" The following options are being used in the table definition ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'quoteChar'='\"', 'separatorChar'=',') but it still won't recognize the double quotes in the data, and that comma in the double quote fiel is messing up the data. When I run the Athena query, the result looks like this reference address V7T452F4H9 "12410 W 62TH ST How do I

Add a partition on glue table via API on AWS?

我是研究僧i 提交于 2019-12-10 15:55:29
问题 I have an S3 bucket which is constantly being filled with new data, I am using Athena and Glue to query that data, the thing is if glue doesn't know that a new partition is created it doesn't search that it needs to search there. If I make an API call to run the Glue crawler each time I need a new partition is too expensive so the best solution to do this is to tell glue that a new partition is added i.e to create a new partition is in it's properties table. I looked through AWS documentation

Amazon Athena - Column cannot be resolved on basic SQL WHERE query

蹲街弑〆低调 提交于 2019-12-10 15:37:39
问题 I am currently evaluating Amazon Athena and Amazon S3. I have created a database (testdb) with one table (awsevaluationtable). The table has two columns, x (bigint) and y (bigint). When I run: SELECT * FROM testdb."awsevaluationtable" I get all of the test data: However, when I try a basic WHERE query: SELECT * FROM testdb."awsevaluationtable" WHERE x > 5 I get: SYNTAX_ERROR: line 3:7: Column 'x' cannot be resolved I have tried all sorts of variations: SELECT * FROM testdb.awsevaluationtable

Can I delete data (rows in tables) from Athena?

旧巷老猫 提交于 2019-12-10 14:54:00
问题 Is it possible to delete data stored in S3 through an Athena query? I have some rows I have to delete from a couple of tables (they point to separate buckets in S3). I couldn't find a way to do it in the Athena User Guide: https://docs.aws.amazon.com/athena/latest/ug/athena-ug.pdf and DELETE FROM isn't supported, but I'm wondering if there is an easier way than trying to find the files in S3 and deleting them. 回答1: You can leverage Athena to find out all the files that you want to delete and

How do I Configure file format of AWS Athena results

时光怂恿深爱的人放手 提交于 2019-12-10 13:46:06
问题 Currently, the Athena query results are in tsv format in S3. Is there any way to configure Athena queries to return results in Parquet format. 回答1: Answer At this moment it isn't possible to do it directly with Athena. When it comes to configure result of the Athena query you can only setup query result location and encryption configuration. Workaround 1) From October Athena supports CTAS query, you can try to use this feature. https://docs.aws.amazon.com/athena/latest/ug/ctas.html https:/

converting to timestamp with time zone failed on Athena

♀尐吖头ヾ 提交于 2019-12-10 13:13:26
问题 I'm trying to create to following view: CREATE OR REPLACE VIEW view_events AS ( SELECT "rank"() OVER (PARTITION BY "tb1"."innerid" ORDER BY "tb1"."date" ASC) "r" , "tb2"."opcode" , "tb1"."innerid" , "tb1"."date" , From_iso8601_timestamp(tb1.date) as "real_date" , "tb2"."eventtype" , "tb1"."fuelused" , "tb1"."mileage" , "tb1"."latitude" , "tb1"."longitude" FROM rt_message_header tb1 , rt_messages tb2 WHERE ((("tb1"."uuid" = "tb2"."header_uuid") AND ("tb2"."opcode" = '39')) AND ("tb2"."type" =

Query csv tables stored s3 through athena

試著忘記壹切 提交于 2019-12-10 11:58:49
问题 Recently we started to store our backups in aws s3. It is all csv files that we need to query through aws athena. We tried to insert the tables one by one but it's taking too long, it is a fair amount of data. Is there any API that we can use or something that is alredy set? we were about to do something with spark, but maybe there is a simpler way, or something that's already have been done. thanks 回答1: You can simply create an external table on top of CSV files with the required properties.

How to Query parquet data from Amazon Athena?

↘锁芯ラ 提交于 2019-12-10 09:15:57
问题 Athena creates a temporary table using fields in S3 table. I have done this using JSON data. Could you help me on how to create table using parquet data? I have tried following: Converted sample JSON data to parquet data. Uploaded parquet data to S3. Created temporary table using columns of JSON data. By doing this I am able to a execute query but the result is empty. Is this approach right or is there any other approach to be followed on parquet data? Sample json data: {"_id":

AWS Athena export array of structs to JSON

走远了吗. 提交于 2019-12-10 01:08:35
问题 I've got an Athena table where some fields have a fairly complex nested format. The backing records in S3 are JSON. Along these lines (but we have several more levels of nesting): CREATE EXTERNAL TABLE IF NOT EXISTS test ( timestamp double, stats array<struct<time:double, mean:double, var:double>>, dets array<struct<coords: array<double>, header:struct<frame:int, seq:int, name:string>>>, pos struct<x:double, y:double, theta:double> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH

Query exhausted resources at this scale factor

て烟熏妆下的殇ゞ 提交于 2019-12-08 16:35:44
问题 I was running SQL query on Amazon Athena. And I got the following error couple of times: Query exhausted resources at this scale factor This query ran against the "test1" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: ************* 回答1: We also encountered this. We noticed that this might be an internal Amazon issue. Whenever we encounter this error specially for very fast queries, we just try to delete the table