amazon-athena

AWS Athena MSCK REPAIR TABLE tablename command

那年仲夏 提交于 2019-12-13 04:41:47
问题 Is there any number of partitions we would expect this command MSCK REPAIR TABLE tablename; to fail on? I have a system that currently has over 27k partitions and the schema changes for the Athena table we drop the table, recreate the table with say the new column(s) tacked to the end and then run MSCK REPAIR TABLE tablename; We had no luck with this command doing any work what so every after we let it run for 5 hours. Not a single partition was added. Wondering if anyone has information

How do I identify problematic documents in S3 when querying data in Athena?

我怕爱的太早我们不能终老 提交于 2019-12-13 01:43:04
问题 I have a basic Athena query like this: SELECT * FROM my.dataset LIMIT 10 When I try to run it I get an error message like this: Your query has the following error(s): HIVE_BAD_DATA: Error parsing field value for field 2: For input string: "32700.000000000004" How do I identify the S3 document that has the invalid field? My documents are JSON. My table looks like this: CREATE EXTERNAL TABLE my.data ( `id` string, `timestamp` string, `profile` struct< `name`: string, `score`: int> ) ROW FORMAT

Recursively tracking state of customers (Presto SQL)

有些话、适合烂在心里 提交于 2019-12-12 19:05:13
问题 I have one table with the current state_id of my customers and another table holding all states and their state_ids, but without the corresponding customer_id. However, the historical state table holds the information of which state_id it replaced. Hence, it should be possible to recursively track the states/journey of the customer. Consider the following example: "Customer" table: customer_id state_created current_state_id 1 2017-11-09 33 2 2018-04-01 243 3 2018-07-10 254 "Historical_state"

Can you create views in Amazon Athena?

夙愿已清 提交于 2019-12-12 18:32:43
问题 Is it possible to create views in Amazon Athena? Since an External table is essentially metadata for data stored in files on S3, there's no transformation involved. Therefore, you can't handle data inconsistencies. Quite often, this can result in tables being defined with lots of string fields. Can you create a view over the top of the External table that can contain the transformation logic, allowing users to query a "cleansed" view of the data? 回答1: While that is a nice feature that you are

Athena date format unable to convert string to date formate

♀尐吖头ヾ 提交于 2019-12-12 10:08:39
问题 tried the below syntax none of them helped to convert a string type column to date select INVC_,APIDT,APDDT from APAPP100 limit 10 select current_date, APIDT,APDDT from APAPP100 limit 10 select date_format( b.APIDT, '%Y-%m-%d') from APAPP100 b select CAST( b.APIDT AS date) from APAPP100 b select date(b.APIDT) from APAPP100 b select convert(datetime, b.APIDT) from APAPP100 b select date_parse(b.APIDT, '%Y-%m-%d') from APAPP100 b select str_to_date(b.APIDT) from APAPP100 b 回答1: The correct

AWS Athena JDBC PreparedStatement

不想你离开。 提交于 2019-12-12 09:44:10
问题 I don't manage to make AWS Athena JDBC driver working with PreparedStatement and binded variables. If I put the desired value of a column directly in the SQL string, it works. But if I use placeholders '?' and I bind variables with setters of PreparedStatement, it does not work. Of course, we know we have to use the second way of doing (for caching, avoid SQL injection and so on). I use JDBC Driver AthenaJDBC42_2.0.2.jar. I get the following error when trying to use placeholders '?' in the

how to ensure that Athena result S3 object with bucket-owner-full-control

大城市里の小女人 提交于 2019-12-11 21:13:18
问题 We(account A) would like to use programmatically way to trigger athena query(startQueryExecution) in different aws account ( Account B), we use assumed role to achieve it. After athena query done, we are expecting that result should be written to our aws account s3 bucket (Account A). We managed to do so by setting both side IAM policy to allow B to write to A's S3 bucket. However, it seemed S3 object in account A is still owned by Account B, user/role in account A has no access to those

Athena Presto list empty tables

大城市里の小女人 提交于 2019-12-11 16:52:32
问题 I would like to list all empty tables in my database Athena. I tried : select table_schema, table_name from information_schema.tables where table_schema = 'database' But like this I list only table name with database name. Thanks for your help. 回答1: I do not think it is possible within a single query. Your query gives you a list of tables. Having that I think you could now iterate over that from the external tool. 来源: https://stackoverflow.com/questions/49410867/athena-presto-list-empty

Why do I need to hardcode credentials to connect to AWS using the javascript SDK?

女生的网名这么多〃 提交于 2019-12-11 15:54:07
问题 I've asked this other question here that leads me to believe, by default, the JavaScript AWS SDK looks for credentials in a number of places in your environment without you having to do anything. The order of places it checks is listed here: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/setting-credentials-node.html I've got some working code that connects to AWS Athena. I can only get it to work if I hardcode the credentials manually, which seems to contradict the

How to speed up Amazon Athena query executions?

喜夏-厌秋 提交于 2019-12-11 15:51:50
问题 I'm using Athena Query Execution to retrieve data from a Glue Table. A Crawler updates this table every hour using a S3 Bucket which is continuously updated by Kinesis Firehose. My Node.js server executes basic queries using Athena. But I realized that some of the requests takes so long that my server throws Server Request Timeout. I checked the Query History in Athena and I saw some of the latest requests' state is Queued which means they are waiting to be executed. They all have a small Run