Can you use Athena ODBC/JDBC to return the S3 location of results?

▼魔方 西西 提交于 2021-02-11 14:06:01

问题


I've been using the metis package to run Athena queries via R. While this is great for small queries, there still does not seem to be a viable solution for queries with very large return datasets (10's of thousands of rows, for example). However, when running these same queries in the AWS console, it is fast/straightforward to use the download link to obtain the CSV file of the query result.

This got me thinking: is there a mechanism for sending the query via R but returning/obtaining the S3:// bucket location where the query results live instead of the normal results object?


回答1:


As mentioned in my comment above you could investigate the RAthena and noctua packages.

These packages connect to AWS Athena using AWS SDK's as their drivers. What this means is that they will also download the data from S3 in as similar method that is mentioned by @Zerodf. They both use data.table to load the data into R so they are pretty quick. Also you can get to Query Execution ID if required for some reason.

Here is an example of how to use the packages:

RAthena

Create a connection to AWS Athena, for more information around how to connect please look at: dbConnect

library(DBI)
con <- dbConnect(RAthena::athena())

Example in how to query Athena:

dbGetQuery(con, "select * from sampledb.elb_logs")

How to access the Query ID:

res <- dbSendQuery(con,  "select * from sampledb.elb_logs")

sprintf("%s%s.csv",res@connection@info$s3_staging, res@info$QueryExecutionId)

noctua

Create a connection to AWS Athena, for more information around how to connect please look at: dbConnect

library(DBI)
con <- dbConnect(noctua::athena())

Example in how to query Athena:

dbGetQuery(con, "select * from sampledb.elb_logs")

How to access the Query ID:

res <- dbSendQuery(con,  "select * from sampledb.elb_logs")

sprintf("%s%s.csv",res@connection@info$s3_staging, res@info$QueryExecutionId)

Sum up

These packages should do what you are looking for however as they download the data from the query output in s3 I don't believe you will need to go to the query execution ID to do the same process.




回答2:


You could look at the Cloudyr Project. They have a package that handles creating the signature requests for the AWS API. Then you can fire off a query, poll AWS until the query finishes (using the QueryExecutionID), and use aws.s3 to download the result set.

You can also use system() to use AWS CLI commands to execute a query, wait for the results, and download the results.

For example: You could run the following commands on the command line to get the results of a query.

$ aws athena start-query-execution --query-string "select count(*) from test_null_unquoted" --execution-context Database=stackoverflow --result-configuration OutputLocation=s3://SOMEBUCKET/ --output text XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

Once you get the query-execution-id, then you can check on the results.

$ aws athena get-query-execution --query-execution-id=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX --output text QUERYEXECUTION select count(*) from test_null_unquoted XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX QUERYEXECUTIONCONTEXT stackoverflow RESULTCONFIGURATION s3://SOMEBUCKET/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.csv STATISTICS 104 1403 STATUS 1528809056.658 SUCCEEDED 1528809054.945

Once the query succeeds, you can download the data.

$ aws s3 cp s3://stack-exchange/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.csv

Edit: You can even turn those commands into a one liner (Bash example here), but I'm sure you could do the same thing in powershell.

$ eid=`aws athena start-query-execution --query-string "select count(*) from test_null_unquoted" --query-execution-context Database=SOMEDATABASE--result-configuration OutputLocation=s3://SOMEBUCKET/ --output text --output text` && until aws athena get-query-execution --query-execution-id=$eid --output text | grep "SUCCEEDE D"; do sleep 10 | echo "waiting..."; done && aws s3 cp s3://SOMEBUCKET/$eid.csv . && unset eid



来源:https://stackoverflow.com/questions/50806014/can-you-use-athena-odbc-jdbc-to-return-the-s3-location-of-results

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!