amazon-athena | 易学教程

How to use aws athena using nodejs?

阅读更多关于 How to use aws athena using nodejs?

问题 Athena is analytics service for retrieving data from s3 using sql query. I have queried data in s3 using t aws console Need access to aws athena using nodejs code 回答1: I am using athena like following way in my nodejs project : download JDBC driver from AWS. Create a connector.js file. npm install jdbc NPM. Paste followings: var JDBC = require('jdbc'); var jinst = require('jdbc/lib/jinst'); if (!jinst.isJvmCreated()) { jinst.addOption("-Xrs"); jinst.setupClasspath(['./AthenaJDBC41-*.jar']); }

How to change the name of the Athena results stored in S3?

阅读更多关于 How to change the name of the Athena results stored in S3?

问题 The results of Athena query is saved by the query id (a long string) in S3. I was wondering if there's a way to save the results of the query with a pre-specified name? (that can later be easily looked up) 回答1: unfortunately no (at least not yet)! the best way to do this as of now is to write a script to go through all the results of each run and rename (moving+deleting) all the files in that s3 bucket! 回答2: For named queries your results location will be structured as follows: s3://athena

Amazon AWS Athena S3 and Glacier Mixed Bucket

阅读更多关于 Amazon AWS Athena S3 and Glacier Mixed Bucket

问题 Amazon Athena Log Analysis Services with S3 Glacier We have petabytes of data in S3. We are https://www.pubnub.com/ and we store usage data in S3 of our network for billing purposes. We have tab delimited log files stored in an S3 bucket. Athena is giving us a HIVE_CURSOR_ERROR failure. Our S3 bucket is setup to automatically push to AWS Glacier after 6 months. Our bucket has S3 files hot and ready to read in addition to the Glacier backup files. We are getting access errors from Athena

AWS Athena concurrency limits: Number of submitted queries VS number of running queries

阅读更多关于 AWS Athena concurrency limits: Number of submitted queries VS number of running queries

问题 According to AWS Athena limitations you can submit up to 20 queries of the same type at a time, but it is a soft limit and can be increased on request. I use boto3 to interact with Athena and my script submits 16 CTAS queries each of which takes about 2 minutes to finish. In a AWS account, it is only me who is using Athena service. However, when I look at the state of queries through console I see that only a few of queries (5 on average) are actually being executed despite all of them being

Amazon Athena: no viable alternative at input

阅读更多关于 Amazon Athena: no viable alternative at input

问题 While creating a table in Athena; it gives me following exception: no viable alternative at input 回答1: hyphens are not allowed in table name.. ( though wizard allows it ) .. Just remove hyphen and it works like a charm 回答2: Unfortunately, at the moment the syntax validation error messages are not very descriptive in Athena, this error may mean "almost" any possible syntax errors on the create table statement. Although this is annoying at the moment you will need to check if the syntax follows

How to make MSCK REPAIR TABLE execute automatically in AWS Athena

阅读更多关于 How to make MSCK REPAIR TABLE execute automatically in AWS Athena

问题 I have a spark batch job which is executed hourly. Each run generates and stores new data in S3 with the directory naming pattern DATA/YEAR=?/MONTH=?/DATE=?/datafile . After uploading the data to S3 , I want to investigate them using Athena . More, I would like to visualize them in QuickSight by connecting to Athena as a data source. The problem is that, after each run of my Spark batch, the newly generated data stored in S3 will not be discovered by Athena, unless I manually run the query

Aws Athena - Create external table skipping first row

阅读更多关于 Aws Athena - Create external table skipping first row

问题 I'm trying to create an external table on csv files with Aws Athena with the code below but the line TBLPROPERTIES ("skip.header.line.count"="1") doesn't work: it doesn't skip the first line (header) of the csv file. CREATE EXTERNAL TABLE mytable ( colA string, colB int ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'separatorChar' = ',', 'quoteChar' = '\"', 'escapeChar' = '\\' ) STORED AS TEXTFILE LOCATION 's3://mybucket/mylocation/' TBLPROPERTIES (

Hive select a particular string from a json row

阅读更多关于 Hive select a particular string from a json row

问题 Im trying to analyze AWS cloud trail logs in Athena, for that If I select security group add inbound rules event it returns the below string in elements column . {"groupId":"sg-XXXX","ipPermissions":{"items":[{"ipProtocol":"tcp","fromPort":22,"toPort":22,"groups":{},"ipRanges":{"items":[{"cidrIp":"0.0.0.0/0"}]},"prefixListIds":{}}]}} But I need groupId alone from that json results. So how can I get that ? Note: The tables is an external table 回答1: select json_extract_scalar('{"groupId":"sg

Create AWS Athena view programmatically

阅读更多关于 Create AWS Athena view programmatically

问题 Can you create views in Amazon Athena? outlines how to create a view using the User Interface. I'd like to create an AWS Athena View programatically, ideally using Terraform (which calls CloudFormation). I followed the steps outlined here: https://ujjwalbhardwaj.me/post/create-virtual-views-with-aws-glue-and-query-them-using-athena, however I run into an issue with this in that the view goes stale quickly. ...._view' is stale; it must be re-created. The terraform code looks like this:

Issue with REGEXP_EXTRACT function

阅读更多关于 Issue with REGEXP_EXTRACT function

问题 I have a field in a table being stored as such: id (array<struct<peopleidl:string,householdidl:string>>) So a sample result looks like this: [{peopleidl=Xi3020rmDOhU2iWYUu3AXytMOggv6jdRK8_xyzy_COup1vd3uU0-OcWz4C3vW-ew9IeZEN, householdidl=null}] I'm attempting to use the REGEXP_EXTRACT function to capture everything between the = sign and the , so the new field would simply be Xi3020rmDOhU2iWYUu3AXytMOggv6jdRK8_xyzy_COup1vd3uU0-OcWz4C3vW-ew9IeZEN But when I try: REGEXP_EXTRACT(idls, '=(.+),')