amazon-athena

How to ignore amazon athena struct order

 ̄綄美尐妖づ 提交于 2019-12-04 06:29:00
问题 I'm getting an HIVE_PARTITION_SCHEMA_MISMATCH error that I'm not quite sure what to do about. When I look at the 2 different schemas, the only thing that's different is the order of the keys in one of my structs (created by a glue crawler). I really don't care about the order of the data, and I'm receiving the data as a JSON blob, so I cannot guarantee the order of the keys. struct<device_id:string,user_id:string,payload:array<struct<channel:string,sensor_id:string,type:string,unit:string

Amazon Athena Covert String to Date

烂漫一生 提交于 2019-12-04 04:42:15
问题 I am looking to convert the following string: mmm-dd-yyyy to a date: yyyy-mm-dd e.g Nov-06-2015 to 2015-11-06 within Amazon Athena 回答1: I would do date_parse. Adjust your regex accordingly. select date_parse('Nov-06-2015','%b-%d-%Y') 2015-11-06 00:00:00.000 refd:https://prestodb.io/docs/current/functions/datetime.html 来源: https://stackoverflow.com/questions/48152596/amazon-athena-covert-string-to-date

AWS Glue Custom Classifiers Json Path

北城以北 提交于 2019-12-04 03:33:15
问题 I have a set of Json data files that look like this [ {"client":"toys", "filename":"toy1.csv", "file_row_number":1, "secondary_db_index":"4050", "processed_timestamp":1535004075, "processed_datetime":"2018-08-23T06:01:15+0000", "entity_id":"4050", "entity_name":"4050", "is_emailable":false, "is_txtable":false, "is_loadable":false} ] I have created a Glue Crawler with the following custom classifier Json Path $[*] Glue returns the correct schema with the columns correctly identified. However,

How to use aws athena using nodejs?

百般思念 提交于 2019-12-03 17:01:46
Athena is analytics service for retrieving data from s3 using sql query. I have queried data in s3 using t aws console Need access to aws athena using nodejs code I am using athena like following way in my nodejs project : download JDBC driver from AWS . Create a connector.js file. npm install jdbc NPM . Paste followings: var JDBC = require('jdbc'); var jinst = require('jdbc/lib/jinst'); if (!jinst.isJvmCreated()) { jinst.addOption("-Xrs"); jinst.setupClasspath(['./AthenaJDBC41-*.jar']); } var config = { // Required url: 'jdbc:awsathena://athena.*.amazonaws.com:443', // Optional drivername:

AWS Glue: crawler misinterprets timestamps as strings. GLUE ETL meant to convert strings to timestamps makes them NULL

一曲冷凌霜 提交于 2019-12-03 16:55:18
I have been playing around with AWS Glue for some quick analytics by following the tutorial here While I have been able to successfully create crawlers and discover data in Athena, I've had issues with the data types created by the crawler. The date and timestamp data types get read as string data types. I followed this up by creating an ETL job in GLUE using the data source created by the crawler as the input and a target table in Amazon S3 As part of the mapping transformation, I converted the data types of the date and timestamp as string to timestamp but unfortunately the ETL converted

Amazon AWS Athena S3 and Glacier Mixed Bucket

淺唱寂寞╮ 提交于 2019-12-02 19:41:06
Amazon Athena Log Analysis Services with S3 Glacier We have petabytes of data in S3. We are https://www.pubnub.com/ and we store usage data in S3 of our network for billing purposes. We have tab delimited log files stored in an S3 bucket. Athena is giving us a HIVE_CURSOR_ERROR failure. Our S3 bucket is setup to automatically push to AWS Glacier after 6 months. Our bucket has S3 files hot and ready to read in addition to the Glacier backup files. We are getting access errors from Athena because of this. The file referenced in the error is a Glacier backup. My guess is the answer will be: don't

How to ignore amazon athena struct order

荒凉一梦 提交于 2019-12-02 10:14:18
I'm getting an HIVE_PARTITION_SCHEMA_MISMATCH error that I'm not quite sure what to do about. When I look at the 2 different schemas, the only thing that's different is the order of the keys in one of my structs (created by a glue crawler). I really don't care about the order of the data, and I'm receiving the data as a JSON blob, so I cannot guarantee the order of the keys. struct<device_id:string,user_id:string,payload:array<struct<channel:string,sensor_id:string,type:string,unit:string,value:double,name:string>>,topic:string,channel:string,client_id:string,hardware_id:string,timestamp

Connecting Athena and S3 in same Cloudformation Stack

て烟熏妆下的殇ゞ 提交于 2019-12-02 04:44:32
From the documentation, AWS::Athena::NamedQuery , it is unclear how to attach Athena to an S3 bucket specified in the same stack. If I had to guess from the example , I would imagine that you can write a template like, Resources: MyS3Bucket: Type: AWS::S3::Bucket ... other params ... AthenaNamedQuery: Type: AWS::Athena::NamedQuery Properties: Database: "db_name" Name: "MostExpensiveWorkflow" QueryString: > CREATE EXTERNAL TABLE db_name.test_table (...) LOCATION s3://.../path/to/folder/ Would a template like the above work? Upon stack creation, will the table db_name.test_table be available to

AWS Glue Custom Classifiers Json Path

时光怂恿深爱的人放手 提交于 2019-12-01 19:28:09
I have a set of Json data files that look like this [ {"client":"toys", "filename":"toy1.csv", "file_row_number":1, "secondary_db_index":"4050", "processed_timestamp":1535004075, "processed_datetime":"2018-08-23T06:01:15+0000", "entity_id":"4050", "entity_name":"4050", "is_emailable":false, "is_txtable":false, "is_loadable":false} ] I have created a Glue Crawler with the following custom classifier Json Path $[*] Glue returns the correct schema with the columns correctly identified. However, when I query the data on Athena... all the data is landing in the first column and the rest of the

Presto SQL : Changing time zones using time zone string coming as a result of a query is not working

霸气de小男生 提交于 2019-12-01 09:55:52
问题 I am connecting to AWS Athena through Mode Analytics Platform and querying a table using its Query Engine ( which is based on Presto 0.172 ). This table public.zones has time zone information stored in a column called time_zone on some regions I am interested in, stored as varchar . For example if I type: SELECT time_zone FROM public.zones LIMIT 4; I get (as expected): time_zone ---------- US/Pacific US/Eastern US/Eastern US/Eastern I can run this test query: SELECT timestamp '2017-06-01 12