amazon-redshift

Talend job running with slow transfer rate

若如初见. 提交于 2019-12-12 05:17:20
问题 I am new to talend and have very limited experience in it , My task required to perform daily incremental update from sql rds to redshift on daliy basis, however my job runs with very slow transfer rate details are listed below my sql rds query is SELECT * FROM test.ankit2 WHERE id > (SELECT COALESCE(max(id), 0) as id FROM test.stagetable) ankit2 is the table in myrds and stagetable is table in redshift and used tmap component to link the component from rds input to redshift output component

AWS Lambda and Multipart Upload to/from S3

人盡茶涼 提交于 2019-12-12 04:33:56
问题 Using Lambda to move files from an S3 to our Redshift. The data is placed in the S3 using an UNLOAD command directly from the data provider's Redshift. It comes in 10 different parts that, due to running in parallel, sometimes complete at different times. I want the Lambda trigger to wait until all the data is completely uploaded before firing the trigger to import the data to my Redshift. There is an event option in Lambda called "Complete Multipart Upload." Does the UNLOAD function count as

How to create table structure in RDS similar to source table in Redshift and vice-versa

可紊 提交于 2019-12-12 04:27:32
问题 Trying to automate table creation in database(RDS/Redshift) by using source table structure that is also present in either Redshift or RDS. Database on which I am working on is RDS(mysql) and Redshift(postgresql). Approaches- Write all the mappings between data-types of RDS and Redshift, and handles all the edge cases. Another way is by using any API/Library. Questions- I found "CREATE TABLE AS" in Redshift, but didn't get anything for RDS (MYSQL). Is there anything similar in RDS ? 来源: https

Is there a correct way to parse output in Bash from a command?

妖精的绣舞 提交于 2019-12-12 04:25:34
问题 I'm using psql to run a few commands, where the last one is something similar to: select max(id) from tablename . I'm trying to get the id of the last row inserted. It returns the correct value (7) in this scenario. Begin Informatica Script SET INSERT 0 1 max ----- 7 (1 row) Redshift Delta copy completed at: 04/10/17 17:45:21 END However, I'm trying to parse it and have no idea what to do... Is there a way to limit the output to just the value of 7 ? If not, how do I grab just the number?

Amazon CloudWatch is not returning Redshift metrics

痴心易碎 提交于 2019-12-12 03:31:59
问题 Below is my part of Python script to retrieve Redshift's PercentageDiskSpaceUsed metric. I have changed my code from the previous post. When i write script using boto3, its not working. But working when written using boto2. Pasting both the scripts. Please check and correct:- Script using boto2:- from boto.ec2 import cloudwatch from datetime import datetime, timedelta import boto REDSHIFT_REGION = 'ap-south-1' connection = boto.ec2.cloudwatch.connect_to_region(REDSHIFT_REGION) def set_time

Copying data from S3 to Redshift - Access denied

蹲街弑〆低调 提交于 2019-12-12 03:27:58
问题 We are having trouble copying files from S3 to Redshift. The S3 bucket in question allows access only from a VPC in which we have a Redshift cluster. We have no problems with copying from public S3 buckets. We tried both, key-based and IAM role based approach, but result is the same: we keep getting 403 Access Denied by S3. Any idea what we are missing? Thanks. EDIT: Queries we use: 1. (using IAM role): copy redshift_table from 's3://bucket/file.csv.gz' credentials 'aws_iam_role=arn:aws:iam:

How do I do a SQL range for date and time dimensions?

淺唱寂寞╮ 提交于 2019-12-12 03:21:48
问题 I have a reporting table that has a saledateid (days) and saletimeid (minutes) dimensions. I need to select a range that may be less or more than one day. If the range is more than one day (e.g. (1705, 901) -> (1708, 1140) to represent 2015-09-01 15:00:00 -> 2015-09-04 18:59:00 ) I can use: WHERE (saledateid = 1705 AND saletimeid >= 901) OR (saledateid BETWEEN 1705 + 1 AND 1708 - 1) OR (saledateid = 1708 AND saletimeid <= 1140)) However, this doesn't work when the saledateid's are the same

Extract data in parentheses with Amazon-redshift

99封情书 提交于 2019-12-11 18:49:24
问题 I have these characters in a table: LEASE THIRD-CCP-MANAGER (AAAA) THE MANAGEMENT OF A THIRD PARTY (BBBB / AAAA) When I extract the information: AAAA BBBB/AAAA That is, I have to look for the pattern and extract what is inside the parenthesis. I'm trying to use the REGEXP_SUBSTR function. In amazon redshift, how do I extract the characters in parentheses? thanks 回答1: use position for finding the index of parenthesis ( and then substring select substring(position('(' in 'LEASE THIRD-CCP

Create a lapsed concept based on logic across every row per ID

落花浮王杯 提交于 2019-12-11 17:44:40
问题 I am trying to get to a lapsed_date which is when there are >12 weeks (ie. 84 days) for a given ID between: 1) onboarded_at and current_date (if no applied_at exists) - this means lapsed_now if >84 days 2) onboarded_at and min( applied_at ) (if one exists) 3) each consecutive applied_at 4) max( applied_at ) and current_date - this means lapsed_now if >84 days If there are multiple instances where he lapsed, then we only show the latest lapsed date. The attempt I have works for most but not

Netezza In-Built AGE function as UDF in Redshift

时间秒杀一切 提交于 2019-12-11 17:16:25
问题 I'm trying to Implement Netezza AGE function in Redshift as a UDF. I can able to get the correct answer in Python (Spyder IDE - Py 3.6) but when I execute it in Redshift as UDF, it gives me incorrect output. I've tried to execute as select AGE_UDF('1994-04-04 20:10:52','2018-09-24 11:31:05'); in Redshift. Here is the code used in RS UDF. CREATE OR REPLACE FUNCTION AGE_UDF (START_DATE TIMESTAMP, END_DATE TIMESTAMP) RETURNS varchar(100) stable AS $$ from datetime import datetime from dateutil