amazon-redshift

lag function to get the last different value(redshift)

拟墨画扇 提交于 2019-12-24 10:17:11
问题 I have a sample data as below and wanting to get a desired o/p, please help me with some idea. I want the o/p of prev_diff_value of the 3rd,4th row to be 2015-01-01 00:00:00 instead of 2015-01-02 00:00:00. with dat as ( select 1 as id,'20150101 02:02:50'::timestamp as dt union all select 1,'20150101 03:02:50'::timestamp union all select 1,'20150101 04:02:50'::timestamp union all select 1,'20150102 02:02:50'::timestamp union all select 1,'20150102 02:02:50'::timestamp union all select 1,

handle locks in redshift

删除回忆录丶 提交于 2019-12-24 08:46:40
问题 I have a python script that executes multiple sql scripts (one after another) in Redshift. Some of the tables in these sql scripts can be queried multiple times. For ex. Table t1 can be SELECTed in one script and can be dropped/recreated in another script. This whole process is running in one transaction. Now, sometimes, I am getting deadlock detected error and the whole transaction is rolled back. If there is a deadlock on a table, I would like to wait for the table to be released and then

Change order of sortkey to descending

北城余情 提交于 2019-12-24 08:31:04
问题 We have a 2-node Redshift cluster with a table with around 100M records. We marked a timestamp column as the sortkey - because the queries are always time restricted. However, our use-case requires the results has to be sorted in the descending order (on the sortkey). After some benchmarking, we noticed that the average time taken around 10s. However, when the reverse ordering was removed, the average time came down to under 1s. Is it possible to reverse the order of the sortkey to be of

Move Redshift from Subnet 1 to Subnet 2 within the same VPC

有些话、适合烂在心里 提交于 2019-12-24 07:19:09
问题 I have a VPC which has 2 Private subnets i.e. subnet 1 and subnet 2. My redshift cluster sits in subnet 2 and has data. I want to move the redshift from subnet 2 to subnet 1 within the same VPC (Which can be done easily). But I have few doubts related to data migration: Does data migration happens automatically without any data loss or do I need to take the backup, create the cluster in subnet 1 and then again push the backed up data to the cluster. Any leads would be appreciated. 回答1: From

S3 Query Exception (Fetch)

本小妞迷上赌 提交于 2019-12-24 07:18:16
问题 I have uploaded data from Redshift to S3 in Parquet format and created the data catalog in Glue. I have been able to query the table from Athena but when I create the external schema on Redshift and tried to query on the table I'm getting the below error ERROR: S3 Query Exception (Fetch) DETAIL: ----------------------------------------------- error: S3 Query Exception (Fetch) code: 15001 context: Task failed due to an internal error. File 'https://s3-eu-west-1.amazonaws.com/bucket/folder

SQL: Gaps and Island Problem - Date not consecutive causing rank inaccurate

霸气de小男生 提交于 2019-12-24 06:22:58
问题 This is a follow up on the other question I asked. Quarter Segment Customer *Counter* Q1 2018 A A1 1 Q2 2018 A A1 2 Q3 2018 A A1 3 Q4 2018 B A1 1 Q1 2019 B A1 2 Q2 2019 A A1 1 Q1 2020 A A1 *1* I want 1 not 2 here because it's not consecutive (we don't have Q3 & Q4 2019) Q2 2020 A A1 *2* I want 2 not 3 here because it reset in Q1 2020. Below query works if the dates are consecutive. How would I adjust the query to get what I'm looking for? I tried adding a new column that is 1 row lag, and

SQL: Gaps and Island Problem - Date not consecutive causing rank inaccurate

与世无争的帅哥 提交于 2019-12-24 06:20:05
问题 This is a follow up on the other question I asked. Quarter Segment Customer *Counter* Q1 2018 A A1 1 Q2 2018 A A1 2 Q3 2018 A A1 3 Q4 2018 B A1 1 Q1 2019 B A1 2 Q2 2019 A A1 1 Q1 2020 A A1 *1* I want 1 not 2 here because it's not consecutive (we don't have Q3 & Q4 2019) Q2 2020 A A1 *2* I want 2 not 3 here because it reset in Q1 2020. Below query works if the dates are consecutive. How would I adjust the query to get what I'm looking for? I tried adding a new column that is 1 row lag, and

Redshift Querying: error xx000 disk full redshift

拟墨画扇 提交于 2019-12-24 03:20:35
问题 I executed the below query select employee_name, max(employee_dept) as dept from employeeDB where employee_name is not null and employee_name != '' group by employee_name order by employee_name asc limit 1000 and received the error ERROR: XX000: Disk Full . upon investigation by executing the below query i found that i have 941 GB free space and 5000 GB used space. select sum(capacity)/1024 as capacity_gbytes, sum(used)/1024 as used_gbytes, (sum(capacity) - sum(used))/1024 as free_gbytes from

Redshift COPY using JSONPath for missing array/fields

て烟熏妆下的殇ゞ 提交于 2019-12-24 01:23:54
问题 I am using the COPY command to load the JSON dataset from S3 to Redshift table. The data is getting loaded partially but it ignores records which has missing data(key-value/array) i.e. from the below example only the first record will get loaded. Query: COPY address from 's3://mybucket/address.json' credentials 'aws_access_key_id=XXXXXXX;aws_secret_access_key=XXXXXXX' maxerror as 250 json 's3:/mybucket/address_jsonpath.json'; My question is how can I load all the records from address.json

Python write DateFrame to AWS redshift using psycopg2

吃可爱长大的小学妹 提交于 2019-12-24 01:16:47
问题 I want to update a table in AWS on a daily basis, what I plan to do is to delete data/rows in a public table in AWS using Python psycopg2 first, then insert a python dataframe data into that table. import psycopg2 import pandas as pd con=psycopg2.connect(dbname= My_Credential.....) cur = con.cursor() sql = """ DELETE FROM tableA """ cur.execute(sql) con.commit() the above code can do the delete, but I don't know how to write python code to insert My_Dataframe to the tableA. TableA size is