amazon-redshift | 易学教程

lag function to get the last different value(redshift)

阅读更多关于 lag function to get the last different value(redshift)

问题 I have a sample data as below and wanting to get a desired o/p, please help me with some idea. I want the o/p of prev_diff_value of the 3rd,4th row to be 2015-01-01 00:00:00 instead of 2015-01-02 00:00:00. with dat as ( select 1 as id,'20150101 02:02:50'::timestamp as dt union all select 1,'20150101 03:02:50'::timestamp union all select 1,'20150101 04:02:50'::timestamp union all select 1,'20150102 02:02:50'::timestamp union all select 1,'20150102 02:02:50'::timestamp union all select 1,

handle locks in redshift

阅读更多关于 handle locks in redshift

问题 I have a python script that executes multiple sql scripts (one after another) in Redshift. Some of the tables in these sql scripts can be queried multiple times. For ex. Table t1 can be SELECTed in one script and can be dropped/recreated in another script. This whole process is running in one transaction. Now, sometimes, I am getting deadlock detected error and the whole transaction is rolled back. If there is a deadlock on a table, I would like to wait for the table to be released and then

Change order of sortkey to descending

阅读更多关于 Change order of sortkey to descending

问题 We have a 2-node Redshift cluster with a table with around 100M records. We marked a timestamp column as the sortkey - because the queries are always time restricted. However, our use-case requires the results has to be sorted in the descending order (on the sortkey). After some benchmarking, we noticed that the average time taken around 10s. However, when the reverse ordering was removed, the average time came down to under 1s. Is it possible to reverse the order of the sortkey to be of

Move Redshift from Subnet 1 to Subnet 2 within the same VPC

阅读更多关于 Move Redshift from Subnet 1 to Subnet 2 within the same VPC

问题 I have a VPC which has 2 Private subnets i.e. subnet 1 and subnet 2. My redshift cluster sits in subnet 2 and has data. I want to move the redshift from subnet 2 to subnet 1 within the same VPC (Which can be done easily). But I have few doubts related to data migration: Does data migration happens automatically without any data loss or do I need to take the backup, create the cluster in subnet 1 and then again push the backed up data to the cluster. Any leads would be appreciated. 回答1: From

S3 Query Exception (Fetch)

阅读更多关于 S3 Query Exception (Fetch)

问题 I have uploaded data from Redshift to S3 in Parquet format and created the data catalog in Glue. I have been able to query the table from Athena but when I create the external schema on Redshift and tried to query on the table I'm getting the below error ERROR: S3 Query Exception (Fetch) DETAIL: ----------------------------------------------- error: S3 Query Exception (Fetch) code: 15001 context: Task failed due to an internal error. File 'https://s3-eu-west-1.amazonaws.com/bucket/folder

SQL: Gaps and Island Problem - Date not consecutive causing rank inaccurate

阅读更多关于 SQL: Gaps and Island Problem - Date not consecutive causing rank inaccurate

问题 This is a follow up on the other question I asked. Quarter Segment Customer *Counter* Q1 2018 A A1 1 Q2 2018 A A1 2 Q3 2018 A A1 3 Q4 2018 B A1 1 Q1 2019 B A1 2 Q2 2019 A A1 1 Q1 2020 A A1 *1* I want 1 not 2 here because it's not consecutive (we don't have Q3 & Q4 2019) Q2 2020 A A1 *2* I want 2 not 3 here because it reset in Q1 2020. Below query works if the dates are consecutive. How would I adjust the query to get what I'm looking for? I tried adding a new column that is 1 row lag, and

SQL: Gaps and Island Problem - Date not consecutive causing rank inaccurate

阅读更多关于 SQL: Gaps and Island Problem - Date not consecutive causing rank inaccurate

Redshift Querying: error xx000 disk full redshift

阅读更多关于 Redshift Querying: error xx000 disk full redshift

问题 I executed the below query select employee_name, max(employee_dept) as dept from employeeDB where employee_name is not null and employee_name != '' group by employee_name order by employee_name asc limit 1000 and received the error ERROR: XX000: Disk Full . upon investigation by executing the below query i found that i have 941 GB free space and 5000 GB used space. select sum(capacity)/1024 as capacity_gbytes, sum(used)/1024 as used_gbytes, (sum(capacity) - sum(used))/1024 as free_gbytes from

Redshift COPY using JSONPath for missing array/fields

阅读更多关于 Redshift COPY using JSONPath for missing array/fields

问题 I am using the COPY command to load the JSON dataset from S3 to Redshift table. The data is getting loaded partially but it ignores records which has missing data(key-value/array) i.e. from the below example only the first record will get loaded. Query: COPY address from 's3://mybucket/address.json' credentials 'aws_access_key_id=XXXXXXX;aws_secret_access_key=XXXXXXX' maxerror as 250 json 's3:/mybucket/address_jsonpath.json'; My question is how can I load all the records from address.json

Python write DateFrame to AWS redshift using psycopg2

阅读更多关于 Python write DateFrame to AWS redshift using psycopg2

问题 I want to update a table in AWS on a daily basis, what I plan to do is to delete data/rows in a public table in AWS using Python psycopg2 first, then insert a python dataframe data into that table. import psycopg2 import pandas as pd con=psycopg2.connect(dbname= My_Credential.....) cur = con.cursor() sql = """ DELETE FROM tableA """ cur.execute(sql) con.commit() the above code can do the delete, but I don't know how to write python code to insert My_Dataframe to the tableA. TableA size is