amazon-redshift | 易学教程

Redshift UPDATE prohibitively slow

阅读更多关于 Redshift UPDATE prohibitively slow

问题 I have a table in a Redshift cluster with ~1 billion rows. I have a job that tries to update some column values based on some filter. Updating anything at all in this table is incredibly slow. Here's an example: SELECT col1, col2, col3 FROM SOMETABLE WHERE col1 = 'a value of col1' AND col2 = 12; The above query returns in less than a second, because I have sortkeys on col1 and col2 . There is only one row that meets this criteria, so the result set is just one row. However, if I run: UPDATE

Amazon Redshift at 100% disk usage due to VACUUM query

阅读更多关于 Amazon Redshift at 100% disk usage due to VACUUM query

问题 Reading the Amazon Redshift documentatoin I ran a VACUUM on a certain 400GB table which has never been vacuumed before, in attempt to improve query performance. Unfortunately, the VACUUM has caused the table to grow to 1.7TB (!!) and has brought the Redshift's disk usage to 100%. I then tried to stop the VACUUM by running a CANCEL query in the super user queue (you enter it by running "set query_group='superuser';") but although the query didn't raise an error, this had no effect on the

Transfer data from vertica to Redshift using Apache Nifi

阅读更多关于 Transfer data from vertica to Redshift using Apache Nifi

I want to transfer data from vertica to redshift using apache nifi. which are the processors and configuration I need to set? If Vertica and Redshift have "well-behaved" JDBC drivers, you can set up a DBCPConnectionPool for each, then a SQL processor such as ExecuteSQL , QueryDatabaseTable , or GenerateTableFetch (the latter of which generates SQL for use in ExecuteSQL). These will get your records into Avro format, then (prior to NiFi 1.2.0) you can use ConvertAvroToJSON -> ConvertJSONToSQL -> PutSQL to get your records inserted into Redshift. In NiFi 1.2.0, you can use set up an AvroReader

How deal with Redshift lack of support for Arrays

阅读更多关于 How deal with Redshift lack of support for Arrays

问题 Redshift does not support Arrays, however my source database has several Array columns that I need in Redshift. How should this field type be handled when trying to migrate it into Redshift? 回答1: While Redshift does not support arrays in the PostgreSQL-sense, it provides some JSON functions you might want to have a look at: http://docs.aws.amazon.com/redshift/latest/dg/json-functions.html You can insert arrays into varchar columns: create temporary table _test (col1 varchar(20)); insert into

How can I ensure synchronous DDL operations on a table that is being replaced?

阅读更多关于 How can I ensure synchronous DDL operations on a table that is being replaced?

问题 I have multiple processes which are continually refreshing data in Redshift. They start a transaction, create a new table, COPY all the data from S3 into the new table, then drop the old table and rename the new table to the old table. pseudocode: start transaction; create table foo_temp; copy into foo_temp from S3; drop table foo; rename table foo_temp to foo; commit; I have several dozen tables that I update in this way. This works well but I would like to have multiple processes performing

Can you store JSON fields on Redshift?

阅读更多关于 Can you store JSON fields on Redshift?

问题 Does Redshift support JSON fields, like Postgresql's json data type? If so what do I do to use it? 回答1: You can store JSON in Amazon Redshift, within a normal text field. There are functions available to extract data from JSON fields, but it is not an effective way to store data since it doesn't leverage the full capabilities of Redshift's column-based architecture. See: Amazon Redshift documentation - JSON Functions 来源： https://stackoverflow.com/questions/32722687/can-you-store-json-fields

AWS Glue ETL job from AWS Redshift to S3 fails

阅读更多关于 AWS Glue ETL job from AWS Redshift to S3 fails

问题 I am trying out AWS Glue service to ETL some data from redshift to S3. Crawler runs successfully and creates the meta table in data catalog, however when I run the ETL job ( generated by AWS ) it fails after around 20 minutes saying "Resource unavailable". I cannot see AWS glue logs or error logs created in Cloudwatch. When I try to view them it says "Log stream not found. The log stream jr_xxxxxxxxxx could not be found. Check if it was correctly created and retry." I would appreciate it if

Aggregating in SQL - Multiple Criteria

阅读更多关于 Aggregating in SQL - Multiple Criteria

问题 This is a follow-up question to Aggregating in SQL I have a table below. Conversion_Date User_Name Last_Date_Touch Touch_Count 7/15/2017 A 6/17/2017 1 7/16/2017 B 6/24/2017 2 7/19/2017 A 6/20/2017 1 7/19/2017 C 6/29/2017 1 Grouped by Conversion_Date and User_Name , what is the sum of the Touch_Count looking back 30 days from the Conversion Date but at the same time keeping the Last_Date_Touch within that 30 day window of the Conversion_Date . For example, if I look back 30 days from the

type “e” does not exist , Redshift through Postgresql connector in php codeigniter

阅读更多关于 type “e” does not exist , Redshift through Postgresql connector in php codeigniter

问题 i'm using Redshift through Postgresql connector i got the following error while querying in php codeigniter 3.x, php version 7.0 the model is as follows $subQuery = "select max(button_history_id) as button_history_id from button_history where site_id =".$site_id." group by button_history_id "; $this->another->select('cms_group_id'); $this->another->from('button_history'); $this->another->where('button_history_id IN ('.$subQuery.")"); $this->another->where('cms_group_id !=', ''); $queryGrp =

Copying json objects with multiple layouts from S3 into Redshift

阅读更多关于 Copying json objects with multiple layouts from S3 into Redshift

问题 I have an S3 bucket with many files containing "\n" delimited json objects. These json objects can have a few different layouts. There is a standard set of keys that are common across all the layouts. Most differences just have a few extra keys, but some have nested json objects. One file can have any/all of these layouts. I have managed to define a single, basic table in Redshift and copy the data into that table, but any keys not in my table are lost. I would like to create 1 table for each