amazon-redshift | 易学教程

Aggregate case when inside non aggregate query

阅读更多关于 Aggregate case when inside non aggregate query

问题 I have a pretty massive query that in its simplest form looks like this: select r.rep_id, u.user_id, u.signup_date, pi.application_date, pi.management_date, aum from table1 r left join table2 u on r.user_id=u.user_id left join table3 pi on u.user_id=pi.user_id I need to add one more condition that gives me count of users with non null application date per rep (like: rep 1 has 3 users with filled application dates), and assign it into categories (since 3 users, rep is a certain status category

AWS Glue Truncate Redshift Table

阅读更多关于 AWS Glue Truncate Redshift Table

问题 I have created a Glue job that copies data from S3 (csv file) to Redshift. It works and populates the desired table. However, I need to purge the table during this process as I am left with duplicate records after the process completes. I'm looking for a way to add this purge to the Glue process. Any advice would be appreciated. Thanks. 回答1: Did you have a look at Job Bookmarks in Glue? It's a feature for keeping the high water mark and works with s3 only. I am not 100% sure, but it may

Pandas dataframe to Redshift psql table

阅读更多关于 Pandas dataframe to Redshift psql table

问题 I'm trying to keep inserting data into Redshift table. This method often worked for me: test_df.to_sql('MY_table', engine, index=False, if_exists ='append') As you can see, I have already used 'append' for new insertion. But today redshift kept giving me error saying: ProgrammingError: (psycopg2.ProgrammingError) Relation "MY_table" already exists Do you know why? 来源： https://stackoverflow.com/questions/51312684/pandas-dataframe-to-redshift-psql-table

Using Node 'pg' library to connect to Amazon Redshift

阅读更多关于 Using Node 'pg' library to connect to Amazon Redshift

问题 I am trying to connect my API to an instance of Redshift using the 'pg' library but I get the following error: Possibly unhandled error: SET TIME ZONE is not supported at Connection.parseE (/Users/henrylee/WebstormProjects/project-api/node_modules/pg/lib/connection.js:534:11) I know Redshift doesn't support the setting of timezone, but I don't really care about that. I can't seem to find any help about how to get past this issue, so any inout would be greatly appreciated. Thanks! 来源： https:/

How can I get the Redshift/Postgresql LAG window function to selectively exclude records?

阅读更多关于 How can I get the Redshift/Postgresql LAG window function to selectively exclude records?

问题 I have this table in Redshift, and I'm trying to write a query for the following dataset. For those items such as row#3 which are 'renewal successes' and are preceded by a 'sub success', I want to flag them as 'is_first_renewal = true', BUT they might have been preceded by any number of 'RENEWAL Failures' before they succeeded, so I can't use the window function LAG for this scenario. I also cannot filter out FAILURES as my query needs those. id phone op ts pr status result is_first_renewal 1

Best way to join on a range?

阅读更多关于 Best way to join on a range?

问题 I think this may be a common problem that may not have an answer for every tool. Right now we are trying to use amazons Redshift. The only problem we have now is we are trying to do a look up of zip code for an IP address. The table we have that connects IP to city is a range by IP converted to an integer. Example: Start IP | End IP | City | 123123 | 123129 | Rancho Cucamonga| I have tried the obvious inner join on intip >= startip and intip < endip. Does anyone know a good way to do this?

How do I grant access to an Amazon Redshift user to read the system tables, views, logs, etc?

阅读更多关于 How do I grant access to an Amazon Redshift user to read the system tables, views, logs, etc?

问题 I have a user in Amazon Redshift. I want that user to be able to do read-only queries against the system tables: http://docs.aws.amazon.com/redshift/latest/dg/cm_chap_system-tables.html But I don't know how to grant a user who is not a superuser access to these tables as it does not appear to be documented anywhere on amazon. 回答1: Obviously you don't want to grant superuser to another user just so they can see system logs. Being able to monitor is a very common use case that shouldn't require

importing data with commas in numeric fields into redshift

阅读更多关于 importing data with commas in numeric fields into redshift

问题 I am importing data into redshift using the SQL COPY statement. The data has comma thousands separators in the numeric fields which the COPY statement rejects. The COPY statement has a number of options to specify field separators, date and time formats and NULL values. However I do not see anything to specify number formatting. Do I need to preprocess the data before loading or is there a way to get redshift to parse the numbers corerctly? 回答1: Import the columns as TEXT data type in a

remove duplicates from comma separated string (Amazon Redshift)

阅读更多关于 remove duplicates from comma separated string (Amazon Redshift)

问题 I am using Amazon Redshift. I have a column in that string is stored as comma separated like Private, Private, Private, Private, Private, Private, United Healthcare . I want to remove the duplicates from it using query , so the result should be Private, United Healthcare . I found some solutions obviously from Stackoverflow and came to know it is possible using regular expressions. Hence, I have tried using: SELECT regexp_replace('Private, Private, Private, Private, Private, Private, United

Why does Redshift need to do a full table scan to find the max value of the DIST/SORT key?

阅读更多关于 Why does Redshift need to do a full table scan to find the max value of the DIST/SORT key?

问题 I'm doing simple tests on Redshift to try and speed up the insertion of data into a Redshift table. One thing I noticed today is that doing something like this CREATE TABLE a (x int) DISTSTYLE key DISTKEY (x) SORTKEY (x); INSERT INTO a (x) VALUES (1), (2), (3), (4); VACUUM a; ANALYZE a; EXPLAIN SELECT MAX(x) FROM a; yields QUERY PLAN XN Aggregate (cost=0.05..0.05 rows=1 width=4) -> XN Seq Scan on a (cost=0.00..0.04 rows=4 width=4) I know this is only 4 rows, but it still shouldn't be doing a