amazon-redshift

convert MM/DD/YYYY to YYYYMMDD in redshift

梦想与她 提交于 2020-01-03 02:19:06
问题 I have a requirement to convert MM/DD/YYYY to YYYYMMDD in amazon redshift database. My result of this query gives me some weird result. Can some one please help me. select to_date ('07/17/2017','YYYYMMDD'); 0007-07-20 回答1: If you just wish to convert the hard-coded string into a DATE : select to_date('07/17/2017', 'MM/DD/YYYY') If you have a column already formatted as DATE , then use: to_char(fieldname, 'YYYYMMDD') Combining the two concepts: select to_char(to_date('07/17/2017', 'MM/DD/YYYY'

Redshift - Calculate monthly active users

筅森魡賤 提交于 2020-01-03 01:42:09
问题 I have a table which looks like this: Date | User_ID 2017-1-1 | 1 2017-1-1 | 2 2017-1-1 | 4 2017-1-2 | 3 2017-1-2 | 2 ... | .. ... | .. ... | .. ... | .. 2017-2-1 | 1 2017-2-2 | 2 ... | .. ... | .. ... | .. I'd like to calculate the monthly active users over a rolling 30 day period. I know Redshift does not do COUNT(DISTINCT)) windowing. What can I do to get the following output? Date | MAU 2017-1-1 | 3 2017-1-2 | 4 <- We don't want to count user_id 2 twice. ... | .. ... | .. ... | .. 2017-2

Redshift - Calculate monthly active users

女生的网名这么多〃 提交于 2020-01-03 01:41:43
问题 I have a table which looks like this: Date | User_ID 2017-1-1 | 1 2017-1-1 | 2 2017-1-1 | 4 2017-1-2 | 3 2017-1-2 | 2 ... | .. ... | .. ... | .. ... | .. 2017-2-1 | 1 2017-2-2 | 2 ... | .. ... | .. ... | .. I'd like to calculate the monthly active users over a rolling 30 day period. I know Redshift does not do COUNT(DISTINCT)) windowing. What can I do to get the following output? Date | MAU 2017-1-1 | 3 2017-1-2 | 4 <- We don't want to count user_id 2 twice. ... | .. ... | .. ... | .. 2017-2

Does case matter when 'auto' loading data from S3 into a Redshift table? [duplicate]

孤街浪徒 提交于 2020-01-02 11:04:49
问题 This question already has answers here : Loading JSON data to AWS Redshift results in NULL values (3 answers) Closed 2 years ago . I am loading data from S3 into Redshift using the COPY command, the gzip flag and the 'auto' format, as per this documentation on loading from S3, this documentation for using the 'auto' format in AWS, and this documentation for addressing compressed files. My data is a highly nested JSON format, and I have created the redshift table such that the column names

Does case matter when 'auto' loading data from S3 into a Redshift table? [duplicate]

一曲冷凌霜 提交于 2020-01-02 11:04:22
问题 This question already has answers here : Loading JSON data to AWS Redshift results in NULL values (3 answers) Closed 2 years ago . I am loading data from S3 into Redshift using the COPY command, the gzip flag and the 'auto' format, as per this documentation on loading from S3, this documentation for using the 'auto' format in AWS, and this documentation for addressing compressed files. My data is a highly nested JSON format, and I have created the redshift table such that the column names

What does the column skew_sorkey1 in Amazon Redshift's svv_table_info imply?

断了今生、忘了曾经 提交于 2020-01-02 07:31:31
问题 Redshift's documentation (http://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html) states that the definition of the column skew_sortkey1 is - Ratio of the size of the largest non-sort key column to the size of the first column of the sort key, if a sort key is defined. Use this value to evaluate the effectiveness of the sort key. What does this imply? What does it mean if this value is large? or alternatively small? Thanks! 回答1: A large skew_sortkey1 value means that the ratio of

What does the column skew_sorkey1 in Amazon Redshift's svv_table_info imply?

我只是一个虾纸丫 提交于 2020-01-02 07:31:10
问题 Redshift's documentation (http://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html) states that the definition of the column skew_sortkey1 is - Ratio of the size of the largest non-sort key column to the size of the first column of the sort key, if a sort key is defined. Use this value to evaluate the effectiveness of the sort key. What does this imply? What does it mean if this value is large? or alternatively small? Thanks! 回答1: A large skew_sortkey1 value means that the ratio of

Difference between RDS and Redshift

断了今生、忘了曾经 提交于 2020-01-02 06:25:28
问题 Can anyone list down the main differences between Amazon Redshift and RDS? I know both are relational DB's but why choose one over the other ? 回答1: RDS is a managed service for Online Transaction Processing databases (OLTP), i.e. a managed service for the usual MySQL, PostgreSQL, Oracle, MariaDB, Microsoft SQL Server or Aurora (Amazon's own relational database) Redshift is a managed service for data warehousing, i.e. columnar oriented storage, typical for business analytics type of workloads.

Copying only new records from AWS DynamoDB to AWS Redshift

家住魔仙堡 提交于 2020-01-02 02:37:10
问题 I see there is tons of examples and documentation to copy data from DynamoDB to Redshift, but we are looking at an incremental copy process where only the new rows are copied from DynamoDB to Redshift. We will run this copy process everyday, so there is no need to kill the entire redshift table each day. Does anybody have any experience or thoughts on this topic? 回答1: Dynamo DB has a feature (currently in preview) called Streams: Amazon DynamoDB Streams maintains a time ordered sequence of

Efficient GROUP BY a CASE expression in Amazon Redshift/PostgreSQL

一笑奈何 提交于 2020-01-01 13:27:55
问题 In analytics processing there is often a need to collapse "unimportant" groups of data into a single row in the resulting table. One way to do this is to GROUP BY a CASE expression where unimportant groups are coalesced into a single row via the CASE expression returning a single value, e.g., NULL for the groups. This question is about efficient ways to perform this grouping in Amazon Redshift, which is based on ParAccel: close to PosgreSQL 8.0 in terms of functionality. As an example,