amazon-redshift

Fill the table with data for missing date (postgresql, redshift)

我只是一个虾纸丫 提交于 2019-12-10 22:46:10
问题 I'm trying to fill daily data for missing dates and can not find an answer, please help. My daily_table example: url | timestamp_gmt | visitors | hits | other.. -------------------+---------------+----------+-------+------- www.domain.com/1 | 2016-04-12 | 1231 | 23423 | www.domain.com/1 | 2016-04-13 | 1374 | 26482 | www.domain.com/1 | 2016-04-17 | 1262 | 21493 | www.domain.com/2 | 2016-05-09 | 2345 | 35471 | Expected result: I wand to fill this table with data for every domain and every day

Redshift - Delimited value missing end quote

≯℡__Kan透↙ 提交于 2019-12-10 21:13:07
问题 Im trying to load a CSV file to redshift. Delimiter '|' 1'st column of CSV: 1 |Bhuvi|"This is ok"|xyz@domain.com I used this command to load. copy tbl from 's3://datawarehouse/source.csv' iam_role 'arn:aws:iam:::role/xxx'cas-pulse-redshift' delimiter '|' removequotes ACCEPTINVCHARS ; ERROR: raw_field_value | This is ok" |xyz@domain.com err_code | 1214 err_reason | Delimited value missing end quote then I tried this too. copy tbl from 's3://datawarehouse/source.csv' iam_role 'arn:aws:iam::

Redshift. How can we transpose (dynamically) a table from columns to rows?

╄→гoц情女王★ 提交于 2019-12-10 18:25:32
问题 How can we transpose a Redshift table from columns to rows? For example, if we have a generic (not already known) table like the following: source table: date id alfa beta gamma ... omega 2018-08-03 1 1 2 3 4 2018-08-03 2 4 3 2 1 ... 2018-09-04 1 3 1 2 4 ... How we can achieve the following result? transposed table: date id column_name column_value 2018-08-03 1 alfa 1 2018-08-03 1 beta 2 ... 2018-08-03 2 omega 1 ... 2018-09-04 1 gamma 2 ... Where the target table, the number of columns (alfa,

how can aws glue job upload several tables in redshift

家住魔仙堡 提交于 2019-12-10 18:17:06
问题 Is it possible to load multiple tables in Redshift using AWS Glue job? These are the steps I followed. Crawled json from S3 and the data has been translated into data catalog table. I created a job that will upload the data catalog table in redshift but it only limits me to upload 1 table for every job. In the job properties (in adding a job), This job runs option I chose is: A proposed script generated by AWS Glue. I am not familiar with python and I am new to AWS Glue. but I have several

Is it possible to run a join between two different AWS Redshift Databases in the same cluster?

我们两清 提交于 2019-12-10 14:42:22
问题 just wondering if this is possible? I see some older links stating this is not possible from 2015 or so but wondering if it is now and if there is any available documentation that states yay/nay. Thanks! 回答1: It is not possible to queries across or JOIN logical databases created via the CREATE DATABASE command. This is the same for PostgreSQL, on which Amazon Redshift was based. See: Joining Results from Two Separate Databases While PostgreSQL has the dlink module that can join separate

How to parse host out of a string in Redshift?

不打扰是莪最后的温柔 提交于 2019-12-10 11:13:25
问题 I'm looking for a Postgres (actually Redshift) equivalent to Hive's parse_url(..., 'HOST'). Postgres docs say it has a URL parser as part of its full text search. This blog post has a regex which may or may not be bulletproof. What is best? 回答1: If you weren't using Redshift, I'd say "use PL/Perlu, PL/Python, or one of the other procedural languages to get a regular URL parser". Since you're on a proprietary fork of Pg 8.1 you're going to have to settle for a hacky regexp I suspect. There is

AWS Redshift : DISTKEY / SORTKEY columns should be compressed?

試著忘記壹切 提交于 2019-12-10 10:56:18
问题 Let me ask something about column compression on AWS Redshift. Now we're verifying what can be made better performance using appropriate diststyle, sortkeys and column compression. If my understanding is correct, the column compression can help to reduce IO cost. I tried "analyze compression table_name;". And mostly Redshift suggests to use 'zstd' or 'lzo' as compression method for our columns. In general speaking, may I ask the columns set as DISTKEY/SORTKEY should be also compressed like

How to calculate median in AWS Redshift?

别等时光非礼了梦想. 提交于 2019-12-10 02:25:02
问题 Most databases have a built in function for calculating the median but I don't see anything for median in Amazon Redshift. You could calculate the median using a combination of the nth_value() and count() analytic functions but that seems janky. I would be very surprised if an analytics db didn't have a built in method for computing median so I'm assuming I'm missing something. http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_NTH_WF.html http://docs.aws.amazon.com/redshift/latest

Deleting duplicates rows from redshift

99封情书 提交于 2019-12-09 14:49:56
问题 I am trying to delete some duplicate data in my redshift table. Below is my query:- With duplicates As (Select *, ROW_NUMBER() Over (PARTITION by record_indicator Order by record_indicator) as Duplicate From table_name) delete from duplicates Where Duplicate > 1 ; This query is giving me an error. Amazon Invalid operation: syntax error at or near "delete"; Not sure what the issue is as the syntax for with clause seems to be correct. Has anybody faced this situation before? 回答1: Redshift being

postgresql (redshift) maximum value for a specific column

自作多情 提交于 2019-12-09 13:48:24
问题 I'm working on redshift - I have a table like userid oid version number_of_objects 1 ab 1 10 1 ab 2 20 1 ab 3 17 1 ab 4 16 1 ab 5 14 1 cd 1 5 1 cd 2 6 1 cd 3 9 1 cd 4 12 2 ef 1 4 2 ef 2 3 2 gh 1 16 2 gh 2 12 2 gh 3 21 I would like to select from this table the maximum version number for every oid and get the userid and the number of the row. When I tried this, unfortunately I've got the whole table back: SELECT MAX(version), oid, userid, number_of_objects FROM table GROUP BY oid, userid,