amazon-redshift

Using Redshfit as Spring batch Job Repository and alternatives to SEQUENCE in Redshfit

白昼怎懂夜的黑 提交于 2019-12-20 03:17:32
问题 One of the requirements in my project is to place the spring batch schema on amazon redshift db. I am planning to start from the schema-postgresql.sql as the base line as redshift is based on postgres. Looking at the spring batch source code it looks like you need to do few things to make this work: Extending JobRepositoryFactoryBean, DefaultDataFieldMaxValueIncrementerFactory. Adding My own RedshfitMaxValueIncrementer that extends AbstractSequenceMaxValueIncrementer Looking at the redshift

lag to get first non null value since the previous null value

你。 提交于 2019-12-20 02:17:11
问题 Below is an example of what I'm trying to achieve in a Redshift Database. I have a variable current_value and I want to create a new column value_desired that is: the same as current_value if the previous row is null equal to the last preceding non-null value if the previous row is non-null It sounds like an easy task but I haven't found a way to do it yet. row_numb current_value value_desired 1 2 3 47 47 4 5 45 45 6 7 8 42 42 9 41 42 10 40 42 11 39 42 12 38 42 13 14 36 36 15 16 17 33 33 18

Redshift unload's file name

时光总嘲笑我的痴心妄想 提交于 2019-12-19 06:46:53
问题 I'm running a Redshift unload command, but am not getting the name I desire. The command is: UNLOAD ('select * from foo') TO 's3://mybucket/foo' CREDENTIALS 'xxxxxx' GZIP NULL AS 'NULL' DELIMITER as '\t' allowoverwrite parallel off The result is mybucket/foo-000.gz. I don't want the slice number to be the end of the file name (it'd be great if it can be eliminated completely), I want to add a file extension at end of the file name. I'd like to see either of the following: mybucket/foo-000.txt

Pivot for redshift database

可紊 提交于 2019-12-18 19:38:13
问题 I know this question has been asked before but any of the answers were not able to help me to meet my desired requirements. So asking the question in new thread In redshift how can use pivot the data into a form of one row per each unique dimension set, e.g.: id Name Category count 8660 Iced Chocolate Coffees 105 8660 Iced Chocolate Milkshakes 10 8662 Old Monk Beer 29 8663 Burger Snacks 18 to id Name Cofees Milkshakes Beer Snacks 8660 Iced Chocolate 105 10 0 0 8662 Old Monk 0 0 29 0 8663

sequence number generation function in AWS redshift

青春壹個敷衍的年華 提交于 2019-12-18 17:31:07
问题 Is there a sequence number generation function in redshift ? Or a function that takes combination of values and gives out a numerical hash key ? 回答1: There is no concept of sequences (as seen in Oracle) at the moment. You have a few options: Number tables RANK() or ROW_NUMBER() window functions over the whole set. Note that this can have some negative performance implications if you have a multi-node cluster. Columns defined as IDENTITY(seed, step). Note that IDENTITY sequence may be 'sparse'

Redshift Performance of Flat Tables Vs Dimension and Facts

天大地大妈咪最大 提交于 2019-12-18 12:40:09
问题 I am trying to create dimensional model on a flat OLTP tables (not in 3NF). There are people who are thinking dimensional model table is not required because most of the data for the report present single table. But that table contains more than what we need like 300 columns. Should I still separate flat table into dimensions and facts or just use the flat tables directly in the reports. 回答1: When creating tables purely for reporting purposes (as is typical in a Data Warehouse), it is

What does it mean to have multiple sortkey columns?

穿精又带淫゛_ 提交于 2019-12-18 10:33:34
问题 Redshift allows designating multiple columns as SORTKEY columns, but most of the best-practices documentation is written as if there were only a single SORTKEY. If I create a table with SORTKEY (COL1, COL2) , does that mean that all columns are stored sorted by COL1, then COL2? Or maybe, since it is a columnar store, each column gets stored in a different order? I.e. COL1 in COL1 order, COL2 in COL2 order, and the other columns unordered? My situation is that I have a table with (among others

Connect Lambda to Redshift in Different Availability Zones

北慕城南 提交于 2019-12-18 09:37:12
问题 Our Redshift cluster resides in Zone A. When our Lambda function uses a Zone A subnet, it can connect to Redshift. When our Lambda function uses a subnet other than Zone A, it times out. The work around, where we ALLOW connections for Redshift on port 5439 from 0.0.0.0/0, is not desired. We have our Lambda functions and Redshift cluster in the same VPC. Lambda functions have 4 dedicated subnets (one per zone) Redshift has 4 dedicated subnets per zone as well Lambda functions have their own

How to get the last day of month in postgres?

≡放荡痞女 提交于 2019-12-18 07:42:12
问题 How to find the last day os the month in postgres? I have a date columns stored as numeric(18) in the format(YYYYMMDD) I am trying it to make it date using to_date("act_dt",'YYYYMMDD') AS "act date" then find the last day of this date: like this: (select (date_trunc('MONTH',to_date("act_dt",'YYYYMMDD')) + INTERVAL '1 MONTH - 1 day')::date) but it gives me this error: ERROR: Interval values with month or year parts are not supported Detail: ----------------------------------------------- error

Conecting AWS Lambda to Redshift - Times out after 60 seconds

痞子三分冷 提交于 2019-12-18 05:56:12
问题 I created an AWS Lambda function that: logs onto Redshift via JDBC URL runs a query Locally, using Node, I can successfully connect to the Redshift instance via JDBC, and execute a query. var conString = "postgresql://USER_NAME:PASSWORD@JDBC_URL”; var client = new pg.Client(conString); client.connect(function(err) { if(err) {
 console.log('could not connect to redshift', err);
 } 
 // omitted due to above error However, when I execute the function on AWS Lambda (where it's wrapped in a async