amazon-redshift

Amazon Redshift to Mysql using Pentaho Data Integration

旧巷老猫 提交于 2020-01-30 11:28:48
问题 We are using Amazon redshift and the data base is POSTGRESQL.Tha data sit in amazon cloud. We need to load data from Amazon redshift to Mysql using Pentaho Data Integration Software.Could you please tell us how to connect to Redshift via Pentaho ??? 回答1: I'll try to help you. The redshift connection will need the PostgreSql JDBC in the lib folder of your pentaho data-integration. But the one that comes with Pentaho have some issues with redshift, this may be solved by removing the existent

Convert Varchar to Float/double

送分小仙女□ 提交于 2020-01-25 07:13:08
问题 I have column 'Age' (Varchar) How can I select query and convert it to (float) Pet_Name Age(varchar) John 2 years 6 months. Anne 3 years and 6 months. output: Pet_Name Age(float/double) John 2.5 Anne 3.5 my problem is that the inputs does not follow a specific date/age format and was entered as string. 回答1: Assuming the age column always has two numbers, I have used REGEXP_SUBSTR function in Redshift to write below answer: create temp table pets (petname varchar(10), age varchar(20)); insert

Ranking values to determine highest value

旧时模样 提交于 2020-01-25 06:53:29
问题 I have data that looks as such: +-----+--------+--------+--------+ | ID | score1 | score2 | score3 | +-----+--------+--------+--------+ | 123 | 14 | 561 | 580 | | 123 | 626 | 771 | 843 | | 123 | 844 | 147 | 904 | | 456 | 922 | 677 | 301 | | 456 | 665 | 578 | 678 | | 456 | 416 | 631 | 320 | +-----+--------+--------+--------+ What I'm trying to do is create another column that provides which score is the highest amongst the three. Remember, I'm not looking for what the value is, I'm looking for

Redshift SQL: add and reset a counter with date and group considered

空扰寡人 提交于 2020-01-24 20:50:46
问题 Suppose I have a table below. I'd like to have a counter to count the # of times when a Customer (there are many) is in Segment A. If the Customer jumps to a different Segment between 2 quarters, the counter will reset when the Customer jumps back to Segment A. I am sure there are many ways to do it, but I just can't figure this out..Please help. Thank you! Quarter Segment Customer *Counter* Q1 2018 A A1 1 Q2 2018 A A1 2 Q3 2018 A A1 3 Q4 2018 B A1 1 Q1 2019 B A1 2 Q2 2019 A A1 1 Q1 2020 A A1

Redshift SQL: add and reset a counter with date and group considered

自古美人都是妖i 提交于 2020-01-24 20:50:07
问题 Suppose I have a table below. I'd like to have a counter to count the # of times when a Customer (there are many) is in Segment A. If the Customer jumps to a different Segment between 2 quarters, the counter will reset when the Customer jumps back to Segment A. I am sure there are many ways to do it, but I just can't figure this out..Please help. Thank you! Quarter Segment Customer *Counter* Q1 2018 A A1 1 Q2 2018 A A1 2 Q3 2018 A A1 3 Q4 2018 B A1 1 Q1 2019 B A1 2 Q2 2019 A A1 1 Q1 2020 A A1

Write data to Redshift using Spark 2.0.1

冷暖自知 提交于 2020-01-24 15:45:09
问题 I am doing a POC, where I want to write some simple data set to Redshift. I have following sbt file: name := "Spark_POC" version := "1.0" scalaVersion := "2.10.6" libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.1" libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "2.0.1" resolvers += "jitpack" at "https://jitpack.io" libraryDependencies += "com.databricks" %% "spark-redshift" % "3.0.0-preview1" and following code: object Main extends App{ val conf = new

How to change decimal separator from comma to fullstop in redshift copy command

若如初见. 提交于 2020-01-17 04:27:04
问题 I'm working with raw data that has a comma as the decimal separator rather than fullstop (3,99 instead of 3.99). Is there a way to convert this directly in redshift copy command rather than having to upload then change afterwards? 回答1: There are two issues to consider: Field delimiters Replacing Characters The default delimiter in the Amazon Redshift COPY command is a pipe character ( | ), unless the CSV option is used, in which case the default delimiter is a comma ( , ). Thus, if your file

Implementing NULLS FIRST in Amazon Redshift

空扰寡人 提交于 2020-01-17 03:13:05
问题 I am using sum window function for row number, similar to this query - SELECT field_a, SUM(1) OVER (PARTITION BY field_b ORDER BY field_c ASC ROWS UNBOUNDED PRECEDING) AS row_number FROM test_table ORDER BY row_number; The problem is that if field_c is a null value, it appears at the end. I want it at the beginning, so null value is treated as smaller than all other values. In Oracle, this could be done by providing NULLS FIRST argument, but its not supported in Redshift. So how do I

Amazon Redshift Foreign Keys - Sort or Interleaved Keys

喜欢而已 提交于 2020-01-16 10:32:02
问题 We plan to import OLTP Relational tables into AWS Redshift. The CustomerTransaction table joins to multiple lookup tables. I only included 3, but we have more. What should Sort Key be on Customer Transaction Table? In regular SQL server, we have nonclustered indexes on the foreign keys in CustomerTransaction table. For AWS Redshift, Should I use compound sort keys or interleaved sort on foreign key columns in CustomerTransaction? What is the best indexing strategy for this table design.

Amazon Redshift Foreign Keys - Sort or Interleaved Keys

筅森魡賤 提交于 2020-01-16 10:31:42
问题 We plan to import OLTP Relational tables into AWS Redshift. The CustomerTransaction table joins to multiple lookup tables. I only included 3, but we have more. What should Sort Key be on Customer Transaction Table? In regular SQL server, we have nonclustered indexes on the foreign keys in CustomerTransaction table. For AWS Redshift, Should I use compound sort keys or interleaved sort on foreign key columns in CustomerTransaction? What is the best indexing strategy for this table design.