amazon-redshift | 易学教程

Amazon Redshift to Mysql using Pentaho Data Integration

阅读更多关于 Amazon Redshift to Mysql using Pentaho Data Integration

问题 We are using Amazon redshift and the data base is POSTGRESQL.Tha data sit in amazon cloud. We need to load data from Amazon redshift to Mysql using Pentaho Data Integration Software.Could you please tell us how to connect to Redshift via Pentaho ??? 回答1: I'll try to help you. The redshift connection will need the PostgreSql JDBC in the lib folder of your pentaho data-integration. But the one that comes with Pentaho have some issues with redshift, this may be solved by removing the existent

Convert Varchar to Float/double

阅读更多关于 Convert Varchar to Float/double

问题 I have column 'Age' (Varchar) How can I select query and convert it to (float) Pet_Name Age(varchar) John 2 years 6 months. Anne 3 years and 6 months. output: Pet_Name Age(float/double) John 2.5 Anne 3.5 my problem is that the inputs does not follow a specific date/age format and was entered as string. 回答1: Assuming the age column always has two numbers, I have used REGEXP_SUBSTR function in Redshift to write below answer: create temp table pets (petname varchar(10), age varchar(20)); insert

Ranking values to determine highest value

阅读更多关于 Ranking values to determine highest value

问题 I have data that looks as such: +-----+--------+--------+--------+ | ID | score1 | score2 | score3 | +-----+--------+--------+--------+ | 123 | 14 | 561 | 580 | | 123 | 626 | 771 | 843 | | 123 | 844 | 147 | 904 | | 456 | 922 | 677 | 301 | | 456 | 665 | 578 | 678 | | 456 | 416 | 631 | 320 | +-----+--------+--------+--------+ What I'm trying to do is create another column that provides which score is the highest amongst the three. Remember, I'm not looking for what the value is, I'm looking for

Redshift SQL: add and reset a counter with date and group considered

阅读更多关于 Redshift SQL: add and reset a counter with date and group considered

问题 Suppose I have a table below. I'd like to have a counter to count the # of times when a Customer (there are many) is in Segment A. If the Customer jumps to a different Segment between 2 quarters, the counter will reset when the Customer jumps back to Segment A. I am sure there are many ways to do it, but I just can't figure this out..Please help. Thank you! Quarter Segment Customer *Counter* Q1 2018 A A1 1 Q2 2018 A A1 2 Q3 2018 A A1 3 Q4 2018 B A1 1 Q1 2019 B A1 2 Q2 2019 A A1 1 Q1 2020 A A1

Redshift SQL: add and reset a counter with date and group considered

阅读更多关于 Redshift SQL: add and reset a counter with date and group considered

Write data to Redshift using Spark 2.0.1

阅读更多关于 Write data to Redshift using Spark 2.0.1

问题 I am doing a POC, where I want to write some simple data set to Redshift. I have following sbt file: name := "Spark_POC" version := "1.0" scalaVersion := "2.10.6" libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.1" libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "2.0.1" resolvers += "jitpack" at "https://jitpack.io" libraryDependencies += "com.databricks" %% "spark-redshift" % "3.0.0-preview1" and following code: object Main extends App{ val conf = new

How to change decimal separator from comma to fullstop in redshift copy command

阅读更多关于 How to change decimal separator from comma to fullstop in redshift copy command

问题 I'm working with raw data that has a comma as the decimal separator rather than fullstop (3,99 instead of 3.99). Is there a way to convert this directly in redshift copy command rather than having to upload then change afterwards? 回答1: There are two issues to consider: Field delimiters Replacing Characters The default delimiter in the Amazon Redshift COPY command is a pipe character ( | ), unless the CSV option is used, in which case the default delimiter is a comma ( , ). Thus, if your file

Implementing NULLS FIRST in Amazon Redshift

阅读更多关于 Implementing NULLS FIRST in Amazon Redshift

问题 I am using sum window function for row number, similar to this query - SELECT field_a, SUM(1) OVER (PARTITION BY field_b ORDER BY field_c ASC ROWS UNBOUNDED PRECEDING) AS row_number FROM test_table ORDER BY row_number; The problem is that if field_c is a null value, it appears at the end. I want it at the beginning, so null value is treated as smaller than all other values. In Oracle, this could be done by providing NULLS FIRST argument, but its not supported in Redshift. So how do I

Amazon Redshift Foreign Keys - Sort or Interleaved Keys

阅读更多关于 Amazon Redshift Foreign Keys - Sort or Interleaved Keys

问题 We plan to import OLTP Relational tables into AWS Redshift. The CustomerTransaction table joins to multiple lookup tables. I only included 3, but we have more. What should Sort Key be on Customer Transaction Table? In regular SQL server, we have nonclustered indexes on the foreign keys in CustomerTransaction table. For AWS Redshift, Should I use compound sort keys or interleaved sort on foreign key columns in CustomerTransaction? What is the best indexing strategy for this table design.

Amazon Redshift Foreign Keys - Sort or Interleaved Keys

阅读更多关于 Amazon Redshift Foreign Keys - Sort or Interleaved Keys