amazon-redshift | 易学教程

Redshift : defining composite primary key

阅读更多关于 Redshift : defining composite primary key

问题 I have a table for which I want to define a composite primary key with two columns in redshift. I am having some trouble with create table syntax. Here is what I am trying to do : Create table metrics ( id varchar(30), runtime timestamp, category varchar(30), location varchar(30)) primary key(id, runtime), sortkey(runtime); It is failing with a message : ERROR: syntax error at or near "PRIMARY" Can anyone please help me figure out how to fix it? Thanks in advance. 回答1: The primary key

Amazon Redshift - lateral column alias reference

阅读更多关于 Amazon Redshift - lateral column alias reference

问题 Based on Amazon Redshift announces support for lateral column alias reference: The support for lateral column alias reference enables you to write queries without repeating the same expressions in the SELECT list. For example, you can define the alias 'probability' and use it within the same select statement: select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data; Which is basically the same as: select 1 AS col ,col + 1 AS col2; db<>fiddle demo

Pick a random attribute from group in Redshift

阅读更多关于 Pick a random attribute from group in Redshift

问题 I have a data set in the form. id | attribute ----------------- 1 | a 2 | b 2 | a 2 | a 3 | c Desired output: attribute| num ------------------- a | 1 b,a | 1 c | 1 In MySQL, I would use: select attribute, count(*) num from (select id, group_concat(distinct attribute) attribute from dataset group by id) as subquery group by attribute; I am not sure this can be done in Redshift because it does not support group_concat or any psql group aggregate functions like array_agg() or string_agg(). See

Suggestion for scheduling tool(s) for building hadoop based data pipelines

阅读更多关于 Suggestion for scheduling tool(s) for building hadoop based data pipelines

问题 Between Apache Oozie, Spotify/Luigi and airbnb/airflow, what are the pros and cons for each of them? I have used oozie and airflow in the past for building a data ingestion pipeline using PIG and Hive. Currently, I am in the process of building a pipeline that looks at logs and extracts out useful events and puts them on redshift. I found that airflow was much easier to use/test/setup. It has a much cooler UI and lets users perform actions from the UI itself, which is not the case with Oozie.

Escaping delimiter in Amazon Redshift COPY command

阅读更多关于 Escaping delimiter in Amazon Redshift COPY command

问题 I'm pulling data from Amazon S3 into a table in Amazon Redshift. The table contains various columns, where some column data might contain special characters. The copy command has an option called Delimiter where we can specify the delimiter while pulling the data into the table. The issue is 2 fold - When I export (unload command) to S3 using a delimiter - say , - it works fine, but when I try to import into Redshift from S3, the issue creeps in because certain columns contain the ','

Trying to Connect to Redshift Over AWS Lambda

阅读更多关于 Trying to Connect to Redshift Over AWS Lambda

问题 I'm using the node-postgres client to my AWS Redshift database. Locally, I'm able to run the following code in node , getting print statements for ">> connected" and ">>> successful query. jsonResult: ". However, when I run this code in Amazon Lambda, I don't see any log statements besides "trying to connect...". console.log("trying to connect..."); var r = pg.connect(conString, function(err, client) { if(err) { return console.log('>> could not connect to redshift', err); } console.log(">>

Hide databases in Amazon Redshift cluster from certain users

阅读更多关于 Hide databases in Amazon Redshift cluster from certain users

问题 Is it possible to hide the existence of and access to databases (incl. their schemas, tables etc) from certain users within Amazon Redshift . By default, it seems like every user is able to see other DBs even though he doesnt have permission to select data nor any other (non-default) privileges. I tried REVOKE ALL PRIVILEGES ON DATABASE testdb FROM testdbuser; and similar but still testdbuser can connect to the testdb DB and even see all other objects in his object browser in a SQL tool (here

Split values over multiple rows in RedShift

阅读更多关于 Split values over multiple rows in RedShift

问题 The question of how to split a field (e.g. a CSV string) into multiple rows has already been answered: Split values over multiple rows. However, this question refers to MSSQL, and the answers use various features for which there are no RedShift equivalents. For the sake of completeness, here's an example of what I'd like to do: Current data: | Key | Data | +-----+----------+ | 1 | 18,20,22 | | 2 | 17,19 | Required data: | Key | Data | +-----+----------+ | 1 | 18 | | 1 | 20 | | 1 | 22 | | 2 |

Epoch to timeformat 'YYYY-MM-DD HH:MI:SS' while redshift copy

阅读更多关于 Epoch to timeformat 'YYYY-MM-DD HH:MI:SS' while redshift copy

问题 Is there any way to format the epoch to timeformat 'YYYY-MM-DD HH:MI:SS' while doing redshift copy from s3 to redshift using COPY command 回答1: You can use redshift COPY command with parameter TIMEFORMAT 'epochsecs' or TIMEFORMAT 'epochmillisecs' Check redshift documentation for more details 回答2: Sample COPY query using javascript miliseconds (13 digits): Possible options in documentation COPY "hits" FROM 's3://your-bucket/your_folder/' CREDENTIALS 'aws_access_key_id=<AWS_ACCESS_KEY_ID>;aws

REST API for Redshift

阅读更多关于 REST API for Redshift

问题 I'm currently brainstorming an idea and trying to figure out whether it's feasible or a better way to handle this approach. Assume I have a Redshift table and I want to expose this table through a REST API. For example, there are several customer who needs some kind of meta data from this table. They will call a REST service and it will execute on the Redshift to get the data and will response to the client in JSON format. I'm fairly new in Redshift/AWS area so not sure whether AWS already