amazon-redshift

Redshift : defining composite primary key

孤街醉人 提交于 2019-12-23 08:05:10
问题 I have a table for which I want to define a composite primary key with two columns in redshift. I am having some trouble with create table syntax. Here is what I am trying to do : Create table metrics ( id varchar(30), runtime timestamp, category varchar(30), location varchar(30)) primary key(id, runtime), sortkey(runtime); It is failing with a message : ERROR: syntax error at or near "PRIMARY" Can anyone please help me figure out how to fix it? Thanks in advance. 回答1: The primary key

Amazon Redshift - lateral column alias reference

拟墨画扇 提交于 2019-12-23 04:26:16
问题 Based on Amazon Redshift announces support for lateral column alias reference: The support for lateral column alias reference enables you to write queries without repeating the same expressions in the SELECT list. For example, you can define the alias 'probability' and use it within the same select statement: select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data; Which is basically the same as: select 1 AS col ,col + 1 AS col2; db<>fiddle demo

Pick a random attribute from group in Redshift

元气小坏坏 提交于 2019-12-23 04:07:09
问题 I have a data set in the form. id | attribute ----------------- 1 | a 2 | b 2 | a 2 | a 3 | c Desired output: attribute| num ------------------- a | 1 b,a | 1 c | 1 In MySQL, I would use: select attribute, count(*) num from (select id, group_concat(distinct attribute) attribute from dataset group by id) as subquery group by attribute; I am not sure this can be done in Redshift because it does not support group_concat or any psql group aggregate functions like array_agg() or string_agg(). See

Suggestion for scheduling tool(s) for building hadoop based data pipelines

青春壹個敷衍的年華 提交于 2019-12-22 17:54:56
问题 Between Apache Oozie, Spotify/Luigi and airbnb/airflow, what are the pros and cons for each of them? I have used oozie and airflow in the past for building a data ingestion pipeline using PIG and Hive. Currently, I am in the process of building a pipeline that looks at logs and extracts out useful events and puts them on redshift. I found that airflow was much easier to use/test/setup. It has a much cooler UI and lets users perform actions from the UI itself, which is not the case with Oozie.

Escaping delimiter in Amazon Redshift COPY command

ぐ巨炮叔叔 提交于 2019-12-22 09:56:27
问题 I'm pulling data from Amazon S3 into a table in Amazon Redshift. The table contains various columns, where some column data might contain special characters. The copy command has an option called Delimiter where we can specify the delimiter while pulling the data into the table. The issue is 2 fold - When I export (unload command) to S3 using a delimiter - say , - it works fine, but when I try to import into Redshift from S3, the issue creeps in because certain columns contain the ','

Trying to Connect to Redshift Over AWS Lambda

自闭症网瘾萝莉.ら 提交于 2019-12-22 08:57:14
问题 I'm using the node-postgres client to my AWS Redshift database. Locally, I'm able to run the following code in node , getting print statements for ">> connected" and ">>> successful query. jsonResult: ". However, when I run this code in Amazon Lambda, I don't see any log statements besides "trying to connect...". console.log("trying to connect..."); var r = pg.connect(conString, function(err, client) { if(err) { return console.log('>> could not connect to redshift', err); } console.log(">>

Hide databases in Amazon Redshift cluster from certain users

≡放荡痞女 提交于 2019-12-22 08:08:12
问题 Is it possible to hide the existence of and access to databases (incl. their schemas, tables etc) from certain users within Amazon Redshift . By default, it seems like every user is able to see other DBs even though he doesnt have permission to select data nor any other (non-default) privileges. I tried REVOKE ALL PRIVILEGES ON DATABASE testdb FROM testdbuser; and similar but still testdbuser can connect to the testdb DB and even see all other objects in his object browser in a SQL tool (here

Split values over multiple rows in RedShift

情到浓时终转凉″ 提交于 2019-12-22 07:25:09
问题 The question of how to split a field (e.g. a CSV string) into multiple rows has already been answered: Split values over multiple rows. However, this question refers to MSSQL, and the answers use various features for which there are no RedShift equivalents. For the sake of completeness, here's an example of what I'd like to do: Current data: | Key | Data | +-----+----------+ | 1 | 18,20,22 | | 2 | 17,19 | Required data: | Key | Data | +-----+----------+ | 1 | 18 | | 1 | 20 | | 1 | 22 | | 2 |

Epoch to timeformat 'YYYY-MM-DD HH:MI:SS' while redshift copy

风格不统一 提交于 2019-12-22 07:02:53
问题 Is there any way to format the epoch to timeformat 'YYYY-MM-DD HH:MI:SS' while doing redshift copy from s3 to redshift using COPY command 回答1: You can use redshift COPY command with parameter TIMEFORMAT 'epochsecs' or TIMEFORMAT 'epochmillisecs' Check redshift documentation for more details 回答2: Sample COPY query using javascript miliseconds (13 digits): Possible options in documentation COPY "hits" FROM 's3://your-bucket/your_folder/' CREDENTIALS 'aws_access_key_id=<AWS_ACCESS_KEY_ID>;aws

REST API for Redshift

懵懂的女人 提交于 2019-12-22 06:46:22
问题 I'm currently brainstorming an idea and trying to figure out whether it's feasible or a better way to handle this approach. Assume I have a Redshift table and I want to expose this table through a REST API. For example, there are several customer who needs some kind of meta data from this table. They will call a REST service and it will execute on the Redshift to get the data and will response to the client in JSON format. I'm fairly new in Redshift/AWS area so not sure whether AWS already