Redshift

Redshift COPY command delimiter not found

匿名 (未验证) 提交于 2019-12-03 02:50:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm trying to load some text files to Redshift. They are tab delimited, except for after the final row value. That's causing a delimiter not found error. I only see a way to set the field delimiter in the COPY statement, not a way to set a row delimiter. Any ideas that don't involve processing all my files to add a tab to the end of each row? Thanks 回答1: I don't think the problem is with missing <tab> at the end of lines. Are you sure that ALL lines have correct number of fields? Run the query: select le.starttime, d.query, d.line_number, d

Amazon redshift: bulk insert vs COPYing from s3

匿名 (未验证) 提交于 2019-12-03 02:44:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have a redshift cluster that I use for some analytics application. I have incoming data that I would like to add to a clicks table. Let's say I have ~10 new 'clicks' that I want to store each second. If possible, I would like my data to be available as soon as possible in redshift. From what I understand, because of the columnar storage, insert performance is bad, so you have to insert by batches. My workflow is to store the clicks in redis, and every minute, I insert the ~600 clicks from redis to redshift as a batch. I have to ways of

Pivot a table with Amazon RedShift / PostgreSQL

匿名 (未验证) 提交于 2019-12-03 02:38:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I have several tables in Amazon RedShift that follow the pattern of several dimension columns and a pair of metric name/value columns. DimensionA DimensionB MetricName MetricValue ---------- ---------- ---------- ----------- dimA1 dimB1 m1 v11 dimA1 dimB2 m1 v12 dimA1 dimB2 m2 v21 dimA2 dimB2 m1 v13 dimA3 dimB1 m2 v22 I am looking for a good way to unwind/pivot the data into a form of one row per each unique dimension set, e.g.: DimensionA DimensionB m1 m2 ---------- ---------- --- --- dimA1 dimB1 v11 dimA1 dimB2 v12 v21 dimA2

S3 to Redshift : Copy with Access Denied

匿名 (未验证) 提交于 2019-12-03 02:34:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: We previously used to copy files from s3 to Redshift using the COPY command every day, from a bucket with no specific policy. COPY schema.table_staging FROM 's3://our-bucket/X/YYYY/MM/DD/' CREDENTIALS 'aws_access_key_id=xxxxxx;aws_secret_access_key=xxxxxx' CSV GZIP DELIMITER AS '|' TIMEFORMAT 'YYYY-MM-DD HH24:MI:SS'; As we needed to improve the security of our S3 bucket, we added a policy to authorize connections either from our VPC (the one we use for our Redshift cluster) or specific IP address. { "Version": "2012-10-17", "Id":

Redshift COPY operation doesn&#039;t work in SQLAlchemy

匿名 (未验证) 提交于 2019-12-03 02:30:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm trying to do a Redshift COPY in SQLAlchemy. The following SQL correctly copies objects from my S3 bucket into my Redshift table when I execute it in psql: COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS 'aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto'; I have several files named s3://mybucket/the/key/prefix.001.json s3://mybucket/the/key/prefix.002.json etc. I can verify that the new rows were added to the table with select count(*) from posts . However, when I execute the exact same

Does Google BigQuery/ Amazon Redshift use column-based relational database or NoSQL database?

匿名 (未验证) 提交于 2019-12-03 02:26:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm still not very clear about the difference between a column-based relational database vs. column-based NoSQL database. Google BigQuery enables SQL-like query so how can it be NoSQL? Column-based relational database I know of are InfoBright, Vertica and Sybase IQ. Column-based NoSQL database I know of are Cassandra and HBase. The following article about Redshift starts with saying "NoSQL" but ends with PostgreSQL (which is relational) being used: http://nosqlguide.com/column-store/intro-to-amazon-redshift-a-columnar-nosql-database/ 回答1: A

Convert text to timestamp in redshift

匿名 (未验证) 提交于 2019-12-03 02:25:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have a text field "presence_changed_at" with text values i.e. '2014/12/17 08:05:28 +0000 . I need to convert this into timestamp. In postgreSQL there is function TO_TIMESTAMP() , however in redshift this does not seem to be supported. I can get the date without time by TO_DATE("presence_changed_at",'YYYY/MM/DD HH24:MI:SS') which produces 2014-12-12 but i can't find any way to get TIMESTAMP format. Thanks in advance for solving this 回答1: try convert function: convert(timestamp,presence_changed_at) 回答2: It's surprisingly horrible to get a

Copying data from S3 to Redshift - Access denied

匿名 (未验证) 提交于 2019-12-03 01:45:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: We are having trouble copying files from S3 to Redshift. The S3 bucket in question allows access only from a VPC in which we have a Redshift cluster. We have no problems with copying from public S3 buckets. We tried both, key-based and IAM role based approach, but result is the same: we keep getting 403 Access Denied by S3. Any idea what we are missing? Thanks. EDIT: Queries we use: 1. (using IAM role): copy redshift_table from 's3://bucket/file.csv.gz' credentials 'aws_iam_role=arn:aws:iam::123456789:role/redshift-copyunload' delimiter '|'

Max stage using timestamp SQL Redshift

匿名 (未验证) 提交于 2019-12-03 01:42:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'd like to find the record associated with the max exited_on per application_id (sample table below). I started out with the following SQL but get an error message telling me that my subquery has too many columns. SELECT * FROM application_stages where application_stages.application_id = '91649746' and (application_stages.application_id, max(exited_on) in (select application_stages.application_id, max(exited_on) from application_stages group by application_stages.application_id)) Table 1 +----------------+-------+--------------------+------

Amazon Redshift Foreign Keys - Sort or Interleaved Keys

匿名 (未验证) 提交于 2019-12-03 01:40:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: We plan to import OLTP Relational tables into AWS Redshift. The CustomerTransaction table joins to multiple lookup tables. I only included 3, but we have more. What should Sort Key be on Customer Transaction Table? In regular SQL server, we have nonclustered indexes on the foreign keys in CustomerTransaction table. For AWS Redshift, Should I use compound sort keys or interleaved sort on foreign key columns in CustomerTransaction? What is the best indexing strategy for this table design. Thanks, create table.dbo CustomerTransaction {