amazon-redshift

Matching consecutive digits REGEXP_REPLACE in Redshift

自作多情 提交于 2019-12-13 15:58:30
问题 I'm trying to remove consecutive numbers from a string in Redshift. From '16,16,16,3,3,4,16,16,' I want to get '16,3,4,16,' . The following construction doesn't work for me: SELECT regexp_replace('16,16,16,3,3,4,16,16,', '(.+)\1{1,}', '\1'); It's returning exactly the same string. :( Thanks! 回答1: Here is the answer using a Redshift python UDF. create or replace function dedupstring(InputStr varChar) returns varchar stable as $$ OutputStr='' PrevStr='' first=True for part in InputStr.split(','

RPostgreSQL - R Connection to Amazon Redshift - How to WRITE/Post Bigger Data Sets

你离开我真会死。 提交于 2019-12-13 13:20:45
问题 I'm experimenting with how to connect R with Amazon's Redshift - and publishing a short blog for other newbies. Some good progress - I'm able to do most things (create tables, select data, and even sqlSave or dbSendQuery 'line by line' HOWEVER, I have not found a way to do a BULK UPLOAD of a table in one shot (e.g. copy the whole 5X150 IRIS table/data frame to Redshift) - that doesnt take more than a minute. Question: Any advice for a newish person to RPostgreSQL on how to write/upload a

Get the the auto id for inserted row into Redshift table using psycopg2 in Python

旧街凉风 提交于 2019-12-13 13:11:38
问题 I am inserting a record into a Amazon Redshift table from Python 2.7 using psycopg2 library and I would like to get back the auto generate primary id for the inserted row. I have tried the usual ways I can find here or in other websites using google search, eg: conn=psycopg2.connect(conn_str) conn.autocommit = True sql = "INSERT INTO schema.table (col1, col2) VALUES (%s, %s) RETURNING id;" cur = conn.cursor() cur.execute(sql,(val1,val2)) id = cur.fetchone()[0] I receive an error on cur

Spark 2.0.0 truncate from Redshift table using jdbc

断了今生、忘了曾经 提交于 2019-12-13 07:22:56
问题 Hello I am using Spark SQL(2.0.0) with Redshift where I want to truncate my tables. I am using this spark-redshift package & I want to know how I can truncate my table.Can anyone please share example of this ?? 回答1: I was unable to accomplish this using Spark and the code in the spark-redshift repo that you have listed above. I was, however, able to use AWS Lambda with psycopg2 to truncate a redshift table. Then I use boto3 to kick off my spark job via AWS Glue. The important code below is

SQLFeatureNotSupportedException on Amazon Redshift

落花浮王杯 提交于 2019-12-13 06:54:28
问题 I am trying to run some ETL process on Amazon Redshift. It's written in Apache Spark. Same code works fine on Postgres but with Redshift is throwing SQLFeatureNotSupportedException: [Amazon][JDBC](10220) Driver not capable. error. I am trying to read data from flat files and write it to the tables. Spark code look like this spark .read.schema(getFileNameAndSchema(table)._2).csv(getFileNameAndSchema(table)._1) .write .mode(SaveMode.Overwrite) .jdbc("jdbc:redshift://url:5429", table,

[Amazon](500150) Error setting/closing connection: Connection timed out

谁说胖子不能爱 提交于 2019-12-13 03:56:45
问题 I am having connectivity issue from Glue console while trying to connect to Redshift Cluster. I am able to connect to Redshift cluster with exact credentials from my Desktop. I have followed the AWS documentation and have "ALL TCP" connections open for Security Groups in which Redshift cluster resides. Both Glue and Redshift are in same Region. Also Glue has been given AWSRedshiftFullAccess. I am running a wall and appreciate if you provide me guidance to resolve this issue. I followed the

Case returns more than one value with join

元气小坏坏 提交于 2019-12-13 03:40:48
问题 I have a problem when I'm using case statement with a join. I have two tables. Tbl_a: and Tbl_b: I'm running the following query: SELECT tbl_a.id, ( CASE WHEN tbl_b.param_type = 'Ignition' Then param_value WHEN tbl_b.param_type = 'Turn' Then param_value WHEN tbl_b.param_type = 'Speed' Then param_value WHEN tbl_b.param_type = 'Break' Then param_value END ) as value FROM public.tbl_a JOIN public.tbl_b on tbl_b.id = tbl_a.id I want to get for each id in tbl_a the first match from tbl_b. If there

AWS Glue to Redshift: duplicate data?

倾然丶 夕夏残阳落幕 提交于 2019-12-13 02:42:36
问题 Here are some bullet points in terms of how I have things setup: I have CSV files uploaded to S3 and a Glue crawler setup to create the table and schema. I have a Glue job setup that writes the data from the Glue table to our Amazon Redshift database using a JDBC connection. The Job also is in charge of mapping the columns and creating the redshift table. By re-running a job, I am getting duplicate rows in redshift (as expected). However, is there way to replace or delete rows before

Loading contents of json array in redshift

徘徊边缘 提交于 2019-12-13 02:26:40
问题 I'm setting up redshift and importing data from mongo. I have succeeded in using a json path file for a simple document but am now needing to import from a document containing an array. { "id":123, "things":[ { "foo":321, "bar":654 }, { "foo":987, "bar":567 } ] } How do I load the above in to a table like so: select * from things; id | foo | bar --------+------+------- 123 | 321 | 654 123 | 987 | 567 or is there some other way? I can't just store the json array in a varchar(max) column as the

Redshift: Update or Insert each row in column with random data from another table

故事扮演 提交于 2019-12-12 18:08:47
问题 update testdata.dataset1 set abcd = (select abc from dataset2 order by random() limit 1 ) Doing this only makes one random entry from table dataset2 is getting populated in all the rows of dataset1 table. What I need is to generate each row with random entry from dataset2 table to dataset1 table. Notice: dataset1 can be greater than dataset2 . 回答1: Query 1 You should pass abcd into your subquery to prevent "optimizing". UPDATE dataset1 SET abcd = (SELECT abc FROM dataset2 WHERE abcd = abcd