amazon-redshift | 易学教程

Matching consecutive digits REGEXP_REPLACE in Redshift

阅读更多关于 Matching consecutive digits REGEXP_REPLACE in Redshift

问题 I'm trying to remove consecutive numbers from a string in Redshift. From '16,16,16,3,3,4,16,16,' I want to get '16,3,4,16,' . The following construction doesn't work for me: SELECT regexp_replace('16,16,16,3,3,4,16,16,', '(.+)\1{1,}', '\1'); It's returning exactly the same string. :( Thanks! 回答1: Here is the answer using a Redshift python UDF. create or replace function dedupstring(InputStr varChar) returns varchar stable as $$ OutputStr='' PrevStr='' first=True for part in InputStr.split(','

RPostgreSQL - R Connection to Amazon Redshift - How to WRITE/Post Bigger Data Sets

阅读更多关于 RPostgreSQL - R Connection to Amazon Redshift - How to WRITE/Post Bigger Data Sets

问题 I'm experimenting with how to connect R with Amazon's Redshift - and publishing a short blog for other newbies. Some good progress - I'm able to do most things (create tables, select data, and even sqlSave or dbSendQuery 'line by line' HOWEVER, I have not found a way to do a BULK UPLOAD of a table in one shot (e.g. copy the whole 5X150 IRIS table/data frame to Redshift) - that doesnt take more than a minute. Question: Any advice for a newish person to RPostgreSQL on how to write/upload a

Get the the auto id for inserted row into Redshift table using psycopg2 in Python

阅读更多关于 Get the the auto id for inserted row into Redshift table using psycopg2 in Python

问题 I am inserting a record into a Amazon Redshift table from Python 2.7 using psycopg2 library and I would like to get back the auto generate primary id for the inserted row. I have tried the usual ways I can find here or in other websites using google search, eg: conn=psycopg2.connect(conn_str) conn.autocommit = True sql = "INSERT INTO schema.table (col1, col2) VALUES (%s, %s) RETURNING id;" cur = conn.cursor() cur.execute(sql,(val1,val2)) id = cur.fetchone()[0] I receive an error on cur

Spark 2.0.0 truncate from Redshift table using jdbc

阅读更多关于 Spark 2.0.0 truncate from Redshift table using jdbc

问题 Hello I am using Spark SQL(2.0.0) with Redshift where I want to truncate my tables. I am using this spark-redshift package & I want to know how I can truncate my table.Can anyone please share example of this ?? 回答1: I was unable to accomplish this using Spark and the code in the spark-redshift repo that you have listed above. I was, however, able to use AWS Lambda with psycopg2 to truncate a redshift table. Then I use boto3 to kick off my spark job via AWS Glue. The important code below is

SQLFeatureNotSupportedException on Amazon Redshift

阅读更多关于 SQLFeatureNotSupportedException on Amazon Redshift

问题 I am trying to run some ETL process on Amazon Redshift. It's written in Apache Spark. Same code works fine on Postgres but with Redshift is throwing SQLFeatureNotSupportedException: [Amazon][JDBC](10220) Driver not capable. error. I am trying to read data from flat files and write it to the tables. Spark code look like this spark .read.schema(getFileNameAndSchema(table)._2).csv(getFileNameAndSchema(table)._1) .write .mode(SaveMode.Overwrite) .jdbc("jdbc:redshift://url:5429", table,

[Amazon](500150) Error setting/closing connection: Connection timed out

阅读更多关于 [Amazon](500150) Error setting/closing connection: Connection timed out

问题 I am having connectivity issue from Glue console while trying to connect to Redshift Cluster. I am able to connect to Redshift cluster with exact credentials from my Desktop. I have followed the AWS documentation and have "ALL TCP" connections open for Security Groups in which Redshift cluster resides. Both Glue and Redshift are in same Region. Also Glue has been given AWSRedshiftFullAccess. I am running a wall and appreciate if you provide me guidance to resolve this issue. I followed the

Case returns more than one value with join

阅读更多关于 Case returns more than one value with join

问题 I have a problem when I'm using case statement with a join. I have two tables. Tbl_a: and Tbl_b: I'm running the following query: SELECT tbl_a.id, ( CASE WHEN tbl_b.param_type = 'Ignition' Then param_value WHEN tbl_b.param_type = 'Turn' Then param_value WHEN tbl_b.param_type = 'Speed' Then param_value WHEN tbl_b.param_type = 'Break' Then param_value END ) as value FROM public.tbl_a JOIN public.tbl_b on tbl_b.id = tbl_a.id I want to get for each id in tbl_a the first match from tbl_b. If there

AWS Glue to Redshift: duplicate data?

阅读更多关于 AWS Glue to Redshift: duplicate data?

问题 Here are some bullet points in terms of how I have things setup: I have CSV files uploaded to S3 and a Glue crawler setup to create the table and schema. I have a Glue job setup that writes the data from the Glue table to our Amazon Redshift database using a JDBC connection. The Job also is in charge of mapping the columns and creating the redshift table. By re-running a job, I am getting duplicate rows in redshift (as expected). However, is there way to replace or delete rows before

Loading contents of json array in redshift

阅读更多关于 Loading contents of json array in redshift

问题 I'm setting up redshift and importing data from mongo. I have succeeded in using a json path file for a simple document but am now needing to import from a document containing an array. { "id":123, "things":[ { "foo":321, "bar":654 }, { "foo":987, "bar":567 } ] } How do I load the above in to a table like so: select * from things; id | foo | bar --------+------+------- 123 | 321 | 654 123 | 987 | 567 or is there some other way? I can't just store the json array in a varchar(max) column as the

Redshift: Update or Insert each row in column with random data from another table

阅读更多关于 Redshift: Update or Insert each row in column with random data from another table

问题 update testdata.dataset1 set abcd = (select abc from dataset2 order by random() limit 1 ) Doing this only makes one random entry from table dataset2 is getting populated in all the rows of dataset1 table. What I need is to generate each row with random entry from dataset2 table to dataset1 table. Notice: dataset1 can be greater than dataset2 . 回答1: Query 1 You should pass abcd into your subquery to prevent "optimizing". UPDATE dataset1 SET abcd = (SELECT abc FROM dataset2 WHERE abcd = abcd