Return data from subselect used in INSERT in a Common Table Expression

问题

I am trying to move bytea data from one table to another, updating references in one query.

Therefore I would like to return data from the query used for the insert that is not used for the insert.

INSERT INTO file_data (data)
  select image from task_log where image is not null
RETURNING id as file_data_id, task_log.id as task_log_id

But I get an error for that query:

[42P01] ERROR: missing FROM-clause entry for table "task_log"

I want to do something like:

WITH inserted AS (
  INSERT INTO file_data (data)
    SELECT image FROM task_log WHERE image IS NOT NULL
  RETURNING id AS file_data_id, task_log.id AS task_log_id
)
UPDATE task_log
SET    task_log.attachment_id = inserted.file_data_id,
       task_log.attachment_type = 'INLINE_IMAGE'
FROM   inserted
WHERE  inserted.task_log_id = task_log.id;

But I fail to get all data used for the insert, I can't return the id from the subselect.

I was inspired by this answer on how to do that with Common Table Expressions but I can't find a way to make it work.

回答1:

You need to get your table names and aliases right. Plus, the connection between the two tables is the column image (datain the new table file_data):

WITH inserted AS (
  INSERT INTO file_data (data)
  SELECT image
  FROM   task_log
  WHERE  image IS NOT NULL
  RETURNING id, data  -- can only reference target row
)
UPDATE task_log t
SET    attachment_id = i.id
     , attachment_type = 'INLINE_IMAGE'
FROM   inserted i
WHERE  t.image = i.data;

Like explained in my old answer you referenced, image must be unique in task_log for this to work:

Insert data and set foreign keys with Postgres

I added a technique how to disambiguate non-unique values in the referenced answer. Not sure if you'd want duplicate images in file_data, though.

In the RETURNING clause of an INSERT you can only reference columns from the inserted row. The manual:

The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted (...) However, any expression using the table's columns is allowed.

Bold emphasis mine.

Fold duplicate source values

If you want distinct entries in the target table of the INSERT (task_log), all you need in this case is DISTINCT in the initial SELECT:

WITH inserted AS (
  INSERT INTO file_data (data)
  SELECT DISTINCT image  -- fold duplicates
  FROM   task_log
  WHERE  image IS NOT NULL
  RETURNING id, data  -- can only reference target row
)
UPDATE task_log t
SET    attachment_id = i.id
     , attachment_type = 'INLINE_IMAGE'
FROM   inserted i
WHERE  t.image = i.data;

The resulting file_data.id is used multiple times in task_log. Be aware that multiple rows in task_log now point to the same image in file_data. Careful with updates and deletes ...

回答2:

I needed to replicate duplicates so I ended up adding a temp column for the id of the used data row.

alter table file_data add column task_log_id bigint;
-- insert & update data
alter table file_data drop column task_log_id;

The full move script was

-- A new table for any file data
CREATE TABLE file_data (
  id         BIGSERIAL PRIMARY KEY,
  data  bytea
);

-- Move data from task_log to bytes

-- Create new columns to reference file_data
alter table task_log add column attachment_type VARCHAR(50);
alter table task_log add column attachment_id bigint REFERENCES file_data;

-- add a temp column for the task_id used for the insert
alter table file_data add column task_log_id bigint;

-- insert data into file_data and set references
with inserted as (
  INSERT INTO file_data (data, task_log_id)
    select image, id from task_log where image is not null
  RETURNING id, task_log_id
)
UPDATE task_log
SET   attachment_id = inserted.id,
      attachment_type = 'INLINE_IMAGE'
FROM  inserted
where inserted.task_log_id = task_log.id;
-- delete the temp column
alter table file_data drop column task_log_id;
-- delete task_log images
alter table task_log drop column image;

As this produces some dead data I ran a vacuum full afterwards to clean up.

But please let me repeat the warning from @ErwinBrandstetter:

Performance is much worse than for the method using a serial number I proposed in the linked answer. Adding & removing a column require's owner's privileges, a full table rewrite and exclusive locks on the table, which is poison for concurrent access.

来源：https://stackoverflow.com/questions/47202078/return-data-from-subselect-used-in-insert-in-a-common-table-expression

标签

sql

postgresql

common-table-expression

data-migration

postgresql-10