Error while loading parquet format file into Amazon Redshift using copy command and manifest file

…衆ロ難τιáo~ 提交于 2021-01-28 19:57:01

问题


I'm trying to load parquet file using manifest file and getting below error.

query: 124138ailed due to an internal error. File 'https://s3.amazonaws.com/sbredshift-east/data/000002_0 has an invalid version number: )

Here is my copy command

copy testtable from 's3://sbredshift-east/manifest/supplier.manifest' IAM_ROLE 'arn:aws:iam::123456789:role/MyRedshiftRole123' FORMAT AS PARQUET manifest;

here is my manifest file

**{  
   "entries":[  
      {  
         "url":"s3://sbredshift-east/data/000002_0",
         "mandatory":true,
         "meta":{  
            "content_length":1000
         }
      }
   ]
}**

I'm able to load the same file using copy command by specifying the file name.

copy testtable from 's3://sbredshift-east/data/000002_0' IAM_ROLE 'arn:aws:iam::123456789:role/MyRedshiftRole123' FORMAT AS PARQUET;

INFO: Load into table 'supplier' completed, 800000 record(s) loaded successfully. COPY

What could be wrong in my copy statement?


回答1:


This error happens when the content_length value is wrong. You have to specify the correct content_length. You could check it executing an s3 ls command.

aws s3 ls s3://sbredshift-east/data/
2019-12-27 11:15:19        539 sbredshift-east/data/000002_0

The 539 (file size) should be the same than the content_lenght value in your manifest file.

I don't know why they are using this meta value when you don't need it in the direct copy command.

¯\_(ツ)_/¯




回答2:


The only way I've gotten parquet copy to work with manifest file is to add the meta key with the content_length.

From what I can gather in my error logs, the COPY command for parquet (w/ manifest) might first be reading the files using Redshift Spectrum as an external table. If that's the case, this hidden step does require the content_step which contradicts their initial statement about COPY commands.

https://docs.amazonaws.cn/en_us/redshift/latest/dg/loading-data-files-using-manifest.html



来源:https://stackoverflow.com/questions/56944036/error-while-loading-parquet-format-file-into-amazon-redshift-using-copy-command

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!