AWS Lambda function - convert PDF to Image

喜欢而已 提交于 2020-01-01 06:31:27

问题


I am developing application where user can upload some drawings in pdf format. Uploaded files are stored on S3. After uploading, files has to be converted to images. For this purpose I have created lambda function which downloads file from S3 to /tmp folder in lambda execution environment and then I call ‘convert’ command from imagemagick.

convert sourceFile.pdf targetFile.png

Lambda runtime environment is nodejs 4.3. Memory is set to 128MB, timeout 30 sec.

Now the problem is that some files are converted successfully while others are failing with the following error:

{ [Error: Command failed: /bin/sh -c convert /tmp/sourceFile.pdf /tmp/targetFile.png convert: %s' (%d) "gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=/tmp/magick-QRH6nVLV--0000001" "-f/tmp/magick-B610L5uo" "-f/tmp/magick-tIe1MjeR" @ error/utility.c/SystemCommand/1890. convert: Postscript delegate failed/tmp/sourceFile.pdf': No such file or directory @ error/pdf.c/ReadPDFImage/678. convert: no images defined `/tmp/targetFile.png' @ error/convert.c/ConvertImageCommand/3046. ] killed: false, code: 1, signal: null, cmd: '/bin/sh -c convert /tmp/sourceFile.pdf /tmp/targetFile.png' }

At first I did not understand why this happens, then I tried to convert problematic files on my local Ubuntu machine with the same command. This is the output from terminal:

**** Warning: considering '0000000000 XXXXX n' as a free entry. **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> Mac OS X 10.10.5 Quartz PDFContext <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification.

So the message was very clear, but the file gets converted to png anyway. If I try to do convert source.pdf target.pdf and after that convert target.pdf image.png, file is repaired and converted without any errors. This doesn’t work with lambda.

Since the same thing works on one environment but not on the other, my best guess is that the version of Ghostscript is the problem. Installed version on AMI is 8.70. On my local machine Ghostsript version is 9.18.

My questions are:

  • Is the version of ghostscript problem? Is this a bug with older version of ghostscript? If not, how can I tell ghostscript (with or without using imagemagick) to repair or ignore errors like it does on my local environment?
  • If the old version is a problem, is it possible to build ghostscript from source, create nodejs module and then use that version of ghostscript instead the one that is installed?
  • Is there an easier way to convert pdf to image without using imagemagick and ghostscript?

UPDATE Relevant part of lambda code:

var exec = require('child_process').exec;
var AWS = require('aws-sdk');
var fs = require('fs');
...

var localSourceFile = '/tmp/sourceFile.pdf';
var localTargetFile = '/tmp/targetFile.png';

var writeStream = fs.createWriteStream(localSourceFile);
writeStream.write(body);
writeStream.end();

writeStream.on('error', function (err) {
    console.log("Error writing data from s3 to tmp folder.");
    context.fail(err);
});

writeStream.on('finish', function () {
    var cmd = 'convert ' + localSourceFile + ' ' + localTargetFile;

    exec(cmd, function (err, stdout, stderr ) {

        if (err) {
            console.log("Error executing convert command.");
            context.fail(err);
        }

        if (stderr) {
            console.log("Command executed successfully but returned error.");
            context.fail(stderr);
        }else{
            //file converted successfully - do something...
        }
    });
});

回答1:


You can find a compiled version of Ghostscript for Lambda in the following repository. You should add the files to the zip file that you are uploading as the source code to AWS Lambda.

https://github.com/sina-masnadi/lambda-ghostscript

This is an npm package to call Ghostscript functions:

https://github.com/sina-masnadi/node-gs

After copying the compiled Ghostscript files to your project and adding the npm package, you can use the executablePath('path to ghostscript') function to point the package to the compiled Ghostscript files that you added earlier.




回答2:


Its almost certainly a bug, or perhaps limitation, with the older version of Ghostscript.

Many PDF producers create PDF files which do not conform to the specification, and yet will open without complain in Adobe Acrobat. Ghostscript endeavours to do the same, but obviously we can't know what Acrobat is going to allow, so we are continually chasing this nebulous target. (FWIW that warning is a legitimate out-of-spec PDF file).

There's nothing you can do with the old version other than replace it.

Yes you can build Ghostscript from source, I have no idea about a nodejs module, not sure why that's relevant.

There are numerous other applications which will render a PDF file, MuPDF is another one I know of. And, of course, you can use Ghostscript directly without using ImageMagick. Of course, if you can load another application, then you should simply be able to replace your Ghostscript installation too.



来源:https://stackoverflow.com/questions/39205035/aws-lambda-function-convert-pdf-to-image

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!