Stream ffmpeg transcoding result to S3

后端 未结 3 1004
粉色の甜心
粉色の甜心 2020-12-28 10:17

I want to transcode a large file using FFMPEG and store the result directly on AWS S3. This will be done inside of an AWS Lambda that has limited tmp space

3条回答
  •  长发绾君心
    2020-12-28 10:36

    Since the goal is to take a stream of bytes from S3 and output it also to S3, it is not necessary to use the HTTP capabilities of ffmpeg. ffmpeg being built as a command line tool that can take it's input from stdin and output to stdout/stderr, it is more simple to use these capabilities than to try to have ffmpeg handle the HTTP reading/writing. You just have to connect an HTTP stream (that reads from S3) to ffmpegs' stdin and connect its stdout to another stream (that writes to S3). See here for more information on ffmpeg piping.

    The most simple implementation would look like this:

    var s3Client = new AmazonS3Client(RegionEndpoint.USEast1);
    
    var startInfo = new ProcessStartInfo
    {
        FileName = "ffmpeg",
        Arguments = $"-i pipe:0 -y -vn -ar 44100 -ab 192k -f mp3 pipe:1",
        CreateNoWindow = true,
        RedirectStandardInput = false,
        RedirectStandardOutput = false,
        UseShellExecute = false,
        RedirectStandardInput = true,
        RedirectStandardOutput = true,
    };
    
    using (var process = new Process { StartInfo = startInfo })
    {
        // Get a stream to an object stored on S3.
        var s3InputObject = await s3Client.GetObjectAsync(new GetObjectRequest
        {
            BucketName = "my-bucket",
            Key = "input.wav",
        });
    
        process.Start();
    
        // Store the output of ffmpeg directly on S3 in a background thread
        // since I don't 'await'.
        var uploadTask = s3Client.PutObjectAsync(new PutObjectRequest
        {
            BucketName = "my-bucket",
            Key = "output.wav",
            InputStream = process.StandardOutput.BaseStream,
        });
    
        // Feed the S3 input stream into ffmpeg
        await s3Object.ResponseStream.CopyToAsync(process.StandardInput.BaseStream);
        process.StandardInput.Close();
    
        // Wait for ffmpeg to be done
        await uploadTask;
    
        process.WaitForExit();
    }
    

    This snippet gives an idea of how to pipe the input/output of ffmpeg.

    Unfortunately, this code does not work. The call to PutObjectAsync will throw an exception that says Could not determine content length. Yes, that's true, S3 only allows upload of files of known sizes, we can't use PutObjectAsync since we don't know how big will be the output of ffmpeg.

    The idea to workaround this is to use S3 multipart upload. So instead of directly feeding the ffmpeg directly to S3, you write it in a memory buffer (let's say 25 MB) that is not too big (so that it won't consume all the memory of the AWS lambda that will run this code). When the buffer is full, you upload the buffer to S3 using a multipart upload. Then, once ffmpeg is done transcoding the input file, you take what's left in the current memory buffer, upload this last buffer to S3 and then simply call CompleteMultipartUpload. This will take all the 25MB parts and merge them in a single file.

    That's it. With this strategy it is possible to read a file from S3, transcode it and store it on the fly in S3 without storing anything locally. It is therefore possible to transcode large files in an AWS lambda that uses a very minimal quantity of memory and virtually no disk space.

    This was implemented successfully. I will try to see if this code can be shared.

    Warning: as mentioned in a comment, the result that we get is not 100% identical if we stream the output of ffmpeg or if we let ffmpeg write himself to a local file. When writing to a local file, ffmpeg has the ability to seek back to the beginning of the file when it is done transcoding. It can then update the file metadata with some results of the transcoding. I don't know what's the impact of not having this updated metadata.

提交回复
热议问题