Large File download from SQL via WebApi after custom MultipartFormDataStreamProvider upload

This is a follow up to a question I had asked previously that was closed for being too broad.Previous Question

In that question I explained that I needed upload a large file (1-3GB) to the database by storing chunks as individual rows. I did this by overriding the MultipartFormDataStreamProvider.GetStream method. That method returned a custom stream that wrote the buffered chunks to the database.

The problem is that the overriden GetStream method is writing the entire request to the database (including the headers). It is successfully writing that data while keeping the Memory levels flat but when I download the file, in addition to the file contents, it's returning all the header information in the downloaded file contents so the file can't be opened.

Is there a way to, in the overriden GetStream method, write just the contents of the file to the database without writing the headers?

API

  [HttpPost]
    [Route("file")]
    [ValidateMimeMultipartContentFilter]
    public Task<HttpResponseMessage> PostFormData()
    {
        var provider = new CustomMultipartFormDataStreamProvider();

        // Read the form data and return an async task.
        var task = Request.Content.ReadAsMultipartAsync(provider).ContinueWith<HttpResponseMessage>(t =>
        {
            if (t.IsFaulted || t.IsCanceled)
            {
                Request.CreateErrorResponse(HttpStatusCode.InternalServerError, t.Exception);
            }

            return Request.CreateResponse(HttpStatusCode.OK);
        });

        return task;
    }

    [HttpGet]
    [Route("file/{id}")]
    public async Task<HttpResponseMessage> GetFile(string id)
    {
                       var result = new HttpResponseMessage()
            {
                Content = new PushStreamContent(async (outputStream, httpContent, transportContext) =>
                {
                    await WriteDataChunksFromDBToStream(outputStream, httpContent, transportContext, id);
                }),
                StatusCode = HttpStatusCode.OK
            };


            result.Content.Headers.ContentType = new MediaTypeHeaderValue("application/zipx");
            result.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment") { FileName = "test response.zipx" };

            return result;
        }

        return new HttpResponseMessage(HttpStatusCode.BadRequest);
    }

    private async Task WriteDataChunksFromDBToStream(Stream responseStream, HttpContent httpContent, TransportContext transportContext, string fileIdentifier)
    {
        // PushStreamContent requires the responseStream to be closed
        // for signaling it that you have finished writing the response.
        using (responseStream)
        {
            using (var myConn = new SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings["TestDB"].ConnectionString))
            {
                await myConn.OpenAsync();

                using (var myCmd = new SqlCommand("ReadAttachmentChunks", myConn))
                {
                    myCmd.CommandType = System.Data.CommandType.StoredProcedure;

                    var fileName = new SqlParameter("@Identifier", fileIdentifier);

                    myCmd.Parameters.Add(fileName);


                    // Read data back from db in async call to avoid OutOfMemoryException when sending file back to user
                    using (var reader = await myCmd.ExecuteReaderAsync(CommandBehavior.SequentialAccess))
                    {
                        while (await reader.ReadAsync())
                        {
                            if (!(await reader.IsDBNullAsync(3)))
                            {
                                using (var data = reader.GetStream(3))
                                {
                                    // Asynchronously copy the stream from the server to the response stream
                                    await data.CopyToAsync(responseStream);
                                }
                            }
                        }
                    }
                }
            }
        }// close response stream
    }

Custom MultipartFormDataStreamProvider GetStream method implementation

 public override Stream GetStream(HttpContent parent, HttpContentHeaders headers)
    {
        // For form data, Content-Disposition header is a requirement
        ContentDispositionHeaderValue contentDisposition = headers.ContentDisposition;
        if (contentDisposition != null)
        {
            // If we have a file name then write contents out to AWS stream. Otherwise just write to MemoryStream
            if (!String.IsNullOrEmpty(contentDisposition.FileName))
            {
                var identifier = Guid.NewGuid().ToString();
                var fileName = contentDisposition.FileName;// GetLocalFileName(headers);

                if (fileName.Contains("\\"))
                {
                    fileName = fileName.Substring(fileName.LastIndexOf("\\") + 1).Replace("\"", "");
                }

                // We won't post process files as form data
                _isFormData.Add(false);

                var stream = new CustomSqlStream();
                stream.Filename = fileName;
                stream.Identifier = identifier;
                stream.ContentType = headers.ContentType.MediaType;
                stream.Description = (_formData.AllKeys.Count() > 0 && _formData["description"] != null) ? _formData["description"] : "";

                return stream;
                //return new CustomSqlStream(contentDisposition.Name);
            }

            // We will post process this as form data
            _isFormData.Add(true);

            // If no filename parameter was found in the Content-Disposition header then return a memory stream.
            return new MemoryStream();
        }

        throw new InvalidOperationException("Did not find required 'Content-Disposition' header field in MIME multipart body part..");
        #endregion
    }

Implemented Write method of Stream called by CustomSqlStream

 public override void Write(byte[] buffer, int offset, int count)
    {
                   //write buffer to database
        using (var myConn = new SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings["TestDB"].ConnectionString)) {
            using (var myCmd = new SqlCommand("WriteAttachmentChunk", myConn)) {
                myCmd.CommandType = System.Data.CommandType.StoredProcedure;

                                  var pContent = new SqlParameter("@Content", buffer);

                myCmd.Parameters.Add(pContent);

                myConn.Open();
                myCmd.ExecuteNonQuery();

                if (myConn.State == System.Data.ConnectionState.Open)
                {
                    myConn.Close();
                }
            }
        }
            ((ManualResetEvent)_dataAddedEvent).Set();
    }

The "ReadAttachmentChunks" stored procedure gets the rows respective to the file from the db ordered by the time they are inserted into the database. So, the way the code works is it pulls those chunks back and then async writes it back to the PushStreamContent to go back to the user.

So my question is:

Is there a way to write ONLY the content of the file being uploaded as opposed to the headers in addition to the content?

Any help would be greatly appreciated. Thank you.

I finally figured it out. I over-complicated the write process which brought about most of the struggle. Here is my solution to my initial issue:

To keep .net from buffering the file in memory (so that you can handle large file uploads), you first need to override the WebHostBufferPolicySelector so that it doesnt buffer the input stream for your controller and then replace the BufferPolicy Selector.

 public class NoBufferPolicySelector : WebHostBufferPolicySelector
{
    public override bool UseBufferedInputStream(object hostContext)
    {
        var context = hostContext as HttpContextBase;

        if (context != null)
        {
            if (context.Request.RequestContext.RouteData.Values["controller"] != null)
            {
                if (string.Equals(context.Request.RequestContext.RouteData.Values["controller"].ToString(), "upload", StringComparison.InvariantCultureIgnoreCase))
                    return false;
            }
        }

        return true;
    }

    public override bool UseBufferedOutputStream(HttpResponseMessage response)
    {
        return base.UseBufferedOutputStream(response);
    }
}

then for replacing the BufferPolicy Selector

GlobalConfiguration.Configuration.Services.Replace(typeof(IHostBufferPolicySelector), new NoBufferPolicySelector());

Then to avoid the default behavior of having the file stream written to disk, you need to provide a stream provider that will write to the database instead. To do this you inherit MultipartStreamProvider and override the GetStream method to return the stream that will write to your database.

    public override Stream GetStream(HttpContent parent, HttpContentHeaders headers)
    {
        // For form data, Content-Disposition header is a requirement
        ContentDispositionHeaderValue contentDisposition = headers.ContentDisposition;
        if (contentDisposition != null && !String.IsNullOrEmpty(contentDisposition.FileName))
        {
            // We won't post process files as form data
            _isFormData.Add(false);

            //create unique identifier for this file upload
            var identifier = Guid.NewGuid();
            var fileName = contentDisposition.FileName;

            var boundaryObj = parent.Headers.ContentType.Parameters.SingleOrDefault(a => a.Name == "boundary");

            var boundary = (boundaryObj != null) ? boundaryObj.Value : "";

            if (fileName.Contains("\\"))
            {
                fileName = fileName.Substring(fileName.LastIndexOf("\\") + 1).Replace("\"", "");
            }

            //write parent container for the file chunks that are being stored
            WriteLargeFileContainer(fileName, identifier, headers.ContentType.MediaType, boundary);

            //create an instance of the custom stream that will write the chunks to the database
            var stream = new CustomSqlStream();
            stream.Filename = fileName;
            stream.FullFilename = contentDisposition.FileName.Replace("\"", "");
            stream.Identifier = identifier.ToString();
            stream.ContentType = headers.ContentType.MediaType;
            stream.Boundary = (!string.IsNullOrEmpty(boundary)) ? boundary : "";

            return stream;
        }
        else
        {
            // We will post process this as form data
            _isFormData.Add(true);

            // If no filename parameter was found in the Content-Disposition header then return a memory stream.
            return new MemoryStream();
        }
    }

The custom stream you create needs to inherit Stream and override the Write method. This is where I overthought the problem and thought I needed to parse out the boundary headers that were passed via the buffer parameter. But this is actually done for you by leveraging the offset and count parameters.

public override void Write(byte[] buffer, int offset, int count)
    {
        //no boundary is inluded in buffer
        byte[] fileData = new byte[count];
        Buffer.BlockCopy(buffer, offset, fileData, 0, count);
        WriteData(fileData);
    }

From there, it's just plugging in the api methods for upload and download. For upload:

 public Task<HttpResponseMessage> PostFormData()
    {
        var provider = new CustomMultipartLargeFileStreamProvider();

        // Read the form data and return an async task.
        var task = Request.Content.ReadAsMultipartAsync(provider).ContinueWith<HttpResponseMessage>(t =>
        {
            if (t.IsFaulted || t.IsCanceled)
            {
                Request.CreateErrorResponse(HttpStatusCode.InternalServerError, t.Exception);
            }

            return Request.CreateResponse(HttpStatusCode.OK);
        });

        return task;
    }

For download, and in order to keep the memory footprint low, I leveraged the PushStreamContent to push the chunks back to the user:

[HttpGet]
    [Route("file/{id}")]
    public async Task<HttpResponseMessage> GetFile(string id)
    {
        string mimeType = string.Empty;
        string filename = string.Empty;
        if (!string.IsNullOrEmpty(id))
        {
            //get the headers for the file being sent back to the user
            using (var myConn = new SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings["PortalBetaConnectionString"].ConnectionString))
            {
                using (var myCmd = new SqlCommand("ReadLargeFileInfo", myConn))
                {
                    myCmd.CommandType = System.Data.CommandType.StoredProcedure;

                    var pIdentifier = new SqlParameter("@Identifier", id);

                    myCmd.Parameters.Add(pIdentifier);

                    myConn.Open();

                    var dataReader = myCmd.ExecuteReader();

                    if (dataReader.HasRows)
                    {
                        while (dataReader.Read())
                        {
                            mimeType = dataReader.GetString(0);
                            filename = dataReader.GetString(1);
                        }
                    }
                }
            }


            var result = new HttpResponseMessage()
            {
                Content = new PushStreamContent(async (outputStream, httpContent, transportContext) =>
                {
                    //pull the data back from the db and stream the data back to the user
                    await WriteDataChunksFromDBToStream(outputStream, httpContent, transportContext, id);
                }),
                StatusCode = HttpStatusCode.OK
            };

            result.Content.Headers.ContentType = new MediaTypeHeaderValue(mimeType);// "application/octet-stream");
            result.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment") { FileName = filename };

            return result;
        }

        return new HttpResponseMessage(HttpStatusCode.BadRequest);
    }

    private async Task WriteDataChunksFromDBToStream(Stream responseStream, HttpContent httpContent, TransportContext transportContext, string fileIdentifier)
    {
        // PushStreamContent requires the responseStream to be closed
        // for signaling it that you have finished writing the response.
        using (responseStream)
        {
            using (var myConn = new SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings["PortalBetaConnectionString"].ConnectionString))
            {
                await myConn.OpenAsync();

                //stored proc to pull the data back from the db
                using (var myCmd = new SqlCommand("ReadAttachmentChunks", myConn))
                {
                    myCmd.CommandType = System.Data.CommandType.StoredProcedure;

                    var fileName = new SqlParameter("@Identifier", fileIdentifier);

                    myCmd.Parameters.Add(fileName);

                    // The reader needs to be executed with the SequentialAccess behavior to enable network streaming
                    // Otherwise ReadAsync will buffer the entire BLOB into memory which can cause scalability issues or even OutOfMemoryExceptions
                    using (var reader = await myCmd.ExecuteReaderAsync(CommandBehavior.SequentialAccess))
                    {
                        while (await reader.ReadAsync())
                        {
                            //confirm the column that has the binary data of the file returned is not null
                            if (!(await reader.IsDBNullAsync(0)))
                            {
                                //read the binary data of the file into a stream
                                using (var data = reader.GetStream(0))
                                {
                                    // Asynchronously copy the stream from the server to the response stream
                                    await data.CopyToAsync(responseStream);
                                    await data.FlushAsync();
                                }
                            }
                        }
                    }
                }
            }
        }// close response stream
    }

Ugh. This is nasty. With the upload, you have to make sure to

separate the headers from the content portion - you must follow the requirements RFC documents for HTTP.
Allow for chunked transfers
Of course, the content portion (unless you are transmitting text) will be binary encoded into strings.
Allow for transfers that are compressed, i.e. GZIP or DEFLATE.
Maybe - just maybe - take the encoding into account (ASCII, Unicode, UTF8, etc).

You can't really ensure that you're persisting the right info to the database without looking at all of these. For the latter items, all of your metadata as to what to do will be somewhere in the header, so it's not just a throwaway.

来源：https://stackoverflow.com/questions/40616670/large-file-download-from-sql-via-webapi-after-custom-multipartformdatastreamprov

标签

asp.net-web-api

asp.net-web-api2