How do you unzip a gz file in memory using GZipStream?

萝らか妹 提交于 2020-01-30 12:05:53

问题


I'm probably doing something obviously stupid here. Please point it out!

I have some C# code that is pulling down a bunch of .gz files from SFTP (using the SSH.NET Nuget package - works great!). Each gz contains only a single .CSV file inside of them. I want to keep these files in memory without hitting disk (yes, I know, server memory management concerns exist - that's fine as these files are fairly small), decompress them in memory to extract the CSV file inside, and then return a collection of CSV files in a custom DTO (FtpFile).

My problem is that while my MemoryStream from the SFTP connection has data in it, either it doesn't ever seem to be populated in my GZipStream or the copy from the GZipStream to my output MemoryStream is failing. I have tried with the more traditional looping over Read with my own buffer but it had the same results as this code.

Aside from connection details (it connects successfully, so no worries there), here's all of my code:

Logic:

    public static List<FtpFile> Foo()
    {
        var connectionInfo = new ConnectionInfo("example.com",
            "username",
            new PasswordAuthenticationMethod("username", "password"));
        using (var client = new SftpClient(connectionInfo))
        {
            client.Connect();

            var searchResults = client.ListDirectory("/testdir")
                .Where(obj => obj.IsRegularFile
                              && obj.Name.ToLowerInvariant().StartsWith("test_")
                              && obj.Name.ToLowerInvariant().EndsWith(".gz"))
                .Take(2)
                .ToList();

            var fileResults = new List<FtpFile>();

            foreach (var file in searchResults)
            {
                var ftpFile = new FtpFile { FileName = file.Name, FileSize = file.Length };

                using (var fileStream = new MemoryStream())
                {
                    client.DownloadFile(file.FullName, fileStream); // Success! All is good here, so far. :)

                    using (var gzStream = new GZipStream(fileStream, CompressionMode.Decompress))
                    {
                        using (var outputStream = new MemoryStream())
                        {
                            gzStream.CopyTo(outputStream);
                            byte[] outputBytes = outputStream.ToArray(); // No data. Sad panda. :'(
                            ftpFile.FileContents = Encoding.ASCII.GetString(outputBytes);
                            fileResults.Add(ftpFile);
                        }
                    }
                }
            }

            return fileResults;
        }
    }

FtpFile (just a simple DTO I'm populating):

public class FtpFile
{
    public string FileName { get; set; }
    public long FileSize { get; set; }
    public string FileContents { get; set; }
}

PSA If anybody comes and copies this code, be aware that this is NOT good code in that you could have some serious memory management problems with this code! It's best practice to instead stream it to disk, which is not being done in this code! My needs are very specific in that I have to have these files simultaneously in memory for what I'm building with them.


回答1:


If you are inserting data into the stream, make sure to seek back to its origin before un-gzipping it.

The following should fix your troubles:

            using (var fileStream = new MemoryStream())
            {
                client.DownloadFile(file.FullName, fileStream); // Success! All is good here, so far. :)
                fileStream.Seek(0, SeekOrigin.Begin);

                using (var gzStream = new GZipStream(fileStream, CompressionMode.Decompress))
                {
                    using (var outputStream = new MemoryStream())
                    {
                        gzStream.CopyTo(outputStream);
                        byte[] outputBytes = outputStream.ToArray(); // No data. Sad panda. :'(
                        ftpFile.FileContents = Encoding.ASCII.GetString(outputBytes);
                        fileResults.Add(ftpFile);
                    }
                }
            }


来源:https://stackoverflow.com/questions/42817059/how-do-you-unzip-a-gz-file-in-memory-using-gzipstream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!