C# - How to save byte values to file with smallest size possible?

寵の児 提交于 2019-12-13 18:17:55

问题


I need to serialize the following data in the smallest file size possible.

I have a collection of patterns, each pattern is a byte array (byte[]) of a set length.

In this example let's use a pattern length of 5, so byte array will be:

var pattern = new byte[] {1, 2, 3, 4, 5};

Let's say we have 3 of the same pattern in a collection:

var collection = new byte[][] { pattern, pattern, pattern };

Currently I am saving the collection in an ASCII encoded file. Using the collection above, the saved file would look like this:

010203040501020304050102030405

Each byte in the array is represented by 2 digits (00) so that I can cater for byte values from 0 to 25, it can be visualized like this:

[01|02|03|04|05] [01|02|03|04|05] [01|02|03|04|05]

When I deserialize the file, I parse each block of 2 characters as a byte and put every 5 bytes into a byte array.

As I understand it, each character in the ASCII-encoded file is one byte - offering a possible 256 different values, but all I need is for each block of 2 characters to be a possible decimal value from 0 to 25.

When I save a file with 50,000 patterns, each with a length of 12, I end up with a 1.7MB file, which is way too big.

What encoding can I use in C# to make my file size much smaller?

Please provide example code of how to write and read this data to/from a file.


回答1:


I did something similar when encoding binary data into barcodes (see Efficient compression and representation of key value pairs to be read from 1D barcodes). Consider the following code which will serialize samples into a file and deserialize them immediately:

static void Main(string[] args)
{
    var data = new List<byte[]>() {
        new byte[] { 01, 05, 15, 04, 11, 00, 01, 01, 05, 15, 04, 11, 00, 01 },
        new byte[] { 09, 04, 02, 00, 08, 12, 01, 07, 04, 02, 00, 08, 12, 01 },
        new byte[] { 01, 05, 06, 04, 02, 00, 01, 01, 05, 06, 04, 02, 00, 01 }
    };

    // has to be known when loading the file
    var reasonableBase = data.SelectMany(i => i).Max() + 1;

    using (var target = File.OpenWrite("data.bin"))
    {
        using (var writer = new BinaryWriter(target))
        {
            // write the number of lines (16 bit, lines limited to 65536)
            writer.Write((ushort)data.Count);

            // write the base (8 bit, base limited to 255)
            writer.Write((byte)reasonableBase);

            foreach (var sample in data)
            {
                // converts the byte array into a large number of the known base (bypasses all the bit-mess)
                var serializedData = ByteArrayToNumberBased(sample, reasonableBase).ToByteArray();

                // write the length of the sample (8 bit, limited to 255)
                writer.Write((byte)serializedData.Length);
                writer.Write(serializedData);
            }
        }
    }

    var deserializedData = new List<byte[]>();

    using (var source = File.OpenRead("data.bin"))
    {
        using (var reader = new BinaryReader(source))
        {
            var lines = reader.ReadUInt16();
            var sourceBase = reader.ReadByte();

            for (int i = 0; i < lines; i++)
            {
                var length = reader.ReadByte();
                var value = new BigInteger(reader.ReadBytes(length));

                // chunk the bytes back of the big number we loaded
                // works because we know the base
                deserializedData.Add(NumberToByteArrayBased(value, sourceBase));
            }
        }
    }
}

private static BigInteger ByteArrayToNumberBased(byte[] data, int numBase)
{
    var result = BigInteger.Zero;

    for (int i = 0; i < data.Length; i++)
    {
        result += data[i] * BigInteger.Pow(numBase, i);
    }

    return result;
}

private static byte[] NumberToByteArrayBased(BigInteger data, int numBase)
{
    var list = new List<Byte>();

    do
    {
        list.Add((byte)(data % numBase));
    }
    while ((data = (data / numBase)) > 0);

    return list.ToArray();
}

Compared to your format, the sample data will serialize to 27 bytes instead of 90. Using @xanatos's 4.7 bit per symbol, the perfect result would be 14 * 3 * 4.7 / 8 = 24,675 bytes, so that's not bad (to be fair: the example serializes to 30 bytes with the base set to 26).




回答2:


Here's an example of how you can use GZipStream and BinaryFormatter to read and write the data from and to a compressed file.

It is not very efficient for small arrays, but becomes more efficient for large arrays. However, note that this relies on the data being compressible - if it is not, then this will not be any use!

using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Runtime.Serialization.Formatters.Binary;

namespace Demo
{
    static class Program
    {
        static void Main()
        {
            var pattern    = new byte[] { 1, 2, 3, 4, 5 };
            var collection = new [] { pattern, pattern, pattern };

            string filename = @"e:\tmp\test.bin";
            zipToFile(filename, collection);

            var deserialised = unzipFromFile(filename);

            Console.WriteLine(string.Join("\n", deserialised.Select(row => string.Join(", ", row))));
        }

        static void zipToFile(string file, byte[][] data)
        {
            using (var output = new FileStream(file, FileMode.Create))
            using (var gzip   = new GZipStream(output, CompressionLevel.Optimal))
            {
                new BinaryFormatter().Serialize(gzip, data);
            }
        }

        static byte[][] unzipFromFile(string file)
        {
            using (var input = new FileStream(file, FileMode.Open))
            using (var gzip  = new GZipStream(input, CompressionMode.Decompress))
            {
                return (byte[][]) new BinaryFormatter().Deserialize(gzip);
            }
        }
    }
}



回答3:


Sometimes simplicity is the best compromise.

A rectangular array can be considered a sequence of linear arrays.

A file of bytes is a linear array of bytes.

Here is very simple code to convert a rectangular array of bytes and write bytes to a file:

// All patterns must be the same length so they can be split when reading
File.WriteAllBytes(Path.GetTempFileName(), collection.SelectMany(p => p).ToArray()); 

System.Linq.Enumerable.SelectMany(pattern => pattern) takes a sequence of sequences and flattens them to a sequence. (It along with ToArray() are not the most efficient but for 50,000 * 4 elements, it could be fine.)

Given that as a starting point, if compression is needed, Zip would be a way to go, as shown by Matthew Watson.



来源:https://stackoverflow.com/questions/50343973/c-sharp-how-to-save-byte-values-to-file-with-smallest-size-possible

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!