Fastest way to delete the first few bytes of a file

左心房为你撑大大i 提交于 2019-12-11 01:02:05

问题


I am using a windows mobile compact edition 6.5 phone and am writing out binary data to a file from bluetooth. These files get quite large, 16M+ and what I need to do is to once the file is written then I need to search the file for a start character and then delete everything before, thus eliminating garbage. I cannot do this inline when the data comes in due to graphing issues and speed as I get alot of data coming in and there is already too many if conditions on the incoming data. I figured it was best to post process. Anyway here is my dilemma, speed of search for the start bytes and the rewrite of the file takes sometimes 5mins or more...I basically move the file over to a temp file parse through it and rewrite a whole new file. I have to do this byte by byte.

private void closeFiles() {
    try {

    // Close file stream for raw data.
    if (this.fsRaw != null) {
        this.fsRaw.Flush();
        this.fsRaw.Close();

        // Move file, seek the first sync bytes, 
        // write to fsRaw stream with sync byte and rest of data after it
        File.Move(this.s_fileNameRaw, this.s_fileNameRaw + ".old");
        FileStream fsRaw_Copy = File.Open(this.s_fileNameRaw + ".old", FileMode.Open);
        this.fsRaw = File.Create(this.s_fileNameRaw);

        int x = 0;
        bool syncFound = false;

        // search for sync byte algorithm
        while (x != -1) {
            ... logic to search for sync byte
            if (x != -1 && syncFound) {
                this.fsPatientRaw.WriteByte((byte)x);
            }
        }

        this.fsRaw.Close();

        fsRaw_Copy.Close();
        File.Delete(this.s_fileNameRaw + ".old");
    }


    } catch(IOException e) {
        CLogger.WriteLog(ELogLevel.ERROR,"Exception in writing: " + e.Message);
    }
}

There has got to be a faster way than this!

------------Testing times using answer -------------

Initial Test my way with one byte read and and one byte write:

27 Kb/sec

using a answer below and a 32768 byte buffer:

321 Kb/sec

using a answer below and a 65536 byte buffer:

501 Kb/sec

回答1:


You're doing a byte-wise copy of the entire file. That can't be efficient for a load of reasons. Search for the start offset (and end offset if you need both), then copy from one stream to another the entire contents between the two offsets (or the start offset and end of file).

EDIT

You don't have to read the entire contents to make the copy. Something like this (untested, but you get the idea) would work.

private void CopyPartial(string sourceName, byte syncByte, string destName)
{
    using (var input = File.OpenRead(sourceName))
    using (var reader = new BinaryReader(input))
    using (var output = File.Create(destName))
    {
        var start = 0;
        // seek to sync byte
        while (reader.ReadByte() != syncByte)
        {
            start++;
        }

        var buffer = new byte[4096]; // 4k page - adjust as you see fit

        do
        {
            var actual = reader.Read(buffer, 0, buffer.Length);
            output.Write(buffer, 0, actual);
        } while (reader.PeekChar() >= 0);
    }

}

EDIT 2

I actually needed something similar to this today, so I decided to write it without the PeekChar() call. Here's the kernel of what I did - feel free to integrate it with the second do...while loop above.

            var buffer = new byte[1024];
            var total = 0;

            do
            {
                var actual = reader.Read(buffer, 0, buffer.Length);
                writer.Write(buffer, 0, actual);
                total += actual;
            } while (total < reader.BaseStream.Length);



回答2:


Don't discount an approach because you're afraid it will be too slow. Try it! It'll only take 5-10 minutes to give it a try and may result in a much better solution.

If the detection process for the start of the data is not too complex/slow, then avoiding writing data until you hit the start may actually make the program skip past the junk data more efficiently.

How to do this:

  • Use a simple bool to know whether or not you have detected the start of the data. If you are reading junk, then don't waste time writing it to the output, just scan it to detect the start of the data. Once you find the start, then stop scanning for the start and just copy the data to the output. Just copying the good data will incur no more than an if (found) check, which really won't make any noticeable difference to your performance.

You may find that in itself solves the problem. But you can optimise it if you need more performance:

  • What can you do to minimise the work you do to detect the start of the data? Perhaps if you are looking for a complex sequence you only need to check for one particular byte value that starts the sequence, and it's only if you find that start byte that you need to do any more complex checking. There are some very simple but efficient string searching algorithms that may help in this sort of case too. Or perhaps you can allocate a buffer (e.g. 4kB) and gradually fill it with bytes from your incoming stream. When the buffer is filled, then and only then search for the end of the "junk" in your buffer. By batching the work you can make use of memory/cache coherence to make the processing considerably more efficient than it would be if you did the same work byte by byte.

  • Do all the other "conditions on the incoming data" need to be continually checked? How can you minimise the amount of work you need to do but still achieve the required results? Perhaps some of the ideas above might help here too?

  • Do you actually need to do any processing on the data while you are skipping junk? If not, then you can break the whole thing into two phases (skip junk, copy data), and skipping the junk won't cost you anything when it actually matters.



来源:https://stackoverflow.com/questions/7367153/fastest-way-to-delete-the-first-few-bytes-of-a-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!