What is the most suitable type of vector to keep the bytes of a file?

问题

I'm considering using the int type, because the bits "00000000" (1 byte) are interpreted to 0!

The goal is to save this data (bytes) to a file and retrieve from this file later.

NOTE: The files contain null bytes ("00000000" in bits)!

I'm a bit lost here. Help me! =D Thanks!

UPDATE I:

To read the file I'm using this function:

char* readFileBytes(const char *name){
    std::ifstream fl(name);
    fl.seekg( 0, std::ios::end );
    size_t len = fl.tellg();
    char *ret = new char[len];
    fl.seekg(0, std::ios::beg);
    fl.read(ret, len);
    fl.close();
    return ret;
}

NOTE I: I need to find a way to ensure that bits "00000000" can be recovered from the file!

NOTE II: Any suggestions for a safe way to save those bits "00000000" to a file?

NOTE III: When using char array I had problems converting bits "00000000" for that type.

Code Snippet:

int bit8Array[] = {0, 0, 0, 0, 0, 0, 0, 0};
char charByte = (bit8Array[7]     ) | 
                (bit8Array[6] << 1) | 
                (bit8Array[5] << 2) | 
                (bit8Array[4] << 3) | 
                (bit8Array[3] << 4) | 
                (bit8Array[2] << 5) | 
                (bit8Array[1] << 6) | 
                (bit8Array[0] << 7);

UPDATE II:

Following the @chqrlie recommendations.

#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm>
#include <random>
#include <cstring>
#include <iterator>

std::vector<unsigned char> readFileBytes(const char* filename)
{
    // Open the file.
    std::ifstream file(filename, std::ios::binary);

    // Stop eating new lines in binary mode!
    file.unsetf(std::ios::skipws);

    // Get its size
    std::streampos fileSize;

    file.seekg(0, std::ios::end);
    fileSize = file.tellg();
    file.seekg(0, std::ios::beg);

    // Reserve capacity.
    std::vector<unsigned char> unsignedCharVec;
    unsignedCharVec.reserve(fileSize);

    // Read the data.
    unsignedCharVec.insert(unsignedCharVec.begin(),
               std::istream_iterator<unsigned char>(file),
               std::istream_iterator<unsigned char>());

    return unsignedCharVec;
}

int main(){

    std::vector<unsigned char> unsignedCharVec;

    // txt file contents "xz"
    unsignedCharVec=readFileBytes("xz.txt");

    // Letters -> UTF8/HEX -> bits!
    // x -> 78 -> 0111 1000
    // z -> 7a -> 0111 1010

    for(unsigned char c : unsignedCharVec){
        printf("%c\n", c);
        for(int o=7; o >= 0; o--){
            printf("%i", ((c >> o) & 1));
        }
        printf("%s", "\n");
    }

    // Prints...
    // x
    // 01111000
    // z
    // 01111010

    return 0;
}

UPDATE III:

This is the code I am using using to write to a binary file:

void writeFileBytes(const char* filename, std::vector<unsigned char>& fileBytes){
    std::ofstream file(filename, std::ios::out|std::ios::binary);
    file.write(fileBytes.size() ? (char*)&fileBytes[0] : 0, 
               std::streamsize(fileBytes.size()));
}

writeFileBytes("xz.bin", fileBytesOutput);

UPDATE IV:

Futher read about UPDATE III:

c++ - Save the contents of a "std::vector<unsigned char>" to a file

CONCLUSION:

Definitely the solution to the problem of the "00000000" bits (1 byte) was change the type that stores the bytes of the file to std::vector<unsigned char> as the guidance of friends. std::vector<unsigned char> is a universal type (exists in all environments) and will accept any octal (unlike char* in "UPDATE I")!

In addition, changing from array (char) to vector (unsigned char) was crucial for success! With vector I manipulate my data more securely and completely independent of its content (in char array I have problems with this).

Thanks a lot!

回答1:

There are 3 problems in your code:

You use the char type and return a char *. Yet the return value is not a proper C string as you do not allocate an extra byte for the '\0' terminator nor null terminate it.
If the file may contain null bytes, you should probably use type unsigned char or uint8_t to make it explicit that the array does not contain text.
You do not return the array size to the caller. The caller has no way to tell how long the array is. You should probably use a std::vector<uint8_t> or std::vector<unsigned char> instead of an array allocated with new.

回答2:

Use std::vector<unsigned char>. Don't use std::uint8_t: it's won't exist on systems that don't have a native hardware type of exactly 8 bits. unsigned char will always exist; it will usually be the smallest addressable type that the hardware supports, and it's required to be at least 8 bits wide, so if you're trafficking in 8-bit bytes, it will handle the bits that you need.

If you really, really, really like the fixed-width types, you might consider std::uint_least8_t, which will always exist, and has at least eight bits, or std::uint_fast8_t, which also has at least eight bits. But file I/O traffics in char types, and mixing char and it's variants with vaguely specified "least" and "fast" types may well get confusing.

回答3:

uint8_t is the winner in my eyes:

it's exactly 8 bits, or 1 byte, long;
it's unsigned without requiring you to type unsigned every time;
it's exactly the same on all platforms;
it's a generic type that does not imply any specific use, unlike char / unsigned char, which is associated with characters of text even if it can technically be used for any purpose just the same as uint8_t.

Bottom line: uint8_t is functionally equivalent to unsigned char, but does a better job of saying this is some data of unspecified nature in the source code.

So use std::vector<uint8_t>.
#include <stdint.h> to make the uint8_t definition available.

P. S. As pointed out in the comments, the C++ standard defines char as 1 byte, and byte is not, strictly speaking, required to be the same as octet (8 bits). On such a hypothetical system, char will still exist and will be 1 byte long, but uint8_t is defined as 8 bits (octet) and thus may not exist (due to implementation difficulties / overhead). So char is more portable, theoretically speaking, but uint8_t is more strict and has wider guarantees of expected behavior.

来源：https://stackoverflow.com/questions/40050243/what-is-the-most-suitable-type-of-vector-to-keep-the-bytes-of-a-file

标签

c++

visual-c++

byte

bit