What is the most suitable type of vector to keep the bytes of a file?

旧巷老猫 提交于 2019-12-08 17:38:11

问题


What is the most suitable type of vector to keep the bytes of a file?

I'm considering using the int type, because the bits "00000000" (1 byte) are interpreted to 0!

The goal is to save this data (bytes) to a file and retrieve from this file later.

NOTE: The files contain null bytes ("00000000" in bits)!

I'm a bit lost here. Help me! =D Thanks!


UPDATE I:

To read the file I'm using this function:

char* readFileBytes(const char *name){
    std::ifstream fl(name);
    fl.seekg( 0, std::ios::end );
    size_t len = fl.tellg();
    char *ret = new char[len];
    fl.seekg(0, std::ios::beg);
    fl.read(ret, len);
    fl.close();
    return ret;
}

NOTE I: I need to find a way to ensure that bits "00000000" can be recovered from the file!

NOTE II: Any suggestions for a safe way to save those bits "00000000" to a file?

NOTE III: When using char array I had problems converting bits "00000000" for that type.

Code Snippet:

int bit8Array[] = {0, 0, 0, 0, 0, 0, 0, 0};
char charByte = (bit8Array[7]     ) | 
                (bit8Array[6] << 1) | 
                (bit8Array[5] << 2) | 
                (bit8Array[4] << 3) | 
                (bit8Array[3] << 4) | 
                (bit8Array[2] << 5) | 
                (bit8Array[1] << 6) | 
                (bit8Array[0] << 7);

UPDATE II:

Following the @chqrlie recommendations.

#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm>
#include <random>
#include <cstring>
#include <iterator>

std::vector<unsigned char> readFileBytes(const char* filename)
{
    // Open the file.
    std::ifstream file(filename, std::ios::binary);

    // Stop eating new lines in binary mode!
    file.unsetf(std::ios::skipws);

    // Get its size
    std::streampos fileSize;

    file.seekg(0, std::ios::end);
    fileSize = file.tellg();
    file.seekg(0, std::ios::beg);

    // Reserve capacity.
    std::vector<unsigned char> unsignedCharVec;
    unsignedCharVec.reserve(fileSize);

    // Read the data.
    unsignedCharVec.insert(unsignedCharVec.begin(),
               std::istream_iterator<unsigned char>(file),
               std::istream_iterator<unsigned char>());

    return unsignedCharVec;
}

int main(){

    std::vector<unsigned char> unsignedCharVec;

    // txt file contents "xz"
    unsignedCharVec=readFileBytes("xz.txt");

    // Letters -> UTF8/HEX -> bits!
    // x -> 78 -> 0111 1000
    // z -> 7a -> 0111 1010

    for(unsigned char c : unsignedCharVec){
        printf("%c\n", c);
        for(int o=7; o >= 0; o--){
            printf("%i", ((c >> o) & 1));
        }
        printf("%s", "\n");
    }

    // Prints...
    // x
    // 01111000
    // z
    // 01111010

    return 0;
}

UPDATE III:

This is the code I am using using to write to a binary file:

void writeFileBytes(const char* filename, std::vector<unsigned char>& fileBytes){
    std::ofstream file(filename, std::ios::out|std::ios::binary);
    file.write(fileBytes.size() ? (char*)&fileBytes[0] : 0, 
               std::streamsize(fileBytes.size()));
}

writeFileBytes("xz.bin", fileBytesOutput);

UPDATE IV:

Futher read about UPDATE III:

c++ - Save the contents of a "std::vector<unsigned char>" to a file


CONCLUSION:

Definitely the solution to the problem of the "00000000" bits (1 byte) was change the type that stores the bytes of the file to std::vector<unsigned char> as the guidance of friends. std::vector<unsigned char> is a universal type (exists in all environments) and will accept any octal (unlike char* in "UPDATE I")!

In addition, changing from array (char) to vector (unsigned char) was crucial for success! With vector I manipulate my data more securely and completely independent of its content (in char array I have problems with this).

Thanks a lot!


回答1:


There are 3 problems in your code:

  • You use the char type and return a char *. Yet the return value is not a proper C string as you do not allocate an extra byte for the '\0' terminator nor null terminate it.

  • If the file may contain null bytes, you should probably use type unsigned char or uint8_t to make it explicit that the array does not contain text.

  • You do not return the array size to the caller. The caller has no way to tell how long the array is. You should probably use a std::vector<uint8_t> or std::vector<unsigned char> instead of an array allocated with new.




回答2:


Use std::vector<unsigned char>. Don't use std::uint8_t: it's won't exist on systems that don't have a native hardware type of exactly 8 bits. unsigned char will always exist; it will usually be the smallest addressable type that the hardware supports, and it's required to be at least 8 bits wide, so if you're trafficking in 8-bit bytes, it will handle the bits that you need.

If you really, really, really like the fixed-width types, you might consider std::uint_least8_t, which will always exist, and has at least eight bits, or std::uint_fast8_t, which also has at least eight bits. But file I/O traffics in char types, and mixing char and it's variants with vaguely specified "least" and "fast" types may well get confusing.




回答3:


uint8_t is the winner in my eyes:

  • it's exactly 8 bits, or 1 byte, long;
  • it's unsigned without requiring you to type unsigned every time;
  • it's exactly the same on all platforms;
  • it's a generic type that does not imply any specific use, unlike char / unsigned char, which is associated with characters of text even if it can technically be used for any purpose just the same as uint8_t.

Bottom line: uint8_t is functionally equivalent to unsigned char, but does a better job of saying this is some data of unspecified nature in the source code.

So use std::vector<uint8_t>.
#include <stdint.h> to make the uint8_t definition available.

P. S. As pointed out in the comments, the C++ standard defines char as 1 byte, and byte is not, strictly speaking, required to be the same as octet (8 bits). On such a hypothetical system, char will still exist and will be 1 byte long, but uint8_t is defined as 8 bits (octet) and thus may not exist (due to implementation difficulties / overhead). So char is more portable, theoretically speaking, but uint8_t is more strict and has wider guarantees of expected behavior.



来源:https://stackoverflow.com/questions/40050243/what-is-the-most-suitable-type-of-vector-to-keep-the-bytes-of-a-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!