C++ reading a file in binary mode. Problems with END OF FILE

雨燕双飞 提交于 2020-12-30 03:15:18

问题


I am learning C++and I have to read a file in binary mode. Here's how I do it (following the C++ reference):

unsigned values[255];
unsigned total;
ifstream in ("test.txt", ifstream::binary);

while(in.good()){
    unsigned val = in.get();
    if(in.good()){
        values[val]++;
        total++;
        cout << val <<endl;
    }
}

in.close();

So, I am reading the file byte per byte till in.good() is true. I put some cout at the end of the while in order to understand what's happening, and here is the output:

marco@iceland:~/workspace/huffman$ ./main 
97
97
97
97
10
98
98
10
99
99
99
99
10
100
100
10
101
101
10
221497852
marco@iceland:~/workspace/huffman$

Now, the input file "test.txt" is just:

aaaa
bb
cccc
dd
ee

So everything works perfectly till the end, where there's that 221497852. I guess it's something about the end of file, but I can't figure the problem out.

I am using gedit & g++ on a debian machine(64bit). Any help help will be appreciated.

Many thanks,

Marco


回答1:


fstream::get returns an int-value. This is one of the problems.

Secondly, you are reading in binary, so you shouldn't use formatted streams. You should use fstream::read:

// read a file into memory
#include <iostream>     // std::cout
#include <fstream>      // std::ifstream

int main () {

  std::ifstream is ("test.txt", std::ifstream::binary);
  if (is) {
    // get length of file:
    is.seekg (0, is.end);
    int length = is.tellg();
    is.seekg (0, is.beg);

    char * buffer = new char [length];

    std::cout << "Reading " << length << " characters... ";
    // read data as a block:
    is.read (buffer,length);

    if (is)
      std::cout << "all characters read successfully.";
    else
      std::cout << "error: only " << is.gcount() << " could be read";
    is.close();

    // ...buffer contains the entire file...

    delete[] buffer;
  }
  return 0;
}



回答2:


This isn't the way istream::get() was designed to be used. The classical idiom for using this function would be:

for ( int val = in.get(); val != EOF; val = in.get() ) {
    //  ...
}

or even more idiomatic:

char ch;
while ( in.get( ch ) ) {
    //  ...
}

The first loop is really inherited from C, where in.get() is the equivalent of fgetc().

Still, as far as I can tell, the code you give should work. It's not idiomatic, and it's not

The C++ standard is unclear what it should return if the character value read is negative. fgetc() requires a value in the range [0...UCHAR_MAX], and I think it safe to assume that this is the intent here. It is, at least, what every implementation I've used does. But this doesn't affect your input. Depending on how the implementation interprets the standard, the return value of in.get() must be in the range [0...UCHAR_MAX] or [CHAR_MIN...CHAR_MAX], or it must be EOF (typically -1). (The reason I'm fairly sure that the intent is to require [0...UCHAR_MAX] is because otherwise, you may not be able to distinguish end of file from a valid character.)

And if the return value is EOF (almost always -1), failbit should be set, so in.good() would return false. There is no case where in.get() would be allowed to return 221497852. The only explication I can possibly think of for your results is that your file has some character with bit 7 set at the end of the file, that the implementation is returning a negative number for this (but not end of file, because it is a character), which results in an out of bounds index in values[val], and that this out of bounds index somehow ends up modifying val. Or that your implementation is broken, and is not setting failbit when it returns end of file.

To be certain, I'd be interested in knowing what you get from the following:

std::ifstream in( "text.txt", std::ios_base::binary );
int ch = in.get();
while ( ch != std::istream::traits_type::eof() ) {
    std::cout << ch << std::endl;
    ch = in.get();
}

This avoids any issues of a possibly invalid index, and any type conversions (although the conversion int to unsigned is well defined). Also, out of curiosity (since I can only access VC++ here), you might try replacing in as follows:

std::istringstream in( "\n\xE5" );

I would expect to get:

10
233

(Assuming 8 bit bytes and an ASCII based code set. Both of which are almost, but not quite universal today.)




回答3:


I've eventually figured this out. Apparently it seems the problem wasn't due to any code. The problem was gedit. It always appends a newline character at the end of file. This also happen with other editors, such as vim. For some editor this can be configured to not append anything, but in gedit this is apparently not possible. https://askubuntu.com/questions/13317/how-to-stop-gedit-gvim-vim-nano-from-adding-end-of-file-newline-char

Cheers to everyone who asked me,

Marco



来源:https://stackoverflow.com/questions/16435180/c-reading-a-file-in-binary-mode-problems-with-end-of-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!