Machine dependent _write failures with EINVAL error code

限于喜欢 提交于 2020-01-21 03:43:10

问题


This has some lengthy background before the actual question, however, it bears some explaining to hopefully weed out some red herrings.

Our application, developed in Microsoft Visual C++ (2005), uses a 3rd party library (whose source code we luckily happen to have) to export a compressed file used in another 3rd party application. The library is responsible for creating the exported file, managing the data and compression, and generally handling all errors. Recently, we began getting feedback that on certain machines, our application would crash during writes to the file. Based on some initial exploration, we were able to determine the following:

  • The crashes happened on a variety of hardware setups and Operating Systems (although our customers are restricted to XP / 2000)
  • The crashes would always happen on the same set of data; however they would not occur on all sets of data
  • For a set of data that caused a crash, the crash is not reproducible on all machines, even with similar characteristics, i.e., operating system, amount of RAM, etc.
  • The bug would only manifest itself when the application was run in the installation directory - not when built from Visual Studio, run in debug mode, or even run in other directories that the user had access to
  • The issue occurs whether the file is being constructed on a local or a mapped drive

Upon investigating the problem, we found the issue to be in the following block of code (slightly modified to remove some macros):

 while (size>0) {
    do {
        nbytes = _write(file->fd, buf, size);
    } while (-1==nbytes && EINTR==errno);
    if (-1==nbytes) /* error */
        throw("file write failed")
    assert(nbytes>0);
    assert((size_t)nbytes<=size);
    size -= (size_t)nbytes;
    addr += (haddr_t)nbytes;
    buf = (const char*)buf + nbytes;
}

Specifically, the _write is returning error code 22, or EINVAL. According to MSDN, _write returning EINVAL implies that the buffer (buf in this case) is a null pointer. Some simple checks however around this function verified that this was not the case in any calls made to it.

We do, however, call this method with some very large sets of data - upwards of 250MB in a single call, depending on the input data. When we imposed an artificial limit on the amount of data that went to this method, we appear to have resolved the issue. This, however, smacks of a code fix for a problem that is machine dependent / permissions dependent / dependent on the phase of the moon. So now the questions:

  1. Is anyone aware of a limit on the amount of data _write can handle in a single call? Or - barring _write - any file I/O command support by Visual C++?
  2. Since this does not occur on all machines - or even on every call that is a sufficient size (one call with 250 MB will work, another call will not) - is anyone aware of user, machine, group policy settings, or folder permissions that would affect this?

UPDATE: A few other points, from the posts so far:

  • We do handle the cases where the large buffer allocation fails. For performance reasons in the 3rd party application that reads the file we're creating, we want to write all the data out in one big block (although given this error, it may not be possible)
  • We have checked the initial value of size in the routine above, and it is the same as the size of the buffer that was allocated. Also, when the EINVAL error code is raised, size is equal to 0, and buf is not a null pointer - which makes me think that this isn't the cause of the problem.

Another Update:

An example of a failure is below with some handy printfs in the code sample above.

     while (size>0) {
    if (NULL == buf)
    {
        printf("Buffer is null\n");
    }
    do {
        nbytes = _write(file->fd, buf, size);
    } while (-1==nbytes && EINTR==errno);
    if (-1==nbytes) /* error */
    {
        if (NULL == buf)
        {
            printf("Buffer is null post write\n");
        }
        printf("Error number: %d\n", errno);
        printf("Buffer address: %d\n", &buf);
        printf("Size: %d\n", size);
        throw("file write failed")
    }
    assert(nbytes>0);
    assert((size_t)nbytes<=size);
    size -= (size_t)nbytes;
    addr += (haddr_t)nbytes;
    buf = (const char*)buf + nbytes;
}

On a failure, this will print out:

Error number: 22
Buffer address: 1194824
Size: 89702400

Note that no bytes were successfully written and that the buffer has a valid address (and no NULL pointer checks were triggered, pre or post _write)

LAST UPDATE

Unfortunately, we were overcome by events and were not able to conclusively solve this. We were able to find some interesting (and maybe even disturbing) facts. 1. The errors only occurred on machines with slower write times on their hard disks. Two PCs, with the exact same hardware specs, but with different RAID configurations (RAID 0 versus RAID 1) would have different results. The RAID 0 would process the data correctly; the RAID 1 would fail. Similarly, older PCs with slower hard drives would also fail; newers PCs with faster hard drives - but similar processors / memory - would work. 2. The write size mattered. When we limited the amount of data passed to _write to be 64 MB, all but one file succeeded. When we limited it to 32 MB, all the files succeeded. We took a performance hit in the library we were using - which was a limitation of that library and independent of _write or the problem we were seeing - but it was our only "software" fix.

Unfortunately, I never got a good answer (and we were about to call Microsoft on this, but we had to get business to sign off on the expense of a tech support call) as to why the EINVAL was being returned in the first place. It isn't - from what we were able to find - documented anywhere in the C library API.

If anyone does find a good answer for this, please post it on here and I'll mark it as the answer. I'd love to get a conclusion for this saga, even if it no longer directly applies to me.


回答1:


We had a very similar problem which we managed to reproduce quite easily. We first compiled the following program:

#include <stdlib.h>
#include <stdio.h>
#include <io.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char *argv[])
{ int len = 70000000;
  int handle= creat(argv[1], S_IWRITE | S_IREAD);
  setmode (handle, _O_BINARY);
  void *buf = malloc(len);
  int byteswritten = write(handle, buf, len);
  if (byteswritten == len)
    printf("Write successful.\n");
  else
    printf("Write failed.\n");
  close(handle);
  return 0;
}

Now, let's say you are working on the computer mycomputer and that C:\inbox maps to a shared folder \\mycomputer\inbox. Then the observe the following effect:

C:\>a.exe C:\inbox\x
Write successful.

C:\>a.exe \\mycomputer\inbox\x
Write failed.

Note that if len is changed to 60000000, there is no problem...

Based on this web page support.microsoft.com/kb/899149, we think it is a "limitation of the operating system" (the same effect has been observed with fwrite). Our work around is to try to cut the write in 63 MB pieces if it fails. This problem has apparently been corrected on Windows Vista.

I hope this helps! Simon




回答2:


Did you look at the implementation of _write() in the CRT (C runtime) source that was installed with Visual Studio (C:\Program Files\Microsoft Visual Studio 8\VC\crt\src\write.c)?

There are at least two conditions that cause _write() to set errno to EINVAL:

  1. buffer is NULL, as you mentioned.
  2. count parameter is odd when the file is opened in text mode in UTF-16 format (or UTF-8? the comments don't match the code). Is this a text or binary file? If it's text, does it have a byte order mark?
  3. Perhaps another function that _write() calls also sets errno to EINVAL?

If you can reliably reproduce this problem, you should be able to narrow down the source of the error by putting breakpoints in the parts of the CRT source that set the error code. It also appears that the debug version of the CRT is capable of asserting when the error occurs, but it might require tweaking some options (I haven't tried it).




回答3:


According to http://msdn.microsoft.com/en-us/library/1570wh78(v=VS.90).aspx errno can take the values:

- EBADF
- ENOSPC
- EINVAL.

There is not EINTR on windows. Random system interrupts cause this error and are not caught by the test while (-1==nbytes && EINTR==errno);




回答4:


You could be trashing your own stack by accidentally misusing a pointer somewhere else - if you can find a repro machine, try running your app under Application Verifier with all the memory checks turned on




回答5:


Two thoughts come to mind.. Either you are walking past the end of the buffer, and trying to write that data out, or the allocation of the buffer failed. Problems that, in debug mode, will not be as visible as they are in release mode.

It's probably a bad idea to allocate 250 meg of memory anyways. You'd do better to allocate a fixed size buffer, and do your writing in chunks.

Have you looked for things like Virus Scanners that might have a hold on the file in between your write operations, thus making the write fail?

I know of no limit to the amount of data you can pass to write in a single call, unless (like I said), you are writing data (as part of the buffer) that does not belong to you...

Since most of these functions wrap the Kernel call WriteFile(), (Or NtWriteFile()), there COULD be the condition that there isn't enough Kernel memory to handle the buffer to write. But, THAT I'm not certain of, since I don't know WHEN exactly the code makes the jump from UM to KM.

Don't know if any of this will help, but hope it does...

If you can provide any more details, please do. Sometimes just telling someone about the problem will trigger your brain to go "Wait a minute!", and you'll figure it out. heh..



来源:https://stackoverflow.com/questions/584347/machine-dependent-write-failures-with-einval-error-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!