Faster I/O in C

自闭症网瘾萝莉.ら 提交于 2020-01-12 05:30:14

问题


I have a problem which will take 1000000 lines of inputs like below from console.

0 1 23 4 5
1 3 5 2 56
12 2 3 33 5
...
...

I have used scanf, but it is very very slow. Is there anyway to get the input from console in a faster way? I could use read(), but I am not sure about the no of bytes in each line, so I can not as read() to read 'n' bytes. Thanks, Very obliged


回答1:


Use fgets(...) to pull in a line at a time. Note that you should check for the '\n' at the end of the line, and if there is not one, you are either at EOF, or you need to read another buffer's worth, and concatenate the two together. Lather, rinse, repeat. Don't get caught with a buffer overflow.

THEN, you can parse each logical line in memory yourself. I like to use strspn(...) and strcspn(...) for this sort of thing, but your mileage may vary.

Parsing: Define a delimiters string. Use strspn() to count "non data" chars that match the delimiters, and skip over them. Use strcspn() to count the "data" chars that DO NOT match the delimiters. If this count is 0, you are done (no more data in the line). Otherwise, copy out those N chars to hand to a parsing function such as atoi(...) or sscanf(...). Then, reset your pointer base to the end of this chunk and repeat the skip-delims, copy-data, convert-to-numeric process.




回答2:


You use multiple reads with a fixed size buffer till you hit end of file.




回答3:


If your example is representative, that you indeed have a fixed format of five decimal numbers per line, I'd probably use a combination of fgets() to read the lines, then a loop calling strtol() to convert from string to integer.

That should be faster than scanf(), while still clearer and more high-level than doing the string to integer conversion on your own.

Something like this:

typedef struct {
  int number[5];
} LineOfNumbers;

int getNumbers(FILE *in, LineOfNumbers *line)
{
  char buf[128];  /* Should be large enough. */
  if(fgets(buf, sizeof buf, in) != NULL)
  {
    int i;
    char *ptr, *eptr;

    ptr = buf;
    for(i = 0; i < sizeof line->number / sizeof *line->number; i++)
    {
      line->number[i] = (int) strtol(ptr, &eptr, 10);
      if(eptr == ptr)
        return 0;
      ptr = eptr;
    }
    return 1;
  }
  return 0;
}

Note: this is untested (even uncompiled!) browser-written code. But perhaps useful as a concrete example.




回答4:


Use binary I/O if you can. Text conversion can slow down the reading by several times. If you're using text I/O because it's easy to debug, consider again binary format, and use the od program (assuming you're on unix) to make it human-readable when needed.

Oh, another thing: there's AT&T's SFIO library, which stands for safer/faster file IO. You might also have some luck with that, but I doubt that you'll get the same kind of speedup as you will with binary format.




回答5:


Read a line at a time (if buffer not big enough for a line, expand and continue with larger buffer).

Then use dedicated functions (e.g. atoi) rather than general for conversion.

But, most of all, set up a repeatable test harness with profiling to ensure changes really do speed things up.




回答6:


Out of curiosity, what generates that many lines that fast in a console ?




回答7:


fread will still return if you try to read more bytes than there are.

I have found on of the fastest ways to read file is like this:

/*seek end of file */ fseek(file,0,SEEK_END);

/*get size of file */ size = ftell(file);

/*seek start of file */ fseek(file,0,SEEK_SET);

/* make a buffer for the file */ buffer = malloc(1048576);

/*fread in 1MB at a time until you reach size bytes etc */

On modern computers put your ram to use and load the whole thing to ram, then you can easily work your way through the memory.

At the very least you should be using fread with block sizes as big as you can, and at least as big as the cache blocks or HDD sector size (4096 bytes minimum, I would use 1048576 as a minimum personally). You will find that with much bigger read requsts rfead is able to sequentially get a big stream in one operation. The suggestion here of some people to use 128 bytes is rediculous.... as you will end up with the drive having to seek all the time as the tiny delay between calls will cause the head to already be past the next sector which almost certainly has sequential data that you want.




回答8:


You can greatly reduce the time of execution by taking input using fread() or fread_unlocked() (if your program is single-threaded). Locking/Unlocking the input stream just once takes negligible time, so ignore that.

Here is the code:

#include <iostream>

int maxio=1000000;
char buf[maxio], *s = buf + maxio;

inline char getc1(void)
{
   if(s >= buf + maxio) { fread_unlocked(buf,sizeof(char),maxio,stdin); s = buf; }
   return *(s++);
}
inline int input()
{
   char t = getc1();
   int n=1,res=0;
   while(t!='-' && !isdigit(t)) t=getc1(); if(t=='-')
   {
      n=-1; t=getc1();
   }
   while(isdigit(t))
   {
     res = 10*res + (t&15);
     t=getc1();
   }
   return res*n;
}

This is implemented in C++. In C, you won't need to include iostream, function isdigit() is implicitly available.

You can take input as a stream of chars by calling getc1() and take integer input by calling input().

The whole idea behind using fread() is to take all the input at once. Calling scanf()/printf(), repeatedly takes up valuable time in locking and unlocking streams which is completely redundant in a single-threaded program.

Also make sure that the value of maxio is such that all input can be taken in a few "roundtrips" only (ideally one, in this case). Tweak it as necessary.

Hope this helps!



来源:https://stackoverflow.com/questions/705303/faster-i-o-in-c

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!