问题
I'm writing a cache simulator in C that's based on trace files, which I want to pipe into the program via stdin. These trace files can be up to 15 billion lines long, so I don't want to store them anywhere in active memory. I want to run the simulation multiple times for different memory configurations from one call using a configuration file which is specified in the input to the program. The program call should look like this:
cat | (trace file) ./MemorySimulator -f (config file)
Right now, the way the program runs is that it uses the config file to set the parameters of a simulation then reads the piped in formatted data from stdin using scanf() until it reaches the end of the trace file. It then proceeds to the next configuration setting from the config file and tries to read data from the trace file over again. This process continues until the various configuration options have been exhausted.
The problem I'm running into is that once I run through the trace file once, I'm unable to capture the data again for the following memory configuration from the config file.
Is there a way to recycle the pipe data within my C program so that I can run the simulation multiple times from a single program execution? So far, I haven't been able to find a way to accomplish this.
回答1:
No, that doesn't work. That's the very nature of a pipe.
You cannot have the demand that data isn't cached and at the same time that it can be re-requested.
In a pipe, one the data has been written, it is gone, so you haveto store it somewhere in order not to get lost.
The only way you can accomplish this is to "imitate" the behaviour of the other program - which should be trivial in the cat
case.
To be exact, your code is a very good example for the famous UUOC (Unneecessary Use of cat
).
If you are requested to read from stdin - well, that hasn't to be a pipe. Instead of
cat file | program
you can do
program < file
and this doesn't give you a pipe, but direct access to the file, including the ability to seek.
You could use this if possible, and if not, either cache the data yourself or refuse to run.
This, however, doesn't work if you are requested to accept all kinds of standard input.
回答2:
You asked:
Is there a way to recycle the pipe data within my C program so that I can run the simulation multiple times from a single program execution?
If you are open to using the trace file as an input argument to the program, you can accomplish what you want.
Instead of
cat <tracefile> | ./MemorySimulator -f (config file)
you can use:
./MemorySimulator <tracefile> -f (config file)
In main
, use fopen
to open the trace file. Once you are done using it for one configuration, rewind using frewind
and reuse the FILE*
for the next configuration.
You can also use fopen/fclose
on the trace file for each configuration.
回答3:
Given your comments that you are required to read your data from stdin
(and, I presume, cannot require stdin
to be directly redirected from a file), you have little choice than to cache the data yourself. Since that data is more than 40GB, the cache better be a disk file.
What I'd do is, on the first pass, open a temporary file for read/write and as you read from a FILE*
variable set equal to stdin
, also write the data to your temporary file. At the end of the first pass, copy your temporary file fp to your input fp.
Now for the remaining passes, you can start be rewinding your input (temporary) file and read it for input.
You can use your loop counter to determine what you need to do each pass.
Here's an overview of this code:
infp = stdin;
for (loop = 0; loop < NUM_LOOPS; loop++) {
if (loop == 0) {
tmpfp = fopen("tmpfile.tmp", "w");
//check for errors here
}
for (;;) {
num_read = read(infp, buf, sizeof(buf));
// check for EOF here and break if so
if (loop == 0) {
num_written = write(tmpfp, buf, num_read);
//check for write errors here
}
// Main input processing code
}
if (loop == 0) {
infp = tmpfp;
}
rewind(infp);
}
来源:https://stackoverflow.com/questions/29763277/reading-the-same-data-from-stdin-multiple-times-in-c