问题
I am working on a program that is to use read(), write(), open(), and close() to deal with files. We are given a binary file of records to sort.
My confusion starts at the read step. From my understanding, read puts the file contents into a character array. So if I'm not totally off, it means each index contains a single byte of information. The records are each seperate by a space. I am to sort them by the first four bytes each contains.
I know the format of the records but data has a variable range. Luckily there are only spaces between records, none within a single one. The structure is one integer as a file header that says how many records there are. Each key is 4 bytes, followed by 4 bytes saying how much data there is, followed by the data all without spaces. The size of the data does not include the space.
Will a sort routine from the C library work with these being handled as characters and not integers? Also, I am not sure where to start with separating and rearranging the records. Would I have to extract each into an array of record structs and sort from there?
I am new to C and can't find much online using these specific functions. It was from a homework assignment but the due date has passed; I am just trying to get my understanding up to speed.
回答1:
If the file is binary, as you write - then the records are not separated by anything, you just need to know the size of each record (all records probably have same size).
For sorting, you can use the standard library functions such as qsort. This function uses a callback that you provide, so it can work with any kind of data. After qsort returns, you will have the data rearranged.
Would I have to extract each into an array of record structs and sort from there?
Yes, for small number of records (as in a student assignment) this is a good option.
回答2:
Test data generation
We don't have any sample data, so we have to create some. Let's use plain text lines as the 'data' section, and we can generate random keys in the range 0..999, and report the length of the lines as the length of the data, and include the blank pad at the end of each record.
For example, this code does the generation job, reading from standard input and writing to standard output:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
int main(void)
{
srand(time(0));
int fd = STDOUT_FILENO;
char *buffer = 0;
size_t buflen = 0;
int len;
while ((len = getline(&buffer, &buflen, stdin)) != -1)
{
int key = rand() % 1000;
write(fd, &key, sizeof(key));
write(fd, &len, sizeof(len));
write(fd, buffer, len);
write(fd, " ", 1);
}
free(buffer);
return 0;
}
Given the input file (search on 'great panjandrum' to find out where the text comes from — it is not meant to be very sensible):
So she went into the garden
to cut a cabbage-leaf
to make an apple-pie
and at the same time
a great she-bear coming down the street
pops its head into the shop
What no soap
So he died
and she very imprudently married the Barber
and there were present
the Picninnies
and the Joblillies
and the Garyulies
and the great Panjandrum himself
with the little round button at top
and they all fell to playing the game of catch-as-catch-can
till the gunpowder ran out at the heels of their boots
the output could be:
0x0000: C3 03 00 00 1C 00 00 00 53 6F 20 73 68 65 20 77 ........So she w
0x0010: 65 6E 74 20 69 6E 74 6F 20 74 68 65 20 67 61 72 ent into the gar
0x0020: 64 65 6E 0A 20 C7 01 00 00 16 00 00 00 74 6F 20 den. ........to
0x0030: 63 75 74 20 61 20 63 61 62 62 61 67 65 2D 6C 65 cut a cabbage-le
0x0040: 61 66 0A 20 6C 03 00 00 15 00 00 00 74 6F 20 6D af. l.......to m
0x0050: 61 6B 65 20 61 6E 20 61 70 70 6C 65 2D 70 69 65 ake an apple-pie
0x0060: 0A 20 6F 02 00 00 15 00 00 00 61 6E 64 20 61 74 . o.......and at
0x0070: 20 74 68 65 20 73 61 6D 65 20 74 69 6D 65 0A 20 the same time.
0x0080: 80 02 00 00 28 00 00 00 61 20 67 72 65 61 74 20 ....(...a great
0x0090: 73 68 65 2D 62 65 61 72 20 63 6F 6D 69 6E 67 20 she-bear coming
0x00A0: 64 6F 77 6E 20 74 68 65 20 73 74 72 65 65 74 0A down the street.
0x00B0: 20 F5 02 00 00 1C 00 00 00 70 6F 70 73 20 69 74 ........pops it
0x00C0: 73 20 68 65 61 64 20 69 6E 74 6F 20 74 68 65 20 s head into the
0x00D0: 73 68 6F 70 0A 20 10 01 00 00 0D 00 00 00 57 68 shop. ........Wh
0x00E0: 61 74 20 6E 6F 20 73 6F 61 70 0A 20 4F 02 00 00 at no soap. O...
0x00F0: 0B 00 00 00 53 6F 20 68 65 20 64 69 65 64 0A 20 ....So he died.
0x0100: 73 01 00 00 2C 00 00 00 61 6E 64 20 73 68 65 20 s...,...and she
0x0110: 76 65 72 79 20 69 6D 70 72 75 64 65 6E 74 6C 79 very imprudently
0x0120: 20 6D 61 72 72 69 65 64 20 74 68 65 20 42 61 72 married the Bar
0x0130: 62 65 72 0A 20 60 01 00 00 17 00 00 00 61 6E 64 ber. `.......and
0x0140: 20 74 68 65 72 65 20 77 65 72 65 20 70 72 65 73 there were pres
0x0150: 65 6E 74 0A 20 0D 00 00 00 0F 00 00 00 74 68 65 ent. ........the
0x0160: 20 50 69 63 6E 69 6E 6E 69 65 73 0A 20 46 02 00 Picninnies. F..
0x0170: 00 13 00 00 00 61 6E 64 20 74 68 65 20 4A 6F 62 .....and the Job
0x0180: 6C 69 6C 6C 69 65 73 0A 20 88 02 00 00 12 00 00 lillies. .......
0x0190: 00 61 6E 64 20 74 68 65 20 47 61 72 79 75 6C 69 .and the Garyuli
0x01A0: 65 73 0A 20 92 00 00 00 21 00 00 00 61 6E 64 20 es. ....!...and
0x01B0: 74 68 65 20 67 72 65 61 74 20 50 61 6E 6A 61 6E the great Panjan
0x01C0: 64 72 75 6D 20 68 69 6D 73 65 6C 66 0A 20 A8 01 drum himself. ..
0x01D0: 00 00 24 00 00 00 77 69 74 68 20 74 68 65 20 6C ..$...with the l
0x01E0: 69 74 74 6C 65 20 72 6F 75 6E 64 20 62 75 74 74 ittle round butt
0x01F0: 6F 6E 20 61 74 20 74 6F 70 0A 20 15 01 00 00 3C on at top. ....<
0x0200: 00 00 00 61 6E 64 20 74 68 65 79 20 61 6C 6C 20 ...and they all
0x0210: 66 65 6C 6C 20 74 6F 20 70 6C 61 79 69 6E 67 20 fell to playing
0x0220: 74 68 65 20 67 61 6D 65 20 6F 66 20 63 61 74 63 the game of catc
0x0230: 68 2D 61 73 2D 63 61 74 63 68 2D 63 61 6E 0A 20 h-as-catch-can.
0x0240: B1 03 00 00 37 00 00 00 74 69 6C 6C 20 74 68 65 ....7...till the
0x0250: 20 67 75 6E 70 6F 77 64 65 72 20 72 61 6E 20 6F gunpowder ran o
0x0260: 75 74 20 61 74 20 74 68 65 20 68 65 65 6C 73 20 ut at the heels
0x0270: 6F 66 20 74 68 65 69 72 20 62 6F 6F 74 73 0A 20 of their boots.
0x0280:
Sorting the generated data
Now we have data that can be processed by a program to read, print, sort, and print the data.
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
struct record
{
int key;
int data_len;
char data[];
};
static int comparator(const void *v1, const void *v2);
static void print_records(const char *tag, int num_recs, struct record **recs);
static void err_syserr(const char *msg, ...);
static void err_setarg0(const char *argv0);
int main(int argc, char **argv)
{
if (argc != 2)
{
fprintf(stderr, "Usage: %s file\n", argv[0]);
return 1;
}
err_setarg0(argv[0]);
int fd = open(argv[1], O_RDONLY);
if (fd < 0)
err_syserr("Failed to open file '%s' for reading\n", argv[1]);
struct record **records = 0;
int num_recs = 0;
int max_recs = 0;
int key;
int len;
while (read(fd, &key, sizeof(key)) == sizeof(key) &&
read(fd, &len, sizeof(len)) == sizeof(len))
{
//printf("rec num %d (key %d, len %d)\n", num_recs, key, len);
assert(len > 0);
assert(num_recs <= max_recs);
if (num_recs == max_recs)
{
size_t new_max = 2 * max_recs + 2;
void *new_recs = realloc(records, new_max * sizeof(*records));
if (new_recs == 0)
err_syserr("Failed to realloc() %zu bytes of memory\n", new_max * sizeof(*records));
records = new_recs;
max_recs = new_max;
}
int rec_size = sizeof(struct record) + len;
records[num_recs] = malloc(rec_size);
records[num_recs]->key = key;
records[num_recs]->data_len = len;
if (read(fd, records[num_recs]->data, len) != len)
err_syserr("Short read for record number %d (key %d)\n", num_recs, key);
records[num_recs]->data[len-1] = '\0';
//printf("Data: [%s]\n", records[num_recs]->data);
char blank = 0;
if (read(fd, &blank, sizeof(blank)) != sizeof(blank))
err_syserr("Missing record terminator after record number %d (key %d)\n", num_recs, key);
if (blank != ' ')
err_syserr("Unexpected EOR code %d for record number %d (key %d)\n", blank, num_recs, key);
num_recs++;
}
close(fd);
print_records("Before", num_recs, records);
qsort(records, num_recs, sizeof(struct record *), comparator);
print_records("After", num_recs, records);
for (int i = 0; i < num_recs; i++)
free(records[i]);
free(records);
return 0;
}
static int comparator(const void *v1, const void *v2)
{
int key_1 = (*(struct record **)v1)->key;
int key_2 = (*(struct record **)v2)->key;
if (key_1 < key_2)
return -1;
else if (key_1 > key_2)
return +1;
else
return 0;
}
static void print_records(const char *tag, int num_recs, struct record **recs)
{
printf("%s (%d records):\n", tag, num_recs);
for (int i = 0; i < num_recs; i++)
{
struct record *rec = recs[i];
printf("%6d: %4d: %s\n", rec->key, rec->data_len, rec->data);
}
}
/* My standard error handling - stderr.h and stderr.c */
static const char *arg0 = "unknown";
static void err_setarg0(const char *argv0)
{
arg0 = argv0;
}
static void err_syserr(const char *fmt, ...)
{
va_list args;
int errnum = errno;
fprintf(stderr, "%s: ", arg0);
va_start(args, fmt);
vfprintf(stderr, fmt, args);
va_end(args);
if (errnum != 0)
fprintf(stderr, "(%d: %s)\n", errnum, strerror(errnum));
exit(EXIT_FAILURE);
}
The code exploits the knowledge that the data for each record ends with a newline, and it overwrites that newline with a null byte. That makes the presentation better, too. Also, note that you cannot create an array of structures with flexible array members (because the elements of an array are all the same size, and structures with flexible array members are not all the same size). Hence the code uses an array of pointers to structures with flexible array members. That affects the comparator function, amongst others.
In part because the data format is moderately complex, the code is careful to identify erroneous (malformatted) data.
Sample run
Note that the first column in the output is the record key — the goal is to sort the data into ascending order of the key numbers. The second column is the data length.
Before (17 records):
963: 28: So she went into the garden
455: 22: to cut a cabbage-leaf
876: 21: to make an apple-pie
623: 21: and at the same time
640: 40: a great she-bear coming down the street
757: 28: pops its head into the shop
272: 13: What no soap
591: 11: So he died
371: 44: and she very imprudently married the Barber
352: 23: and there were present
13: 15: the Picninnies
582: 19: and the Joblillies
648: 18: and the Garyulies
146: 33: and the great Panjandrum himself
424: 36: with the little round button at top
277: 60: and they all fell to playing the game of catch-as-catch-can
945: 55: till the gunpowder ran out at the heels of their boots
After (17 records):
13: 15: the Picninnies
146: 33: and the great Panjandrum himself
272: 13: What no soap
277: 60: and they all fell to playing the game of catch-as-catch-can
352: 23: and there were present
371: 44: and she very imprudently married the Barber
424: 36: with the little round button at top
455: 22: to cut a cabbage-leaf
582: 19: and the Joblillies
591: 11: So he died
623: 21: and at the same time
640: 40: a great she-bear coming down the street
648: 18: and the Garyulies
757: 28: pops its head into the shop
876: 21: to make an apple-pie
945: 55: till the gunpowder ran out at the heels of their boots
963: 28: So she went into the garden
来源:https://stackoverflow.com/questions/39682323/sorting-records-from-a-binary-file-in-c