Sorting Records from a binary file in C

二次信任 提交于 2019-12-25 08:14:14

问题


I am working on a program that is to use read(), write(), open(), and close() to deal with files. We are given a binary file of records to sort.

My confusion starts at the read step. From my understanding, read puts the file contents into a character array. So if I'm not totally off, it means each index contains a single byte of information. The records are each seperate by a space. I am to sort them by the first four bytes each contains.

I know the format of the records but data has a variable range. Luckily there are only spaces between records, none within a single one. The structure is one integer as a file header that says how many records there are. Each key is 4 bytes, followed by 4 bytes saying how much data there is, followed by the data all without spaces. The size of the data does not include the space.

Will a sort routine from the C library work with these being handled as characters and not integers? Also, I am not sure where to start with separating and rearranging the records. Would I have to extract each into an array of record structs and sort from there?

I am new to C and can't find much online using these specific functions. It was from a homework assignment but the due date has passed; I am just trying to get my understanding up to speed.


回答1:


If the file is binary, as you write - then the records are not separated by anything, you just need to know the size of each record (all records probably have same size).

For sorting, you can use the standard library functions such as qsort. This function uses a callback that you provide, so it can work with any kind of data. After qsort returns, you will have the data rearranged.

Would I have to extract each into an array of record structs and sort from there?

Yes, for small number of records (as in a student assignment) this is a good option.




回答2:


Test data generation

We don't have any sample data, so we have to create some. Let's use plain text lines as the 'data' section, and we can generate random keys in the range 0..999, and report the length of the lines as the length of the data, and include the blank pad at the end of each record.

For example, this code does the generation job, reading from standard input and writing to standard output:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

int main(void)
{
    srand(time(0));

    int fd = STDOUT_FILENO;
    char *buffer = 0;
    size_t buflen = 0;
    int len;
    while ((len = getline(&buffer, &buflen, stdin)) != -1)
    {
        int key = rand() % 1000;
        write(fd, &key, sizeof(key));
        write(fd, &len, sizeof(len));
        write(fd, buffer, len);
        write(fd, " ", 1);
    }
    free(buffer);
    return 0;
}

Given the input file (search on 'great panjandrum' to find out where the text comes from — it is not meant to be very sensible):

So she went into the garden
to cut a cabbage-leaf
to make an apple-pie
and at the same time
a great she-bear coming down the street
pops its head into the shop
What no soap
So he died
and she very imprudently married the Barber
and there were present
the Picninnies
and the Joblillies
and the Garyulies
and the great Panjandrum himself
with the little round button at top
and they all fell to playing the game of catch-as-catch-can
till the gunpowder ran out at the heels of their boots

the output could be:

0x0000: C3 03 00 00 1C 00 00 00 53 6F 20 73 68 65 20 77   ........So she w
0x0010: 65 6E 74 20 69 6E 74 6F 20 74 68 65 20 67 61 72   ent into the gar
0x0020: 64 65 6E 0A 20 C7 01 00 00 16 00 00 00 74 6F 20   den. ........to 
0x0030: 63 75 74 20 61 20 63 61 62 62 61 67 65 2D 6C 65   cut a cabbage-le
0x0040: 61 66 0A 20 6C 03 00 00 15 00 00 00 74 6F 20 6D   af. l.......to m
0x0050: 61 6B 65 20 61 6E 20 61 70 70 6C 65 2D 70 69 65   ake an apple-pie
0x0060: 0A 20 6F 02 00 00 15 00 00 00 61 6E 64 20 61 74   . o.......and at
0x0070: 20 74 68 65 20 73 61 6D 65 20 74 69 6D 65 0A 20    the same time. 
0x0080: 80 02 00 00 28 00 00 00 61 20 67 72 65 61 74 20   ....(...a great 
0x0090: 73 68 65 2D 62 65 61 72 20 63 6F 6D 69 6E 67 20   she-bear coming 
0x00A0: 64 6F 77 6E 20 74 68 65 20 73 74 72 65 65 74 0A   down the street.
0x00B0: 20 F5 02 00 00 1C 00 00 00 70 6F 70 73 20 69 74    ........pops it
0x00C0: 73 20 68 65 61 64 20 69 6E 74 6F 20 74 68 65 20   s head into the 
0x00D0: 73 68 6F 70 0A 20 10 01 00 00 0D 00 00 00 57 68   shop. ........Wh
0x00E0: 61 74 20 6E 6F 20 73 6F 61 70 0A 20 4F 02 00 00   at no soap. O...
0x00F0: 0B 00 00 00 53 6F 20 68 65 20 64 69 65 64 0A 20   ....So he died. 
0x0100: 73 01 00 00 2C 00 00 00 61 6E 64 20 73 68 65 20   s...,...and she 
0x0110: 76 65 72 79 20 69 6D 70 72 75 64 65 6E 74 6C 79   very imprudently
0x0120: 20 6D 61 72 72 69 65 64 20 74 68 65 20 42 61 72    married the Bar
0x0130: 62 65 72 0A 20 60 01 00 00 17 00 00 00 61 6E 64   ber. `.......and
0x0140: 20 74 68 65 72 65 20 77 65 72 65 20 70 72 65 73    there were pres
0x0150: 65 6E 74 0A 20 0D 00 00 00 0F 00 00 00 74 68 65   ent. ........the
0x0160: 20 50 69 63 6E 69 6E 6E 69 65 73 0A 20 46 02 00    Picninnies. F..
0x0170: 00 13 00 00 00 61 6E 64 20 74 68 65 20 4A 6F 62   .....and the Job
0x0180: 6C 69 6C 6C 69 65 73 0A 20 88 02 00 00 12 00 00   lillies. .......
0x0190: 00 61 6E 64 20 74 68 65 20 47 61 72 79 75 6C 69   .and the Garyuli
0x01A0: 65 73 0A 20 92 00 00 00 21 00 00 00 61 6E 64 20   es. ....!...and 
0x01B0: 74 68 65 20 67 72 65 61 74 20 50 61 6E 6A 61 6E   the great Panjan
0x01C0: 64 72 75 6D 20 68 69 6D 73 65 6C 66 0A 20 A8 01   drum himself. ..
0x01D0: 00 00 24 00 00 00 77 69 74 68 20 74 68 65 20 6C   ..$...with the l
0x01E0: 69 74 74 6C 65 20 72 6F 75 6E 64 20 62 75 74 74   ittle round butt
0x01F0: 6F 6E 20 61 74 20 74 6F 70 0A 20 15 01 00 00 3C   on at top. ....<
0x0200: 00 00 00 61 6E 64 20 74 68 65 79 20 61 6C 6C 20   ...and they all 
0x0210: 66 65 6C 6C 20 74 6F 20 70 6C 61 79 69 6E 67 20   fell to playing 
0x0220: 74 68 65 20 67 61 6D 65 20 6F 66 20 63 61 74 63   the game of catc
0x0230: 68 2D 61 73 2D 63 61 74 63 68 2D 63 61 6E 0A 20   h-as-catch-can. 
0x0240: B1 03 00 00 37 00 00 00 74 69 6C 6C 20 74 68 65   ....7...till the
0x0250: 20 67 75 6E 70 6F 77 64 65 72 20 72 61 6E 20 6F    gunpowder ran o
0x0260: 75 74 20 61 74 20 74 68 65 20 68 65 65 6C 73 20   ut at the heels 
0x0270: 6F 66 20 74 68 65 69 72 20 62 6F 6F 74 73 0A 20   of their boots. 
0x0280:

Sorting the generated data

Now we have data that can be processed by a program to read, print, sort, and print the data.

#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

struct record
{
    int   key;
    int   data_len;
    char  data[];
};

static int comparator(const void *v1, const void *v2);
static void print_records(const char *tag, int num_recs, struct record **recs);
static void err_syserr(const char *msg, ...);
static void err_setarg0(const char *argv0);

int main(int argc, char **argv)
{
    if (argc != 2)
    {
        fprintf(stderr, "Usage: %s file\n", argv[0]);
        return 1;
    }

    err_setarg0(argv[0]);

    int fd = open(argv[1], O_RDONLY);
    if (fd < 0)
        err_syserr("Failed to open file '%s' for reading\n", argv[1]);

    struct record **records = 0;
    int num_recs = 0;
    int max_recs = 0;
    int key;
    int len;
    while (read(fd, &key, sizeof(key)) == sizeof(key) &&
           read(fd, &len, sizeof(len)) == sizeof(len))
    {
        //printf("rec num %d (key %d, len %d)\n", num_recs, key, len);
        assert(len > 0);
        assert(num_recs <= max_recs);
        if (num_recs == max_recs)
        {
            size_t new_max = 2 * max_recs + 2;
            void *new_recs = realloc(records, new_max * sizeof(*records));
            if (new_recs == 0)
                err_syserr("Failed to realloc() %zu bytes of memory\n", new_max * sizeof(*records));
            records = new_recs;
            max_recs = new_max;
        }
        int rec_size = sizeof(struct record) + len;
        records[num_recs] = malloc(rec_size);
        records[num_recs]->key = key;
        records[num_recs]->data_len = len;
        if (read(fd, records[num_recs]->data, len) != len)
            err_syserr("Short read for record number %d (key %d)\n", num_recs, key);
        records[num_recs]->data[len-1] = '\0';
        //printf("Data: [%s]\n", records[num_recs]->data);
        char blank = 0;
        if (read(fd, &blank, sizeof(blank)) != sizeof(blank))
            err_syserr("Missing record terminator after record number %d (key %d)\n", num_recs, key);
        if (blank != ' ')
            err_syserr("Unexpected EOR code %d for record number %d (key %d)\n", blank, num_recs, key);
        num_recs++;
    }
    close(fd);

    print_records("Before", num_recs, records);
    qsort(records, num_recs, sizeof(struct record *), comparator);
    print_records("After", num_recs, records);

    for (int i = 0; i < num_recs; i++)
        free(records[i]);
    free(records);

    return 0;
}

static int comparator(const void *v1, const void *v2)
{
    int key_1 = (*(struct record **)v1)->key;
    int key_2 = (*(struct record **)v2)->key;
    if (key_1 < key_2)
        return -1;
    else if (key_1 > key_2)
        return +1;
    else
        return 0;
}

static void print_records(const char *tag, int num_recs, struct record **recs)
{
    printf("%s (%d records):\n", tag, num_recs);
    for (int i = 0; i < num_recs; i++)
    {
        struct record *rec = recs[i];
        printf("%6d: %4d: %s\n", rec->key, rec->data_len, rec->data);
    }
}

/* My standard error handling - stderr.h and stderr.c */
static const char *arg0 = "unknown";

static void err_setarg0(const char *argv0)
{
    arg0 = argv0;
}

static void err_syserr(const char *fmt, ...)
{
    va_list args;
    int errnum = errno;
    fprintf(stderr, "%s: ", arg0);
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
    if (errnum != 0)
        fprintf(stderr, "(%d: %s)\n", errnum, strerror(errnum));
    exit(EXIT_FAILURE);
}

The code exploits the knowledge that the data for each record ends with a newline, and it overwrites that newline with a null byte. That makes the presentation better, too. Also, note that you cannot create an array of structures with flexible array members (because the elements of an array are all the same size, and structures with flexible array members are not all the same size). Hence the code uses an array of pointers to structures with flexible array members. That affects the comparator function, amongst others.

In part because the data format is moderately complex, the code is careful to identify erroneous (malformatted) data.

Sample run

Note that the first column in the output is the record key — the goal is to sort the data into ascending order of the key numbers. The second column is the data length.

Before (17 records):
   963:   28: So she went into the garden
   455:   22: to cut a cabbage-leaf
   876:   21: to make an apple-pie
   623:   21: and at the same time
   640:   40: a great she-bear coming down the street
   757:   28: pops its head into the shop
   272:   13: What no soap
   591:   11: So he died
   371:   44: and she very imprudently married the Barber
   352:   23: and there were present
    13:   15: the Picninnies
   582:   19: and the Joblillies
   648:   18: and the Garyulies
   146:   33: and the great Panjandrum himself
   424:   36: with the little round button at top
   277:   60: and they all fell to playing the game of catch-as-catch-can
   945:   55: till the gunpowder ran out at the heels of their boots
After (17 records):
    13:   15: the Picninnies
   146:   33: and the great Panjandrum himself
   272:   13: What no soap
   277:   60: and they all fell to playing the game of catch-as-catch-can
   352:   23: and there were present
   371:   44: and she very imprudently married the Barber
   424:   36: with the little round button at top
   455:   22: to cut a cabbage-leaf
   582:   19: and the Joblillies
   591:   11: So he died
   623:   21: and at the same time
   640:   40: a great she-bear coming down the street
   648:   18: and the Garyulies
   757:   28: pops its head into the shop
   876:   21: to make an apple-pie
   945:   55: till the gunpowder ran out at the heels of their boots
   963:   28: So she went into the garden


来源:https://stackoverflow.com/questions/39682323/sorting-records-from-a-binary-file-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!