What is a good algorithm for compacting records in a blocked file?

前端 未结 4 1186
长情又很酷
长情又很酷 2021-01-20 05:31

Suppose you have a large file made up of a bunch of fixed size blocks. Each of these blocks contains some number of variable sized records. Each record must fit completely w

4条回答
  •  情书的邮戳
    2021-01-20 05:50

    If there is no ordering to these records, I'd simply fill the blocks from the front with records extracted from the last block(s). This will minimize movement of data, is fairly simple, and should do a decent job of packing data tightly.

    E.g.:

    // records should be sorted by size in memory (probably in a balanced BST)
    records = read last N blocks on disk;
    
    foreach (block in blocks) // read from disk into memory
    {
        if (block.hasBeenReadFrom())
        {
            // we read from this into records already
            // all remaining records are already in memory
    
            writeAllToNewBlocks(records);
    
            // this will leave some empty blocks on the disk that can either
            // be eliminated programmatically or left alone and filled during
            // normal operation
    
            foreach (record in records)
            {
                record.eraseFromOriginalLocation();
            }
    
            break;
        }
    
        while(!block.full())
        {
            moveRecords = new Array; // list of records we've moved
    
            size = block.availableSpace();
            record = records.extractBestFit(size);
            if (record == null)
            {
                break;
            }
    
            moveRecords.add(record);
            block.add(record);
    
            if (records.gettingLow())
            {
                records.readMoreFromDisk();
            }
        }
    
        if(moveRecords.size() > 0)
        {
            block.writeBackToDisk();
            foreach (record in moveRecords)
            {
                record.eraseFromOriginalLocation();
            }
        }
    }
    

    Update: I neglected to maintain the no-blocks-only-in-memory rule. I've updated the pseudocode to fix this. Also fixed a glitch in my loop condition.

提交回复
热议问题