Random numbers external sort

眉间皱痕 提交于 2021-01-01 06:47:45

问题


I need to write a program that generates N random numbers and write them to binary file in descending order. It should be done without using any of sorting algorithms that use main memory. This is what I've done so far:

#include <iostream>
#include <fstream> 
#include <ctime>
#include <cstdlib>

using namespace std;
int main () {
  srand(time(0));
  rand();
  int N;
  do{
    cout << "Unesite N: ";
    cin >> N;
    } while(N<=0);

  ofstream br("broj.dat", ios::binary | ios::trunc);

  for(int i = 0; i<N; i++){
    int a = rand();
    br.write((char *)&a, sizeof(a));
  }
  br.close();

  return 0;
}

So, I've generated random numbers and wrote them to binary file but I don't know how to sort it.


回答1:


You can generate your numbers in sorted order in linear time. The paper describing how to do this is: Generating Sorted Lists of Random Numbers by Bentley & Saxe

https://pdfs.semanticscholar.org/2dbc/4e3f10b88832fcd5fb88d34b8fb0b0102000.pdf

/**
 * Generate an sorted list of random numbers sorted from 1 to 0, given the size
 * of the list being requested.
 * 
 * This is an implementation of an algorithm developed by Bentley and Sax, and
 * published in in ACM Transactions on Mathematical Software (v6, iss3, 1980) on
 * 'Generating Sorted Lists of Random Numbers'.
 */
public class SortedRandomDoubleGenerator {
    private long       valsFound;
    private double     curMax;
    private final long numVals;

    /**
     * Instantiate a generator of sorted random doubles.
     * 
     * @param numVals the size of the list of sorted random doubles to be
     *        generated
     */
    public SortedRandomDoubleGenerator(long numVals) {
        curMax = 1.0;
        valsFound = 0;
        this.numVals = numVals;
    }

    /**
     * @return the next random number, in descending order.
     */
    public double getNext() {
        curMax = curMax
                * Math.pow(Math.E, Math.log(RandomNumbers.nextDouble())
                        / (numVals - valsFound));
        valsFound++;
        return curMax;
    }
}



回答2:


Here is pseudo-code for how I'd do it.

for i in 1..N:
    write rand() to new file
    push onto file stack (new file, size=1)
    while 2 < len(file stack) and size of top two files the same:
        pop top two and merge them
        push onto file stack (merged file, size=new size)

while 2 < len(file stack):
    pop top two and merge them
    push onto file stack (merged file, size=new size)

The top of the file stack is your new sorted file.



回答3:


The standard library has a merge sort, but you need to use random access iterators. If you can use mmap (or its equivalent), you have random access iterators (yes, I know that you need to take COUNT from the command line):

#include <algorithm>
#include <cstdio>
#include <random>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>

const size_t COUNT = 4096 * 4096;

int main()
{
    // create file (using mmap for simplicity)
    int fd = open("out.dat", O_RDWR | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR);
    if (fd < 0) {
        std::perror("open failed");
        return 1;
    }
    if (ftruncate(fd, COUNT * sizeof(unsigned)) != 0) {
        std::perror("ftruncate failed");
        close(fd);
        return 1;
    }
    void* mm = mmap(nullptr, COUNT * sizeof(unsigned), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (mm == MAP_FAILED) {
        std::perror("mmap failed");
        close(fd);
        return 1;
    }
    close(fd);

    // populate file
    unsigned* begin = static_cast<unsigned*>(mm);
    std::default_random_engine rng((std::random_device())());
    std::generate_n(begin, COUNT, rng);
    msync(mm, COUNT * sizeof(unsigned), MS_SYNC);
    std::puts("file written");

    // sort file
    std::stable_sort(begin, begin + COUNT);
    msync(mm, COUNT * sizeof(unsigned), MS_SYNC);
    std::puts("file sorted");

    if (std::is_sorted(begin, begin + COUNT)) {
        std::puts("it's properly sorted");
    }

    // close file
    munmap(mm, COUNT * sizeof(unsigned));
    return 0;
}

The msync calls aren't actually needed. I'm honestly surprised that this has decent performance.



来源:https://stackoverflow.com/questions/46058304/random-numbers-external-sort

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!