How to efficiently find the ideal column count for strings of a certain width?

问题

I have n strings of different length s₁, s₂, …, s_n that I want to display on a terminal in c columns. The terminal has a width of m characters. Each column i has a certain width w_i which is equal to the width of the longest entry in that column. Between each pair of columns there is a certain amount of space s. The total width of all columns including the space between cannot be larger than the width of the terminal (w₁ + w₂ + … + w_c + (c - 1) · s ≤ m). Each column shall contain ⌈n / c⌉ strings, except when n is not evenly dividable by c, in which case the last few columns shall be shorter by one entry or only the last column shall either be shorter depending on whether the strings are arranged across or down.

Is there an efficient (e.g. O(n·w) where w = max(w₁, w₂, …, w_n)) algorithm to figure out the maximal amount of columns that I can fit into c columns, if...

the strings are arranged across

string1 string2  string3 string4
string5 string6  string7 string8
string9 string10

the strings are arranged down

string1 string4 string7 string10
string2 string5 string8
string3 string6 string9

Later findings

I found out that s doesn't matter. Each instance of the problem where s > 0 can be translated into an instance where s = 0 by expanding each string by s characters and also expanding the width of the terminal by s characters to compensate for the extra s characters at the end of the screen.

回答1:

Unfortunately I think the fastest algorithm you can have is O(n^2). This is because you can determine if a configuration is possible for c columns in a single pass of the list but you can't know by how much to change c so basically you'll just have to try a different value for it. At most your algorithm will do this n times.

This is pseudo-code for how I would do it

for c = n.size, c > 0, c --
  remainingDist = m - c*s
  for i = 1, i <= c, i ++ 
    columnWidth = 0
    for j = i, j <= n.size; j += c
      columnWidth = max(n[j], columnWidth)
    end
    remainingDist -= columnWidth
  end
  if remainingDist >= 0
    success
    columns = c
    break
  end
end

you could jump to midway through the loop by first computing an average size of the items and figure an "ideal" number of columns from that.

回答2:

I'm not sure exactly how to formulate this in code but with a histogram of the strings sorted by size, we may be able to set a theoretical upper bound on the number of columns, to be further refined by an exact method like wckd's. Since column size is dictated by its longest element, and we are obliged to divide columns as evenly as possible, as long as the number of the largest strings so far is a small enough portion of the total, we can continue to split the columns with shorter strings. For example,

size frequency
10   5
 6   10
 3   11

m = 30, s = 1

start: 30 - (10 + 1) = 19
implies 13 + 13:

10 (x5)  6 (x2) 
 6 (x8)  3 (x11)

but the portion of 6 and 3 is large enough to split the second column:

19 - (6 + 1) = 12
implies 9 + 9 + 8:

10 (x5) 6 (x6) 3 (x8)
6  (x4) 3 (x3)

but splitting again will still fit:
12 - (6 + 1) = 5
implies 7 + 7 + 6 + 6:

10 (x5) 6 (x7) 6 (x1) 3 (x6)
6  (x2)        3 (x5)

We end up with at most 4 columns theoretically (clearly, sorting the strings is not allowed in the actual result) which may be reduced by wckd's method.

Depending on the data, I wonder if such an optimization could sometimes be useful. Constructing the histogram ought to take O(n + k * log k) time and O(k) space, where k is the number of string sizes (which you have already limited with w < 1000, m < 10000). And the operation I am suggesting is actually independent of n, it depends only on m, s and the distribution of k; and since k is sorted, we only need one pass splitting/calculating the columns.

回答3:

I looked at the first problem (horizontal fill), and assumed a gap-size (s) of zero, like you suggested in your edit.

First: I know the bounty is over, and I have no proof of an algorithm that does better than O(n²).

However, I do have some ideas to present that might still be of interest.

My proposed algorithm goes as follows:

Get some upper bound of c in O(n) time (I will get to that later)

If c is 0 or 1, or all strings fit on one row then that c is the answer. Stop.

Create an index ss[] on s[] on descending widths, using pigeon hole sort, in O(w+n) (with w = max(s[]), w <= m). One element of ss[] has two attributes: width and seqNo (the original sequence number, as it occurs in s[]).

Then, loop through the widths in decreasing order until we have a width for each column in a c-column configuration. If the sum of these widths is still not greater than m then c is a solution. More formally:

    knownColumnWidths = new Set() of column numbers
    sumWidth = 0
    for i from 0 to n-1:
        colNo = ss[i].seqNo modulo c
        if not colNo in knownColumnWidths:
            sumWidth += ss[i].width
            if sumWidth > m:
                // c is not a solution, exit for-loop
                break
            add colNo to knownColumnWidths
            if knownColumnWidths has c values:
                // solution found
                return c as solution. Stop

If c was rejected as solution, repeat previous code with c = c - 1.

This last part of the algorithm seems O(n²). But, if the for-loop has a worst case performance (i.e. n - c + 1 iterations), then the next few times (c/2) it runs, it will have close to best performance (i.e. close to c iterations). But in the end it still looks like O(n²).

For getting a good upper bound of c (cf. first step above), I propose this:

First fill as many strings on the first row of the terminal without exceeding the limit m, and take that as initial upper limit for c. More formally put:

sumWidth = 0
c = 0
while c < n and sumWidth + s[c] <= m:
    sumWidth += s[c]
    c++

This is clearly O(n).

This can be further improved as follows:

Take the sum of c widths, but starting one string further, and check if this still is not greater than m. Keep doing this shifting. When m is surpassed, decrease c until the sum of c widths is OK again, and continue the shift with the sum of c consecutive widths.

More formally put, with c starting off with the above found upper limit:

for i from c to n - 1:
    if s[i] > m:
        c = 0. Stop // should not happen: all widths should be <= m
    sumWidth += s[i] - s[i - c]
    while sumWidth > m:
        // c is not a solution. Improve upper limit: 
        c--
        sumWidth -= s[i - c]

This means that in one sweep you might have several improvements for c. Of course, in the worst case, it leads to no improvement at all.

This concludes my algorithm. I estimate it will perform well on random input, but still looks like O(n²).

But I have a few observations, which I have not used in the above algorithm:

When you have found the column widths for a certain c, but the total width is greater than m, then this result can still be put to good use for the case c'=c/2. It is then not necessary to go through all string widths. It suffices to take the sum(max(s[i], s[i+c']) for i in 0..c'-1. The same principle holds for other divisors of c.

I did not use this, because if you have to go down from c all the way to c/2 without finding a solution, you already have spent O(n²) on the algorithm. But maybe it can serve a purpose in another algorithm...

When zero-length strings are not allowed, then an algorithm can be made to be O(m.n), as the number of possible solutions (values for c) is limited to m and to determine whether one of these is a solution, requires only one sweep through all the widths.

I have allowed for zero-length strings.

Initially I was looking into a binary search for c, dividing it by 2 and going for one of the remaining halves. But this method cannot be used, because even when a certain c is found to not be a solution, this does not exclude there being a c' > c that is a solution.

Hopefully there is something in this answer you find useful.

回答4:

An obvious way to solve this problem is to iterate through all string lengths in some predetermined order, update width of the column where each string belongs, and stop when sum of column widths exceeds terminal width. Then repeat this process for decremented number of columns (or for incremented number of rows for "arrange down" case) until success. Three possible choices for this predetermined order:

by row (consider all strings in the first row, then in row 2, 3, ...)
by column (consider all strings in the first column, then in column 2, 3, ...)
by value (sort strings by length, then iterate them in sorted order, starting from the longest one).

These 3 approaches are OK for "arrange across" sub-problem. But for "arrange down" sub-problem they all have worst case time complexity O(n²). And first two methods show quadratic complexity even for random data. "By value" approach is pretty good for random data, but it is easy to find worst case scenario: just assign short strings to first half of the list and long strings - to second half.

Here I describe an algorithm for "arrange down" case that does not have these disadvantages.

Instead of inspecting each string length separately, it determines width of each column in O(1) time with the help of range maximum query (RMQ). Width of the first column is just the maximum value in range (0 .. num_of_rows), width of next one is the maximum value in range (num_of_rows .. 2*num_of_rows), etc.

And to answer each query in O(1) time, we need to prepare an array of maximum values in ranges (0 .. 2^k), (1 .. 1 + 2^k), ..., where k is the largest integer such that 2^k is not greater than current number of rows. Each range maximum query is computed as maximum of two entries from this array. When number of rows starts to be too large, we should update this query array from k to k+1 (each such update needs O(n) range queries).

Here is C++14 implementation:

template<class PP>
uint32_t arrangeDownRMQ(Vec& input)
{
    auto rows = getMinRows(input);
    auto ranges = PP{}(input, rows);

    while (rows <= input.size())
    {
        if (rows >= ranges * 2)
        { // lazily update RMQ data
            transform(begin(input) + ranges, end(input), begin(input),
                      begin(input), maximum
            );

            ranges *= 2;
        }
        else
        { // use RMQ to get widths of columns and check if all columns fit
            uint32_t sum = 0;

            for (Index i = 0; sum <= kPageW && i < input.size(); i += rows)
            {
                ++g_complexity;
                auto j = i + rows - ranges;

                if (j < input.size())
                    sum += max(input[i], input[j]);
                else
                    sum += input[i];
            }

            if (sum <= kPageW)
            {
                return uint32_t(ceilDiv(input.size(), rows));
            }

            ++rows;
        }
    }

    return 0;
}

Here PP is optional, for simple case this function object does nothing and returns 1.

To determine the worst case time complexity of this algorithm, note that outer loop starts with rows = n * v / m (where v is average string length, m is page width) and stops with at most rows = n * w / m (where w is largest string length). Number of iterations in "query" loop is not greater than the number of columns or n / rows. Adding these iterations together gives O(n * (ln(n*w/m) - ln(n*v/m))) or O(n * log(w/v)). Which means linear time with small constant factor.

We should add here time to perform all update operations which is O(n log n) to get complexity of whole algorithm: O(n * log n).

If we perform no update operations until some query operations are done, time needed for update operations as well as algorithm complexity decreases to O(n * log(w/v)). To make this possible we need some algorithm that fills RMQ array with maximums of sub-arrays of given length. I tried two possible approaches and it seems algorithm with pair of stacks is slightly faster. Here is C++14 implementation (pieces of input array are used to implement both stacks to lower memory requirements and to simplify the code):

template<typename I, typename Op>
auto transform_partial_sum(I lbegin, I lend, I rbegin, I out, Op op)
{ // maximum of the element in first enterval and prefix of second interval
    auto sum = typename I::value_type{};

    for (; lbegin != lend; ++lbegin, ++rbegin, ++out)
    {
        sum = op(sum, *rbegin);
        *lbegin = op(*lbegin, sum);
    }

    return sum;
}

template<typename I>
void reverse_max(I b, I e)
{ // for each element: maximum of the suffix starting from this element
    partial_sum(make_reverse_iterator(e),
                make_reverse_iterator(b),
                make_reverse_iterator(e),
                maximum);
}

struct PreprocRMQ
{
    Index operator () (Vec& input, Index rows)
    {
        if (rows < 4)
        { // no preprocessing needed
            return 1;
        }

        Index ranges = 1;
        auto b = begin(input);

        while (rows >>= 1)
        {
            ranges <<= 1;
        }

        for (; b + 2 * ranges <= end(input); b += ranges)
        {
            reverse_max(b, b + ranges);
            transform_partial_sum(b, b + ranges, b + ranges, b, maximum);
        }

        // preprocess inconvenient tail part of the array
        reverse_max(b, b + ranges);
        const auto d = end(input) - b - ranges;
        const auto z = transform_partial_sum(b, b + d, b + ranges, b, maximum);
        transform(b + d, b + ranges, b + d, [&](Data x){return max(x, z);});
        reverse_max(b + ranges, end(input));
        return ranges;
    }
};

In practice there is much higher chance to see a short word than a long word. Shorter words outnumber longer ones in English texts, shorter text representations of natural numbers prevail as well. So I chosen (slightly modified) geometric distribution of string lengths to evaluate various algorithms. Here is whole benchmarking program (in C++14 for gcc). Older version of the same program contains some obsolete tests and different implementation of some algorithms: TL;DR. And here are the results:

For "arrange across" case:

"by column" approach is the slowest
"by value" approach is good for n = 1000..500'000'000
"by row" approach is better for small and very large sizes

For "arrange down" case:

"by value" approach is OK, but slower than other alternatives (one possibility to make it faster is to implement divisions via multiplications), also it uses more memory and has quadratic worst-case time
simple RMQ approach is better
RMQ with preprocessing the best: it has linear time complexity and its memory bandwidth requirements are pretty low.

It may be also interesting to see number of iterations needed for each algorithm. Here all predictable parts are excluded (sorting, summing, preprocessing, and RMQ updates):

来源：https://stackoverflow.com/questions/24411759/how-to-efficiently-find-the-ideal-column-count-for-strings-of-a-certain-width

标签

algorithm

language-agnostic

formatting

multiple-columns