Justify string algorithm [closed]

跟風遠走 提交于 2021-02-05 13:42:02

问题


Just tanked a job interview where I was asked to implement a function with this signature:

function justify($str_in, $desired_length)

It needs to mimic what HTML's text-align: justify would do, here's some examples (desired_length = 48)

    hello world there ok then                              = hello......world......there.......ok.......then
    hello                                                  = .....................hello.....................
    ok then                                                = ok.........................................then
    this string is almost certainly longer than 48 I think = this.string.is.almost.certainly.longer.than.48.
    two words                                              = two.......................................words
    three ok words                                         = three.................ok..................words
    1 2 3 4 5 6 7 8 9                                      = 1....2....3.....4.....5.....6.....7.....8.....9

(I replaced the spaces with periods to illustrate)

The length of spaces between words may never differ by more than one.

I have written a PHP solution, but I am more interested in what algorithms people can come up with to solve the problem. It was my first whiteboard question at a job interview ever, and I'm afraid a combination of factors made me take way longer than I should have.


回答1:


Here's what I came up with. I added the optional $char parameter so you can see what it's outputting - Of course you can pull it inside the function so the prototype matches the requirement.

function justify($str_in, $desired_length, $char = '_') {

    // Some common vars and simple error checking / sanitation
    $return = '';
    $str_in = trim( $str_in);
    $desired_length = intval( $desired_length);

    // If we've got invalid input, we're done
    if( $desired_length <= 0)
        return $str_in;

    // If the input string is greater than the length, we need to truncate it WITHOUT splitting words
    if( strlen( $str_in) > $desired_length) {
        $str = wordwrap($str_in, $desired_length);
        $str = explode("\n", $str);
        $str_in = $str[0];
    }

    $words = explode( ' ', $str_in);
    $num_words = count( $words);

    // If there's only one word, it's a simple edge case
    if( $num_words == 1) {
        $length = ($desired_length - strlen( $words[0])) / 2;
        $return .= str_repeat( $char, floor( $length)) . $words[0] . str_repeat( $char, ceil( $length));
    } else {
        $word_length = strlen( implode( '', $words));

        // Calculate the number of spaces to distribute over the words
        $num_words--; // We're going to eliminate the last word
        $spaces = floor( ($desired_length - $word_length) / $num_words);
        $remainder = $desired_length - $word_length - ($num_words * $spaces);

        $last = array_pop( $words);
        foreach( $words as $word) {
            // If we didn't get an even number of spaces to distribute, just tack it on to the front
            $spaces_to_add = $spaces;
            if( $remainder > 0) {
                $spaces_to_add++;
                $remainder--;
            }

            $return .= $word . str_repeat( $char, $spaces_to_add);
        }
        $return .= $last;
    }
    return $return;
}

And the test cases:

$inputs = array( 
    'hello world there ok then',
    'hello',
    'ok then',
    'this string is almost certainly longer than 48 I think',
    'two words',
    'three ok words',
    '1 2 3 4 5 6 7 8 9'
);

foreach( $inputs as $x) {
    $ret = justify( $x, 48);
    echo 'Inp: ' . $x . " - strlen(" . strlen( $x) .  ")\n";
    echo 'Out: ' . $ret . " - strlen(" . strlen( $ret) .  ")\n\n";
}

And the output:

Inp: hello world there ok then - strlen(25)
Out: hello_______world_______there_______ok______then - strlen(48)

Inp: hello - strlen(5)
Out: _____________________hello______________________ - strlen(48)

Inp: ok then - strlen(7)
Out: ok__________________________________________then - strlen(48)

Inp: this string is almost certainly longer than 48 I think - strlen(54)
Out: this_string_is_almost_certainly_longer_than_48_I - strlen(48)

Inp: two words - strlen(9)
Out: two________________________________________words - strlen(48)

Inp: three ok words - strlen(14)
Out: three__________________ok__________________words - strlen(48)

Inp: 1 2 3 4 5 6 7 8 9 - strlen(17)
Out: 1_____2_____3_____4_____5_____6_____7_____8____9 - strlen(48)

And a demo!

Edit: Cleaned up the code, and it still works :).




回答2:


Made it a personal challenge to not use any loops/recursion or regex with callbacks. I used a single explode() and a single implode() to achieve this. Great success!

The Code

function justify($str, $maxlen) {
    $str = trim($str);

    $strlen = strlen($str);
    if ($strlen >= $maxlen) {
        $str = wordwrap($str, $maxlen);
        $str = explode("\n", $str);
        $str = $str[0];
        $strlen = strlen($str);
    }

    $space_count = substr_count($str, ' ');
    if ($space_count === 0) {
        return str_pad($str, $maxlen, ' ', STR_PAD_BOTH);
    }

    $extra_spaces_needed = $maxlen - $strlen;
    $total_spaces = $extra_spaces_needed + $space_count;

    $space_string_avg_length = $total_spaces / $space_count;
    $short_string_multiplier = floor($space_string_avg_length);
    $long_string_multiplier = ceil($space_string_avg_length);

    $short_fill_string = str_repeat(' ', $short_string_multiplier);
    $long_fill_string = str_repeat(' ', $long_string_multiplier);

    $limit = ($space_string_avg_length - $short_string_multiplier) * $space_count;

    $words_split_by_long = explode(' ', $str, $limit+1);
    $words_split_by_short = $words_split_by_long[$limit];
    $words_split_by_short = str_replace(' ', $short_fill_string, $words_split_by_short);
    $words_split_by_long[$limit] = $words_split_by_short;

    $result = implode($long_fill_string, $words_split_by_long);

    return $result;
}

Short (348 chars)

function j($s,$m){$s=trim($s);$l=strlen($s);if($l>=$m){$s=explode("\n",wordwrap($s,$m));$s=$s[0];$l=strlen($s);}$c=substr_count($s,' ');if($c===0)return str_pad($s,$m,' ',STR_PAD_BOTH);$a=($m-$l+$c)/$c;$h=floor($a);$i=($a-$h)*$c;$w=explode(' ',$s,$i+1);$w[$i]=str_replace(' ',str_repeat(' ',$h),$w[$i]);return implode(str_repeat(' ',ceil($a)),$w);}

Algorithm / Code explanation

  1. Handle the two exceptions (string longer than max length or only one word).
  2. Find the average space needed between each word ($space_string_avg_length).
  3. Create a long and short fill string for use between the words, based on ceil() and floor() of the $space_string_avg_length, respectively.
  4. Find out how many long fill strings we need. ($limit+1).
  5. Split the text based on how many long fill strings we need.
  6. Replace spaces in the last part of the array, made by the split, with the short fill strings.
  7. Join the split text back together with the long fill strings.

Testing

$tests = array(
    'hello world there ok then',
    'hello',
    'ok then',
    'this string is almost certainly longer than 48 I think',
    'two words',
    'three ok words',
    '1 2 3 4 5 6 7 8 9'
);

foreach ($tests as $test) {
    $len_before = strlen($test);
    $processed = str_replace(' ', '_', justify($test, 48));
    $len_after = strlen($processed);
    echo "IN($len_before): $test\n";
    echo "OUT($len_after): $processed\n";
}

Results

IN(25): hello world there ok then
OUT(48): hello_______world_______there_______ok______then
IN(5): hello
OUT(48): _____________________hello______________________
IN(7): ok then
OUT(48): ok__________________________________________then
IN(54): this string is almost certainly longer than 48 I think
OUT(48): this_string_is_almost_certainly_longer_than_48_I
IN(9): two words
OUT(48): two________________________________________words
IN(14): three ok words
OUT(48): three__________________ok__________________words
IN(17): 1 2 3 4 5 6 7 8 9
OUT(48): 1_____2_____3_____4_____5_____6_____7_____8____9

See it run!




回答3:


Here's my solution with no pesky loops

function justify( $str_in, $desired_length=48 ) {
    if ( strlen( $str_in ) > $desired_length ) {
        $str_in = current( explode( "\n", wordwrap( $str_in, $desired_length ) ) );
    }
    $string_length = strlen( $str_in );
    $spaces_count = substr_count( $str_in, ' ' );
    $needed_spaces_count = $desired_length - $string_length + $spaces_count;
    if ( $spaces_count === 0 ) {
        return str_pad( $str_in, $desired_length, ' ', STR_PAD_BOTH );
    }
    $spaces_per_space = ceil( $needed_spaces_count / $spaces_count );
    $spaced_string = preg_replace( '~\s+~', str_repeat( ' ', $spaces_per_space ), $str_in );
    return preg_replace_callback(
        sprintf( '~\s{%s}~', $spaces_per_space ),
        function ( $m ) use( $spaces_per_space ) {
            return str_repeat( ' ', $spaces_per_space-1 );
        },
        $spaced_string,
        strlen( $spaced_string ) - $desired_length
    );
}

Comments and output...

https://gist.github.com/2939068

  1. Find out how many spaces there are
  2. Find out how many spaces are needed
  3. Replace existing spaces with the amount of spaces (evenly distributed) needed to meet or just exceed desired line length
  4. Use preg_replace_callback to replace the amount of \s{spaces_inserted} with \s{spaces_inserted-1} necessary to meet the desired line length



回答4:


I wanted to see which algorithm was the most efficient, so I ran some benchmarks. I did 100k iterations of all 7 test cases. (Ran it in a single core Ubuntu VM)

The results of @ppsreejith and @Kristian Antonsen's code are omitted, because their code crashed when I tried to run it. @PhpMyCoder's code ran as long as I didn't do the formatting to 48 length after object construction. Therefore the test result is incomplete. (Fixed)

Benchmark results

$ php justify.bench.php
Galen(justify1): 5.1464750766754
nickb(justify2): 3.8629620075226
Paolo Bergantino(justify3): 4.3705048561096
user381521(justify5): 8.5988481044769
vlzvl(justify7): 6.6795041561127
Alexander(justify8): 6.7060301303864
ohaal(justify9): 2.9896130561829

PhpMyCoder: 6.1514630317688 (Fixed!)

justify.bench.php

<?php
$tests = array(
    'hello world there ok then',
    'hello',
    'ok then',
    'this string is almost certainly longer than 48 I think',
    'two words',
    'three ok words',
    '1 2 3 4 5 6 7 8 9'
);
$testers = array(
    'Galen' => 'justify1',
    'nickb' => 'justify2',
    'Paolo Bergantino' => 'justify3',
//    'Kristian Antonsen' => 'justify4',
    'user381521' => 'justify5',
//    'ppsreejith' => 'justify6',
    'vlzvl' => 'justify7',
    'Alexander' => 'justify8',
    'ohaal' => 'justify9'
);
// ppsreejith and Kristian Antonsen's code crashed and burned when I tried to run it
// PhpMyCoder is a special case, but his code also crashed when doing $jus->format(48);

foreach ($testers as $tester => $func) {
    $b=microtime(true);
    for($i=0;$i<100000;$i++)
        foreach ($tests as $test)
            $func($test,48);
    $a=microtime(true);
    echo $tester.'('.$func.'): '.($a-$b)."\n";
}

echo "\n";

// Fixed!
$jus = new Justifier($tests);
$b=microtime(true);

for($i=0;$i<100000;$i++) {
    $jus->format(54);
}

$a=microtime(true);
echo 'PhpMyCoder: '.($a-$b)." (Fixed!)\n";

// ALGORITHMS BELOW

// Galen
function justify1( $str_in, $desired_length=48 ) {
    if ( strlen( $str_in ) > $desired_length ) {
        $str_in = current( explode( "\n", wordwrap( $str_in, $desired_length ) ) );
    }
    $string_length = strlen( $str_in );
    $spaces_count = substr_count( $str_in, ' ' );
    $needed_spaces_count = $desired_length - $string_length + $spaces_count;
    if ( $spaces_count === 0 ) {
        return str_pad( $str_in, $desired_length, ' ', STR_PAD_BOTH );
    }
    $spaces_per_space = ceil( $needed_spaces_count / $spaces_count );
    $spaced_string = preg_replace( '~\s+~', str_repeat( ' ', $spaces_per_space ), $str_in );
    return preg_replace_callback(
        sprintf( '~\s{%s}~', $spaces_per_space ),
        function ( $m ) use( $spaces_per_space ) {
            return str_repeat( ' ', $spaces_per_space-1 );
        },
        $spaced_string,
        strlen( $spaced_string ) - $desired_length
    );
}
// nickb
function justify2($str_in, $desired_length, $char = '_') {

    // Some common vars and simple error checking / sanitation
    $return = '';
    $str_in = trim( $str_in);
    $desired_length = intval( $desired_length);

    // If we've got invalid input, we're done
    if( $desired_length <= 0)
        return $str_in;

    // If the input string is greater than the length, we need to truncate it WITHOUT splitting words
    if( strlen( $str_in) > $desired_length) {
        $str = wordwrap($str_in, $desired_length);
        $str = explode("\n", $str);
        $str_in = $str[0];
    }

    $words = explode( ' ', $str_in);
    $num_words = count( $words);

    // If there's only one word, it's a simple edge case
    if( $num_words == 1) {
        $length = ($desired_length - strlen( $words[0])) / 2;
        $return .= str_repeat( $char, floor( $length)) . $words[0] . str_repeat( $char, ceil( $length));
    } else {
        $word_length = strlen( implode( '', $words));

        // Calculate the number of spaces to distribute over the words
        $num_words--; // We're going to eliminate the last word
        $spaces = floor( ($desired_length - $word_length) / $num_words);
        $remainder = $desired_length - $word_length - ($num_words * $spaces);

        $last = array_pop( $words);
        foreach( $words as $word) {
            // If we didn't get an even number of spaces to distribute, just tack it on to the front
            $spaces_to_add = $spaces;
            if( $remainder > 0) {
                $spaces_to_add++;
                $remainder--;
            }

            $return .= $word . str_repeat( $char, $spaces_to_add);
        }
        $return .= $last;
    }
    return $return;
}
// Paolo Bergantino
function justify3($str, $to_len) {
    $str = trim($str);
    $strlen = strlen($str);

    if($str == '') return '';

    if($strlen >= $to_len) {
        return substr($str, 0, $to_len);   
    }

    $words = explode(' ', $str);
    $word_count = count($words);
    $space_count = $word_count - 1;

    if($word_count == 1) {
        return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
    }

    $space = $to_len - $strlen + $space_count;
    $per_space = $space/$space_count;

    if(is_int($per_space)) {
        return implode($words, str_pad('', $per_space, ' '));    
    }

    $new_str = '';
    $spacing = floor($per_space);
    $new_str .= $words[0] . str_pad('', $spacing);
    foreach($words as $x => $word) {
        if($x == $word_count - 1 || $x == 0) continue;
        if($x < $word_count - 1) {
            $diff = $to_len - strlen($new_str) - (strlen(implode('', array_slice($words, $x))));
            $new_str .= $word . str_pad('', floor($diff/($space_count - $x)), ' ');
        }
    }
    $new_str .= $words[$x];

    return $new_str;   
}
// Kristian Antonsen
function justify4($str_in, $desired_length)
{
    foreach ($str_in as &$line) {
        $words = explode(' ', $line);
        $word_count = count($words) - 1;
        $spaces_to_fill = $desired_length - strlen($line) + $word_count;
        if (count($words) == 1) {
            $line = str_repeat('_', ceil($spaces_to_fill/2)) . $line
                  . str_repeat('_', floor($spaces_to_fill/2));
            continue;
        }
        $next_space = floor($spaces_to_fill/$word_count);
        $leftover_space = $spaces_to_fill % $word_count;
        $line = array_shift($words);
        foreach($words as $word) {
            $extra_space = ($leftover_space) ? ceil($leftover_space / $word_count) : 0;
            $leftover_space -= $extra_space;
            $line .= str_repeat('_', $next_space + $extra_space) . $word;
        }
    }
    return $str_in;
}
// user381521
function justify5 ($str, $len)
{
    // split by whitespace, remove empty strings
    $words = array_diff (preg_split ('/\s+/', $str), array (""));

    // just space if no words
    if (count ($words) == 0)
        return str_repeat (" ", $len);

    // add empty strings if only one element
    if (count ($words) == 1)
        $words = array ("", $words[0], "");

    // get number of words and spaces
    $wordcount = count ($words);
    $numspaces = $wordcount - 1;

    // get number of non-space characters
    $numchars = array_sum (array_map ("strlen", $words));

    // get number of characters remaining for space
    $remaining = $len - $numchars;

    // return if too little spaces remaining
    if ($remaining <= $numspaces)
        return substr (implode (" ", $words), 0, $len);

    // get number of spaces per space
    $spaces_per_space = $remaining / $numspaces;
    $spaces_leftover = $remaining % $numspaces;

    // make array for spaces, spread out leftover spaces
    $spaces = array_fill (0, $numspaces, $spaces_per_space);
    while ($spaces_leftover--)
        $spaces[$numspaces - $spaces_leftover - 1]++;
    $spaces[] = 0; // make count ($words) == count ($spaces)

    // join it all together
    $result = array ();
    foreach ($words as $k => $v)
        array_push ($result, $v, str_repeat (" ", $spaces[$k]));
    return implode ($result);
}
// ppsreejith
function justify6($str, $to_len) {
    $str = trim($str);
    $strlen = strlen($str);

    if($str == '') return '';

    if($strlen >= $to_len) {
        return substr($str, 0, $to_len);   
    }

    $words = explode(' ', $str);
    $word_count = count($words);
    $space_count = $word_count - 1;

    if($word_count == 1) {
        return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
    }

    $space = $to_len - $strlen + $space_count;
    $per_space = floor($space/$space_count);
    $spaces = str_pad('', $per_space, ' ');
    $curr_word = implode($words, $spaces);
    while(strlen($curr_word) < $to_len){
    $curr_word = substr($curr_word,0,preg_match("[! ][".$spaces."][! ]",$curr_word)." ".preg_match("[! ][".$spaces."][! ]",$curr_word));
    }
    return $curr_word;
}
// vlzvl
function justify7($str_in, $desired_length)
{
   $str_in = preg_replace("!\s+!"," ",$str_in);   // get rid of multiple spaces
   $words = explode(" ",$str_in);   // break words
   $num_words = sizeof($words);     // num words
   if ($num_words==1) {
      return str_pad($str_in,$desired_length,"_",STR_PAD_BOTH);
   }
   else {
      $num_chars = 0; $lenwords = array();
      for($x=0;$x<$num_words;$x++) { $num_chars += $lenwords[$x] = strlen($words[$x]); }
      $each_div = round(($desired_length - $num_chars) / ($num_words-1));
      for($x=0,$sum=0;$x<$num_words;$x++) { $sum += ($lenwords[$x] + ($x<$num_words-1 ? $each_div : 0)); }
      $space_to_addcut = ($desired_length - $sum);
      for($x=0;$x<$num_words-1;$x++) {
         $words[$x] .= str_repeat("_",$each_div+($each_div>1? ($space_to_addcut<0?-1:($space_to_addcut>0?1:0)) :0));
         if ($each_div>1) { $space_to_addcut += ($space_to_addcut<0 ? 1 : ($space_to_addcut>0?-1:0) ); }
      }
      return substr(implode($words),0,$desired_length);
   }
}
// Alexander
function justify8($str, $length) {
  $words   = explode(' ', $str);
  if(count($words)==1) $words = array("", $str, "");
  $spaces  = $length - array_sum(array_map("strlen", $words));
  $add     = (int)($spaces / (count($words) - 1));
  $left    = $spaces % (count($words) - 1);
  $spaced  = implode(str_repeat("_", $add + 1), array_slice($words, 0, $left + 1));
  $spaced .= str_repeat("_", max(1, $add));
  $spaced .= implode(str_repeat("_", max(1, $add)), array_slice($words, $left + 1));
  return substr($spaced, 0, $length);
}
// ohaal
function justify9($s,$m){$s=trim($s);$l=strlen($s);if($l>=$m){$s=explode("\n",wordwrap($s,$m));$s=$s[0];$l=strlen($s);}$c=substr_count($s,' ');if($c===0)return str_pad($s,$m,' ',STR_PAD_BOTH);$a=($m-$l+$c)/$c;$h=floor($a);$i=($a-$h)*$c;$w=explode(' ',$s,$i+1);$w[$i]=str_replace(' ',str_repeat(' ',$h),$w[$i]);return implode(str_repeat(' ',ceil($a)),$w);}


// PhpMyCoder
class Justifier {
    private $text;

    public function __construct($text) {
        if(!is_string($text) && !is_array($text)) {
            throw new InvalidArgumentException('Expected a string or an array of strings, instead received type: ' . gettype($text));
        }

        if(is_array($text)) {
            // String arrays must be converted to JustifierLine arrays
            $this->text = array_map(function($line) {
                return JustifierLine::fromText($line);
            }, $text);
        } else {
            // Single line of text input
            $this->text = $text;
        }
    }

    public function format($width = NULL) {
        // Strings have to be broken into an array and then jusitifed
        if(is_string($this->text)) {
            if($width == null) {
                throw new InvalidArgumentException('A width must be provided for separation when an un-split string is provided');
            }

            if($width <= 0) {
                throw new InvalidArgumentException('Expected a positive, non-zero width, instead received width of ' . $width);
            }

            // Break up a JustifierLine of all text until each piece is smaller or equal to $width
            $lines = array(JustifierLine::fromText($this->text));
            $count = 0;
            $newLine = $lines[0]->breakAtColumn($width);

            while($newLine !== null) {
                $lines[] = $newLine;
                $newLine = $lines[++$count]->breakAtColumn($width);
            }
        } else {
            $lines = $this->text;

            // Allow for fluid width (uses longest line with single space)
            if($width == NULL) {
                $width = -1;

                foreach($lines as $line) {
                    // Width of line = Sum of the lengths of the words and the spaces (number of words - 1)
                    $newWidth = $line->calculateWordsLength() + $line->countWords() - 1;

                    if($newWidth > $width) { // Looking for the longest line
                        $width = $newWidth;
                    }
                }
            }
        }

        // Justify each element of array
        //$output = array_map(function($line) use ($width) {
        //    return $this->justify($line, $width);
        //}, $lines);

        $output = array();
        foreach($lines as $line) {
            $output[] = $this->justify($line, $width);
        }            

        // If a single-line is passed in, a single line is returned
        if(count($output)) {
            return $output[0];
        }

        return $output;
    }

    private function justify(JustifierLine $line, $width) {
        // Retrieve already calculated line information
        $words     = $line->extractWords();
        $spaces    = $line->countWords() - 1;
        $wordLens  = $line->findWordLengths();
        $wordsLen  = $line->calculateWordsLength();
        $minWidth  = $wordsLen + $spaces;
        $output    = '';

        if($minWidth > $width) {
            throw new LengthException('A minimum width of ' . $minWidth . ' was required, but a width of ' . $width . ' was given instead');
        }

        // No spaces means only one word (center align)
        if($spaces == 0) {
            return str_pad($words[0], $width, ' ', STR_PAD_BOTH);
        }

        for(;$spaces > 0; $spaces--) {
            // Add next word to output and subtract its length from counters
            $output   .= array_shift($words);
            $length    = array_shift($wordLens);
            $wordsLen -= $length;
            $width    -= $length;

            if($spaces == 1) { // Last Iteration
                return $output . str_repeat(' ', $width - $wordsLen) . $words[0];
            }

            // Magic padding is really just simple math
            $padding  = floor(($width - $wordsLen) / $spaces);
            $output  .= str_repeat(' ', $padding);
            $width   -= $padding;
        }
    }
}

class JustifierLine {
    private $words;
    private $numWords;
    private $wordLengths;
    private $wordsLength;

    public static function fromText($text) {
        // Split words into an array
        preg_match_all('/[^ ]+/', $text, $matches, PREG_PATTERN_ORDER);
        $words       = $matches[0];

        // Count words
        $numWords    = count($words);

        // Find the length of each word
        $wordLengths = array_map('strlen', $words);

        //And Finally, calculate the total length of all words
        $wordsLength = array_reduce($wordLengths, function($result, $length) {
            return $result + $length;
        }, 0);

        return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
    }

    private function __construct($words, $numWords, $wordLengths, $wordsLength) {
        $this->words       = $words;
        $this->numWords    = $numWords;
        $this->wordLengths = $wordLengths;
        $this->wordsLength = $wordsLength;
    }

    public function extractWords() { return $this->words; }
    public function countWords() { return $this->numWords; }
    public function findWordLengths() { return $this->wordLengths; }
    public function calculateWordsLength() { return $this->wordsLength; }

    public function breakAtColumn($column) {
        // Avoid extraneous processing if we can determine no breaking can be done
        if($column >= ($this->wordsLength + $this->numWords - 1)) {
            return null;
        }

        $width       = 0;
        $wordsLength = 0;

        for($i = 0; $i < $this->numWords; $i++) {
            // Add width of next word
            $width += $this->wordLengths[$i];

            // If the line is overflowing past required $width
            if($width > $column) {
                // Remove overflow at end & create a new object with the overflow
                $words             = array_splice($this->words, $i);
                $numWords          = $this->numWords - $i;
                $this->numWords    = $i;
                $wordLengths       = array_splice($this->wordLengths, $i);
                $tempWordsLength   = $wordsLength;
                $wordsLength       = $this->wordsLength - $wordsLength;
                $this->wordsLength = $tempWordsLength;

                return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
            }

            $width++; // Assuming smallest spacing to fit

            // We also have to keep track of the total $wordsLength
            $wordsLength += $this->wordLengths[$i];
        }

        return null;
    }
}



回答5:


This is my solution. No pesky regular expressions :)

function justify($str, $length) {
  $words   = explode(' ', $str);
  if(count($words)==1) $words = array("", $str, "");
  $spaces  = $length - array_sum(array_map("strlen", $words));
  $add     = (int)($spaces / (count($words) - 1));
  $left    = $spaces % (count($words) - 1);
  $spaced  = implode(str_repeat("_", $add + 1), array_slice($words, 0, $left + 1));
  $spaced .= str_repeat("_", max(1, $add));
  $spaced .= implode(str_repeat("_", max(1, $add)), array_slice($words, $left + 1));
  return substr($spaced, 0, $length);
}

This is powered by PHP array functions.

Here is the working example.




回答6:


Just so no one thinks I'm trying to have them do my homework for me, this is my (working, I think) solution.

I'm not sure I could have possibly been expected to write this much code on a whiteboard on demand, however, so I'm mostly curious to see how others would tackle it without looking at my code. (I made it to around the foreach in the interview before they called 'time' on me, so to speak)

function justify($str, $to_len) {
    $str = trim($str);
    $strlen = strlen($str);

    if($str == '') return '';

    if($strlen >= $to_len) {
        return substr($str, 0, $to_len);   
    }

    $words = explode(' ', $str);
    $word_count = count($words);
    $space_count = $word_count - 1;

    if($word_count == 1) {
        return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
    }

    $space = $to_len - $strlen + $space_count;
    $per_space = $space/$space_count;

    if(is_int($per_space)) {
        return implode($words, str_pad('', $per_space, ' '));    
    }

    $new_str = '';
    $spacing = floor($per_space);
    $new_str .= $words[0] . str_pad('', $spacing);
    foreach($words as $x => $word) {
        if($x == $word_count - 1 || $x == 0) continue;
        if($x < $word_count - 1) {
            $diff = $to_len - strlen($new_str) - (strlen(implode('', array_slice($words, $x))));
            $new_str .= $word . str_pad('', floor($diff/($space_count - $x)), ' ');
        }
    }
    $new_str .= $words[$x];

    return $new_str;   
}

$tests = array(' hello world there ok then ', 'hello', 'ok then', 'this string is almost certainly longer than 48 I think', 'two words', 'three ok words', '1 2 3 4 5 6 7 8 9');

foreach($tests as $word) {
    print $word . ' = ' . str_replace(' ', '_', justify($word, 48)) . '<br>';
}



回答7:


I miss my list comprehensions in Python ...

<?php
function justify ($str, $len)
{
    // split by whitespace, remove empty strings
    $words = array_diff (preg_split ('/\s+/', $str), array (""));

    // just space if no words
    if (count ($words) == 0)
        return str_repeat (" ", $len);

    // add empty strings if only one element
    if (count ($words) == 1)
        $words = array ("", $words[0], "");

    // get number of words and spaces
    $wordcount = count ($words);
    $numspaces = $wordcount - 1;

    // get number of non-space characters
    $numchars = array_sum (array_map ("strlen", $words));

    // get number of characters remaining for space
    $remaining = $len - $numchars;

    // return if too little spaces remaining
    if ($remaining <= $numspaces)
        return substr (implode (" ", $words), 0, $len);

    // get number of spaces per space
    $spaces_per_space = $remaining / $numspaces;
    $spaces_leftover = $remaining % $numspaces;

    // make array for spaces, spread out leftover spaces
    $spaces = array_fill (0, $numspaces, $spaces_per_space);
    while ($spaces_leftover--)
        $spaces[$numspaces - $spaces_leftover - 1]++;
    $spaces[] = 0; // make count ($words) == count ($spaces)

    // join it all together
    $result = array ();
    foreach ($words as $k => $v)
        array_push ($result, $v, str_repeat (" ", $spaces[$k]));
    return implode ($result);
}
?>



回答8:


Here's my attempt.

function justify($str_in, $desired_length)
{
    foreach ($str_in as &$line) {
        $words = explode(' ', $line);
        $word_count = count($words) - 1;
        $spaces_to_fill = $desired_length - strlen($line) + $word_count;
        if (count($words) == 1) {
            $line = str_repeat('_', ceil($spaces_to_fill/2)) . $line
                  . str_repeat('_', floor($spaces_to_fill/2));
            continue;
        }
        $next_space = floor($spaces_to_fill/$word_count);
        $leftover_space = $spaces_to_fill % $word_count;
        $line = array_shift($words);
        foreach($words as $word) {
            $extra_space = ($leftover_space) ? ceil($leftover_space / $word_count) : 0;
            $leftover_space -= $extra_space;
            $line .= str_repeat('_', $next_space + $extra_space) . $word;
        }
    }
    return $str_in;
}

I've tried to keep it relatively concise, which has impacted the readability. But here's how it works:

For each entry, we split up the words into an array $words. Because we might want spaces before and after the word, we also add an empty string to the beginning and ending of the array.

We calculate the left-over amount of spaces $leftover_space (that is, the spaces we need to insert somewhere), and divide it by number of words $word_count, so we know the average of how many spaces to put in between each word.

Whenever we add a word, we also add a few spaces $extra_space, depending on how many are left. After that, we remove the amount added from the $leftover_space.

Sample output

$data = justify($data, 48);
print_r($data);

Array
(
    [0] => 123456789012345678901234567890123456789012345678
    [1] => hello_______world_______there_______ok______then
    [2] => ______________________hello_____________________
    [3] => ok__________________________________________then
    [4] => this__string__is_almost_certainly_longer_than_48
    [5] => two________________________________________words
    [6] => three__________________ok__________________words
    [7] => 1_____2_____3_____4_____5_____6_____7_____8____9
)



回答9:


I think this is fully working: (the "_" is just keeping the space visible)

function justify($str_in, $desired_length)
{
   $str_in = preg_replace("!\s+!"," ",$str_in);   // get rid of multiple spaces
   $words = explode(" ",$str_in);   // break words
   $num_words = sizeof($words);     // num words   
   if ($num_words==1) {   
      return str_pad($str_in,$desired_length,"_",STR_PAD_BOTH);   
   }
   else {
      $num_chars = 0; $lenwords = array();
      for($x=0;$x<$num_words;$x++) { $num_chars += $lenwords[$x] = strlen($words[$x]); }
      $each_div = round(($desired_length - $num_chars) / ($num_words-1));
      for($x=0,$sum=0;$x<$num_words;$x++) { $sum += ($lenwords[$x] + ($x<$num_words-1 ? $each_div : 0)); }
      $space_to_addcut = ($desired_length - $sum);
      for($x=0;$x<$num_words-1;$x++) {
         $words[$x] .= str_repeat("_",$each_div+($each_div>1? ($space_to_addcut<0?-1:($space_to_addcut>0?1:0)) :0));
         if ($each_div>1) { $space_to_addcut += ($space_to_addcut<0 ? 1 : ($space_to_addcut>0?-1:0) ); } 
      }
      return substr(implode($words),0,$desired_length);
   }
}

EDITED:

Function now get rid of multiple spaces between words as well. How it works (in short):

  • removes continuous spaces between words
  • count words so if one (the 'hello' example) just padding both and echo it.
  • ..otherwise count the characters of the used words
  • calculate the global and partial space to add (the '_' in example).
  • calculate the extra space to add (string len < desired) OR remove (string len > desired) and apply it to padding.
  • final, reduce the final string to desired length.

TESTING:

$tests = array(
   'hello world there ok then',
   'hello',
   'ok then',
   'this string is almost certainly longer than 48 I think',
   'three ok words',
   '1 2 3 4 5 6 7 8 9',
   'Lorem Ipsum is simply dummy text'
);

$arr = array();
foreach($tests as $key=>$val) {
   $arr[$key] = justify($val,50);
   $arr[$key] .= " - (chars: ".strlen($arr[$key]).")";
}
echo "<pre>".print_r($arr,TRUE)."</pre>";

AND THE RESULT:

Array
(
    [0] => hello________world_______there_______ok_______then - (chars: 50)
    [1] => ______________________hello_______________________ - (chars: 50)
    [2] => ok____________________________________________then - (chars: 50)
    [3] => this_string_is_almost_certainly_longer_than_48_I_t - (chars: 50)
    [4] => three___________________ok___________________words - (chars: 50)
    [5] => 1______2_____3_____4_____5_____6_____7_____8_____9 - (chars: 50)
    [6] => Lorem____Ipsum____is_____simply_____dummy_____text - (chars: 50)
)

THAT WAS TOUGH :)

EDITED 2:

Function is now about 20% faster, because that benchmark touched me :)




回答10:


The (Semi-Long) Solution

It's taken me a while to perfect (probably much, much longer than an interviewer would have allowed for), but I've come up with an elegant, 162 line OOP solution to this problem. I included functionality to allow for the justifying of a single string, array of strings (already separated into lines) or a long string that needs to be broken up into lines of a maximum width first. Demos follow the code block.

Important Note: This class will only work in PHP 5.4. I realized this when running a version on my own server PHP (5.3.6) to get profiling stats with XDebug. PHP 5.3 complains about my use of $this in the anonymous function. A quick check of the docs on anonymous functions reveals that $this could not be used in the context of an anonymous function until 5.4. If anyone can find a clean workaround to this, please drop it in the comments. Added support for PHP 5.3!

<?php
class Justifier {
    private $text;

    public function __construct($text) {
        if(!is_string($text) && !is_array($text)) {
            throw new InvalidArgumentException('Expected a string or an array of strings, instead received type: ' . gettype($text));
        }

        if(is_array($text)) {
            // String arrays must be converted to JustifierLine arrays
            $this->text = array_map(function($line) {
                return JustifierLine::fromText($line);
            }, $text);
        } else {
            // Single line of text input
            $this->text = $text;
        }
    }

    public function format($width = null) {
        // Strings have to be broken into an array and then jusitifed
        if(is_string($this->text)) {
            if($width == null) {
                throw new InvalidArgumentException('A width must be provided for separation when an un-split string is provided');
            }

            if($width <= 0) {
                throw new InvalidArgumentException('Expected a positive, non-zero width, instead received width of ' . $width);
            }

            // Break up a JustifierLine of all text until each piece is smaller or equal to $width
            $lines = array(JustifierLine::fromText($this->text));
            $count = 0;
            $newLine = $lines[0]->breakAtColumn($width);

            while($newLine !== null) {
                $lines[] = $newLine;
                $newLine = $lines[++$count]->breakAtColumn($width);
            }
        } else {
            $lines = $this->text;

            // Allow for fluid width (uses longest line with single space)
            if($width == NULL) {
                $width = -1;

                foreach($lines as $line) {
                    // Width of line = Sum of the lengths of the words and the spaces (number of words - 1)
                    $newWidth = $line->calculateWordsLength() + $line->countWords() - 1;

                    if($newWidth > $width) { // Looking for the longest line
                        $width = $newWidth;
                    }
                }
            }
        }

        // Justify each element of array (PHP 5.4 ONLY)
        //$output = array_map(function($line) use ($width) {
        //  return $this->justify($line, $width);
        //}, $lines);

                    // Support for PHP 5.3
                    $output = array();
                    foreach($lines as $line) {
                        $output = $this->justify($line, $width);
                    }

        // If a single-line is passed in, a single line is returned
        if(count($output)) {
            return $output[0];
        }

        return $output;
    }

    private function justify(JustifierLine $line, $width) {
        // Retrieve already calculated line information
        $words     = $line->extractWords();
        $spaces    = $line->countWords() - 1;
        $wordLens  = $line->findWordLengths();
        $wordsLen  = $line->calculateWordsLength();
        $minWidth  = $wordsLen + $spaces;
        $output    = '';

        if($minWidth > $width) {
            throw new LengthException('A minimum width of ' . $minWidth . ' was required, but a width of ' . $width . ' was given instead');
        }

        // No spaces means only one word (center align)
        if($spaces == 0) {
            return str_pad($words[0], $width, ' ', STR_PAD_BOTH);
        }

        for(;$spaces > 0; $spaces--) {
            // Add next word to output and subtract its length from counters
            $output   .= array_shift($words);
            $length    = array_shift($wordLens);
            $wordsLen -= $length;
            $width    -= $length;

            if($spaces == 1) { // Last Iteration
                return $output . str_repeat(' ', $width - $wordsLen) . $words[0];
            }

            // Magic padding is really just simple math
            $padding  = floor(($width - $wordsLen) / $spaces);
            $output  .= str_repeat(' ', $padding);
            $width   -= $padding;
        }
    }
}

class JustifierLine {
    private $words;
    private $numWords;
    private $wordLengths;
    private $wordsLength;

    public static function fromText($text) {
        // Split words into an array
        preg_match_all('/[^ ]+/', $text, $matches, PREG_PATTERN_ORDER);
        $words       = $matches[0];

        // Count words
        $numWords    = count($words);

        // Find the length of each word
        $wordLengths = array_map('strlen', $words);

        //And Finally, calculate the total length of all words
        $wordsLength = array_reduce($wordLengths, function($result, $length) {
            return $result + $length;
        }, 0);

        return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
    }

    private function __construct($words, $numWords, $wordLengths, $wordsLength) {
        $this->words       = $words;
        $this->numWords    = $numWords;
        $this->wordLengths = $wordLengths;
        $this->wordsLength = $wordsLength;
    }

    public function extractWords() { return $this->words; }
    public function countWords() { return $this->numWords; }
    public function findWordLengths() { return $this->wordLengths; }
    public function calculateWordsLength() { return $this->wordsLength; }

    public function breakAtColumn($column) {
        // Avoid extraneous processing if we can determine no breaking can be done
        if($column >= ($this->wordsLength + $this->numWords - 1)) {
            return null;
        }

        $width       = 0;
        $wordsLength = 0;

        for($i = 0; $i < $this->numWords; $i++) {
            // Add width of next word
            $width += $this->wordLengths[$i];

            // If the line is overflowing past required $width
            if($width > $column) {
                // Remove overflow at end & create a new object with the overflow
                $words             = array_splice($this->words, $i);
                $numWords          = $this->numWords - $i;
                $this->numWords    = $i;
                $wordLengths       = array_splice($this->wordLengths, $i);
                $tempWordsLength   = $wordsLength;
                $wordsLength       = $this->wordsLength - $wordsLength;
                $this->wordsLength = $tempWordsLength;

                return new JustifierLine($words, $numWords, $wordLengths, $wordsLength);
            }

            $width++; // Assuming smallest spacing to fit

            // We also have to keep track of the total $wordsLength
            $wordsLength += $this->wordLengths[$i];
        }

        return null;
    }
}

Demos

Original Question (Justifying Lines of Text to width = 48)

You can pass in an array of many strings or just one string to Justifier. Calling Justifier::format($desired_length) will always return an array of justified lines *if an array of strings or string that required segmentation was passed to the constructor. Otherwise, a string will be returned. (Codepad Demo)

$jus = new Justifier(array(
    'hello world there ok then',
    'hello',
    'ok then',
    'two words',
    'three ok words',
    '1 2 3 4 5 6 7 8 9'
));

print_r( $jus->format(48) );

Output

Array
(
    [0] => hello      world       there       ok       then
    [1] =>                      hello                      
    [2] => ok                                          then
    [3] => two                                        words
    [4] => three                  ok                  words
    [5] => 1    2     3     4     5     6     7     8     9
)

You may notice I omitted one of the OP's test lines. This is because it was 54 characters and would exceed the $desired_length passed to Justifier::format(). The function will throw an IllegalArgumentException for widths that aren't positive, non-zero numbers that exceed or equal to the minimum width. The minimum width is calculated by finding the longest line (of all the lines passed to the constructor) with single spacing.

Fluid Width Justifying With An Array of Strings

If you omit the width, Justifier will use the width of the longest line (of those passed to the constructor) when single spaced. This is the same calculation as finding the minimum width in the previous demo. (Codepad Demo)

$jus = new Justifier(array(
    'hello world there ok then',
    'hello',
    'ok then',
    'this string is almost certainly longer than 48 I think',
    'two words',
    'three ok words',
    '1 2 3 4 5 6 7 8 9'
));

print_r( $jus->format() );

Output

Array
(
    [0] => hello        world        there        ok         then
    [1] =>                         hello                         
    [2] => ok                                                then
    [3] => this string is almost certainly longer than 48 I think
    [4] => two                                              words
    [5] => three                     ok                     words
    [6] => 1     2     3     4      5      6      7      8      9
)

Justifying a Single String of Text (width = 48)

I've also included a feature in the class which allows you to pass a single, non-broken string to the constructor. This string can be of any length. When you call Justifier::format($desired_length) the string is broken into lines such that each line is filled with as much text as possible and justified before starting a new line. The class will complain with an InvalidArgumentException because you must provide a width into which it can break the string. If anyone can think of a sensible default or way to programmatically determine a default for a string, I'm completely open to suggestions. (Codepad Demo)

$jus = new Justifier(
    'hello world there ok then hello ok then this string is almost certainly longer than 48 I think two words three ok words 1 2 3 4 5 6 7 8 9'
);

print_r( $jus->format(48) );

Output

Array
(
    [0] => hello world there ok then  hello  ok  then  this
    [1] => string is almost  certainly  longer  than  48  I
    [2] => think two words three ok words 1 2 3 4 5 6 7 8 9
)



回答11:


Here's my solution. For what it's worth, it took me about 20 minutes to make both the justify function and acceptance tests for it; 5 of those minutes debugging the justify function. Also, I used notpad++ instead of a more robust IDE to try to simulate to some extent the interview environment.

I think this might be too large of a problem for a whiteboard interview question, unless the interviewer lets you write in pseudocode and is more interested in your thought process that what you are putting on the board.

<?php


function justify($str_in, $desired_length) {
    $words = preg_split("/ +/",$str_in);
    // handle special cases
    if(count($words)==0) { return str_repeat(" ",$desired_length); }

    // turn single word case into a normal case
    if(count($words)==1) { $words = array("",$words[0],""); }

    $numwords = count($words);
    $wordlength = strlen(join("",$words));
    // handles cases where words are longer than the desired_length
    if($wordlength>($desired_length-$numwords)) { 
        return substr(join(" ",$words),0,$desired_length);
    }

    $minspace = floor(($desired_length-$wordlength)/($numwords-1));
    $extraspace = $desired_length - $wordlength - ($minspace * ($numwords-1));
    $result = $words[0];
    for($i=1;$i<$numwords;$i++) {
        if($extraspace>0) {
            $result.=" ";
            $extraspace--;
        }
        $result.=str_repeat(" ",$minspace);
        $result.=$words[$i];
    }
    return $result;
}

function acceptance_justify($orig_str, $just_str, $expected_length) {
    // should be the correct length
    if(strlen($just_str)!=$expected_length) { return false; }

    // should contain most of the words in the original string, in the right order
    if(preg_replace("/ +/","",substr($orig_str,0,$expected_length)) != preg_replace("/ +/","",substr($just_str,0,$expected_length))) { return false; }

    //spacing should be uniform (+/- 1 space)
    if(!preg_match("/( +)/",$just_str,$spaces)) { return false; }

    $space_length=strlen($spaces[0]);
    $smin=$space_length;
    $smax=$space_length;
    for($i=1;$i<count(@spaces);$i++) {
        $smin=min($smin,strlen($spaces));
        $smax=max($smax,strlen($spaces));
    }
    if(($smax-$smin)>1) { return false; }
    return true;
}

function run_test($str,$len) {
    print "<pre>";
    print "$str  ==> \n";
    $result = justify($str,$len);
    print preg_replace("/ /",".",$result) . "\n";
    print acceptance_justify($str,$result,$len)?"passed":"FAILED";
    print "\n\n</pre>";
}


run_test("hello world there ok then",48);
run_test("hello",48);
run_test("this string is almost certainly longer than 48 I think",48);
run_test("two words",48);
run_test("three ok words",48);
run_test("1 2 3 4 5 6 7 8 9",48);



回答12:


Here's a little bit different implementation just towards the end.

<?php
function justify($str, $to_len) {
    $str = trim($str);
    $strlen = strlen($str);

    if($str == '') return '';

    if($strlen >= $to_len) {
        return substr($str, 0, $to_len);   
    }

    $words = explode(' ', $str);
    $word_count = count($words);
    $space_count = $word_count - 1;

    if($word_count == 1) {
        return str_pad($str, $to_len, ' ', STR_PAD_BOTH);
    }

    $space = $to_len - $strlen + $space_count;
    $per_space = floor($space/$space_count);
    $spaces = str_pad('', $per_space, ' ');
    $curr_word = implode($words, $spaces);
    while(strlen($curr_word) < $to_len){
    $curr_word = substr($curr_word,0,preg_match("[! ][".$spaces."][! ]",$curr_word))." ".preg_match("[! ][".$spaces."][! ]",$curr_word));
    }
    return $curr_word;

?>

I'm not sure about the regexp, I just meant $spaces and not next space.



来源:https://stackoverflow.com/questions/11058483/justify-string-algorithm

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!