Multi-byte safe wordwrap() function for UTF-8

前端 未结 9 1003
太阳男子
太阳男子 2020-12-01 13:17

PHP\'s wordwrap() function doesn\'t work correctly for multi-byte strings like UTF-8.

There are a few examples of mb safe functions in the comments, but with some di

9条回答
  •  庸人自扰
    2020-12-01 13:29

    Because no answer was handling every use case, here is something that does. The code is based on Drupal’s AbstractStringWrapper::wordWrap.

    $string will be
     *   wrapped. Defaults to 75.
     * @param string $break [optional]
     *   The line is broken using the optional break parameter. Defaults
     *   to "\n".
     * @param boolean $cut [optional]
     *   If the $cut is set to TRUE, the string is
     *   always wrapped at or before the specified $width. So if
     *   you have a word that is larger than the given $width, it
     *   is broken apart. Defaults to FALSE.
     * @return string
     *   Returns the given $string wrapped at the specified
     *   $width.
     */
    function mb_wordwrap($string, $width = 75, $break = "\n", $cut = false) {
      $string = (string) $string;
      if ($string === '') {
        return '';
      }
    
      $break = (string) $break;
      if ($break === '') {
        trigger_error('Break string cannot be empty', E_USER_ERROR);
      }
    
      $width = (int) $width;
      if ($width === 0 && $cut) {
        trigger_error('Cannot force cut when width is zero', E_USER_ERROR);
      }
    
      if (strlen($string) === mb_strlen($string)) {
        return wordwrap($string, $width, $break, $cut);
      }
    
      $stringWidth = mb_strlen($string);
      $breakWidth = mb_strlen($break);
    
      $result = '';
      $lastStart = $lastSpace = 0;
    
      for ($current = 0; $current < $stringWidth; $current++) {
        $char = mb_substr($string, $current, 1);
    
        $possibleBreak = $char;
        if ($breakWidth !== 1) {
          $possibleBreak = mb_substr($string, $current, $breakWidth);
        }
    
        if ($possibleBreak === $break) {
          $result .= mb_substr($string, $lastStart, $current - $lastStart + $breakWidth);
          $current += $breakWidth - 1;
          $lastStart = $lastSpace = $current + 1;
          continue;
        }
    
        if ($char === ' ') {
          if ($current - $lastStart >= $width) {
            $result .= mb_substr($string, $lastStart, $current - $lastStart) . $break;
            $lastStart = $current + 1;
          }
    
          $lastSpace = $current;
          continue;
        }
    
        if ($current - $lastStart >= $width && $cut && $lastStart >= $lastSpace) {
          $result .= mb_substr($string, $lastStart, $current - $lastStart) . $break;
          $lastStart = $lastSpace = $current;
          continue;
        }
    
        if ($current - $lastStart >= $width && $lastStart < $lastSpace) {
          $result .= mb_substr($string, $lastStart, $lastSpace - $lastStart) . $break;
          $lastStart = $lastSpace = $lastSpace + 1;
          continue;
        }
      }
    
      if ($lastStart !== $current) {
        $result .= mb_substr($string, $lastStart, $current - $lastStart);
      }
    
      return $result;
    }
    
    ?>
    

提交回复
热议问题