how to use imagick annotateImage for chinese text?

大憨熊 提交于 2019-11-28 11:20:08
Walter Tross

The problem is you are feeding imagemagick the output of a "line splitter" (wordWrapAnnotation), to which you are utf8_decodeing the text input. This is wrong for sure, if you are dealing with Chinese text. utf8_decode can only deal with UTF-8 text that CAN be converted to ISO-8859-1 (the most common 8-bit extension of ASCII).

Now, I hope that you text is UTF-8 encoded. If it is not, you might be able to convert it like this:

$text = mb_convert_encoding($text, 'UTF-8', 'BIG-5');

or like this

$text = mb_convert_encoding($text, 'UTF-8', 'GB18030'); // only PHP >= 5.4.0

(in your code $text is rather $text1 and $text2).

Then there are (at least) two things to fix in your code:

  1. pass the text "as is" (without utf8_decode) to wordWrapAnnotation,
  2. change the argument of setTextEncoding from "utf-8" to "UTF-8" as per specs

I hope that all variables in your code are initialized in some missing part of it. With the two changes above (the second one might not be necessary, but you never know...), and with the missing parts in place, I see no reason why your code should not work, unless your TTF file is broken or the Imagick library is broken (imagemagick, on which Imagick is based, is a great library, so I consider this last possibility rather unlikely).

EDIT:

Following your request, I update my answer with

a) the fact that setting mb_internal_encoding('utf-8') is very important for the solution, as you say in your answer, and

b) my proposal for a better line splitter, that works acceptably for western languages and for Chinese, and that is probably a good starting point for other languages using Han logograms (Japanese kanji and Korean hanja):

function wordWrapAnnotation(&$image, &$draw, $text, $maxWidth)
{
   $regex = '/( |(?=\p{Han})(?<!\p{Pi})(?<!\p{Ps})|(?=\p{Pi})|(?=\p{Ps}))/u';
   $cleanText = trim(preg_replace('/[\s\v]+/', ' ', $text));
   $strArr = preg_split($regex, $cleanText, -1, PREG_SPLIT_DELIM_CAPTURE |
                                                PREG_SPLIT_NO_EMPTY);
   $linesArr = array();
   $lineHeight = 0;
   $goodLine = '';
   $spacePending = false;
   foreach ($strArr as $str) {
      if ($str == ' ') {
         $spacePending = true;
      } else {
         if ($spacePending) {
            $spacePending = false;
            $line = $goodLine.' '.$str;
         } else {
            $line = $goodLine.$str;
         }
         $metrics = $image->queryFontMetrics($draw, $line);
         if ($metrics['textWidth'] > $maxWidth) {
            if ($goodLine != '') {
               $linesArr[] = $goodLine;
            }
            $goodLine = $str;
         } else {
            $goodLine = $line;
         }
         if ($metrics['textHeight'] > $lineHeight) {
            $lineHeight = $metrics['textHeight'];
         }
      }
   }
   if ($goodLine != '') {
      $linesArr[] = $goodLine;
   }
   return array($linesArr, $lineHeight);
}

In words: the input is first cleaned up by replacing all runs of whitespace, including newlines, with a single space, except for leading and trailing whitespace, which is removed. Then it is split either at spaces, or right before Han characters not preceded by "leading" characters (like opening parentheses or opening quotes), or right before "leading" characters. Lines are assembled in order not to be rendered in more than $maxWidth pixels horizontally, except when this is not possible by the splitting rules (in which case the final rendering will probably overflow). A modification in order to force splitting in overflow cases is not difficult. Note that, e.g., Chinese punctuation is not classified as Han in Unicode, so that, except for "leading" punctuation, no linebreak can be inserted before it by the algorithm.

I'm afraid you will have to choose a TTF that can support Chinese code points. There are many sources for this, here are two:

http://www.wazu.jp/gallery/Fonts_ChineseTraditional.html

http://wildboar.net/multilingual/asian/chinese/language/fonts/unicode/non-microsoft/non-microsoft.html

Kim Stacks

Full solution here:

https://gist.github.com/2971092/232adc3ebfc4b45f0e6e8bb5934308d9051450a4

Key ideas:

Must set the html charset and internal encoding on the form and on the processing page

header('Content-Type: text/html; charset=utf-8');
mb_internal_encoding('utf-8');

These lines must be at the top lines of the php files.

Use this function to determine if text is Chinese and use the right font file

function isThisChineseText($text) {
    return preg_match("/\p{Han}+/u", $text);
}

For more details check out https://stackoverflow.com/a/11219301/80353

Set TextEncoding properly in ImagickDraw object

$draw = new ImagickDraw();

// set utf 8 format
$draw->setTextEncoding('UTF-8');

Note the Capitalized UTF. THis was helpfully pointed out to me by Walter Tross in his answer here: https://stackoverflow.com/a/11207521/80353

Use preg_match_all to explode English words, Chinese Words and spaces

// separate the text by chinese characters or words or spaces
preg_match_all('/([\w]+)|(.)/u', $text, $matches);
$words = $matches[0];

Inspired by this answer https://stackoverflow.com/a/4113903/80353

Works just as well for english text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!