How can I make DOT correctly process UTF-8 to PostScript and have multiple graph/pages?

馋奶兔 提交于 2019-12-24 12:51:41

问题


This dot source

graph A
{
    a;
}
graph B
{
    "Enûma Eliš";
}

when compiled with dot -Tps generates this error

Warning: UTF-8 input uses non-Latin1 characters which cannot be handled by this PostScript driver

I can fix the UTF-8 problem by passing -Tps:cairo but then only graph A is in the output -- it is truncated to a single page. The same happens with -Tpdf. There are no other postscript driver available on my installation.

I could split the graphs into separate files and concatenate them afterwards, but I'd rather not. Is there a way to have correct UTF-8 handling and multiple page output?


回答1:


Apparently the dot PS driver can't handle other encodings than the old ISO8859-1. I think it can't change fonts either.

One thing you can do is to run a filter to change dot's PostScript output. The following Perl program does that, it's an adaptation of some code I had. It changes the encoding from UTF-8 to a modified ISO encoding with extra characters replacing unused ones.

Of course, the output still depends on the font having the characters. Since dot (I think) only uses the default PostScript fonts, anything beyond the "standard latin" is out of the question...

It works with Ghostscript or with any interpreter which defines AdobeGlyphList.

The filter should be used this way:

dot -Tps graph.dot | perl reenc.pl > output.ps

Here it is:

#!/usr/bin/perl

use strict;
use warnings;
use open qw(:std :utf8);

my $ps = do { local $/; <STDIN> };
my %high;
my %in_use;
foreach my $char (split //, $ps) {
    my $code = (unpack("C", $char))[0];
    if ($code > 127) {
        $high{$char} = $code;
        if ($code < 256) {
            $in_use{$code} = 1;
        }
    }
}
my %repl;
my $i = 128;
foreach my $char (keys %high) {
    if ($in_use{$high{$char}}) {
        $ps =~ s/$char/sprintf("\\%03o", $high{$char})/ge;
        next;
    }
    while ($in_use{$i}) { $i++; }
    $repl{$i} = $high{$char};
    $ps =~ s/$char/sprintf("\\%03o", $i)/ge;
    $i++;
}
my $psprocs = <<"EOPS";
/EncReplacements <<
  @{[ join(" ", %repl) ]}
>> def
/RevList AdobeGlyphList length dict dup begin
  AdobeGlyphList { exch def } forall
end def
% code -- (uniXXXX)
/uniX { 16 6 string cvrs dup length 7 exch sub exch
  (uni0000) 7 string copy dup  4 2 roll putinterval } def
% font code -- glyphname
/unitoname { dup RevList exch known
  { RevList exch get }
  { uniX cvn } ifelse
  exch /CharStrings get 1 index known not
  { pop /.notdef } if
} def
/chg-enc { dup length array copy EncReplacements
  { currentdict exch unitoname 2 index 3 1 roll put } forall
} def
EOPS

$ps =~ s{/Encoding EncodingVector def}{/Encoding EncodingVector chg-enc def};
$ps =~ s/(%%BeginProlog)/$1\n$psprocs/;

print $ps;



回答2:


Generating PDF or SVG could bypass the encoding problem too.

dot  -Tpdf  chs.dot > chs.pdf

// or

dot  -Tsvg  chs.dot > chs.svg


来源:https://stackoverflow.com/questions/27732134/how-can-i-make-dot-correctly-process-utf-8-to-postscript-and-have-multiple-graph

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!