问题
This dot source
graph A
{
a;
}
graph B
{
"Enûma Eliš";
}
when compiled with dot -Tps generates this error
Warning: UTF-8 input uses non-Latin1 characters which cannot be handled by this PostScript driver
I can fix the UTF-8 problem by passing -Tps:cairo but then only graph A is in the output -- it is truncated to a single page. The same happens with -Tpdf. There are no other postscript driver available on my installation.
I could split the graphs into separate files and concatenate them afterwards, but I'd rather not. Is there a way to have correct UTF-8 handling and multiple page output?
回答1:
Apparently the dot PS driver can't handle other encodings than the old ISO8859-1. I think it can't change fonts either.
One thing you can do is to run a filter to change dot's PostScript output. The following Perl program does that, it's an adaptation of some code I had. It changes the encoding from UTF-8 to a modified ISO encoding with extra characters replacing unused ones.
Of course, the output still depends on the font having the characters. Since dot (I think) only uses the default PostScript fonts, anything beyond the "standard latin" is out of the question...
It works with Ghostscript or with any interpreter which defines AdobeGlyphList.
The filter should be used this way:
dot -Tps graph.dot | perl reenc.pl > output.ps
Here it is:
#!/usr/bin/perl
use strict;
use warnings;
use open qw(:std :utf8);
my $ps = do { local $/; <STDIN> };
my %high;
my %in_use;
foreach my $char (split //, $ps) {
my $code = (unpack("C", $char))[0];
if ($code > 127) {
$high{$char} = $code;
if ($code < 256) {
$in_use{$code} = 1;
}
}
}
my %repl;
my $i = 128;
foreach my $char (keys %high) {
if ($in_use{$high{$char}}) {
$ps =~ s/$char/sprintf("\\%03o", $high{$char})/ge;
next;
}
while ($in_use{$i}) { $i++; }
$repl{$i} = $high{$char};
$ps =~ s/$char/sprintf("\\%03o", $i)/ge;
$i++;
}
my $psprocs = <<"EOPS";
/EncReplacements <<
@{[ join(" ", %repl) ]}
>> def
/RevList AdobeGlyphList length dict dup begin
AdobeGlyphList { exch def } forall
end def
% code -- (uniXXXX)
/uniX { 16 6 string cvrs dup length 7 exch sub exch
(uni0000) 7 string copy dup 4 2 roll putinterval } def
% font code -- glyphname
/unitoname { dup RevList exch known
{ RevList exch get }
{ uniX cvn } ifelse
exch /CharStrings get 1 index known not
{ pop /.notdef } if
} def
/chg-enc { dup length array copy EncReplacements
{ currentdict exch unitoname 2 index 3 1 roll put } forall
} def
EOPS
$ps =~ s{/Encoding EncodingVector def}{/Encoding EncodingVector chg-enc def};
$ps =~ s/(%%BeginProlog)/$1\n$psprocs/;
print $ps;
回答2:
Generating PDF or SVG could bypass the encoding problem too.
dot -Tpdf chs.dot > chs.pdf
// or
dot -Tsvg chs.dot > chs.svg
来源:https://stackoverflow.com/questions/27732134/how-can-i-make-dot-correctly-process-utf-8-to-postscript-and-have-multiple-graph