gnuplot: Heatmap using character combinations

馋奶兔 提交于 2020-01-14 22:54:51

问题


I am currently analysing two character combinations in texts and I want to visualize the frequencies in a heatmap using gnuplot. My input file is in the format (COUNT stands for the actual number of this combination)

a a COUNT
a b COUNT
...
z y COUNT
z z COUNT

Now I'd like to create a heatmap (like the first one that is shown on this site). On the x axis as well on the y axis I'd like to display the characters from A-Z, i.e.

a
b
...
z
     a b ... z

I am pretty new to gnuplot, so I tried plot "input.dat" using 2:1:3 with images, which results in an error message "Can't plot with an empty x range". My naive approach to run set xrange['a':'z'] did not help much.

There are a bunch of related questions on SO, but they either deal with numeric x-values (e.g. Heatmap with Gnuplot on a non-uniform grid) or with different input data formats (e.g. gnuplot: label x and y-axis of matrix (heatmap) with row and column names)

So my question is: What is the easiest way to transform my input file into a nice gnuplot heatmap?


回答1:


You need to convert the alphabet characters to integers. It might be possible to do this somehow in gnuplot, but it would probably be messy.

My solution would be to use a quick python script to convert the datafile (let's say it is called data.dat):

#!/usr/bin/env python2.7

with open('data.dat', 'r') as i:
    with open('data2.dat', 'w') as o:
        lines = i.readlines()
        for line in lines:
            line = line.split()
            x = str(ord(line[0].lower()) - ord('a'))
            y = str(ord(line[1].lower()) - ord('a'))
            o.write("%s %s %s\n" % (x, y, line[2]))

This takes a file like this:

a a 1
a b 2
a c 3
b a 4
b b 5
b c 6
c a 7
c b 8
c c 9

and converts it to:

0 0 1
0 1 2
0 2 3
1 0 4
1 1 5
1 2 6
2 0 7
2 1 8
2 2 9

Then you can plot it in gnuplot:

#!/usr/bin/env gnuplot

set terminal pngcairo
set output 'test.png'

set xtics ("a" 0, "b" 1, "c" 2)
set ytics ("a" 0, "b" 1, "c" 2)

set xlabel 'First Character'
set ylabel 'Second Character'

set title 'Character Combination Counts'

plot 'data2.dat' with image

It's a little clunky to set the tics manually that way, but it works fine.




回答2:


In case this still might be of interest to someone who wants to avoid an external script.

Basically, an ord() function is missing in gnuplot. You can do it messy, but I would say the solution below is "relatively clean"... at least no need for an external script.

### gnuplot implementation of ord() function

ASCII =  ' !"' . "#$%&'()*+,-./0123456789:;<=>?@".\
         "ABCDEFGHIJKLMNOPQRSTUVWXYZ[" . "\\" . "]^_" . "\`".\
         "abcdefghijklmnopqrstuvwxyz{|}~"

ord(c) = (tmp = strstrt(ASCII,c) ) > 0 ? tmp+31 : 0

# check all ord values
w = "            "
do for [i=1:strlen(ASCII)] {
    w = (i+1)%10==0 ? w=w."\n" : w
    w = w.substr(ASCII,i,i).'='.sprintf("%03d",ord(substr(ASCII,i,i)))." "
}
print w
### end of code

Result:

             =032 !=033 "=034 #=035 $=036 %=037 &=038 '=039 
(=040 )=041 *=042 +=043 ,=044 -=045 .=046 /=047 0=048 1=049 
2=050 3=051 4=052 5=053 6=054 7=055 8=056 9=057 :=058 ;=059 
<=060 ==061 >=062 ?=063 @=064 A=065 B=066 C=067 D=068 E=069 
F=070 G=071 H=072 I=073 J=074 K=075 L=076 M=077 N=078 O=079 
P=080 Q=081 R=082 S=083 T=084 U=085 V=086 W=087 X=088 Y=089 
Z=090 [=091 \=092 ]=093 ^=094 _=095 `=096 a=097 b=098 c=099 
d=100 e=101 f=102 g=103 h=104 i=105 j=106 k=107 l=108 m=109 
n=110 o=111 p=112 q=113 r=114 s=115 t=116 u=117 v=118 w=119 
x=120 y=121 z=122 {=123 |=124 }=125 ~=126 


来源:https://stackoverflow.com/questions/20428010/gnuplot-heatmap-using-character-combinations

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!