How do I extract the list of supported Unicode characters from a TrueType or embedded OpenType font on Linux?
Is there a tool or a library I can use to process a .tt
Here is a POSIX[1] shell script that can print the code point and the character in a nice and easy way with the help of fc-match which is mentioned in Neil Mayhew's answer (it can even handle up to 8-hex-digit Unicode):
#!/bin/sh
for range in $(fc-match --format='%{charset}\n' "$1"); do
for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
n_hex=$(printf "%04x" "$n")
# using \U for 5-hex-digits
printf "%-5s\U$n_hex\t" "$n_hex"
count=$((count + 1))
if [ $((count % 10)) = 0 ]; then
printf "\n"
fi
done
done
printf "\n"
You can pass the font name or anything that fc-match accepts:
$ ls-chars "DejaVu Sans"
Updated content:
I learned that subshell is very time consuming (the printf subshell in my script). So I managed to write a improved version that is 5-10 times faster!
#!/bin/sh
for range in $(fc-match --format='%{charset}\n' "$1"); do
for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
printf "%04x\n" "$n"
done
done | while read -r n_hex; do
count=$((count + 1))
printf "%-5s\U$n_hex\t" "$n_hex"
[ $((count % 10)) = 0 ] && printf "\n"
done
printf "\n"
Old version:
$ time ls-chars "DejaVu Sans" | wc
592 11269 52740
real 0m2.876s
user 0m2.203s
sys 0m0.888s
New version (the line number indicates 5910+ characters, in 0.4 seconds!):
$ time ls-chars "DejaVu Sans" | wc
592 11269 52740
real 0m0.399s
user 0m0.446s
sys 0m0.120s
End of update
Sample output (it aligns better in my st terminal