What are the ways of checking if piece of text in PDF documernt is bold using iTextSharp

↘锁芯ラ 提交于 2019-12-17 16:52:48

问题


I have an application, that extracts headings out of pdf files. The documents, that the application is supposed to work with, all have more or less coherent structure and formatting, in fact, telling if a text chunk is bold or not, is very important. Recently I came across a bunch of files, where some chunks visually appear bold, but do not have "bold" piece in string representation of font. The following SO thread how can i get text formatting with iTextSharp helped me to understand, that there is one more way of making text appear bold. However in my case calling GetTextRenderMode() does not help either, as it returns 0 as if it were normal text. So are there any other ways of making text appear bold, and is it possible to detect it using iTextSharp ?


回答1:


You are making the assumption that the font inside your PDF file knows if it's bold or not. Let's take a look inside and check if your assumption is correct.

This is what the subset JOJJAH of the font TT116t00 looks like when you look at the internals of the PDF file you have shared:

We see that the font is of subtye /TrueType, we see that the /ItalicAngle is 0, and... we see that the 3rd bit of the /Flags is set. Let's check the PDF reference to find out what this tells us:

I quote:

The font contains glyphs outside the Adobe standard Latin character set.

The glyphs look bold, because the glyphs are drawn in a way that they appear bold. You see the font as bold because you are human. However, when a machine looks at the font, it doesn't have a clue that the font is bold. A machine just follows the instructions stored in the /FontFile2 stream.

In short: iTextSharp doesn't have any indications that the font is bold.



来源:https://stackoverflow.com/questions/28065269/what-are-the-ways-of-checking-if-piece-of-text-in-pdf-documernt-is-bold-using-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!