问题
I have a binary which has some text on different parts of the image like at the bottom, top, center, right middle center, etc.
Original Image
The areas I would like to focus on are the manually drawn regions shown in red.
I calculated the horizontal and vertical summations of the image and plotted them:
plot(sum(edgedImage1,1))
plot(sum(edgedImage1,2))
Can somebody give me explanation of what these plots are telling me about the original image with regards to the structure of which I explained above? Moreover, how could these plots help me extracting those regions I just manually drew in red?
回答1:
There's nothing sophisticated about the sum operation. Simply put, sum(edgedImage1,1) computes the sum of all rows for each column in the image and that is what you are plotting. Effectively, you are computing the sum of all non-zero values (i.e. white pixels) over all rows for each column. The horizontal axis in the plot denotes what row's sum you are observing. Similarly, sum(edgedImage,2) computes the sum of all columns for each row of the image and that is what you are plotting.
Because your text is displayed in a horizontal fashion, sum(edgeImage,1) won't be particularly useful. What is very useful is the sum(edgedImage,2) operation. For lines in your image that are blank, the horizontal sum of columns for each row of your image should be a very small value whereas for lines in your image that contain text or strokes, the sum should be quite large. A good example of what I'm talking about is seen towards the bottom of your image. If you consult between rows 600 and 700, you see a huge spike in your plot as there is a lot of text that is surrounded between those rows.
Using this result, a crude way to determine what areas in your image that contain text or strokes in your case would be to find all rows that surpass some threshold. Combined with finding modes or peaks from the sum operation that was just performed, you can very easily localize and separate out each region of text.
You may want to smooth the curve provided by sum(edgedImage,2) if you decide to determine how many blobs of text there are. Once you smooth out this signal, you will clearly see that there are 5 modes corresponding to 5 lines of text.
回答2:
The second plot that shows the sum of each row. This can tell you in which rows you have a lot of information and in which you have none.
You can use this plot to find the rectangles by looking for a sharp incline in the value for a start of a rectangle and sharp decline in the value for the end of the rectangle. Before you do it i would low pass filter the data and then look at the derivative of this and look for a big derivative.
You can do the same the first plot but it is more sensitive.
回答3:
The minimums in your last plot are the gaps between lines of text ...
You just take the graph and align its y axis to y axis of image and then Threshold areas with too small amount of pixels per column. These areas (Red) are the gaps between lines of Text or whatever you got on the image:
Now you should de-skew the image if needed. If the skew is too big you need to apply the de-skew even before the y axis summation.
- De-skew operation characters in binary image (matlab)
After this you make the x axis summation plot for each non red region separately and in the same manner detect gaps between characters/words to get the area of each character for OCR. This time you align to x axis
These plots can be also used to OCR if taken on character region instead see
- OCR and character similarity
If you do a statistical analysis of the gap/non gap sizes then the most recurrent one is usually the Font spacing/size for regular text.
来源:https://stackoverflow.com/questions/39153333/interpretation-of-horizontal-and-vertical-summations-of-an-image