Unable to improve the mask RCNN model for document images?

问题

I am training a model to extract all the necessary fields from a resume for which I am using mask rcnn to detect the fields in image. I have trained my mask RCNN model for 1000 training samples with 49 fields to extract. I am unable to improve the accuracy. How to improve the model? Is there any pretrained weights that may help?

回答1:

Looks like you want to do text classification/processing, you need to extract details from the text but you are applying object detection algorithms. I believe you need to use OCR to extract text (if you have cv as an image) and use the text classification model. Check out the below links more information about text classification -

https://medium.com/@armandj.olivares/a-basic-nlp-tutorial-for-news-multiclass-categorization-82afa6d46aa5

https://www.tensorflow.org/tutorials/tensorflow_text/intro

回答2:

You can break up the problem two different ways: Step 1- OCR seems to be the most direct way to get to your data. But increase the image size, thus resolution, otherwise, you may lose data. Step 2- Store the coordinates of each OCRed word. This is valuable information in this context. How words line up have significance. Step 3- At this point you can try to use basic positional clustering to group words. However, this can easily fail on a columnar vs row-based distribution of related text.
Step 4- See if you can identify which of 49 tags these clusters belong to. Look at text classification for Hidden Markov models, Baum-Welch Algorithms. i.e. Go for basic models first.

OR The above ignores the inherent classification opportunity that is the image of a, well, a properly formatted cv.

Step 1- Train your model to partition the image into sections without OCR. A good model should not break up the sentences, tables etc. This approach may leverage separators lines etc. There is also opportunity to decrease the size of your image since you are not OCRing yet. Step 2 -OCR image sections and try to classify similar to above.

回答3:

Another option is to use the neural networks like - PixelLink: Detecting Scene Text via Instance Segmentation

https://arxiv.org/pdf/1801.01315.pdf

来源：https://stackoverflow.com/questions/58679475/unable-to-improve-the-mask-rcnn-model-for-document-images

标签

python

keras

deep-learning

object-detection