Unable to improve the mask RCNN model for document images?

吃可爱长大的小学妹 提交于 2019-12-25 00:26:50

问题


I am training a model to extract all the necessary fields from a resume for which I am using mask rcnn to detect the fields in image. I have trained my mask RCNN model for 1000 training samples with 49 fields to extract. I am unable to improve the accuracy. How to improve the model? Is there any pretrained weights that may help?


回答1:


Looks like you want to do text classification/processing, you need to extract details from the text but you are applying object detection algorithms. I believe you need to use OCR to extract text (if you have cv as an image) and use the text classification model. Check out the below links more information about text classification -

https://medium.com/@armandj.olivares/a-basic-nlp-tutorial-for-news-multiclass-categorization-82afa6d46aa5

https://www.tensorflow.org/tutorials/tensorflow_text/intro




回答2:


You can break up the problem two different ways: Step 1- OCR seems to be the most direct way to get to your data. But increase the image size, thus resolution, otherwise, you may lose data. Step 2- Store the coordinates of each OCRed word. This is valuable information in this context. How words line up have significance. Step 3- At this point you can try to use basic positional clustering to group words. However, this can easily fail on a columnar vs row-based distribution of related text.
Step 4- See if you can identify which of 49 tags these clusters belong to. Look at text classification for Hidden Markov models, Baum-Welch Algorithms. i.e. Go for basic models first.

OR The above ignores the inherent classification opportunity that is the image of a, well, a properly formatted cv.

Step 1- Train your model to partition the image into sections without OCR. A good model should not break up the sentences, tables etc. This approach may leverage separators lines etc. There is also opportunity to decrease the size of your image since you are not OCRing yet. Step 2 -OCR image sections and try to classify similar to above.




回答3:


Another option is to use the neural networks like - PixelLink: Detecting Scene Text via Instance Segmentation

https://arxiv.org/pdf/1801.01315.pdf



来源:https://stackoverflow.com/questions/58679475/unable-to-improve-the-mask-rcnn-model-for-document-images

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!