Extracting text from a table in R

纵饮孤独 提交于 2021-01-25 03:56:23

问题


I am having significant trouble attempting to use the tabulizer package in R to extract text within tables. The issue is that the tables have a very odd structure (merged cells)...

I am trying to extract a section of the table that is highlighted in red. The numbers at the top of the highlighted portion are the days of the month. For each day, I need to records the values for "Row1" to "Row5". However, when I use the extract_tables function I get the following table (only a small portion)...

For some reason days 1, 2 and 3 are being squished into a single cell. Has anyone else run into this issue using tabulizer? I would specify the coordinates of the table that I am trying to extract, however, the positioning of the table changes with each PDF document. I also cannot specify the region manually because I am trying to automate the process. I can't upload the PDF document to dropbox and then post the link here because I am on my work computer. I can post it tonight if anyone wants to try this particular example. Any help/resources are very much appreciated!

来源:https://stackoverflow.com/questions/60571187/extracting-text-from-a-table-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!