Running a JavaScript command from MATLAB to fetch a PDF file

前端 未结 3 1370
予麋鹿
予麋鹿 2020-12-19 13:40

I\'m currently writing some MATLAB code to interact with my company\'s internal reports database. So far I can access the HTML abstract page using code which looks like this

3条回答
  •  别那么骄傲
    2020-12-19 14:11

    Once you have gotten the correct URL (a la the answer from pjp), your next problem is to "get the contents of the PDF file into a MATLAB variable". Whether or not this is possible may depend on what you mean by "contents"...


    If you want to get the raw data in the PDF file, I don't think there is a way currently to do this in MATLAB. The URLREAD function was the first thing I thought of to read content from a URL into a string, but it has this note in the documentation:

    s = urlread('url') reads the content at a URL into the string s. If the server returns binary data, s will be unreadable.

    Indeed, if you try to read a PDF as in the following example, s contains some text intermingled with mostly garbage:

    s = urlread('http://samplepdf.com/sample.pdf');
    

    If you want to get the text from the PDF file, you have some options. First, you can use URLWRITE to save the contents of the URL to a file:

    urlwrite('http://samplepdf.com/sample.pdf','temp.pdf');
    

    Then you should be able to use one of two submissions on The MathWorks File Exchange to extract the text from the PDF:

    • Extract text from a PDF document by Dimitri Shvorob
    • PDF Reader by Tom Gaudette

    If you simply want to view the PDF, you can just open it in Adobe Acrobat with the OPEN function:

    open('temp.pdf');
    

提交回复
热议问题