How does the Google Docs PDF viewer work?

痞子三分冷 提交于 2019-12-02 14:57:32
Ben Everard

Google is simply serving up an an image (right click -> save as), with an overlay to highlight text.

You should check out this SO question where others go into more detail.

You should also look through the source of your PDF link, it would appear Google are passing the PDF link through to be converted into an image.

Example:

<script type="text/javascript"> 
        var gviewElement = document.getElementById('gview');
        var config = {

          'api': false,
          'chrome': true,
          'csi': true,
          'ddUrl': "http://www.idfcmf.com/downloads/monthly_fund/2009/IDFC-Premier-Equityfund-jan10.pdf",
          'element': gviewElement,
          'embedded': false,
          'initialQuery': "",
          'oivUrl': "http://docs.google.com/viewer?url\x3dhttp%3A%2F%2Fwww.idfcmf.com%2Fdownloads%2Fmonthly_fund%2F2009%2FIDFC-Premier-Equityfund-jan10.pdf",
          'sdm': 200,
          'userAuthenticated': true
        };

        var gviewApp = _createGView(config);
        gviewApp.setProgress(50);


          window.jstiming.load.name = 'view';

          window.jstiming.load.tick('_dt');

      </script> 

Edit

Also if you were to view the PDF viewer in Firefox with Firebug, you will notice that when you 'highlight' text it's really only enabling a load of divs, I'm guessing Google scans the document using OCR, detects where the text is and provides a matrix of coordinates on which to base the div placement on, when you click and drag it introgates the mouse pointer location to determine which divs to display.

the whole thing is an image. text highlight overlay - thats easy to figure out. but when you press ctrl+c and it copies to the clipboard, that part has me totally stumped. because it's not possible to write to the clipboard using javascript in firefox, but this ctrl+c on the image works fine in firefox. http://www.google.com/support/forum/p/Google+Docs/thread?tid=67dcf21ef8579b4c&hl=en&fid=67dcf21ef8579b4c00047e4a2a9fcb12

I agree with some of the other answers - the PDF is rendered as a PNG, and very likely the text areas are layered, probably using absolute/relative positioning. You can extract PDF information from the PDF (of course...). The PDF format is open - anyone could do it (granted, it might not be easy). However there are some open source tools out there (xPDF...) that enables export of PDF contents, like to XML. It's possible that the exports include information like coordinates as to where on the page text and images should display.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!