How major websites capture thumbnails from a link?

限于喜欢 提交于 2019-12-06 04:18:52

问题


When sharing a link in major websites like Digg and Facebook; it will create thumbnails by capturing main images of the page. How they catch images from a webpage? Does it included loading the whole page (e.g. by cURL) and parsing it (e.g. with preg_match) ? To me, this method is slow and unreliable. Does they have a more practical method?

P.S. I think there should be a practical method for quick crawling the page by skipping some parts (e.g. CSS and JS) to reach src attributes. Any idea?


回答1:


They typcailly look for an image on the page, and scale it down on their servers. Reddit's scraper code shows a good deal of what they do. The Scraper class should give you some good ideas on how to tackle this.




回答2:


JohnD's answer shows that Reddit uses embed.ly as part of their Python solution. Really embed.ly does the hard part of finding the image and they're free under 10,000 requests/mo.




回答3:


They generally use a tool like webkit2png.




回答4:


Some use

 <link rel="image_src" href="yourimage.jpg" /> 

included in the head of the page. See http://www.labnol.org/internet/design/set-thumbnail-images-for-web-pages/6482/

Facebook uses

<meta property="og:image" content="thumbnail_image" />

see: http://developers.facebook.com/docs/share/#basic-tags



来源:https://stackoverflow.com/questions/7462044/how-major-websites-capture-thumbnails-from-a-link

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!