How to get the Image from first page when search in Google?

若如初见. 提交于 2019-12-05 02:36:51

问题


Usually after using Google to search for a city, there is a part of Wikipedia page on the right with an image and a map. Can anyone tell me how I could access this image? I should know how to download it.


回答1:


Actually the main image (that goes with the map image on the right) is very rarely from Wikipedia, so you can't use Wikipedia API to get it. If you want to access the actual main image you can use this:

private static void GetGoogleImage(string word)
{
    // make an HTTP Get request
    var request = (HttpWebRequest)WebRequest.Create("https://www.google.com.pg/search?q=" + word);
    request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36";
    using (var webResponse = (HttpWebResponse)request.GetResponse())
    {
        using (var reader = new StreamReader(webResponse.GetResponseStream()))
        {
            // get all images with base64 string
            var matches = Regex.Matches(reader.ReadToEnd(), @"'data:image/jpeg;base64,([^,']*)'");
            if (matches.Count > 0)
            {
                // get the image with the max height
                var bytes = matches.Cast<Match>()
                    .Select(x => Convert.FromBase64String(x.Groups[1].Value.Replace("\\75", "=").Replace("\\075", "=")))
                    .OrderBy(x => Image.FromStream(new MemoryStream(x, false)).Height).Last();

                // save the image as 'image.jpg'
                using (var imageFile = new FileStream("image.jpg", FileMode.Create))
                {
                    imageFile.Write(bytes, 0, bytes.Length);
                    imageFile.Flush();
                }
            }
        }
    }
}

This work for me, and always returns the actual main image (if such exists). For example, GetGoogleImage("New York") give me data:image/jpeg;base64,/9j/4AAQSkZJRg....

I use the fact that from the all base64 string images in response the main has the max height, so its need only to order them by height and to select the last one. If it's required, you can check here also for minimum image height. The replacing \075 to = is needed base64's padding.




回答2:


If you want Wikipedia article main image you have to use Wikipedia API.

Update:

  • You can use jsoup: Java HTML Parser org.jsoup:jsoup:1.8.3 which return list of image inside page.

        String stringResponse = getHtmlContent(url);
        Document doc = Jsoup.parse(stringResponse);
        Element content = doc.getElementById("content");
        //Get all elements with img tag ,
        Elements img = content.getElementsByTag("img");
        for (Element el : img) {
            //for each element get the src image url
            String src = el.attr("src");
            Log.d(TAG, "src attribute is : " + src);
            String alt = el.attr("alt");
            //do some stuff
        }
    

Update: Wikipida provide API for to return HTML Content



来源:https://stackoverflow.com/questions/34501905/how-to-get-the-image-from-first-page-when-search-in-google

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!