Google News XML API: use country/language parameters

爱⌒轻易说出口 提交于 2021-01-03 04:16:30

问题


I would like to subscribe to an RSS/XML feed from Google News that captures the following query:

Articles mentioning "studie" (German for "study"), written in German, emanating from any country.

I'm using https://news.google.com/rss/search, but for this example, it's easier to see the UI output at https://news.google.com/search, so I'll use the latter URL base in this example.

Now, in the XML API reference, Google mentions four different parameters that influence either language or country:

  • hl (host language): the language that the end user is assumed to be typing in. I.e., an English-language speaker types "study," and Google assumes that term is in English and then machine-translates the results back to English. For me, navigating to will redirect a URL with hl=en-US (full URL is https://news.google.com/?hl=en-US&gl=US&ceid=US:en).

  • gl: boosts search results whose country of origin matches the parameter value. The default in my web browser is gl=US.

  • lr (language restrict): restricts search results to documents written in a particular language

  • cr (country restrict): restricts search results to documents originating in a particular country

Based on all of the above, that would imply a URL of*:

https://news.google.com/search?q=study&hl=en-US&lr=lang_de

That attempt, however, fails miserably; it shows English-language results from the U.S., and it 302 redirects to:

https://news.google.com/search?q=study&lr=lang_de&hl=en-US&gl=US&ceid=US:en

So, to that end:

  • How can I properly structure URL parameters to capture 'Articles mentioning "studie" (German for "study"), written in German, from any country.'?
  • What the heck is ceid and why is it documented absolutely nowhere by Google?

* I.e.:

>>> import urllib.parse
>>> urllib.parse.parse_qs('q=study&hl=en-US&lr=lang_de')                                                                                                     
{'q': ['study'], 'hl': ['en-US'], 'lr': ['lang_de']}

Related but not resolving any of this:

  • Limit Google News RSS to specific country
  • RSS Google news language
  • How do you specify retrieving local news when using a Google News RSS URL?

回答1:


I'm using the following URL, it works for me:

https://news.google.com/rss?q=studie&hl=de-DE&gl=DE&ceid=DE:de

you can also search in topics, please refer to this answer: URL format for Google News RSS feed




回答2:


The New URL for Google New RSS is changed. You can use the following format for fetching. Also examples can be seen here.

usage: gnrss2opml.py [-h] [-o OUTPUT] [-c COUNTRY] [-l LANGUAGE] [-s]
                     [-t [TOPIC [TOPIC ...]]] [-g [LOCATION [LOCATION ...]]]
                     [-q [QUERY [QUERY ...]]]

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        output file name (default: print to stdout)
  -c COUNTRY, --country COUNTRY
                        country / Google News edition (default: us)
  -l LANGUAGE, --language LANGUAGE
                        language (default: en)
  -s, --stories         include Top Stories
  -t [TOPIC [TOPIC ...]], --topics [TOPIC [TOPIC ...]]
                        list of topics, will be converted to uppercase
                        (default: WORLD NATION BUSINESS TECHNOLOGY
                        ENTERTAINMENT SPORTS SCIENCE HEALTH)
  -g [LOCATION [LOCATION ...]], --locations [LOCATION [LOCATION ...]]
                        list of geographic locations (default: None)
  -q [QUERY [QUERY ...]], --queries [QUERY [QUERY ...]]
                        list of search queries (default: None)

EDIT1:

The 2 letter language code and country code can be specified in the argument.

Get the codes from here



来源:https://stackoverflow.com/questions/57792590/google-news-xml-api-use-country-language-parameters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!