not iterating the list in web scraping

断了今生、忘了曾经 提交于 2019-12-24 00:58:43

问题


From a link , I am trying to create two lists: one for country and the other for currency. However, I'm stuck at some point where it only gives me the first country name but doesn't iterate to list of all countries. Any help as to how I can fix this will be appreciated.Thanks in advance.

Here is my try:

from bs4 import BeautifulSoup
import urllib.request

url = "http://www.worldatlas.com/aatlas/infopage/currency.htm"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 
10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 
Safari/537.36'}

req = urllib.request.Request(url, headers=headers)
resp = urllib.request.urlopen(req)
html = resp.read()

soup = BeautifulSoup(html, "html.parser")
attr = {"class" : "miscTxt"}

countries = soup.find_all("div", attrs=attr)
countries_list = [tr.td.string for tr in countries]

for country in countries_list:
    print(country)

回答1:


Try this script. It should give you the country names along with corresponding currencies. You didn't require to use headers for this site.

from bs4 import BeautifulSoup
import urllib.request

url = "http://www.worldatlas.com/aatlas/infopage/currency.htm"
resp = urllib.request.urlopen(urllib.request.Request(url)).read()
soup = BeautifulSoup(resp, "lxml")

for item in soup.select("table tr"):
    try:
        country = item.select("td")[0].text.strip()
    except IndexError:
        country = ""
    try:
        currency = item.select("td")[0].find_next_sibling().text.strip()
    except IndexError:
        currency = ""
    print(country,currency)

Partial Output:

Afghanistan afghani
Algeria dinar
Andorra euro
Argentina peso
Australia dollar



回答2:


You can also use a single comprehension list to make a list of tuples like [(country, currency)] & then convert the tuples to 2 lists with map & zip :

temp_list = [
    (t[0].text.strip(), t[1].text.strip()) 
    for t in (t.find_all('td') for t in countries[0].find_all('tr'))
    if t
]

countries_list, currency_list = map(list,zip(*temp_list))

The full code :

from bs4 import BeautifulSoup
import urllib.request

req = urllib.request.Request("http://www.worldatlas.com/aatlas/infopage/currency.htm")

soup = BeautifulSoup(urllib.request.urlopen(req).read(), "html.parser")

countries = soup.find_all("div", attrs = {"class" : "miscTxt"})

temp_list = [
    (t[0].text.strip(), t[1].text.strip()) 
    for t in (t.find_all('td') for t in countries[0].find_all('tr'))
    if t
]

countries_list, currency_list = map(list,zip(*temp_list))

print(countries_list)
print(currency_list)


来源:https://stackoverflow.com/questions/47599875/not-iterating-the-list-in-web-scraping

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!