Why cycle repeats and doesn't change variable? [closed]

半城伤御伤魂 提交于 2019-12-20 07:51:40

问题


#import libraries
import requests
from bs4 import BeautifulSoup

links = set()
#"skeleton" of url
base_url = 'https://steamcommunity.com/market/search?appid=730&q=#p{}'
#site has 1300 pages, and i want to parse all of them
count = 1301
for i in range(count):
    url = base_url.format(i)
    #send get request to url
    request = requests.get(url)
    #print i
    print(f"Extracting Page#: {i}")
    #process the request using bs4
    soup = BeautifulSoup(request.content, 'html5lib')
    urlparse = soup.find_all('div', attrs={'id': 'searchResultsRows'})
    for parseuk in urlparse:
        #print hrefs, that i need
        hrefUK = parseuk.find_all('a', attrs={'class': 'market_listing_row_link'})
        for a in hrefUK:
               z = a["href"]
               print("var z = ", z)

If launched, it will only show links from first page. "i" is changing, but this code parsing only first page. Why? This will repeat 1300 times.


回答1:


I am not exactly sure what you are asking, and do not think below is necessarily an answer, but it might clean things up a bit. There is no need for your first for loop in s_p

def s_p():
    base_url = 'https://steamcommunity.com/market/search?appid=730&q=#p{}'
    count = 1301
    for i in range(counts):
        url = base_url.format(i)
        request = session.get(url)
        soup = BeautifulSoup(request.content, 'html5lib')
        urlparse = soup.find_all('div', attrs={'id':     'searchResultsRows'})
        for parseuk in urlparse:
            hrefUK = parseuk.find_all('a', attrs={'class':     'market_listing_row_link'})
            for a in hrefUK:
                z = a["href"]
                print("var z = ", z)



回答2:


I don't know what you are trying to do. Your code include much mistakes, As far as i understood from your code that you want to iterate over pages and collect the href links.

loop over using q=0#p{i}_popular_desc

import requests
from bs4 import BeautifulSoup

links = set()
for i in range(1, 10):
    print(f"Extracting Page#: {i}")
    r = requests.get(
        f"https://steamcommunity.com/market/search?appid=730&q=0#p{i}_popular_desc")
    soup = BeautifulSoup(r.text, 'html.parser')
    for item in soup.findAll('a', attrs={'class': 'market_listing_row_link'}):
        links.add(item.get('href'))

for item in links:
    print(item)

Or use API directly from here:

https://steamcommunity.com/market/search/render/?query=&start=0&count=10&search_descriptions=0&sort_column=popular&sort_dir=desc&appid=730

So you will not get blocked or need to use tor or keep changing user-agent



来源:https://stackoverflow.com/questions/59163644/why-cycle-repeats-and-doesnt-change-variable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!