Why cycle repeats and doesn't change variable? [closed]

半城伤御伤魂 提交于 2019-12-20 07:51:40


#import libraries
import requests
from bs4 import BeautifulSoup

links = set()
#"skeleton" of url
base_url = 'https://steamcommunity.com/market/search?appid=730&q=#p{}'
#site has 1300 pages, and i want to parse all of them
count = 1301
for i in range(count):
    url = base_url.format(i)
    #send get request to url
    request = requests.get(url)
    #print i
    print(f"Extracting Page#: {i}")
    #process the request using bs4
    soup = BeautifulSoup(request.content, 'html5lib')
    urlparse = soup.find_all('div', attrs={'id': 'searchResultsRows'})
    for parseuk in urlparse:
        #print hrefs, that i need
        hrefUK = parseuk.find_all('a', attrs={'class': 'market_listing_row_link'})
        for a in hrefUK:
               z = a["href"]
               print("var z = ", z)

If launched, it will only show links from first page. "i" is changing, but this code parsing only first page. Why? This will repeat 1300 times.


I am not exactly sure what you are asking, and do not think below is necessarily an answer, but it might clean things up a bit. There is no need for your first for loop in s_p

def s_p():
    base_url = 'https://steamcommunity.com/market/search?appid=730&q=#p{}'
    count = 1301
    for i in range(counts):
        url = base_url.format(i)
        request = session.get(url)
        soup = BeautifulSoup(request.content, 'html5lib')
        urlparse = soup.find_all('div', attrs={'id':     'searchResultsRows'})
        for parseuk in urlparse:
            hrefUK = parseuk.find_all('a', attrs={'class':     'market_listing_row_link'})
            for a in hrefUK:
                z = a["href"]
                print("var z = ", z)


I don't know what you are trying to do. Your code include much mistakes, As far as i understood from your code that you want to iterate over pages and collect the href links.

loop over using q=0#p{i}_popular_desc

import requests
from bs4 import BeautifulSoup

links = set()
for i in range(1, 10):
    print(f"Extracting Page#: {i}")
    r = requests.get(
    soup = BeautifulSoup(r.text, 'html.parser')
    for item in soup.findAll('a', attrs={'class': 'market_listing_row_link'}):

for item in links:

Or use API directly from here:


So you will not get blocked or need to use tor or keep changing user-agent

