Python web scraping - Loop through all categories and subcategories

丶灬走出姿态 提交于 2019-12-22 12:08:14

问题


I am trying to retrieve all categories and subcategories within a retail website. I am able to use BeautifulSoup to pull every single product in the category once I am in it. However, I am struggle with the loop for categories. I'm using this as a test website https://www.uniqlo.com/us/en/women

How do I loop through each category as well as the subcategories on the left side of the website? The problem is that you would have to click on the category before the website displays all the subcategories. I would like to extract all products within the category/subcategory into a csv file. This is what I have so far:

import bs4
import json
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

myurl = 'https://www.uniqlo.com/us/en/women/'
uClient = uReq(myurl)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,"html.parser")
filename = "products.csv"
file = open(filename,"w",newline='')
product_list = []

containers = page_soup.findAll("li",{"class" : lambda L: L and 
L.startswith('grid-tile')})   #Find all li with class: grid-tile

for container in containers: 

product_container = container.findAll("div",{"class":"product-swatches"})   
product_names = product_container[0].findAll("li")

    for i in range(len(product_names)):

    try:
        product_name = product_names[i].a.img.get("alt")
        product_mod_name = product_name.split(',')[0].lstrip()
        print(product_mod_name)
    except:
        product_name = ''

    i +=1    

product = [product_mod_name]
print(product)    
product_list.append(product)

import csv

with open('products.csv','a',newline='') as file:        
    writer=csv.writer(file)
    for row in product_list:
        writer.writerow(row)

回答1:


You can try this script. It will go through different categories and subcategories of products and parse the title and price of them. There are several products with same names and the only difference between them are colors. So, don't count them as duplicate. I've written the script in a very compact manner so stretch it as per your comfortability:

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.uniqlo.com/us/en/women')
soup = BeautifulSoup(res.text, "lxml")

for items in soup.select("#category-level-1 .refinement-link"):
    page = requests.get(items['href'])
    broth = BeautifulSoup(page.text,"lxml")

    for links in broth.select("#category-level-2 .refinement-link"):
        req = requests.get(links['href'])
        sauce = BeautifulSoup(req.text,"lxml")

        for data in sauce.select(".product-tile-info"):
            title = data.select(".name-link")[0].text
            price = ' '.join([item.text for item in data.select(".product-pricing span")])
            print(title.strip(),price.strip())

Results are like:

WOMEN CASHMERE CREW NECK SWEATER $79.90
Women Extra Fine Merino Crew Neck Sweater $29.90 $19.90
WOMEN KAWS X PEANUTS LONG-SLEEVE HOODED SWEATSHIRT $19.90


来源:https://stackoverflow.com/questions/47567368/python-web-scraping-loop-through-all-categories-and-subcategories

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!