Python BeautifulSoup scraping; how to combine two different fields, or pair them based on location in site?

怎甘沉沦 提交于 2021-01-28 09:07:52

问题


Ok guys, so I'm very much a beginner here. The purpose of what I'm trying to do is to scrape a website for company names and corresponding phone numbers. The end goal would be to write these to a CSV that can be opened with Excel.

Currently I'm able to retrieve the company names, and the phone numbers, separately. I am thinking that i could merge the two lists somehow, but I'm concerned about a single outlier data offsetting the whole merge, and mismatching the numbers to names.

What is the best way to accomplish this?

from urllib import request
from bs4 import BeautifulSoup

url = 'https://www.iqsdirectory.com/bolts/bolts-2/'
html = request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

data1 = soup.findAll('span', {'itemprop':'name'})
data2 = soup.findAll('a', {'itemprop':'telephone'})

datalist1 = []
datalist2 = []

for i in data1:
    datalist1.append(i.string)

for i in data2:
    datalist2.append(i.string)

x = zip(datalist1, datalist2)

print(list(x))

Is it possible to pull name and phone in the same soup function in order to preserve their connection?

Any help would be appreciated!


回答1:


import requests
from bs4 import BeautifulSoup
import csv


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = soup.select("h3.cname")
    with open("data.csv", 'w', newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["Name", "Phone"])
        for tar in target:
            name = tar.find("span", itemprop="name").text
            phone = tar.find("a", itemprop="telephone").text
            writer.writerow([name, phone])


main("https://www.iqsdirectory.com/bolts/bolts-2/")

Output: view-online




回答2:


Here is a solution that fits your needs. If a name or number does not exist, it will not be represented in that list. There is probably a correct exception to catch but I don't know the correct name off the top of my head.

The idea is as I explained in my comment. I get a list of the headers. For each header, I try to find the name and number. If I can't find it, I catch the exception. If I can find it, I append it to a company. And then for each company, I append it to companies. Our result is a list of companies, where each company is a list containing a name and a number.

from urllib import request
from bs4 import BeautifulSoup

url = 'https://www.iqsdirectory.com/bolts/bolts-2/'
html = request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

headers = soup.findAll('h3', {"class": 'cname'})
companies = []
for header in headers:
    company = []
    try:
        company.append(header.find('span', {'itemprop':'name'}).text)
    except Error as e:
        print(e)
        pass
    try:
        company.append(header.find('a', {'itemprop':'telephone'}).text)
    except Error as e:
        print(e)
        pass
    companies.append(company)
print(companies)

Your result is:

[['A & J Fastener Corp.', '877-563-2658'], ['AA Anchor Bolt, Inc.', '800-929-3845'], ['Abbott-Interfast Corporation', '800-877-0789'], ['Accurate Manufactured Products Group, Inc.', '317-472-9000'], ['ACF Components & Fasteners, Inc.', '800-824-5449'], ['Aerospace Manufacturing Corporation', '973-472-2300'], ['Aetna Screw Products Co.', '847-647-9555'], ['AFT Fasteners', '877-844-8595'], ['AJ Fasteners Inc.', '714-630-1556'], ['All-Ways Fasteners, Inc.', '800-870-0372'], ['Amco Enterprises', '866-651-2626'], ['American Bolt Corp.', '262-786-6530'], ['Anchor Bolt & Screw Company', '847-841-7000'], ['Anchor Bolt Source', '888-812-6587'], ['Ancrabec', '888-649-7203'], ['Armour Screw Company', '800-726-4563'], ['Aspen Fasteners', '800-479-0056'], ['Assembly Products, Inc.', '608-296-1666'], ['Associated Fastening Products, Inc.', '888-696-0709'], ['Atwood Industries', '800-362-2059'], ['B&G Manufacturing', '800-366-3067'], ['Baco Enterprises, Inc.', '800-622-2226'], ['Barnhill Bolt Co., Inc.', '800-472-3900'], ['Birmingham Fastener Manufacturing', '800-695-3511'], ['Blue Ribbon Fastener Co.', '847-673-1248'], ['BMB Fasteners, Inc.', '973-256-4010'], ['Bolt Products, Inc.', '800-423-6503'], ['Bossard North America, Inc.', '800-772-2738'], ['Bowie Bolt & Supply, Inc.', '800-337-9650'], ['British Metrics', '800-762-5134'], ['Brunner Manufacturing Co., Inc.', '608-847-6667'], ['Buckeye Fasteners, Inc.', '800-437-1689'], ['C&L Rivet Company, Inc.', '215-672-1113'], ['Cal-Fasteners, Inc.', '714-854-1715'], ['California Bolt Co.', '714-957-6000'], ['Champion Bolt & Supply', '425-339-2632'], ['Chicago Hardware & Fixture Company', '847-455-6609'], ['Chicago Nut & Bolt', '888-529-8600'], ['Circle Bolt & Nut Co., Inc.', '800-548-2658'], ['Coburn-Myers Fastening Systems Incorporated', '800-662-7459'], ['Connor Fastener', '478-742-7261'], ['Cordova Bolt, Inc.', '800-421-3435'], ['DAN-LOC Bolt & Gasket', '800-231-6355'], ['Dayton Nut & Bolt Co., Inc.', '888-711-2658'], ['Deco Manufacturing Company', '800-637-5861'], ['Delta Fastener Corp.', '800-670-5938'], ['Diamond Fasteners', '877-729-6283'], ['Dyson Corporation', '800-680-3600'], ['E & T Fasteners', '800-650-4707'], ['East Coast Metals, Inc.', '800-355-2060'], ['Eastwood Manufacturing', '281-447-0081'], ['EBC Industries', '814-456-4287'], ['Elgin Equipment Group', '630-434-7200'], ['Elgin Fastener Group', '812-689-8990'], ['Engineered Components Company', '847-841-7000'], ['EPS Engineered Parts Sourcing Inc.', '877-889-1017'], ['Falcon Fastening Solutions', '502-266-6292'], ['FASCO, Inc.', '708-371-0747'], ['Fast-Rite International, Inc.', '888-327-8077'], ['Fastenal Company', '507-454-5374'], ['Fastener Dimensions, Inc.', '800-969-2188'], ['Fastener Solutions, Inc.', '866-463-2910'], ['Fastener SuperStore, Inc.', '866-688-2500'], ['Fastener Tool & Supply, Inc.', '800-662-9232'], ['Fasteners Plus International', '708-479-5558'], ['Fasteners Unlimited, Inc.', '724-776-7273'], ['Fastening Products of Lancaster, Inc.', '717-299-5771'], ['FM Stainless Fasteners', '800-749-1115'], ['Genesis Bolt & Supply', '866-276-1399'], ['Global Certified Fastener', '708-450-9301'], ['Global Fastener & Supply, Inc.', '800-785-2664'], ['Guidon Corporation', '856-866-8808'], ['Haydon Bolts, Inc.', '215-537-8700'], ['Hayes Bolt & Supply', '619-231-5966'], ['HC Pacific', '909-598-0509'], ['Hercules Fasteners', '800-332-7320'], ['Hudson Fasteners, Inc.', '877-427-2739'], ['Hydra-Dynamics, Inc.', '936-273-2882'], ['Infinity Fasteners', '913-438-2252'], ['IntegraTECH Distribution', '603-880-3760'], ['J.P. Ruklic Screw Company', '708-339-3600'], ['K-T Bolt Manufacturing, Inc.', '800-553-4521'], ['KelKo Products Company', '800-346-7883'], ['Kinter', '800-323-2389'], ['Lamons Fastener Division', '713-673-5376'], ['Lamons Gasket Company', '800-231-6906'], ['Larson Hardware Manufacturing Company', '815-625-0503'], ['Lincoln Structural Solutions', '402-952-4400'], ['Master Bolt Manufacturing, Inc.', '888-905-2658'], ['Melfast, Inc', '973-227-0045'], ['Micro Plastics, Inc', '(870)453-2261'], ['Mid-States Bolt & Screw Co.', '800-482-0867'], ['Mutual Screw & Supply', '800-222-0324'], ['National Bolt & Nut Corporation', '630-307-8800'], ['Nickel Systems, Inc.', '215-855-5633'], ['Nord-Lock / Superbolt®, Inc.', '412-279-1149'], ['Norwood Screw Machine Parts', '800-437-6644'], ['Nova Fasteners Co. Inc.', '877-541-7222'], ['O.E.M. Fastening Systems', '800-928-7439'], ['O.E.M. Hardware', '800-663-6554'], ['Ocean State Stainless, Inc.', '800-394-6396'], ['Palmer Bolt & Supply Co.', '(937)778-9606'], ['Parker Fasteners', '623-925-5998'], ['PennEngineering®', '800-342-5736'], ['Pohl Spring Works, Inc.', '800-777-1284'], ['Product Components Corporation', '800-336-0406'], ['Production Materials Inc.', '224-434-2290'], ['R&R Engineering Company Inc.', '800-979-1921'], ['Reco Industries', '636-639-6010'], ['Remco Bolt', '800-460-3327'], ['ROBNET', '410-247-7273'], ['SASCO Fasteners', '800-779-2024'], ['SC Fastening Systems, LLC.', '330-468-3300'], ['Screw Products International', '800-876-5153'], ['Secure Fastener & Tool Company', '201-939-4422'], ['Specialty Bolt & Screw, Inc.', '413-789-6700'], ['Specialty Screw Corporation', '815-969-4100'], ['St. Louis Screw & Bolt', '800-237-7059'], ['Stalcop', '765-436-7926'], ['Stanley Industries Inc.', '800-253-2658'], ['Stelfast® Inc.', '800-729-9779'], ['Suncor Stainless, Inc.', '800-394-2222'], ['Sunny Screw Industry Co. Ltd.', '770-351-2858'], ['Tanner Bolt & Nut Corp.', '800-456-2658'], ['Tengco', '714-676-8200'], ['The Federal Group', '800-759-2658'], ['Tripac', '951-280-4488'], ['TSA Manufacturing', '800-228-2948'], ['United Titanium, Inc.', '844-321-4684'], ['USP Aerospace Solutions, Inc.', '631-287-6321'], ['Valtra, Inc.', '800-989-5244'], ['Wayne Bolt & Nut Company', '800-521-2207'], ['WINK Fasteners, Inc.', '804-966-8111'], ['Wodin, Inc.', '440-439-4222'], ['Wurth Industry', '800-428-4686'], ['Yangtze Railroad Materials', '855-889-2648']]


来源:https://stackoverflow.com/questions/61258326/python-beautifulsoup-scraping-how-to-combine-two-different-fields-or-pair-them

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!