Beautiful Soup Scraping table

二次信任 提交于 2021-02-05 07:56:40

问题


I have this small piece of code to scrape table data from a web site and then display in a csv format. The issue is that for loop is printing the records multiple time . I am not sure if it is due to
tag. btw I am new to Python. Thanks for your help!

#import needed libraries
import urllib
from bs4 import BeautifulSoup
import requests
import pandas as pd
import csv
import sys
import re


# read the data from a URL
url = requests.get("https://www.top500.org/list/2018/06/")

# parse the URL using Beauriful Soup
soup = BeautifulSoup(url.content, 'html.parser')

newtxt= ""
for record in soup.find_all('tr'):
    tbltxt = ""
    for data in record.find_all('td'):
        tbltxt = tbltxt + "," + data.text
        newtxt= newtxt+ "\n" + tbltxt[1:]
        print(newtxt)

回答1:


from bs4 import BeautifulSoup
import requests

url = requests.get("https://www.top500.org/list/2018/06/")
soup = BeautifulSoup(url.content, 'html.parser')
table = soup.find_all('table', attrs={'class':'table table-condensed table-striped'})
for i in table:
    tr = i.find_all('tr')
    for x in tr:
        print(x.text)

Or the best way to parse table using pandas

import pandas as pd
table = pd.read_html('https://www.top500.org/list/2018/06/', attrs={
    'class': 'table table-condensed table-striped'}, header = 1)
print(table)



回答2:


It's printing much of the data multiple times because the newtext variable, which you are printing after getting the text of each <td></td>, is just accumulating all the values. Easiest way to get this to work is probably to just move the line print(newtxt) outside of both for loops - that is, leave it totally unindented. You should then see a list of all the text, with that from each row on a new line, and that from each individual cell in a row separated by commas.



来源:https://stackoverflow.com/questions/52703694/beautiful-soup-scraping-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!