Beautiful Soup, Python: Trying to display scraped contents of a for loop on an html page in the correct manner

混江龙づ霸主 提交于 2020-07-23 08:21:21

问题


Using beautiful soup and python, I have undertaken some webscraping of the shown website to isolate: the rank, company name and revenue.

I would like to show, in an html table that I am rendering using flask and jinja2, the results of the top ten companies in the table, however, the code I have written is just displaying the first record five times.

Code in file: webscraper.py

url = 'https://en.m.wikipedia.org/wiki/List_of_largest_Internet_companies' 
req = requests.get(url) 
bsObj = BeautifulSoup(req.text, 'html.parser')
data = bsObj.find('table',{'class':'wikitable sortable mw-collapsible'})

table_data=[]
trs = bsObj.select('table tr')
for tr in trs[1:6]: #first element is empty
    row = []
    for t in tr.select('td')[:3]:    #td is referring to the columns
        row.extend([t.text.strip()])
    table_data.append(row)
data=table_data

rank=data[0][0]
name=data[0][1]
revenue=data[0][2]

Relevant code in home.html

<p>{{data}}</p>
<table class="table">
  <thead>
    <tr>
      <th scope="col">#</th>
      <th scope="col">Rank</th>
      <th scope="col">Name</th>
      <th scope="col">Revenue</th>
    </tr>
  </thead>
  <tbody>

{% for element in data %}
    <tr>
      <th scope="row"></th>
      <td>{{rank}}</td>
      <td>{{name}}</td>
      <td>{{revenue}}</td>
    </tr>
  {% endfor %}

  </tbody>

The HTML output is: Note: The variable {{data}} is showing all five records correctly..but I am not isolating the data correctly.

[['1', 'Amazon', '$280.5'], ['2', 'Google', '$161.8'], ['3', 'JD.com', '$82.8'], ['4', 'Facebook', '$70.69'], ['5', 'Alibaba', '$56.152']]

Rank Name Revenue

1 Amazon $280.5 1 Amazon $280.5 1 Amazon $280.5 1 Amazon $280.5 1 Amazon $280.5

As mentioned, I want 1 - 10, all the companies listed up to 10, not just Amazon.

Any suggestions as to what I've done wrong in my code - I'd like the most elegant solution that pertains to my own code, not a completely new idea or solution.

Explanation of the for loop and theory behind it please too.

I know this is wrong:

    rank=data[0][0]
    name=data[0][1]
    revenue=data[0][2]

but don't understand why and how to go about constructing it in the most elegant way such that I have the variables rank, name and revenue contain the respective data elements.


回答1:


rank=data[0][0]
name=data[0][1]
revenue=data[0][2]

You're setting the rank, name and revenue from a single element (first element of data)

I suggest that you try getting changing rank, name and revenue in your html to {{element[0]}} and so on, to access the respective data from each element you loop on




回答2:


Thank you to @mmfallacy above who suggested this answer that I am just fleshing out.

It works, but will accept the answer above as he suggested it. Here it is for reference:

{% for element in data %}
    <tr>
      <th scope="row"></th>
      <td>{{element[0]}}</td>
      <td>{{element[1]}}</td>
      <td>{{element[2]}}</td>
    </tr>
  {% endfor %}

I simply deleted any tries to generate variables rank, revenue in the .py file.



来源:https://stackoverflow.com/questions/62719063/beautiful-soup-python-trying-to-display-scraped-contents-of-a-for-loop-on-an-h

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!