Fetch complete List of Items using BeautifulSoup, Python 3.6

≡放荡痞女 提交于 2019-12-23 01:51:24

问题


I am learning BeautifulSoup and I have choosen Link https://www.bundesbank.de/dynamic/action/en/statistics/time-series-databases/time-series-databases/743796/743796?treeAnchor=BANKEN&statisticType=BBK_ITS to scrape list of items for the topic "Banks and other financial corporations"

I need below Items with their child items in hierarchical format as shown in attached image

  • Banks
  • Investment companies
  • Insurance corporations and pension funds up to Q2 2016
  • Insurance corporations as of Q3 2016
  • Pension funds as of Q3 2016
  • Payments statistics

Below Code tried, after that stuck:

import pandas as pd
import requests
from bs4 import BeautifulSoup
import csv

url = 'https://www.bundesbank.de/dynamic/action/en/statistics/time-series-databases/time-series-databases/743796/743796?treeAnchor=BANKEN&statisticType=BBK_ITS'
result = requests.get(url)
soup = BeautifulSoup(result.text, 'html.parser')
s = soup.find("div", class= "statisticTree")

Also, wants to export results to CSV File.

Is it possible to export Parent - Child as shown in image?


回答1:


You can do it recursively with a help of a function returning a node link text and a list of children:

from pprint import pprint

import requests
from bs4 import BeautifulSoup


url = 'https://www.bundesbank.de/en/statistics/time-series-databases/time-series-databases/743796/openAll?treeAnchor=BANKEN&statisticType=BBK_ITS'
result = requests.get(url)
soup = BeautifulSoup(result.text, 'html.parser')


def get_child_nodes(parent_node):
    node_name = parent_node.a.get_text(strip=True)

    result = {"name": node_name, "children": []}

    children_list = parent_node.find('ul', recursive=False)
    if not children_list:
        return result

    for child_node in children_list('li', recursive=False):
        result["children"].append(get_child_nodes(child_node))

    return result


pprint(get_child_nodes(soup.find("div", class_="statisticTree")))

Note that it's important to make the list item searches in a non-recursive fashion (recursive=False is set) in order to prevent it from grabbing grand-children and going down the tree.

Prints:

{'children': [{'children': [{'children': [{'children': [{'children': [],
                                                         'name': 'Reserve '
                                                                 'maintenance '
                                                                 'in the euro '
                                                                 'area'},
                                                        {'children': [],
                                                         'name': 'Reserve '
                                                                 'maintenance '
                                                                 'in Germany'}],
                                           'name': 'Minimum reserves'},
...

              {'children': [{'children': [], 'name': 'Bank accounts'},
                            {'children': [], 'name': 'Payment card functions'},
                            {'children': [], 'name': 'Accepting devices'},
                            {'children': [],
                             'name': 'Number of payment transactions'},
                            {'children': [],
                             'name': 'Value of payment transactions'},
                            {'children': [],
                             'name': 'Number of transactions per type of '
                                     'terminal'},
                            {'children': [],
                             'name': 'Value of transactions per type of '
                                     'terminal'},
                            {'children': [],
                             'name': 'Number of OTC transactions'},
                            {'children': [],
                             'name': 'Value of OTC transactions'},
                            {'children': [], 'name': 'Issuance of banknotes'}],
               'name': 'Payments statistics'}],
 'name': 'Banks'}


来源:https://stackoverflow.com/questions/59266718/fetch-complete-list-of-items-using-beautifulsoup-python-3-6

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!