问题
I am learning BeautifulSoup and I have choosen Link https://www.bundesbank.de/dynamic/action/en/statistics/time-series-databases/time-series-databases/743796/743796?treeAnchor=BANKEN&statisticType=BBK_ITS to scrape list of items for the topic "Banks and other financial corporations"
I need below Items with their child items in hierarchical format as shown in attached image
- Banks
- Investment companies
- Insurance corporations and pension funds up to Q2 2016
- Insurance corporations as of Q3 2016
- Pension funds as of Q3 2016
- Payments statistics
Below Code tried, after that stuck:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import csv
url = 'https://www.bundesbank.de/dynamic/action/en/statistics/time-series-databases/time-series-databases/743796/743796?treeAnchor=BANKEN&statisticType=BBK_ITS'
result = requests.get(url)
soup = BeautifulSoup(result.text, 'html.parser')
s = soup.find("div", class= "statisticTree")
Also, wants to export results to CSV File.
Is it possible to export Parent - Child as shown in image?
回答1:
You can do it recursively with a help of a function returning a node link text and a list of children:
from pprint import pprint
import requests
from bs4 import BeautifulSoup
url = 'https://www.bundesbank.de/en/statistics/time-series-databases/time-series-databases/743796/openAll?treeAnchor=BANKEN&statisticType=BBK_ITS'
result = requests.get(url)
soup = BeautifulSoup(result.text, 'html.parser')
def get_child_nodes(parent_node):
node_name = parent_node.a.get_text(strip=True)
result = {"name": node_name, "children": []}
children_list = parent_node.find('ul', recursive=False)
if not children_list:
return result
for child_node in children_list('li', recursive=False):
result["children"].append(get_child_nodes(child_node))
return result
pprint(get_child_nodes(soup.find("div", class_="statisticTree")))
Note that it's important to make the list item searches in a non-recursive fashion (recursive=False is set) in order to prevent it from grabbing grand-children and going down the tree.
Prints:
{'children': [{'children': [{'children': [{'children': [{'children': [],
'name': 'Reserve '
'maintenance '
'in the euro '
'area'},
{'children': [],
'name': 'Reserve '
'maintenance '
'in Germany'}],
'name': 'Minimum reserves'},
...
{'children': [{'children': [], 'name': 'Bank accounts'},
{'children': [], 'name': 'Payment card functions'},
{'children': [], 'name': 'Accepting devices'},
{'children': [],
'name': 'Number of payment transactions'},
{'children': [],
'name': 'Value of payment transactions'},
{'children': [],
'name': 'Number of transactions per type of '
'terminal'},
{'children': [],
'name': 'Value of transactions per type of '
'terminal'},
{'children': [],
'name': 'Number of OTC transactions'},
{'children': [],
'name': 'Value of OTC transactions'},
{'children': [], 'name': 'Issuance of banknotes'}],
'name': 'Payments statistics'}],
'name': 'Banks'}
来源:https://stackoverflow.com/questions/59266718/fetch-complete-list-of-items-using-beautifulsoup-python-3-6