问题
I want to iterate and extract table from the link here, then save as excel file.
How can I do that? Thank you.
My code so far:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
url = 'http://zjj.sz.gov.cn/ztfw/gcjs/xmxx/jgysba/'
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
print(soup)
New update:
from requests import post
import json
import pandas as pd
import numpy as np
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
"Referer": "http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp"}
dfs = []
#dfs = pd.DataFrame()
for page in range(0, 10):
data = {"limit": 100, "offset": page * 100, "pageNumber": page + 1}
json_arr = requests.post("http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json", headers = headers, data = data).text
d = json.loads(json_arr)
df = pd.read_json(json.dumps(d['rows']) , orient='list')
dfs.append(df)
print(dfs)
dfs = pd.concat(dfs)
#https://stackoverflow.com/questions/57842073/pandas-how-to-drop-rows-when-all-float-columns-are-nan
dfs = dfs.loc[:, ~dfs.replace(0, np.nan).isna().all()]
dfs.to_excel('test.xlsx', index = False)
It generates 10 pages and 1000 rows, but some columns values are misplaced, someone knows where did I do wrong? Thank you.
回答1:
So, using the JSON API from XHR you make a simple python post request via requests and you have your data.
In the params you have two of them which you can change to get different volumes of data, limit is the nos of objects you get in a request. pageNumber is the paginated page counter.
from requests import post
import json
url = 'http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json'
data = { 'limit' : '100', 'pageNumber' : '1'}
response = post(url, data=d)
response.text
Further you can use pandas to create a data frame or create a excel as you want.
来源:https://stackoverflow.com/questions/59835094/iterate-and-extract-tables-from-web-saving-as-excel-file-in-python