Iterate and extract tables from web saving as excel file in Python

╄→尐↘猪︶ㄣ 提交于 2021-02-05 11:30:12

问题


I want to iterate and extract table from the link here, then save as excel file.

How can I do that? Thank you.

My code so far:

import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate

url = 'http://zjj.sz.gov.cn/ztfw/gcjs/xmxx/jgysba/'
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
print(soup)

New update:

from requests import post
import json
import pandas as pd
import numpy as np

headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
        "Referer": "http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp"}
dfs = []
#dfs = pd.DataFrame()

for page in range(0, 10):
    data = {"limit": 100, "offset": page * 100, "pageNumber": page + 1}
    json_arr = requests.post("http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json", headers = headers, data = data).text
    d = json.loads(json_arr)
    df = pd.read_json(json.dumps(d['rows']) , orient='list')
    dfs.append(df)
    print(dfs)

dfs = pd.concat(dfs)
#https://stackoverflow.com/questions/57842073/pandas-how-to-drop-rows-when-all-float-columns-are-nan
dfs = dfs.loc[:, ~dfs.replace(0, np.nan).isna().all()]
dfs.to_excel('test.xlsx', index = False)

It generates 10 pages and 1000 rows, but some columns values are misplaced, someone knows where did I do wrong? Thank you.


回答1:


So, using the JSON API from XHR you make a simple python post request via requests and you have your data.

In the params you have two of them which you can change to get different volumes of data, limit is the nos of objects you get in a request. pageNumber is the paginated page counter.

from requests import post
import json

url = 'http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json'
data = { 'limit' : '100', 'pageNumber' : '1'}
response = post(url, data=d)
response.text

Further you can use pandas to create a data frame or create a excel as you want.



来源:https://stackoverflow.com/questions/59835094/iterate-and-extract-tables-from-web-saving-as-excel-file-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!