how do you extract data from json using beautifulsoup in django

一笑奈何 提交于 2019-12-24 16:28:44

问题


Good day. I'm facing an issue while trying to extract values from json. First of all my beautifulsoup works very fine in the shell, but not in django. and also what I'm trying to achieve is extracting data from the received json, but with no success. Here's the class in my view doing it:

class FetchWeather(generic.TemplateView):
    template_name = 'forecastApp/pages/weather.html'

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        url = 'http://weather.news24.com/sa/cape-town'
        city = 'cape town'
        url_request = requests.get(url)
        soup = BeautifulSoup(url_request.content, 'html.parser')
        city_list = soup.find(id="ctl00_WeatherContentHolder_ddlCity")
        print(soup.head)
        city_as_on_website = city_list.find(text=re.compile(city, re.I)).parent
        cityId = city_as_on_website['value']
        json_url = "http://weather.news24.com/ajaxpro/TwentyFour.Weather.Web.Ajax,App_Code.ashx"

        headers = {
            'Content-Type': 'text/plain; charset=UTF-8',
            'Host': 'weather.news24.com',
            'Origin': 'http://weather.news24.com',
            'Referer': url,
            'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36',
            'X-AjaxPro-Method': 'GetCurrentOne'}

        payload = {
            "cityId": cityId
        }
        request_post = requests.post(json_url, headers=headers, data=json.dumps(payload))
        print(request_post.content)
        context['Observations'] = request_post.content
        return context

In the json, there's a array "Observations" from which I'm trying to get the city name, the temperature high and low.

but when I tried to do this:

cityDict = json.loads(str(html))

I'm receiving an error. Here's the traceback to it:

 Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 4067 (char 4066)

any help will be gladly appreciated.


回答1:


There are two problems with your JSON data inside request_post.content:

  • there are JS date object values there, for instance:

    "Date":new Date(Date.UTC(2016,1,26,22,0,0,0))
    
  • there are unwanted characters at the end: ;/*".

Let's clean the JSON data so that it can be loaded with json:

from datetime import datetime

data = request_post.text

def convert_date(match):
    return '"' + datetime(*map(int, match.groups())).strftime("%Y-%m-%dT%H:%M:%S") + '"'

data = re.sub(r"new Date\(Date\.UTC\((\d+),(\d+),(\d+),(\d+),(\d+),(\d+),(\d+)\)\)",
              convert_date,
              data)

data = data.strip(";/*")
data = json.loads(data)

context['Observations'] = data


来源:https://stackoverflow.com/questions/35648169/how-do-you-extract-data-from-json-using-beautifulsoup-in-django

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!