Parsing HTTP Response in Python

后端 未结 5 712
夕颜
夕颜 2020-12-13 09:08

I want to manipulate the information at THIS url. I can successfully open it and read its contents. But what I really want to do is throw out all the stuff I don\'t want,

相关标签:
5条回答
  • 2020-12-13 09:42

    TL&DR: When you typically get data from a server, it is sent in bytes. The rationale is that these bytes will need to be 'decoded' by the recipient, who should know how to use the data. You should decode the binary upon arrival to not get 'b' (bytes) but instead a string.

    Use case:

    import requests    
    def get_data_from_url(url):
            response = requests.get(url_to_visit)
            response_data_split_by_line = response.content.decode('utf-8').splitlines()
            return response_data_split_by_line
    

    In this example, I decode the content that I received into UTF-8. For my purposes, I then split it by line, so I can loop through each line with a for loop.

    0 讨论(0)
  • 2020-12-13 09:47

    json works with Unicode text in Python 3 (JSON format itself is defined only in terms of Unicode text) and therefore you need to decode bytes received in HTTP response. r.headers.get_content_charset('utf-8') gets your the character encoding:

    #!/usr/bin/env python3
    import io
    import json
    from urllib.request import urlopen
    
    with urlopen('https://httpbin.org/get') as r, \
         io.TextIOWrapper(r, encoding=r.headers.get_content_charset('utf-8')) as file:
        result = json.load(file)
    print(result['headers']['User-Agent'])
    

    It is not necessary to use io.TextIOWrapper here:

    #!/usr/bin/env python3
    import json
    from urllib.request import urlopen
    
    with urlopen('https://httpbin.org/get') as r:
        result = json.loads(r.read().decode(r.headers.get_content_charset('utf-8')))
    print(result['headers']['User-Agent'])
    
    0 讨论(0)
  • 2020-12-13 09:52

    I guess things have changed in python 3.4. This worked for me:

    print("resp:" + json.dumps(resp.json()))
    
    0 讨论(0)
  • 2020-12-13 09:53

    You can also use python's requests library instead.

    import requests
    
    url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'    
    response = requests.get(url)    
    dict = response.json()
    

    Now you can manipulate the "dict" like a python dictionary.

    0 讨论(0)
  • When I printed response.read() I noticed that b was preprended to the string (e.g. b'{"a":1,..). The "b" stands for bytes and serves as a declaration for the type of the object you're handling. Since, I knew that a string could be converted to a dict by using json.loads('string'), I just had to convert the byte type to a string type. I did this by decoding the response to utf-8 decode('utf-8'). Once it was in a string type my problem was solved and I was easily able to iterate over the dict.

    I don't know if this is the fastest or most 'pythonic' way of writing this but it works and theres always time later of optimization and improvement! Full code for my solution:

    from urllib.request import urlopen
    import json
    
    # Get the dataset
    url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
    response = urlopen(url)
    
    # Convert bytes to string type and string type to dict
    string = response.read().decode('utf-8')
    json_obj = json.loads(string)
    
    print(json_obj['source_name']) # prints the string with 'source_name' key
    
    0 讨论(0)
提交回复
热议问题