error with parse function in lxml

后端 未结 3 681
[愿得一人]
[愿得一人] 2020-12-15 09:54

i have installed lxml2.2.2 on windows platform(i m using python version 2.6.5).i tried this simple command:

from lxml.html import parse 
p= parse(‘http://ww         


        
相关标签:
3条回答
  • 2020-12-15 10:33

    According to the api docs it should work: http://lxml.de/api/lxml.html-module.html#parse

    This seems to be a bug in lxml 2.2.2. I just tested on windows with python 2.6 and 2.7 and it does work with 2.3.0.

    So: upgrade your lxml and you'll be fine.

    I don't know exactly in which versions of lxml the problem occurs, but I believe the problem was not so much with lxml itself, but with the version of libxml2 used to build the windows binary. (certain versions of libxml2 had a problem with http on windows)

    0 讨论(0)
  • 2020-12-15 10:36

    lxml.html.parse does not fetch URLs.

    Here's how to do it with urllib2:

    >>> from urllib2 import urlopen
    >>> from lxml.html import parse
    >>> page = urlopen('http://www.google.com')
    >>> p = parse(page)
    >>> p.getroot()
    <Element html at 1304050>
    

    Update
    Steven is right. lxml.etree.parse should accept and load URLs. I missed that. I've tried deleting this answer, but I'm not allowed.

    I retract my statement about it not fetching URLs.

    0 讨论(0)
  • 2020-12-15 10:36

    Since line breaks are not allowed in comments, here's my implementation of MattH's answer:

    from urllib2 import urlopen
    from lxml.html import parse
    
    site_url = ('http://www.google.com')
    
    try:
        page = parse(site_url).getroot()
    except IOError:
        page = parse(urlopen(site_url)).getroot()
    
    0 讨论(0)
提交回复
热议问题