Bs4 select_one vs find

偶尔善良 提交于 2019-12-23 10:15:33

问题


I was wondering what is the difference between performing bs.find('div') and bs.select_one('div'). Same goes for find_all and select.

Is there any difference performance wise, or if any is better to use over the other in specific cases.


回答1:


select() and select_one() give you a different way navigating through an HTML tree using the CSS selectors which has rich and convenient syntax. Though, the CSS selector syntax support in BeautifulSoup is limited but covers most common cases.

Performance-wise, it really depends on an HTML tree to parse and on which element, how deep is it and what selector is used to locate it. Plus, what find() + find_all() alternative there is to compare the select() with, is also important. In a simple case like bs.find('div') vs bs.select_one('div'), I'd say that, generally, find() should perform faster simply because there is a lot going on to support CSS selector syntax under-the-hood.




回答2:


select_one is normally much faster than find:

In [13]: req = requests.get("https://httpbin.org/")

In [14]: soup = BeautifulSoup(req.content, "html.parser")

In [15]:  soup.select_one("#DESCRIPTION")
Out[15]: <h2 id="DESCRIPTION">DESCRIPTION</h2>

In [16]:  soup.find("h2", id="DESCRIPTION")
Out[16]: <h2 id="DESCRIPTION">DESCRIPTION</h2>

In [17]: timeit  soup.find("h2", id="DESCRIPTION")
100 loops, best of 3: 5.27 ms per loop

In [18]: timeit  soup.select_one("#DESCRIPTION")
1000 loops, best of 3: 649 µs per loop

In [19]: timeit  soup.select_one("div")
10000 loops, best of 3: 61 µs per loop
In [20]: timeit  soup.find("div")
1000 loops, best of 3: 446 µs per loop

find basically is just the same as using find_all setting the limit to 1, then checking if the list returned is empty or not, indexing, if it is not empty or returning None if it is.

def find(self, name=None, attrs={}, recursive=True, text=None,
         **kwargs):
    """Return only the first child of this Tag matching the given
    criteria."""
    r = None
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
    if l:
        r = l[0]
    return r

select_one does something similar using select:

def select_one(self, selector):
        """Perform a CSS selection operation on the current element."""
        value = self.select(selector, limit=1)
        if value:
            return value[0]
        return None

The cost is much lower with the select without all the keyword args to process.

Beautifulsoup : Is there a difference between .find() and .select() - python 3.xx covers a bit more on the differences.



来源:https://stackoverflow.com/questions/39033612/bs4-select-one-vs-find

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!