Are spaces around CSS combinators are really optional

 ̄綄美尐妖づ 提交于 2019-12-24 22:51:39

问题


I'm a bit confused by using CSS selectors with axis combinators in BeautifulSoup. Below is the simple code to illustrate what I mean:

from bs4 import BeautifulSoup as bs
import requests

response = requests.get('https://stackoverflow.com/questions/tagged/python')
soup = bs(response.text)

print(len(soup.select('#mainbar > div'))) 

returns 6 children... but

print(len(soup.select('#mainbar>div')))

returns 0 children...

The same with '#mainbar ~ div' (found 1 sibling) and #mainbar~div' (found nothing)

From documentation those spaces are optional, but in fact I got different output with BeautifulSoup for the same selectors (as I thought)

So is it bs4 bug or this behavior depends on version of CSS or something else?


回答1:


This is confirmed as a bug here: https://bugs.launchpad.net/beautifulsoup/+bug/1717851

The selector, from a CSS perspective is fine with/without.

I will see if I can find further evidence.

The individual reporting the bug states:

The issue, as far as I see, is that since the code is only doing a shlex.split, it doesn't treat div, >, and span as separate entities is a space is left out on either side of >.




回答2:


in case you want to patch it, see bs4/element.py line 1440 replace

tokens = shlex.split(selector)

with

selector = re.sub(r'\s*([+>~])\s*', r' \1 ', selector)
tokens = shlex.split(selector)

Demo:

<script type="text/javascript" src="//cdn.datacamp.com/dcl-react.js.gz"></script>

<div data-datacamp-exercise data-lang="python">
  <code data-type="sample-code">
    import re, shlex

    def testSelect(selector):
        selector = re.sub(r'\s*([+>~])\s*', r' \1 ', selector)
        tokens = shlex.split(selector)
        print(tokens)

    testSelect('#mainbar > div ~ p') # default
    testSelect('#mainbar>div~p')
    testSelect('#mainbar    >div+     p')
    testSelect('#mainbar.classA')
    testSelect('#mainbar p')
  </code>
</div>


来源:https://stackoverflow.com/questions/53401724/are-spaces-around-css-combinators-are-really-optional

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!