How to combine this command with the existing one?

谁说我不能喝 提交于 2020-08-10 19:45:09

问题


From this answer, I know how to combine many similar commands for match in soup.find_all('div', {'class' : in the following code

import requests
from bs4 import BeautifulSoup

url = 'https://www.collinsdictionary.com/dictionary/french-english/aimanter'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers = headers).content, 'html.parser')

entry_name = soup.h2.text

for script in soup.select('script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2'):
    script.extract()

for match in soup.find_all('div', {'class' : 'copyright'}):  
    match.extract()
    
for match in soup.find_all('div', {'class' : 'example-info'}):  
    match.extract()

for match in soup.find_all('div', {'class' : 'share-overlay'}):  
    match.extract()
    
for match in soup.find_all('div', {'class' : 'popup-overlay'}):  
    match.extract()    
    

content1 = ''.join(map(str, soup.select_one('.cB.cB-def.dictionary.biling').contents))
content2 = ''.join(map(str, soup.select_one('.cB.cB-e.dcCorpEx').contents))

format = open('aimer.html', 'w+', encoding = 'utf8')
format.write(entry_name + '\n' + str(content1) + str(content2) + '\n</>\n' )
format.close()

by

for tag in soup.select('''
        script,
        .hcdcrt,
        #ad_contentslot_1,
        #ad_contentslot_2,
        div.copyright,
        div.example-info,
        div.share-overlay,
        div.popup-overlay'''):
    tag.extract()

Now I have one more command, which is

for match in soup.find_all('div', {'id' : 'videos'}):  
    match.extract()

I tried to combine this command naively by adding div.videos, i.e.

for tag in soup.select('''
        script,
        .hcdcrt,
        #ad_contentslot_1,
        #ad_contentslot_2,
        div.copyright,
        div.example-info,
        div.share-overlay,
        div.popup-overlay,
        div.videos'''):
    tag.extract()

but did not work. I think the reason is that this command has id rather than class.

Could you please elaborate on how to combine this command?


回答1:


Use # in your CSS selector (div#videos will select all <div> with id=videos):

for tag in soup.select('''
        script,
        .hcdcrt,
        #ad_contentslot_1,
        #ad_contentslot_2,
        div.copyright,
        div.example-info,
        div.share-overlay,
        div.popup-overlay,

        div#videos
        '''):
    tag.extract()

More on CSS selectors here.



来源:https://stackoverflow.com/questions/63128493/how-to-combine-this-command-with-the-existing-one

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!