how to exclude all title with find?

穿精又带淫゛_ 提交于 2019-12-25 19:02:34

问题


i have function that get me all the titles from my website i dont want to get the title from some products is this the right way ? i dont want titles from products with the words "OLP NL" or "Arcserve" or "LicSAPk" or "symantec"

def get_title ( u ):
html = requests.get ( u )
bsObj = BeautifulSoup ( html.content, 'xml' )
title = str ( bsObj.title ).replace ( '<title>', '' ).replace ( '</title>', 
'' )
if (title.find ( 'Arcserve' ) or title.find ( 'OLP NL' ) or title.find ( 
'LicSAPk' ) or title.find (
        'Symantec' ) is not -1):
    return 'null'
else:
    return title

            if (title != 'null'):
            ws1 [ 'B1' ] = title
            meta_desc = get_metaDesc ( u )
            ws1 [ 'C1' ] = meta_desc
            meta_keyWrds = get_metaKeyWrds ( u )
            ws1 [ 'D1' ] = meta_keyWrds
            print ( "writing product no." + str ( i ) )
        else:
            print("skipped product no. " + str ( i ))
            continue;

the problem is that the program exclude all my products and all i'm seeing is "skipped product no." ? whay ? not all of them have these words ...


回答1:


You can change the if statement for (title.find ( 'Arcserve' )!=-1 or title.find ( 'OLP NL' )!=-1 or title.find ('LicSAPk' )!=-1 or title.find ('Symantec' )!=-1) or you can create a function to evaluate the terms that you want to find

def TermFind(Title):
    terms=['Arcserve','OLP NL','LicSAPk','Symantec']
    disc=False
    for val in terms:
        if Title.find(val)!=-1:
            disc=True
            break
    return disc

When I used the if statement always returned True regardless of the title value. I couldn't find an explanation for such behavior, but you can try checking this [Python != operation vs "is not" and [nested "and/or" if statements. Hope it helps.




回答2:


A similar idea using any

import requests 
from bs4 import BeautifulSoup

url = 'https://www.cdsoft.co.il/index.php?id_product=300610&controller=product'
html = requests.get(url)
bsObj = BeautifulSoup(html.content, 'lxml')
title = str ( bsObj.title ).replace ( '<title>', '' ).replace ( '</title>', '' )
items = ['Arcserve','OLP NL','LicSAPk','Symantec']

if not any(item in title for item in items):
    print(title)


来源:https://stackoverflow.com/questions/55067248/how-to-exclude-all-title-with-find

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!