How to use Beautiful Soup to find a tag with changing id?

别来无恙 提交于 2021-02-07 18:34:38

问题


I am using Beautiful Soup in Python.

Here is an example URL:

http://www.locationary.com/place/en/US/Ohio/Middletown/McDonald%27s-p1013254580.jsp

In the HTML, there are a bunch of tags and the only way I can specify which ones to find is with their id. The only thing I want to find is the telephone number. The tag looks like this:

<td class="dispTxt" id="value_xxx_c_1_f_8_a_134242498">5134231582</td> 

I have gone to other URLs on the same website and found almost the same id for the telephone number tag every time. The part that always stays the same is:

'value_xxx_c_1_f_8_a_'

However, the numbers that come after that always change. Is there a way that I can tell Beautiful Soup to look for part of the id and match it and let the other part be numbers like a regular expression could?

Also, once I get the tag, I was wondering...how can I extract the phone number without using regular expressions? I don't know if Beautiful Soup can do that but it would probably be simpler than regex.


回答1:


You can use regular expressions (this example matches on the tag names, you need to adjust it so it matches on an element's id):

import re
for tag in soup.find_all(re.compile("^value_xxx_c_1_f_8_a_")):
    print(tag.name)



回答2:


Know your documentation

http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html

soup.findAll(id=re.compile("para$"))
# [<p id="firstpara" align="center">This is paragraph <b>one</b>.</p>,
#  <p id="secondpara" align="blah">This is paragraph <b>two</b>.</p>]



回答3:


You can use CSS Selectors here, to match on an attribute value prefix:

soup.select('div[id^="value_xxx_c_1_f_8_a_"]')

This will only match <div> tags with an id attribute that starts with the string value_xxx_c_1_f_8_a_.

If you are willing to switch to lxml instead, you can use an XPath 1.0 expression to find these:

from lxml import etree
doc = etree.parse(openfile)
for elem in doc.xpath('//div[starts-with(@id, "value_xxx_c_1_f_8_a_")]'):
    print elem.text

Using an lxml XPath expression will be an order of a magnitude faster than using a BeautifulSoup regular-expression match.




回答4:


To get the phone number you can use the .text attribute.

tag = soup.find("foo") 
phone_number = tag.text


来源:https://stackoverflow.com/questions/11924135/how-to-use-beautiful-soup-to-find-a-tag-with-changing-id

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!