How to use Beautiful Soup to find a tag with changing id?

问题

I am using Beautiful Soup in Python.

Here is an example URL:

http://www.locationary.com/place/en/US/Ohio/Middletown/McDonald%27s-p1013254580.jsp

In the HTML, there are a bunch of tags and the only way I can specify which ones to find is with their id. The only thing I want to find is the telephone number. The tag looks like this:

<td class="dispTxt" id="value_xxx_c_1_f_8_a_134242498">5134231582</td>

I have gone to other URLs on the same website and found almost the same id for the telephone number tag every time. The part that always stays the same is:

'value_xxx_c_1_f_8_a_'

However, the numbers that come after that always change. Is there a way that I can tell Beautiful Soup to look for part of the id and match it and let the other part be numbers like a regular expression could?

Also, once I get the tag, I was wondering...how can I extract the phone number without using regular expressions? I don't know if Beautiful Soup can do that but it would probably be simpler than regex.

回答1:

You can use regular expressions (this example matches on the tag names, you need to adjust it so it matches on an element's id):

import re
for tag in soup.find_all(re.compile("^value_xxx_c_1_f_8_a_")):
    print(tag.name)

回答2:

Know your documentation

http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html

soup.findAll(id=re.compile("para$"))
# [<p id="firstpara" align="center">This is paragraph <b>one</b>.</p>,
#  <p id="secondpara" align="blah">This is paragraph <b>two</b>.</p>]

回答3:

You can use CSS Selectors here, to match on an attribute value prefix:

soup.select('div[id^="value_xxx_c_1_f_8_a_"]')

This will only match <div> tags with an id attribute that starts with the string value_xxx_c_1_f_8_a_.

If you are willing to switch to lxml instead, you can use an XPath 1.0 expression to find these:

from lxml import etree
doc = etree.parse(openfile)
for elem in doc.xpath('//div[starts-with(@id, "value_xxx_c_1_f_8_a_")]'):
    print elem.text

Using an lxml XPath expression will be an order of a magnitude faster than using a BeautifulSoup regular-expression match.

回答4:

To get the phone number you can use the .text attribute.

tag = soup.find("foo") 
phone_number = tag.text

来源：https://stackoverflow.com/questions/11924135/how-to-use-beautiful-soup-to-find-a-tag-with-changing-id

标签

python

regex

beautifulsoup