beautifulsoup | 易学教程

Why is BeautifulSoup's findAll returning an empty list when I search by class?

阅读更多关于 Why is BeautifulSoup's findAll returning an empty list when I search by class?

问题 I am trying to web-scrape using an h2 tag, but BeautifulSoup returns an empty list. <h2 class="iCIMS_InfoMsg iCIMS_InfoField_Job"> html=urlopen("https://careersus-endologix.icims.com/jobs/2034/associate-supplier-quality-engineer/job") bs0bj=BeautifulSoup(html,"lxml") nameList=bs0bj.findAll("h2",{"class":"iCIMS_InfoMsg iCIMS_InfoField_Job"}) print(nameList) 回答1: The content is inside an iframe and updated via js (so not present in initial request). You can use the same link the page is using

Why is BeautifulSoup's findAll returning an empty list when I search by class?

阅读更多关于 Why is BeautifulSoup's findAll returning an empty list when I search by class?

How do I get all text from within this tag?

阅读更多关于 How do I get all text from within this tag?

问题 I'm trying to get all text from within this HTML tag, which I store in variable tag : <td rowspan="2" style="text-align: center;"><a href="/wiki/Glenn_Miller" title="Glenn Miller">Glenn Miller</a> & His Orchestra</td> The result should be "Glenn Miller & His Orchestra" . But print ing tag.find(text=True) returns this: "Glenn Miller" . How do I get the rest of the text within the td element? 回答1: tag.find(text=True) would return the first matching text node . Use .get_text() instead: >>> from

How do I get all text from within this tag?

阅读更多关于 How do I get all text from within this tag?

Split an element with BeautifulSoup

阅读更多关于 Split an element with BeautifulSoup

问题 I have some html code that I'm parsing with BeautifulSoup. One of the requirements is that tags are not nested in paragraphs or other text tags. For example if I have a code like this: first text <a href="..."> <img .../> </a> second text I need to transform it into something like this: first text <img .../> second text I have done something to extract the images and add them after the paragraph, like this: for match in soup.body.find_all(True, recursive=False): try:

How to use Beautiful Soup to find a tag with changing id?

阅读更多关于 How to use Beautiful Soup to find a tag with changing id?

问题 I am using Beautiful Soup in Python. Here is an example URL: http://www.locationary.com/place/en/US/Ohio/Middletown/McDonald%27s-p1013254580.jsp In the HTML, there are a bunch of tags and the only way I can specify which ones to find is with their id. The only thing I want to find is the telephone number. The tag looks like this: <td class="dispTxt" id="value_xxx_c_1_f_8_a_134242498">5134231582</td> I have gone to other URLs on the same website and found almost the same id for the telephone

How to set value in with Beautiful Soup in some HTML element if I know id of that element or class?

阅读更多关于 How to set value in with Beautiful Soup in some HTML element if I know id of that element or class?

问题 How to set value with Beautiful Soup in some element if I know id of that HTML element or class ? For example I have <td id="test"></td > and I want to set text RESTORE... like <td id="test">RESTORE...</td> . 回答1: Find the tag you want to modify using a find() search for id=test . Then: BeautifulSoup Documentation - "Modifying the tree" Modifying .string If you set a tag’s .string attribute, the tag’s contents are replaced with the string you give: markup = '<a href="http://example.com/">I

Scraping images using beautiful soup

阅读更多关于 Scraping images using beautiful soup

问题 I'm trying to scrape the image from an article using beautiful soup. It seems to work but I can't open the image. I get a file format error every time I try to access the image from my desktop. Any insights? timestamp = time.asctime() # Parse HTML of article, aka making soup soup = BeautifulSoup(urllib2.urlopen(url).read()) # Create a new file to write content to txt = open('%s.jpg' % timestamp, "wb") # Scrape article main img links = soup.find('figure').find_all('img', src=True) for link in

Web-scrapeing a table to a list

阅读更多关于 Web-scrapeing a table to a list

问题 I'm trying to extract a table from a webpage. I have managed to get all the data in the table into a list. However all the table data is being put into one list element. I need assistance getting the 'clean' data (i.e. the strings, without all the HTML packaging) from the rows of the table into their own list elements. So instead of... list = [<tr> <th><a href="/7.62x25mm_TT_AKBS" title="7.62x25mm TT AKBS"><img alt="TTAKBS.png" decoding="async" height="64" src="https://static.wikia.nocookie

How should I show results of BeautifulSoup parsing in Django?

阅读更多关于 How should I show results of BeautifulSoup parsing in Django?

问题 I'm trying to scrape a web page using BeautifulSoup and Django. Here's my views.py which do this task: def detail(request, article_id): article = get_object_or_404(Article, pk=article_id) html = urllib2.urlopen("...url...") soup = BeautifulSoup(html) title = soup.title return render(request, 'detail.html', {'article': article, 'title':title}) But when I use {{ title }} in django template files, it doesn't show anything. I've test it and it works in shell. I've added a line to this function: