Using BeautifulSoup to grab all the HTML between two tags

前端未结

关注

 4  1478

情深已故 2020-12-25 12:57

I have some HTML that looks like this:

Title

//a random amount of p/uls or tagless text

 Next Title

4条回答

没有蜡笔的小新 (楼主)

2020-12-25 13:39

Here is a complete, up-to-date solution:

Contents of temp.html:

Title
hi
//a random amount of p/uls or tagless text
 Next Title

Code:

import copy

from bs4 import BeautifulSoup

with open("resources/temp.html") as file_in:
    soup = BeautifulSoup(file_in, "lxml")

print(f"Before:\n{soup.prettify()}")

first_header = soup.find("body").find("h1")

siblings_to_add = []

for curr_sibling in first_header.next_siblings:
    if curr_sibling.name == "h1":
        for curr_sibling_to_add in siblings_to_add:
            curr_sibling.insert_after(curr_sibling_to_add)
        break
    else:
        siblings_to_add.append(copy.copy(curr_sibling))

print(f"\nAfter:\n{soup.prettify()}")

Output:

Before:

 
  
   Title
  
  
   hi
  
  //a random amount of p/uls or tagless text
  
   Next Title
  
 


After:

 
  
   Title
  
  
   hi
  
  //a random amount of p/uls or tagless text
  
   Next Title
  
  //a random amount of p/uls or tagless text
  
   hi

0 讨论(0)

查看其它4个回答