Using BeautifulSoup to grab all the HTML between two tags

前端 未结 4 1478
情深已故
情深已故 2020-12-25 12:57

I have some HTML that looks like this:

Title

//a random amount of p/uls or tagless text

Next Title

4条回答
  •  没有蜡笔的小新
    2020-12-25 13:39

    Here is a complete, up-to-date solution:

    Contents of temp.html:

    Title

    hi

    //a random amount of p/uls or tagless text

    Next Title

    Code:

    import copy
    
    from bs4 import BeautifulSoup
    
    with open("resources/temp.html") as file_in:
        soup = BeautifulSoup(file_in, "lxml")
    
    print(f"Before:\n{soup.prettify()}")
    
    first_header = soup.find("body").find("h1")
    
    siblings_to_add = []
    
    for curr_sibling in first_header.next_siblings:
        if curr_sibling.name == "h1":
            for curr_sibling_to_add in siblings_to_add:
                curr_sibling.insert_after(curr_sibling_to_add)
            break
        else:
            siblings_to_add.append(copy.copy(curr_sibling))
    
    print(f"\nAfter:\n{soup.prettify()}")
    

    Output:

    Before:
    
     
      

    Title

    hi

    //a random amount of p/uls or tagless text

    Next Title

    After:

    Title

    hi

    //a random amount of p/uls or tagless text

    Next Title

    //a random amount of p/uls or tagless text

    hi

提交回复
热议问题