How to remove content in nested tags with BeautifulSoup?

前端 未结 3 1300
青春惊慌失措
青春惊慌失措 2021-01-20 15:34

How to remove content in nested tags with BeautifulSoup? These posts showed the reverse to retrieve the content in nested tags: How to get contents of nested ta

3条回答
  •  萌比男神i
    2021-01-20 15:58

    Here is my simple method, soup.body.clear() or soup.tag.clear()

    let's say you want to clear the content in

    and add a new pandas dataframe; later you can use this clear method to easily update your tables in an html file for your webpage instead of flask/django:

        import pandas as pd
        import bs4
    

    I want to convert a 1.2million row .csv into a DataFrame, then into a HTML table, and then add it to my webpage's html syntax. Later I want to easily update the data whenever the csv gets updated by simply switching a variable

        bizcsv = read_csv("business.csv")
        dframe = pd.DataFrame(bizcsv)
        dfhtml = dframe.to_html #convert DataFrame to table, HTML format
        dfhtml_update = dfhtml_html.strip(', 
    ') """use dfhtml_update later to update your table without the tags, the
    is easy for BS to select & clear!""" #A small function to unescape (< to <) the tags back into HTML format def unescape(s): s = s.replace("<", "<") s = s.replace(">", ">") # this has to be last: s = s.replace("&", "&") return s with open("page.html") as page: #return to here when updating txt = page.read() soup = bs4.BeautifulSoup(txt, features="lxml") soup.body.append(dfhtml) #adds table to with open("page.html", "w") as outf: outf.write(unescape(str(soup))) #writes to page.html """lets say you want to make seamless table updates to your webpage instead of using flask or django x_x; return to with open function""" soup.table.clear() #clears everything in
    soup.table.append(dfhtml_update) with open("page.html", "w") as outf: outf.write(unescape(str(soup)))

    I'm a newbie, but after tons of searching I just combined a bunch of fundamental teachings from the documentation...Kind of bloated, but so is working with literally billions of cells of data. This works for me

提交回复
热议问题