Is there a way to remove/escape html tags using lxml.html and not beautifulsoup which has some xss issues? I tried using cleaner, but i want to remove all html.
I believe that, this code can help you:
from lxml.html.clean import Cleaner html_text = "HelloText" cleaner = Cleaner(allow_tags=[''], remove_unknown_tags=False) cleaned_text = cleaner.clean_html(html_text)