PostgreSQL - Replace HTML Entities

前端 未结 3 1033
闹比i
闹比i 2020-12-11 20:36

I have just set about the task of stripping out HTML entities from our database, as we do a lot of crawling and some of the crawlers didn\'t do this at input time :(

3条回答
  •  Happy的楠姐
    2020-12-11 21:10

    This is what it took for me to get working on Ubuntu 18.04 with PG10, and Perl didn't decode some entities like , for some reason. So I used Python3.

    From the command line

    sudo apt install postgresql-plpython3-10
    

    From your SQL interface:

    CREATE LANGUAGE plpython3u;
    
    CREATE OR REPLACE  FUNCTION htmlchars(str TEXT) RETURNS TEXT AS $$
        from html.parser import HTMLParser
        h = HTMLParser() 
        if str is None:
            return str
        return h.unescape(str);
    $$ LANGUAGE plpython3u;
    

提交回复
热议问题