DOMDocument appendXML with special characters

前端 未结 5 756
庸人自扰
庸人自扰 2020-12-07 05:25

I am retreiving some html strings from my database and I would like to parse these strings into my DOMDocument. The problem is, that the DOMDocument gives warnings at specia

5条回答
  •  孤街浪徒
    2020-12-07 06:03

    While smarty might be a good bet (why invent the wheel for the 14th time?), etranger might have a point. There's situations in which you don't want to use something overkill like a complete new (and unstudied) package, but more like you want to post some data from a database that just happens to contain html stuff an XML parser has issues with.

    Warning, the following is a simple solution, but don't do it unless you're SURE you can get away with it! (I did this when I had about 2 hours before a deadline and didn't have time to study, leave lone implement something like smarty...)

    Before sticking the string into an appendXML function, run it through a preg_replace. For instance, replace all & nbsp; characters with [some_prefix]_nbsp. Then, on the page where you show the html, do it the other way around.

    And Presto! =)

    Example code: Code that puts text into a document fragment:

    // add text tag to p tag.
    // print("CCMSSelTextBody::getDOMObject: strText: ".$this->m_strText."
    \n"); $this->m_strText = preg_replace("/ /", "__nbsp__", $this->m_strText); $domTextFragment = $domDoc->createDocumentFragment(); $domTextFragment->appendXML(utf8_encode($this->m_strText)); $p->appendChild($domTextFragment); // $p->appendChild(new DOMText(utf8_encode($this->m_strText)));

    Code that parsed the string and writes the html:

    // Instantiate template.
    $pTemplate = new CTemplate($env, $pageID, $pUser, $strState);
    
    // Parse tag-sets.
    $pTemplate->parseTXTTags();
    $pTemplate->parseCMSTags();
    
    // present the html code.
    $html = $pTemplate->getPageHTML();
    $html = preg_replace("/__nbsp__/", " ", $html);
    print($html);
    

    It's probably a good idea to think up a stronger replace. (If you insist on being thorough: Do a md5 on a time() value, and hardcode the result of that as a prefix. So like in the first snippet:

    $this->m_strText = preg_replace("/ /", "4597ee308cd90d78aa4655e76bf46ee0_nbsp", $this->m_strText);
    

    And in the second:

    $html = preg_replace("/4597ee308cd90d78aa4655e76bf46ee0_nbsp/", " ", $html);
    

    Do the same for any other tags and stuff you need to circumvent.

    This is a hack, and not good code by any stretch of the imagination. But it saved my live and wanted to share it with other people that run into this particular problem with minutes to spare.

    Use the above at your own risk.

提交回复
热议问题