Can dom parsers read inside HTML comments, or is Regex the only way

僤鯓⒐⒋嵵緔 提交于 2020-01-02 18:04:15

问题


I am creating very simple CMS for my organisation.

My strategy is to embed editable content between tags called < editable >. However to hide these from the browser I am commenting them out. So an example of an editable region will look like this.

<!-- <editable name="news_item> Today's news is ... </editable> -->

With the content "Today's news is ... " being picked up by the CMS and made editable in the online HTML editor.

I would like to be able to "grab" the name attribute's value as well as the content contained within the tags.

Is there a simple way to do this with XPath, XQuey type things, or is regex the best way to go ( ]esp. given that the regex will not need too much fault tolerance, since I know exactly what the xml will be, because I will be writing the code that generates it).


回答1:


By DOM Parser, do you mean javascript? If so, this blog post suggests that you can indeed slice and dice HTML comments. And, because mentioning javascript without mentioning jQuery is a sin, here's a jQuery plugin that will find all the HTML comments for you.




回答2:


Most parsers are able to get comments without a problem. They will not probably parse them into a DOM structure, but you could do that with them manually once you get the actual comments.

This is an example using BeautifulSoup with Python:

>>> from BeautifulSoup import BeautifulSoup, Comment
>>> html_document = """
... <html>
... <head>
... </head>
... <body>
... <h1>My Html Document</h1>
... <!-- This is a normal comment. -->
... <p>This is some more text.</p>
... <!-- <editable name="news_item">Today's news is Paolo Rocks!</editable> -->
... <p>Yet More Content</p>
... </body>
... </html>
... """
>>> soup = BeautifulSoup(html_document)
>>> comments = soup.findAll(text=lambda text:isinstance(text,Comment))
>>> comments
[u' This is a normal comment. ', u' <editable name="news_item">Today\'s news is
Paolo Rocks!</editable> ']
>>> for comment in comments:
...     editable = BeautifulSoup(comment).find('editable')
...     if editable is not None:
...             print editable['name'], editable.contents
...
news_item [u"Today's news is Paolo Rocks!"]



回答3:


The whole point of a comment is that the DOM will not parse the content. So the whole comment is just text.

I'd be inclind to use RegEx in this case.

However if you certain the content is HTML you would create a DOM element (say a DIV) and assign the comment text to the innerHTML. The you could examine the DOM created from the element. Once you aquired what you need you could drop the DIV element which you would never have added to the current document.




回答4:


I'm pretty sure that you'd need to manually parse it via regex or another method. Comments aren't seen as DOM elements as far as I'm aware.




回答5:


You can use a DIV with a costum attribute like Dojo does a lot:

<div ParseByCMS="true">foobar foo bar foobaz</div>

After that you just use javascript or xslt to parse it and remove it.




回答6:


If you're using PHP.

    $xpath = new DOMXpath(new DOMDocument());

    // Search for comments
    $comments = $xpath->query('//comment()');


来源:https://stackoverflow.com/questions/994773/can-dom-parsers-read-inside-html-comments-or-is-regex-the-only-way

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!