Is there a JS diff library against htmlstring just like google-diff-match-patch on plain text?

孤街醉人 提交于 2019-12-03 02:27:41

问题


Currently I am using google-diff-match-patch to implement a real-time editing tool, which can synchronize texts between multiple users. Everything works great when operations are only plain texts, each user's operation(add/delete texts) could be diff-ed out by comparing to old text snapshot with the helper of google-diff. But when rich format texts(like bold/italic) are involved, google-diff not working well when comparing the htmlstring. The occurrence of character of < and > messed up the diff results, especially when bold/italic format are embedded within each other.

Could anyone suggest a similar library like google-diff to diff htmlstrings? Or any suggestions can get my problem fixed with google-diff? I understood google-diff is designed for plain text, but really didn't find a better library than it so far, so it also works if a doable enhancement to google-diff can help.


回答1:


The wiki at the google-diff-match-patch project shares some ideas. From http://code.google.com/p/google-diff-match-patch/wiki/Plaintext :

One method is to strip the tags from the HTML using a simple regex or node-walker. Then diff the HTML content against the text content. Don't perform any diff cleanups. This diff enables one to map character positions from one version to the other (see the diff_xIndex function). After this, one can apply all the patches one wants against the plain text, then safely map the changes back to the HTML. The catch with this technique is that although text may be freely edited, HTML tags are immutable.

Another method is to walk the HTML and replace every opening and closing tag with a Unicode character. Check the Unicode spec for a range that is not in use. During the process, create a hash table of Unicode characters to the original tags. The result is a block of text which can be patched without fear of inserting text inside a tag or breaking the syntax of a tag. One just has to be careful when reconverting the content back to HTML that no closing tags are lost.

I have a hunch that the 2nd idea, map-HTML-tags-to-Unicode-placeholders, might work better than one would otherwise guess... especially if your HTML tags are from some reduced set, and if you can perform a little open/close touchup when displaying interleaved (strikethrough/underlined) diff markup.

Another method that might work with simple styling would be remove the HTML tags, but remember the character-indexes affected. For example, "positions 8-15 are bolded". Then, perform a plaintext diff. Finally, using the diff_xIndex position-mapping idea from the wiki's first method, intelligently re-insert HTML tags to reapply stylings to the ranges surviving/added. (That is, if old positions 8-13 survived, but moved to 20-25, insert the B tags around there.)




回答2:


jsdifflib - A Javascript Visual Diff Tool & Library https://github.com/cemerick/jsdifflib

There's a demo here: http://cemerick.github.io/jsdifflib/demo.html




回答3:


Pretty Diff does everything you need, except you will need to update the DOM response so that the diff fires against the "onkeyup" event instead on button click.

http://prettydiff.com/




回答4:


Take a look at SynchroEdit, might be useful.



来源:https://stackoverflow.com/questions/2132914/is-there-a-js-diff-library-against-htmlstring-just-like-google-diff-match-patch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!