I have a challenging problem to solve. I\'m working on a script which takes a regex as an input. This script then finds all matches for this regex in a document and wraps ea
I would use "flat DOM" representation for such task.
In flat DOM this paragraph
abc def. ghij.
will be represented by two vectors:
chars: "abc def. ghij.",
props: ....aaaaaaaaaa,
You will use normal regexp on chars to mark span areas on props vector:
chars: "abc def. ghij."
props: ssssaaaaaaaaaa
ssss sssss
I am using schematic representation here, it's real structure is an array of arrays:
props: [
[s],
[s],
[s],
[s],
[a,s],
[a,s],
...
]
conversion tree-DOM <-> flat-DOM can use simple state automata.
At the end you will convert flat DOM to tree DOM that will look like:
abc def. ghij.
Just in case: I am using this approach in my HTML WYSIWYG editors.