问题
I am using RegEx to match a narrower set of TinyMCE HTML from a textarea. Widths are too big, creating run-offs so I am making test code in JavaScript.
My question is why does $3 not only match "1000px" but also matches the rest of the document after the table tag?
<script language="javascript">
// change table width
function adjustTable(elem0,elem1) {
// debugging, place results in div
elem1.innerHTML = elem0.innerHTML.replace(/^(.*)(\u003Ctable.*?\s*?\w*?width\u003D[\u0022\u0027])(\d+px)([\u0022\u0027].*?\u003E)(.*)$/img,"$3");
}
</script>
<button type="button" onclick="adjustTable(document.getElementById('myTable'),document.getElementById('myResult'))">RegEx</button>
<div id="myTable">
<table width="1000px">
<thead>
<tr><th colspan="3">Table Header</th></tr>
</thead>
<tbody>
<tr><td>alpha</td><td>beta</td><td>gamma</td></tr>
</tbody>
</table>
</div>
<textarea id="myResult">
</textarea>
Yes, I do understand RegEx and HTML are streams that should not be crossed, because HTML is complex, etc. I am attempting to make the subset of HTML printable.
I do not see how it matches in multiple ways.
Below is the result for $3.
1000px
<thead>
<tr><th colspan="3">Table Header</th></tr>
</thead>
<tbody>
<tr><td>alpha</td><td>beta</td><td>gamma</td></tr>
</tbody>
</table>
It matches the 1000px, but then there's the extraneous stuff after the table tag, which is odd, because I thought I was forcing a match in the table tag. Thoughts?
回答1:
Let's debug this by logging the entire result of the regex:
function adjustTable(elem0,elem1) {
// debugging, place results in div
console.log ( (/^(.*)(\u003Ctable.*?\s*?\w*?width\u003D[\u0022\u0027])(\d+px)([\u0022\u0027].*?\u003E)(.*)$/img).exec(elem0.innerHTML) );
}
The output is:
[
0: " <table width="1000px">"
1: " "
2: "<table width=""
3: "1000px"
4: "">"
5: ""
index: 1
input: "↵ <table width="1000px">↵ <thead>↵ <tr><th colspan="3">Table Header</th></tr>↵ </thead>↵ <tbody>↵ <tr><td>alpha</td><td>beta</td><td>gamma</td></tr>↵ </tbody>↵ </table>↵"
]
So if you want to get the result "1000px", then use this code:
(/^(.*)(\u003Ctable.*?\s*?\w*?width\u003D[\u0022\u0027])(\d+px)([\u0022\u0027].*?\u003E)(.*)$/img).exec(elem0.innerHTML)[3]
回答2:
The dot doesn't match linebreak characters in JavaScript. And since you set the /m
modifier, the $
also matches at the end of lines, not just the end of the file.
Therefore, the final (.*)
in your regex doesn't match anything, leaving the rest of the string intact when you replace the match with $3
(which contains 1000px
).
See it on regex101.com.
来源:https://stackoverflow.com/questions/23812258/javascript-regex-replace-width-attribute-matching