How do I remove specific whitespace inside xml tag in Sublime Text?

谁说我不能喝 提交于 2019-12-24 16:10:57

问题


I have a file with some xml tags that follow specific patterns (Name and Props are placeholders)

<Name id="mod:Name"/>
<Prop1 Name id="mod:object.Prop1 Name"/>
<Prop1 Prop2 Name id="mod:object.Prop1 Prop2 Name"/>
<Prop1 Prop2 Prop3 Name id="mod:object.Prop1 Prop2 Prop3 Name"/>

I am looking for regex to remove whitespace from portion before the "id=..."

How this should look

<Name id="mod:Name"/>
<Prop1Name id="mod:object.Prop1 Name"/>
<Prop1Prop2Name id="mod:object.Prop1 Prop2 Name"/>
<Prop1Prop2Prop3Name id="mod:object.Prop1 Prop2 Prop3 Name"/>

I have seen the (\S+)\s(?=\S+\s+) example with the substitution being just \1 but that removes all the spaces except the last one and doesn't leave a space before the id=

<Name id="mod:Name"/>
<Prop1Name id="mod:object.Prop1 Name"/>
<Prop1Prop2Name id="mod:object.Prop1Prop2 Name"/>
<Prop1Prop2Prop3Name id="mod:object.Prop1Prop2Prop3 Name"/>

I tried something like

^((\S+)*)\s((\S+)*)\s((\S+)*)\s((\S+)*)\s(?=id)

But that gave me catastrophic backtracking

Not sure if it helps but Sublime uses Boost regex

First question on The Stack so any improvements on question would be welcome

Thank you

This seems to work

^(?|((\S+))\s|((\S+)\s(\S+))\s|((\S+)\s(\S+)\s(\S+)\s))(id=.*)

with replace of $2$3$4 $5

Thanks for the advice


回答1:


A correct regex for removing all whitespaces before the id attribute will be

(?:<\w+|(?!^)\G)\K\s+(\w+)(?=[^<>]*\bid=")

Replace with $1. See the regex demo.

The regex uses the \G operator (matches the location after the last successful match if restricted with (?!^) lookahead) and the \K operator that discards the text that was matched by the pattern so far.

Breakdown:

  • (?:<\w+|(?!^)\G)\K - match < followed with 1+ alphanumeric or underscore characters or the end of the last successful match and omit the text found
  • \s+ - match 1+ whitespace symbols
  • (\w+) - match and capture into Group 1 one or more alphanumeric or underscore characters (we'll later use a $1 backreference to restore this consumed text in the result)
  • (?=[^<>]*\bid=") - only go on matching spaces followed with alphanumerics until it finds id= as a whole word (\b is a word boundary) but inside the tag (due to the [^<>]* matching zero or more characters other than < and >).

A faster alternative (to replace with empty string):

(?:<|(?!^)\G)\w+\K\s+(?!id=)

This regex matches the < or the end of the last successful match, then one or more word characters, then \K will omit the whole text from the match, and only 1 or more whitespaces will be matched (if not followed with id= due to the negative lookahead (?!id=)) in the end - and they will be removed.



来源:https://stackoverflow.com/questions/35613715/how-do-i-remove-specific-whitespace-inside-xml-tag-in-sublime-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!