Remove all content between two strings using regular expressions

拥有回忆 提交于 2019-12-13 07:54:03

问题


I am trying to use regular expressions in sublime 3, to remove all the content between two strings, an XML file.

Suppose this is my content:

        <Body name="ground">
            <mass>0</mass>
            <mass_center> 0 0 0</mass_center>
            <inertia_xx>0</inertia_xx>
            <inertia_yy>0</inertia_yy>
            <inertia_zz>0</inertia_zz>
            <inertia_xy>0</inertia_xy>
            <inertia_xz>0</inertia_xz>
            <inertia_yz>0</inertia_yz>
            <!--Joint that connects this body with the parent body.-->
            <Joint />
            <VisibleObject>
                <!--Set of geometry files and associated attributes, allow .vtp, .stl, .obj-->
                <GeometrySet>
                    <objects />
                    <groups />
                </GeometrySet>
                <!--Three scale factors for display purposes: scaleX scaleY scaleZ-->
                <scale_factors> 1 1 1</scale_factors>
                <!--transform relative to owner specified as 3 rotations (rad) followed by 3 translations rX rY rZ tx ty tz-->
                <transform> -0 0 -0 0 0 0</transform>
                <!--Whether to show a coordinate frame-->
                <show_axes>false</show_axes>
                <!--Display Pref. 0:Hide 1:Wire 3:Flat 4:Shaded Can be overriden for individual geometries-->
                <display_preference>4</display_preference>
            </VisibleObject>
            <WrapObjectSet>
                <objects />
                <groups />
            </WrapObjectSet>
        </Body>

Now suppose I want to remove all the content between <VisibleObject> and </VisibleObject> to leave only:

        <Body name="ground">
            <mass>0</mass>
            <mass_center> 0 0 0</mass_center>
            <inertia_xx>0</inertia_xx>
            <inertia_yy>0</inertia_yy>
            <inertia_zz>0</inertia_zz>
            <inertia_xy>0</inertia_xy>
            <inertia_xz>0</inertia_xz>
            <inertia_yz>0</inertia_yz>
            <!--Joint that connects this body with the parent body.-->
            <Joint />
            <VisibleObject>
            </VisibleObject>
            <WrapObjectSet>
                <objects />
                <groups />
            </WrapObjectSet>
        </Body>

There are a few similar threads and problems, to the above but none of them seem to work particularly well (or at all) for this problem.

Any help would be most appreciated.


回答1:


An image with the sublime window:

You can find it via Find, then Replace and make sure you tick the most outer left options.




回答2:


Sublime appears to use PCRE, according to this page.

That means that you should be able to use the cool tricks PCRE offers (mostly negative look-ahead). This can speed up performance considerably.

The regex I recommend is:

<VisibleObject>(?:[^<]*(?!</VisibleObject).)+</VisibleObject>

Essentially, the negative look-ahead ensures that whenever a < is present (namely at the start of a tag), it's not the closing </VisibleObject>.

The . is needed so that the engine can backtrack one character when the negative look-ahead sees the closing tag.

You will need to use the replacement <VisibleObject></VisibleObject>.



来源:https://stackoverflow.com/questions/38960282/remove-all-content-between-two-strings-using-regular-expressions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!