Batch Script to extract lines Between two given words

风流意气都作罢 提交于 2020-03-06 04:24:12

问题


I need to extract texts between two given words from a file.

The File format is as below :

some lines
<name>text1</name>
some lines
some lines 
<name>text2</name>
some lines
<name>text3</name>
some more lines
  • I need to extract all the occurrences of texts that occur between each of the name tags

    <name> extract this text here </name>
    

Expected Output for above file :

  • text1
  • text2
  • text3

Thank you.


回答1:


This should work for the sample data provided:

for /f "tokens=2 delims=<>" %A in ('type test.txt ^| findstr "<name>"') do @echo %A

If using this inside of a batch script, be sure to change %A to %%A. Basically, this will run through lines containing <name>, and split the line by < and > characters using delims=<>, giving you name, text in between, /name. The tokens=2 sets %A to only the second string.

Keep in mind this won't work if you have anything on the line before <name>. That would probably complicate things a lot more in batch, and I would then suggest using some parsing library in another language for that.

Also, this will not work if the text you wanted to extract contains < or >.




回答2:


Suppose the input file is input.txt.

This should work :

grep '<name>.*</name>' input.txt | sed -r 's/<name>(.*)<\/name>/\1/'

grep finds the lines sed deletes the name tags




回答3:


The following script extracts the text in between the desired tags of the file(s) provided as command line argument(s):

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Resolve command line arguments:
for %%F in (%*) do (
    rem // Read a single line of text following certain criteria:
    for /F "delims=" %%L in ('
        findstr /R "^[^<>]*<name>[^<>][^<>]*</name>[^<>]*$" "%%~F"
    ') do (
        set "LINE=%%L"
        rem /* Extract the desired string portion;
        rem    the preceding `_` is inserted for the first token
        rem    never to appear empty to the `for /F` loop: */
        setlocal EnableDelayedExpansion
        for /F "tokens=3 delims=<>" %%K in ("_!LINE!") do (
            endlocal
            rem // Return found string portion:
            echo(%%K
        )
    )
)

endlocal
exit /B

This works only if there is exactly one tag <name>, followed by some text not containing < and > on its own, followed by exactly one tag </name>; this string must be on a single line and may be preceded or followed by some texts not containing < and > on their own.



来源:https://stackoverflow.com/questions/40371999/batch-script-to-extract-lines-between-two-given-words

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!