问题
I need to extract texts between two given words from a file.
The File format is as below :
some lines
<name>text1</name>
some lines
some lines
<name>text2</name>
some lines
<name>text3</name>
some more lines
I need to extract all the occurrences of texts that occur between each of the name tags
<name> extract this text here </name>
Expected Output for above file :
- text1
- text2
- text3
Thank you.
回答1:
This should work for the sample data provided:
for /f "tokens=2 delims=<>" %A in ('type test.txt ^| findstr "<name>"') do @echo %A
If using this inside of a batch script, be sure to change %A
to %%A
. Basically, this will run through lines containing <name>
, and split the line by <
and >
characters using delims=<>
, giving you name
, text in between
, /name
. The tokens=2
sets %A
to only the second string.
Keep in mind this won't work if you have anything on the line before <name>
. That would probably complicate things a lot more in batch, and I would then suggest using some parsing library in another language for that.
Also, this will not work if the text you wanted to extract contains <
or >
.
回答2:
Suppose the input file is input.txt.
This should work :
grep '<name>.*</name>' input.txt | sed -r 's/<name>(.*)<\/name>/\1/'
grep finds the lines sed deletes the name tags
回答3:
The following script extracts the text in between the desired tags of the file(s) provided as command line argument(s):
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Resolve command line arguments:
for %%F in (%*) do (
rem // Read a single line of text following certain criteria:
for /F "delims=" %%L in ('
findstr /R "^[^<>]*<name>[^<>][^<>]*</name>[^<>]*$" "%%~F"
') do (
set "LINE=%%L"
rem /* Extract the desired string portion;
rem the preceding `_` is inserted for the first token
rem never to appear empty to the `for /F` loop: */
setlocal EnableDelayedExpansion
for /F "tokens=3 delims=<>" %%K in ("_!LINE!") do (
endlocal
rem // Return found string portion:
echo(%%K
)
)
)
endlocal
exit /B
This works only if there is exactly one tag <name>
, followed by some text not containing <
and >
on its own, followed by exactly one tag </name>
; this string must be on a single line and may be preceded or followed by some texts not containing <
and >
on their own.
来源:https://stackoverflow.com/questions/40371999/batch-script-to-extract-lines-between-two-given-words