win batch regexp search and replace

爷,独闯天下 提交于 2020-01-05 06:39:31

问题


I have a set of data like this

7859 10000:00 7859 10000:00 (xfer#1, to-check=1033/1035)

32768 000:17 22174479 10000:00 (xfer#2, to-check=1032/1035)

They are read from a file and passed line by line to a method inside my batch script What I want to do in that method is to extract only

7859

22174479

from this lines, basically whatever is after "\d+:\d\d\s+", then what follows are the numbers that I need and then another "\d\d.*"

Is this possible using only batch script regular expression and search and replace? I tried and read a bunch of articles but could not find a solution In the and I want to add the numbers

Thank you

EDIT
Based on Andrei's comment to David Ruhmann's answer, Andrei wants the token that is 2 positions before (xfer#, not the 3rd token from the beginning.


回答1:


  :: Does %variable% =~ s/old/new/
  setlocal ENABLEDELAYEDEXPANSION     
  for /f "delims=" %%a in ('echo !variable! ^|perl -pe "s/regexp/replace/" ') do set variable=%%a  



回答2:


Do note that batch is not the best language to use for regex! Cmd processes the input one line at a time, whereas regex allows for multi-line processing.

It sounds like you just need to perform a token grab from the lines. Assuming the more complete regex for the line looks like this [\d+\s+\d+:\d\d\s+]+\(xfer#\d+, to-check=\d+/\d+\).

This allows us to know that there are constant delimiters in the line. : colons, and \s+ whitespace. From there it is just a matter of using those anchors to determine the token position.


Extract the third token delimited by single line whitespace from the line.

for /f "tokens=3" %%A in ("line") do echo %%A

Extract the second token delimited by single line whitespace from the second token delimited by colons from the line.

for /f "tokens=2 delims=:" %%A in ("line") do (
    for /f "tokens=2" %%B in ("%%A") do echo %%B
)

Update

Extract the second token before the last colon.

@echo off
setlocal EnableExtensions EnableDelayedExpansion
set "Line=32768 004:47 2686976 2200:03 11707819 10000:01 (xfer#5264, to-check=1020/6975)"

set "Last="
for /f "delims=" %%A in ('echo("%Line::="^&echo("%"') do (
    for /f "tokens=2" %%B in ("%%A") do (
        if defined This set "Last=!This!"
        set "This=%%B"
    )
)
echo %Last%

endlocal
pause >nul

Limitations

  1. Lines containing an odd number of double quotation marks " will cause the script to crash. One method to prevent this is to strip out the quotations before the for loop with set Line=%Line:"=%.



回答3:


Based on your comment to David Ruhmann's answer, you want the token that is 2 positions before the (xfer# string. I suppose it can be done using native batch commands, but that is a nasty problem.

I am assuming you are restricted to commands that are native to Windows - no downloaded executables.

I'm hoping that you may use JScript, since it is native to Windows.

I have written a hybrid JScript/Batch utility script named "REPL.BAT" that performs regex search and replace. It is an amazingly useful utility, despite not requiring much code. The utility makes the solution very simple.

I use FINDSTR to filter out the lines that don't meet the template of at least 2 space delimted tokens prior to (xfer#. I pipe those results to my REPL utility and preserve only the desired token. The result is sent to stdout.

findstr /r /c:" [^ ][^ ]* [^ ][^ ]* (xfer#" test.txt | repl ".* ([^ ]+) ([^ ]+) \(xfer#.*" "$1"

Here is the code for the REPL.BAT utility script. Full documentation is embedded within the script.

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::************ Documentation ***********
:::
:::REPL  Search  Replace  [Options  [SourceVar]]
:::REPL  /?
:::
:::  Performs a global search and replace operation on each line of input from
:::  stdin and prints the result to stdout.
:::
:::  Each parameter may be optionally enclosed by double quotes. The double
:::  quotes are not considered part of the argument. The quotes are required
:::  if the parameter contains a batch token delimiter like space, tab, comma,
:::  semicolon. The quotes should also be used if the argument contains a
:::  batch special character like &, |, etc. so that the special character
:::  does not need to be escaped with ^.
:::
:::  If called with a single argument of /? then prints help documentation
:::  to stdout.
:::
:::  Search  - By default this is a case sensitive JScript (ECMA) regular
:::            expression expressed as a string.
:::
:::            JScript syntax documentation is available at
:::            http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
:::  Replace - By default this is the string to be used as a replacement for
:::            each found search expression. Full support is provided for
:::            substituion patterns available to the JScript replace method.
:::            A $ literal can be escaped as $$. An empty replacement string
:::            must be represented as "".
:::
:::            Replace substitution pattern syntax is documented at
:::            http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
:::  Options - An optional string of characters used to alter the behavior
:::            of REPL. The option characters are case insensitive, and may
:::            appear in any order.
:::
:::            I - Makes the search case-insensitive.
:::
:::            L - The Search is treated as a string literal instead of a
:::                regular expression. Also, all $ found in Replace are
:::                treated as $ literals.
:::
:::            E - Search and Replace represent the name of environment
:::                variables that contain the respective values. An undefined
:::                variable is treated as an empty string.
:::
:::            M - Multi-line mode. The entire contents of stdin is read and
:::                processed in one pass instead of line by line. ^ anchors
:::                the beginning of a line and $ anchors the end of a line.
:::
:::            X - Enables extended substitution pattern syntax with support
:::                for the following escape sequences:
:::
:::                \\     -  Backslash
:::                \b     -  Backspace
:::                \f     -  Formfeed
:::                \n     -  Newline
:::                \r     -  Carriage Return
:::                \t     -  Horizontal Tab
:::                \v     -  Vertical Tab
:::                \xnn   -  Ascii (Latin 1) character expressed as 2 hex digits
:::                \unnnn -  Unicode character expressed as 4 hex digits
:::
:::                Escape sequences are supported even when the L option is used.
:::
:::            S - The source is read from an environment variable instead of
:::                from stdin. The name of the source environment variable is
:::                specified in the next argument after the option string.
:::

::************ Batch portion ***********
@echo off
if .%2 equ . (
  if "%~1" equ "/?" (
    findstr "^:::" "%~f0" | cscript //E:JScript //nologo "%~f0" "^:::" ""
    exit /b 0
  ) else (
    call :err "Insufficient arguments"
    exit /b 1
  )
)
echo(%~3|findstr /i "[^SMILEX]" >nul && (
  call :err "Invalid option(s)"
  exit /b 1
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0

:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b

************* JScript portion **********/
var env=WScript.CreateObject("WScript.Shell").Environment("Process");
var args=WScript.Arguments;
var search=args.Item(0);
var replace=args.Item(1);
var options="g";
if (args.length>2) {
  options+=args.Item(2).toLowerCase();
}
var multi=(options.indexOf("m")>=0);
var srcVar=(options.indexOf("s")>=0);
if (srcVar) {
  options=options.replace(/s/g,"");
}
if (options.indexOf("e")>=0) {
  options=options.replace(/e/g,"");
  search=env(search);
  replace=env(replace);
}
if (options.indexOf("l")>=0) {
  options=options.replace(/l/g,"");
  search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
  replace=replace.replace(/\$/g,"$$$$");
}
if (options.indexOf("x")>=0) {
  options=options.replace(/x/g,"");
  replace=replace.replace(/\\\\/g,"\\B");
  replace=replace.replace(/\\b/g,"\b");
  replace=replace.replace(/\\f/g,"\f");
  replace=replace.replace(/\\n/g,"\n");
  replace=replace.replace(/\\r/g,"\r");
  replace=replace.replace(/\\t/g,"\t");
  replace=replace.replace(/\\v/g,"\v");
  replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
    function($0,$1,$2){
      return String.fromCharCode(parseInt("0x"+$0.substring(2)));
    }
  );
  replace=replace.replace(/\\B/g,"\\");
}
var search=new RegExp(search,options);

if (srcVar) {
  WScript.Stdout.Write(env(args.Item(3)).replace(search,replace));
} else {
  while (!WScript.StdIn.AtEndOfStream) {
    if (multi) {
      WScript.Stdout.Write(WScript.StdIn.ReadAll().replace(search,replace));
    } else {
      WScript.Stdout.WriteLine(WScript.StdIn.ReadLine().replace(search,replace));
    }
  }
}



回答4:


The easiest and most flexible way to accomplish what you want would be to use awk (regexp examples) or sed (for example: sed -i -r -e "s/(\d+:\d\d\s+)\d+/\1replacementstring/g" filename) from GnuWin32, which both support Perl regexp syntax. I think what you're involved in is exactly what awk was designed for.

If you're stuck using only what's available without having to use 3rd party tools, you can perform regexp matches using vbscript. You can call vbscript by echoing the script to a .vbs file, calling cscript vbsfile, and capturing its output. Here's a proof of concept.

@echo off & setlocal enabledelayedexpansion

:: rxp.bat
:: rxp /? for usage instructions

if #%4==# goto usage
set global=false
set replace=false
for %%I in (%*) do (
    if not #!next!==# (
        if !next!==string set string=%%I
        if !next!==pattern set pattern=%%I
        if !next!==replace set replace=%%I
        set next=
    )
    if #%%I==#/s set next=string
    if #%%I==#/p set next=pattern
    if #%%I==#/r set next=replace
    if #%%I==#/g set global=true
)
if #%string==# goto usage
if #%pattern==# goto usage

set string=!string:"=""!
set string=!string:\=!
set pattern=!pattern:"=""!
set pattern=!pattern:\=!
if #!replace!==#false (
    call :rxp !string:~1,-1! !pattern:~1,-1! !global!
) else (
    set replace=!replace:"=""!
    set replace=!replace:\=!
    call :rxp !string:~1,-1! !pattern:~1,-1! !global! !replace:~1,-1!
)
goto :EOF

:rxp string pattern global replacement
echo Set rxp = New RegExp>regexp.vbs
echo rxp.Pattern = %2>>regexp.vbs
echo rxp.Global = %3>>regexp.vbs
if #%4==# (
    echo Set res = rxp.Execute^(%1^)>>regexp.vbs
    echo For Each match in res>>regexp.vbs
    echo Wscript.Echo match.value>>regexp.vbs
    echo Next>>regexp.vbs
) else (
    echo Wscript.echo rxp.Replace^(%1, %4^)>>regexp.vbs
)
cscript /nologo regexp.vbs
del /q regexp.vbs
goto :EOF

:usage
echo Usage: %~nx0 /s "string" /p "regexp" [/g] [/r "replacement text"]
echo;
echo    /s -- search string
echo;
echo    /p -- regular expression pattern
echo          Example: /p "<[^>]+>" to search for markup tags
echo          matches ^<span class='a'^> or similar
echo;
echo    /r -- replacement text (optional)
echo          If specified, replace the matched text
echo          Example: /p "(<div class=')blue('>)" /r "$1red$2"
echo          matches ^<div class='blue'^>
echo          replaces match with ^<div class='red'^>
echo;
echo    /g -- global match (optional)
echo          match every occurrence (matches only the first by default)
echo;
echo notes: If the regexp pattern includes capturing parentheses, use ^$1-^$9 as
echo backreferences in your replacement text.  If any of your strings include
echo quotation marks, they can be escaped with a backslash (\).
echo;
echo Example:
echo %~nx0 /s "text begin <div id=\"foo\"> text end" /p "(<div)[^>]+(>)"
echo /r "$1 class=\"bar\"$2"
echo;
echo matches ^<div id="foo"^>, replaces match with ^<div class="bar"^>
echo output: text begin ^<div class="bar"^> text end

example output:

C:\Users\me\Desktop>rxp /s "7859 10000:00 7849 10000:00 (xfer#1, to-check=1033/1035)" /p "(\d+:\d\d\s+)\d+" /r "$1foo"
7859 10000:00 foo 10000:00 (xfer#1, to-check=1033/1035)

C:\Users\me\Desktop>rxp
Usage: rxp.bat /s "string" /p "regexp" [/g] [/r "replacement text"]

   /s -- search string

   /p -- regular expression pattern
         Example: /p "<[^>]+>" to search for markup tags
         matches <span class='a'> or similar

   /r -- replacement text (optional)
         If specified, replace the matched text

   /g -- global match (optional)
         match every occurrence (matches only the first by default)

notes: If the regexp pattern includes capturing parentheses, use $1-$9 as
backreferences in your replacement text.  If any of your strings include
quotation marks, they can be escaped with a backslash (\).

Example:
rxp.bat /s "text begin <div id=\"foo\"> text end" /p "(<div)[^>]+(>)"
/r "$1 class=\"bar\"$2"

matches <div id="foo">, replaces match with <div class="bar">
output: text begin <div class="bar"> text end


来源:https://stackoverflow.com/questions/14856009/win-batch-regexp-search-and-replace

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!