Removing non alphanumeric characters in a batch variable

旧城冷巷雨未停 提交于 2019-11-30 16:06:21

问题


In batch, how would I remove all non alphanumeric (a-z,A-Z,0-9,_) characters from a variable?

I'm pretty sure I need to use findstr and a regex.


回答1:


The solutionof MC ND works, but it's really slow (Needs ~1second for the small test sample).

This is caused by the echo "!_buf!"|findstr ... construct, as for each character the pipe creates two instances of cmd.exe and starts findstr.

But this can be solved also with pure batch.
Each character is tested if it is in the map variable

:test

    set "_input=Th""i\s&& is not good _maybe_???"
    set "_output="
    set "map=abcdefghijklmnopqrstuvwxyz 1234567890"

:loop
if not defined _input goto endLoop    
for /F "delims=*~ eol=*" %%C in ("!_input:~0,1!") do (
    if "!map:%%C=!" NEQ "!map!" set "_output=!_output!%%C"
)
set "_input=!_input:~1!"
    goto loop

:endLoop
    echo(!_output!

And it could be speed up when the goto loop is removed.
Then you need to calculate the stringLength first and iterate then with a FOR/L loop over each character.
This solution is ~6 times faster than the above method and ~40 times faster than the solution of MC ND

set "_input=Th""i\s&& is not good _maybe_!~*???"
set "_output="
set "map=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 1234567890"
%$strLen% len _input

for /L %%n in (0 1 %len%) DO (
    for /F "delims=*~ eol=*" %%C in ("!_input:~%%n,1!") do (
        if "!map:%%C=!" NEQ "!map!" set "_output=!_output!%%C"
    )
)
exit /b

The macro $strlen can be defined with

set LF=^


::Above 2 blank lines are required - do not remove
@set ^"\n=^^^%LF%%LF%^%LF%%LF%^^":::: StrLen pResult pString
set $strLen=for /L %%n in (1 1 2) do if %%n==2 (%\n%
        for /F "tokens=1,2 delims=, " %%1 in ("!argv!") do (%\n%
            set "str=A!%%~2!"%\n%
              set "len=0"%\n%
              for /l %%A in (12,-1,0) do (%\n%
                set /a "len|=1<<%%A"%\n%
                for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"%\n%
              )%\n%
              for %%v in (!len!) do endlocal^&if "%%~b" neq "" (set "%%~1=%%v") else echo %%v%\n%
        ) %\n%
) ELSE setlocal enableDelayedExpansion ^& set argv=,



回答2:


EDITED - @jeb is right. This works but is really, really slow.

@echo off
    setlocal enableextensions enabledelayedexpansion
    set "_input=Th""i\s&& is not good _maybe_???"
    set "_output="
:loop
    if not defined _input goto endLoop
    set "_buf=!_input:~0,1!"
    set "_input=!_input:~1!"
    echo "!_buf!"|findstr /i /r /c:"[a-z 0-9_]" > nul && set "_output=!_output!!_buf!"
    goto loop
:endLoop
    echo !_output!
    endlocal

So, back to the drawing board. How to make it faster? lets try to do as less operations as we can and use as much long substring as we can. So, do it in two steps

1.- Remove all bad characters that can generate problems. To do it we will use the hability of for command to identify these chars as delimiters , and then join the rest of the sections of god characters of string

2.- Remove the rest of the bad characters, locating them in string using the valids charactes as delimiters to find substrings of bad characters, replacing then in string

So, we end with (sintax adapted to what has been answered here)

@echo off

    setlocal enableextensions enabledelayedexpansion

    rem Test empty string
    call :doClean "" output
    echo "%output%"

    rem Test mixed strings
    call :doClean "~~asd123#()%%%^"^!^"~~~^"""":^!!!!=asd^>^<bm_1" output
    echo %output%
    call :doClean "Thi\s&& is ;;;;not ^^good _maybe_!~*???" output
    echo %output%

    rem Test clean string
    call :doClean "This is already clean" output
    echo %output%

    rem Test all bad string
    call :doClean "*******//////\\\\\\\()()()()" output
    echo "%output%"

    rem Test long string
    set "zz=Thi\s&& is not ^^good _maybe_!~*??? "
    set "zz=TEST: %zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%"
    call :doClean "%zz% TEST" output
    echo %output%

    rem Time long string
    echo %time%
    for /l %%# in (1 1 100) do call :doClean "%zz%" output
    echo %time%

    exit /b

rem ---------------------------------------------------------------------------
:doClean input output
    setlocal enableextensions enabledelayedexpansion
    set "map=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 "
    set "input=%~1"
    set "output="

rem Step 1 - Remove critical delimiters
(
:purgeCritical
    for /L %%z in (1 1 10) do (
        for /f tokens^=1^-9^,^*^ delims^=^=^"^"^~^;^,^&^*^%%^:^!^(^)^<^>^^ %%a in ("!input!") do ( 
            set "output=!output!%%a%%b%%c%%d%%e%%f%%g%%h%%i"
            set "input=%%j" 
        )
        if not defined input goto outPurgeCritical
    )
    goto purgeCritical
)
:outPurgeCritical

rem Step 2 - remove any remaining special character
(
:purgeNormal
    for /L %%z in (1 1 10) do (
        set "pending="
        for /f "tokens=1,* delims=%map%" %%a in ("!output!") do (
            set "output=!output:%%a=!"
            set "pending=%%b"
        )
        if not defined pending goto outPurgeNormal
    )
    goto purgeNormal
)
:outPurgeNormal

    endlocal & set "%~2=%output%"
    goto :EOF

Maybe not the fastest, but at least a "decent" solution




回答3:


@echo eof

call :purge "~~asd123#()%%%^"^!^"~~~^:^=asd^>^<bm_1" var
echo (%var%)
goto :eof


:purge StrVar  [RtnVar]
setlocal disableDelayedExpansion
set "str1=%~1"
setlocal enableDelayedExpansion

for %%a in ( -  ! @ # $ % ^^ ^&  + \ / ^< ^>  . '  [ ] { }  ` ^| ^"  ) do (
   set "str1=!str1:%%a=!"
 )

 rem dealing with some delimiters


 set "str1=!str1:(=!"
 set "str1=!str1:)=!"
 set "str1=!str1:;=!"
 set "str1=!str1:,=!"
 set "str1=!str1:^^=!"
 set "str1=!str1:^~=!"

 set "temp_str=" 
 for %%e in (%str1%) do (
  set "temp_str=!temp_str!%%e"
 )

endlocal & set "str1=%temp_str%"



setlocal disableDelayedExpansion
set "str1=%str1:!=%"
set "str1=%str1::=%"
set "str1=%str1:^^~=%"

for /f "tokens=* delims=~" %%w in ("%str1%") do set "str1=%%w"

endlocal & set "str1=%str1%"



endlocal &  if "%~2" neq "" (set %~2=%str1%) else echo %str1%

goto :eof

Still cannot deal with ~ and = but working on it

EDIT: = now will be cleared EDIT: ~ now will be cleared



来源:https://stackoverflow.com/questions/19855925/removing-non-alphanumeric-characters-in-a-batch-variable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!