is there something akin to regEx in applescript, and if not, what's the alternative?

前端 未结 7 700
感动是毒
感动是毒 2020-12-04 15:54

I need to parse the first 10 chars of a file name to see if they are all digits. The obvious way to do this is fileName =~ m/^\\d{10}/ but I\'m not seeing anything regExy i

相关标签:
7条回答
  • 2020-12-04 16:14

    I was able to call JavaScript directly from AppleScript (on High Sierra) with the following.

    # Returns a list of strings from _subject that match _regex
    # _regex in the format of /<value>/<flags>
    on match(_subject, _regex)
        set _js to "(new String(`" & _subject & "`)).match(" & _regex & ")"
        set _result to run script _js in "JavaScript"
        if _result is null or _result is missing value then
            return {}
        end if
        return _result
    end match
    
    match("file-name.applescript", "/^\\d+/g") #=> {}
    match("1234_file.js", "/^\\d+/g") #=> {"1234"}
    match("5-for-fighting.mp4", "/^\\d+/g") #=> {"5"}
    

    It seems most of the JavaScript String methods work as expected. I've not found a reference for which version of ECMAScript that JavaScript for macOS Automation is compatible so do test before use.

    0 讨论(0)
  • 2020-12-04 16:16

    I recently had need of regular expressions in a script, and wanted to find a scripting addition to handle it, so it would be easier to read what was going on. I found Satimage.osax, which lets you use syntax like below:

    find text "n(.*)" in "to be or not to be" with regexp
    

    The only downside is that (as of 11/08/2010) it's a 32-bit addition, so it throws errors when it's called from a 64-bit process. This bit me in a Mail rule for Snow Leopard, as I had to run Mail in 32-bit mode. Called from a standalone script, though, I have no reservations - it's really great, and lets you pick whatever regex syntax you want, and use back-references.

    Update 5/28/2011

    Thanks to Mitchell Model's comment below for pointing out they have updated it to be 64-bit, so no more reservations - it does everything I need.

    0 讨论(0)
  • 2020-12-04 16:17

    I'm sure there is an Applescript Addition or a shell script that can be called to bring regex into the fold, but I avoid dependencies for the simple stuff. I use this style pattern all the time...

    set filename to "1234567890abcdefghijkl"
    
    return isPrefixGood(filename)
    
    on isPrefixGood(filename) --returns boolean
        set legalCharacters to {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}
    
        set thePrefix to (characters 1 thru 10) of filename as text
    
        set badPrefix to false
    
        repeat with thisChr from 1 to (get count of characters in thePrefix)
            set theChr to character thisChr of thePrefix
            if theChr is not in legalCharacters then
                set badPrefix to true
            end if
        end repeat
    
        if badPrefix is true then
            return "bad prefix"
        end if
    
        return "good prefix"
    end isPrefixGood
    
    0 讨论(0)
  • 2020-12-04 16:19

    I have an alternative, until I have implemented the character class for the Thompson NFA Algorithm I have made the bare bones of work in AppleScript. If someones interested in looking for parsing very basic regex's with Applescript, then code is posted in CodeExchange at MacScripters, please have a look!

    Here is the solution for figuring out if the ten first characters of a text/string:

     set mstr to "1234567889Abcdefg"
    set isnum to prefixIsOnlyDigits for mstr
    to prefixIsOnlyDigits for aText
        set aProbe to text 1 thru 10 of aText
        set isnum to false
        if not ((offset of "," in aProbe) > 0 or (offset of "." in aProbe) > 0 or (offset of "-" in aProbe) > 0) then
            try
                set aNumber to aProbe as number
                set isnum to true
            end try
        end if
        return isnum
    end prefixIsOnlyDigits
    
    0 讨论(0)
  • 2020-12-04 16:21

    Don't despair, since OSX you can also access sed and grep through "do shell script". So:

    set thecommandstring to "echo \"" & filename & "\"|sed \"s/[0-9]\\{10\\}/*good*(&)/\"" as string
    set sedResult to do shell script thecommandstring
    set isgood to sedResult starts with "*good*"
    

    My sed skills aren't too crash hot, so there might be a more elegant way than appending *good* to any name that matches [0-9]{10} and then looking for *good* at the start of the result. But basically, if filename is "1234567890dfoo.mov" this will run the command:

    echo "1234567890foo.mov"|sed "s/[0-9]\{10\}/*good*(&)/"
    

    Note the escaped quotes \" and escaped backslash \\ in the applescript. If you're escaping things in the shell you have to escape the escapes. So to run a shell script that has a backslash in it you have to escape it for the shell like \\ and then escape each backslash in applescript like \\\\. This can get pretty hard to read.

    So anything you can do on the command line you can do by calling it from applescript (woohoo!). Any results on stdout get returned to the script as the result.

    0 讨论(0)
  • 2020-12-04 16:26

    There is an easier way to make use of the shell (works on bash 3.2+) for regex matching:

    set isMatch to "0" = (do shell script ¬
      "[[ " & quoted form of fileName & " =~ ^[[:digit:]]{10} ]]; printf $?")
    

    Note:

    • Makes use of a modern bash test expression [[ ... ]] with the regex-matching operator, =~; not quoting the right operand (or at least the special regex chars.) is a must on bash 3.2+, unless you prepend shopt -s compat31;
    • The do shell script statement executes the test and returns its exit command via an additional command (thanks, @LauriRanta); "0" indicates success.
    • Note that the =~ operator does not support shortcut character classes such as \d and assertions such as \b (true as of OS X 10.9.4 - this is unlikely to change anytime soon).
    • For case-INsensitive matching, prepend the command string with shopt -s nocasematch;
    • For locale-awareness, prepend the command string with export LANG='" & user locale of (system info) & ".UTF-8';.
    • If the regex contains capture groups, you can access the captured strings via the built-in ${BASH_REMATCH[@]} array variable.
    • As in the accepted answer, you'll have to \-escape double quotes and backslashes.

    Here's an alternative using egrep:

    set isMatch to "0" = (do shell script ¬
      "egrep -q '^\\d{10}' <<<" & quoted form of filename & "; printf $?")
    

    Though this presumably performs worse, it has two advantages:

    • You can use shortcut character classes such as \d and assertions such as \b
    • You can more easily make matching case-INsensitive by calling egrep with -i:
    • You canNOT, however, gain access to sub-matches via capture-groups; use the [[ ... =~ ... ]] approach if that is needed.

    Finally, here are utility functions that package both approaches (the syntax highlighting is off, but they do work):

    # SYNOPIS
    #   doesMatch(text, regexString) -> Boolean
    # DESCRIPTION
    #   Matches string s against regular expression (string) regex using bash's extended regular expression language *including* 
    #   support for shortcut classes such as `\d`, and assertions such as `\b`, and *returns a Boolean* to indicate if
    #   there is a match or not.
    #    - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless inside
    #      a 'considering case' block.
    #    - The current user's locale is respected.
    # EXAMPLE
    #    my doesMatch("127.0.0.1", "^(\\d{1,3}\\.){3}\\d{1,3}$") # -> true
    on doesMatch(s, regex)
        local ignoreCase, extraGrepOption
        set ignoreCase to "a" is "A"
        if ignoreCase then
            set extraGrepOption to "i"
        else
            set extraGrepOption to ""
        end if
        # Note: So that classes such as \w work with different locales, we need to set the shell's locale explicitly to the current user's.
        #       Rather than let the shell command fail we return the exit code and test for "0" to avoid having to deal with exception handling in AppleScript.
        tell me to return "0" = (do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; egrep -q" & extraGrepOption & " " & quoted form of regex & " <<< " & quoted form of s & "; printf $?")
    end doesMatch
    
    # SYNOPSIS
    #   getMatch(text, regexString) -> { overallMatch[, captureGroup1Match ...] } or {}
    # DESCRIPTION
    #   Matches string s against regular expression (string) regex using bash's extended regular expression language and
    #   *returns the matching string and substrings matching capture groups, if any.*
    #   
    #   - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless this subroutine is called inside
    #     a 'considering case' block.
    #   - The current user's locale is respected.
    #   
    #   IMPORTANT: 
    #   
    #   Unlike doesMatch(), this subroutine does NOT support shortcut character classes such as \d.
    #   Instead, use one of the following POSIX classes (see `man re_format`):
    #       [[:alpha:]] [[:word:]] [[:lower:]] [[:upper:]] [[:ascii:]]
    #       [[:alnum:]] [[:digit:]] [[:xdigit:]]
    #       [[:blank:]] [[:space:]] [[:punct:]] [[:cntrl:]] 
    #       [[:graph:]]  [[:print:]] 
    #   
    #   Also, `\b`, '\B', '\<', and '\>' are not supported; you can use `[[:<:]]` for '\<' and `[[:>:]]` for `\>`
    #   
    #   Always returns a *list*:
    #    - an empty list, if no match is found
    #    - otherwise, the first list element contains the matching string
    #       - if regex contains capture groups, additional elements return the strings captured by the capture groups; note that *named* capture groups are NOT supported.
    #  EXAMPLE
    #       my getMatch("127.0.0.1", "^([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})$") # -> { "127.0.0.1", "127", "0", "0", "1" }
    on getMatch(s, regex)
        local ignoreCase, extraCommand
        set ignoreCase to "a" is "A"
        if ignoreCase then
            set extraCommand to "shopt -s nocasematch; "
        else
            set extraCommand to ""
        end if
        # Note: 
        #  So that classes such as [[:alpha:]] work with different locales, we need to set the shell's locale explicitly to the current user's.
        #  Since `quoted form of` encloses its argument in single quotes, we must set compatibility option `shopt -s compat31` for the =~ operator to work.
        #  Rather than let the shell command fail we return '' in case of non-match to avoid having to deal with exception handling in AppleScript.
        tell me to do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; shopt -s compat31; " & extraCommand & "[[ " & quoted form of s & " =~ " & quoted form of regex & " ]] && printf '%s\\n' \"${BASH_REMATCH[@]}\" || printf ''"
        return paragraphs of result
    end getMatch
    
    0 讨论(0)
提交回复
热议问题