Interpret escape characters in single quoted string

佐手、 提交于 2019-12-18 02:47:12

问题


Having a single-quoted string:

$content = '\tThis variable is not set by me.\nCannot do anything about it.\n';

I would like to inerpret/process the string as if it was double-quoted. In other words I would like to replace all the possible escape characters (not just tab and linefeed as in this example) with the real values, taking into account that backslash might be escaped as well, thus '\\n' needs to be replaced by '\n'. eval() would easily do what I need but I cannot use it.

Is there some simple solution?

(A similar thread that I found deals with expansion of variables in the single-quoted string while I'm after replacing escape characters.)


回答1:


There is a very simple way to do this, based on preg_replaceDoc and stripcslashes, both build in:

preg_replace(
    '/\\\\([nrtvf\\\\$"]|[0-7]{1,3}|\x[0-9A-Fa-f]{1,2})/e',
    'stripcslashes("$0")', $content
);

This works as long as "\\n" should become "\n" and the like. Demo.

If you're looking for processing these strings literally, see my previous answer.

Edit: You asked in a comment:

I'm just a bit puzzled what's the difference between the output of this and stripcslashes() directly [?]

The difference is not always visible, but there is one: stripcslashes will remove the \ chracter if no escape sequence follows. In PHP strings, the slash is not be dropped in that case. An example, "\d", d is not a special character, so PHP preserves the slash:

$content = '\d';
$content; # \d
stripcslashes($content); # d
preg_replace(..., $content); # \d

That's why preg_replace is useful here, it will only apply the function on those substrings where stripcslashes works as intended: all valid escape sequences.




回答2:


If you need to do the exact escape sequences like PHP does, you need the long version, which is the DoubleQuoted class. I extended input string a bit to cover more escape sequences than in your question to make this more generic:

$content = '\\\\t\tThis variable\\string is\x20not\40set by me.\nCannot \do anything about it.\n';

$dq = new DoubleQuoted($content);

echo $dq;

Output:

\\t This variable\string is not set by me.
Cannot \do anything about it.

However, if you're okay to come closely to that, there is a PHP function called stripcslashes, for comparison, I've added the result of it and the PHP double-quote string:

echo stripcslashes($content), "\n";

$compare = "\\\\t\tThis variable\\string is\x20not\40set by me.\nCannot \do anything about it.\n";

echo $compare, "\n";

Output:

\t  This variablestring is not set by me.
Cannot do anything about it.

\\t This variable\string is not set by me.
Cannot \do anything about it.

As you can see stripcslashes drops some characters here compared to PHP native output.

(Edit: See my other answer as well which offers something simple and sweet with cstripslashes and preg_replace.)

If stripcslashes is not suitable, there is DoubleQuoted. It's constructor takes a string that is treated like a double quoted string (minus variable substitution, only the character escape sequences).

As the manual outlines, there are multiple escape sequences. They look like regular expressions, and all start with \, so it's looks near to actually use regular expressions to replace them.

However there is one exception: \\ will skip the escape sequence. The regular expression would need to have backtracking and/or atomic groups to deal with that and I'm not fluent with those so I just did a simple trick: I only applied the regular expressions to those parts of the string which do not contain \\ by simply exploding the string first and then imploding it again.

The two regular expression based replace functions, preg_replaceDoc and preg_replace_callbackDoc, allow to operate on arrays as well, so this is quite easy to do.

It's done in the __toString()Doc function:

class DoubleQuoted
{
    ...
    private $string;
    public function __construct($string)
    {
        $this->string = $string;
    }
    ...
    public function __toString()
    {
        $this->exception = NULL;
        $patterns = $this->getPatterns();
        $callback = $this->getCallback();
        $parts = explode('\\\\', $this->string);
        try
        {
            $parts = preg_replace_callback($patterns, $callback, $parts);
        }
        catch(Exception $e)
        {
            $this->exception = $e;
            return FALSE; # provoke exception
        }
        return implode('\\\\', $parts);
    }
    ...

See the explodeDoc and implodeDoc calls. Those take care that preg_replace_callback does not operate on any string that contains \\. So the replace operation has been freed from the burden to deal with these special cases. This is the callback function which is invoked by preg_replace_callback for each pattern match. I wrapped it into a closure so it is not publicly accessible:

private function getCallback()
{   
    $map = $this->map;
    return function($matches) use ($map)
    {
        list($full, $type, $number) = $matches += array('', NULL, NULL);

        if (NULL === $type)
            throw new UnexpectedValueException(sprintf('Match was %s', $full))
            ;

        if (NULL === $number)
            return isset($map[$type]) ? $map[$type] : '\\'.$type
            ;

        switch($type)
        {
            case 'x': return chr(hexdec($number));
            case '': return chr(octdec($number));
            default:
                throw  new UnexpectedValueException(sprintf('Match was %s', $full));
        }   
    };
}

You need some additional information to understand it as this is not the complete class already. I go through the missing points and add the missing code as well:

All patterns the class "looks for" contain subgroups, at least one. That one goes into $type and is either the single character to be translated or an empty string for octals and an x for hexadecimal numbers.

The optional second group $number is either not set (NULL) or contains the octal/hexadecimal number. The $matches input is normalized to the just named variables in this line:

list($full, $type, $number) = $matches += array('', NULL, NULL);

Patterns are defined upfront as sequences in a private member variable:

private $sequences = array(
    '(n|r|t|v|f|\\$|")', # single escape characters
    '()([0-7]{1,3})', # octal
    '(x)([0-9A-Fa-f]{1,2})', # hex
);

The getPatterns() function just wraps those definitions into valid PCRE regular expressions like:

/\\(n|r|t|v|f|\$|")/ # single escape characters
/\\()([0-7]{1,3})/ # octal
/\\(x)([0-9A-Fa-f]{1,2})/ # hex

It is pretty simple:

private function getPatterns()
{
    foreach($this->sequences as $sequence)
        $patterns[] = sprintf('/\\\\%s/', $sequence)
        ;

    return $patterns;
}

Now as the patterns are outlined, this explains what $matches contain when the callback function is invoked.

The other thing you need to know to understand how the callback works is $map. That's just an array containing the single replacement characters:

private $map = array(
    'n' => "\n",
    'r' => "\r",
    't' => "\t",
    'v' => "\v",
    'f' => "\f",
    '$' => '$',
    '"' => '"',
);

And that's already pretty much it for the class. There is another private variable $this->exception that is used to store if an exception has been thrown as __toString() can not throw exceptions and would lead to a fatal error if it would happen in the callback function. So it's caught and stored to a private class variable, here again that part of the code:

    ...
    public function __toString()
    {
        $this->exception = NULL;
        ...
        try
        {
            $parts = preg_replace_callback($patterns, $callback, $parts);
        }
        catch(Exception $e)
        {
            $this->exception = $e;
            return FALSE; # provoke exception
        }
        ...

In case of an exception while replacing, the function exists with FALSE which will lead to a catchable exception. A getter function makes the internal exception available then:

private $exception;
...
public function getException()
{
    return $this->exception;
}

As it's nice to access the original string as well, you can add another getter to obtain that:

public function getString()
{
    return $this->string;
}

And that's the whole class. Hope this is helpful.




回答3:


A regex-based solution would probably be most maintainable here (the definitions of valid escape sequences in strings are even provided as regexes in the documentation):

$content = '\tThis variable is not set by me.\nCannot do anything about it.\n';

$replaced = preg_replace_callback(
                '/\\\\(\\\\|n|r|t|v|f|"|[0-7]{1,3}|\x[0-9A-Fa-f]{1,2})/',
                'replacer',
                $content);

var_dump($replaced);

function replacer($match) {
    $map = array(
        '\\\\' => "\\",
        '\\n' => "\n",
        '\\r' => "\r",
        '\\t' => "\t",
        '\\v' => "\v",
        // etc for \f \$ \"
    );

    $match = $match[0]; // So that $match is a scalar, the full matched pattern

    if (!empty($map[$match])) {
        return $map[$match];
    }

    // Otherwise it's octal or hex notation
    if ($match[1] == 'x') {
        return chr(hexdec(substr($match, 2)));
    }
    else {
        return chr(octdec(substr($match, 1)));
    }
}

The above can also (and really should) be improved:

  • Package the replacer function as an anonymous function instead
  • Possibly replace $map with a switch for a free performance increase


来源:https://stackoverflow.com/questions/8309731/interpret-escape-characters-in-single-quoted-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!