问题
Having a single-quoted string:
$content = '\tThis variable is not set by me.\nCannot do anything about it.\n';
I would like to inerpret/process the string as if it was double-quoted. In other words I would like to replace all the possible escape characters (not just tab and linefeed as in this example) with the real values, taking into account that backslash might be escaped as well, thus '\\n' needs to be replaced by '\n'. eval() would easily do what I need but I cannot use it.
Is there some simple solution?
(A similar thread that I found deals with expansion of variables in the single-quoted string while I'm after replacing escape characters.)
回答1:
There is a very simple way to do this, based on preg_replaceDoc and stripcslashes, both build in:
preg_replace(
'/\\\\([nrtvf\\\\$"]|[0-7]{1,3}|\x[0-9A-Fa-f]{1,2})/e',
'stripcslashes("$0")', $content
);
This works as long as "\\n"
should become "\n"
and the like. Demo.
If you're looking for processing these strings literally, see my previous answer.
Edit: You asked in a comment:
I'm just a bit puzzled what's the difference between the output of this and stripcslashes() directly [?]
The difference is not always visible, but there is one: stripcslashes
will remove the \
chracter if no escape sequence follows. In PHP strings, the slash is not be dropped in that case. An example, "\d"
, d
is not a special character, so PHP preserves the slash:
$content = '\d';
$content; # \d
stripcslashes($content); # d
preg_replace(..., $content); # \d
That's why preg_replace
is useful here, it will only apply the function on those substrings where stripcslashes
works as intended: all valid escape sequences.
回答2:
If you need to do the exact escape sequences like PHP does, you need the long version, which is the DoubleQuoted
class. I extended input string a bit to cover more escape sequences than in your question to make this more generic:
$content = '\\\\t\tThis variable\\string is\x20not\40set by me.\nCannot \do anything about it.\n';
$dq = new DoubleQuoted($content);
echo $dq;
Output:
\\t This variable\string is not set by me.
Cannot \do anything about it.
However, if you're okay to come closely to that, there is a PHP function called stripcslashes, for comparison, I've added the result of it and the PHP double-quote string:
echo stripcslashes($content), "\n";
$compare = "\\\\t\tThis variable\\string is\x20not\40set by me.\nCannot \do anything about it.\n";
echo $compare, "\n";
Output:
\t This variablestring is not set by me.
Cannot do anything about it.
\\t This variable\string is not set by me.
Cannot \do anything about it.
As you can see stripcslashes
drops some characters here compared to PHP native output.
(Edit: See my other answer as well which offers something simple and sweet with cstripslashes
and preg_replace
.)
If stripcslashes
is not suitable, there is DoubleQuoted
. It's constructor takes a string that is treated like a double quoted string (minus variable substitution, only the character escape sequences).
As the manual outlines, there are multiple escape sequences. They look like regular expressions, and all start with \
, so it's looks near to actually use regular expressions to replace them.
However there is one exception: \\
will skip the escape sequence. The regular expression would need to have backtracking and/or atomic groups to deal with that and I'm not fluent with those so I just did a simple trick: I only applied the regular expressions to those parts of the string which do not contain \\
by simply exploding the string first and then imploding it again.
The two regular expression based replace functions, preg_replaceDoc and preg_replace_callbackDoc, allow to operate on arrays as well, so this is quite easy to do.
It's done in the __toString()Doc function:
class DoubleQuoted
{
...
private $string;
public function __construct($string)
{
$this->string = $string;
}
...
public function __toString()
{
$this->exception = NULL;
$patterns = $this->getPatterns();
$callback = $this->getCallback();
$parts = explode('\\\\', $this->string);
try
{
$parts = preg_replace_callback($patterns, $callback, $parts);
}
catch(Exception $e)
{
$this->exception = $e;
return FALSE; # provoke exception
}
return implode('\\\\', $parts);
}
...
See the explodeDoc and implodeDoc calls. Those take care that preg_replace_callback
does not operate on any string that contains \\
. So the replace operation has been freed from the burden to deal with these special cases. This is the callback function which is invoked by preg_replace_callback
for each pattern match. I wrapped it into a closure so it is not publicly accessible:
private function getCallback()
{
$map = $this->map;
return function($matches) use ($map)
{
list($full, $type, $number) = $matches += array('', NULL, NULL);
if (NULL === $type)
throw new UnexpectedValueException(sprintf('Match was %s', $full))
;
if (NULL === $number)
return isset($map[$type]) ? $map[$type] : '\\'.$type
;
switch($type)
{
case 'x': return chr(hexdec($number));
case '': return chr(octdec($number));
default:
throw new UnexpectedValueException(sprintf('Match was %s', $full));
}
};
}
You need some additional information to understand it as this is not the complete class already. I go through the missing points and add the missing code as well:
All patterns the class "looks for" contain subgroups, at least one. That one goes into $type
and is either the single character to be translated or an empty string for octals and an x
for hexadecimal numbers.
The optional second group $number
is either not set (NULL
) or contains the octal/hexadecimal number. The $matches
input is normalized to the just named variables in this line:
list($full, $type, $number) = $matches += array('', NULL, NULL);
Patterns are defined upfront as sequences in a private member variable:
private $sequences = array(
'(n|r|t|v|f|\\$|")', # single escape characters
'()([0-7]{1,3})', # octal
'(x)([0-9A-Fa-f]{1,2})', # hex
);
The getPatterns()
function just wraps those definitions into valid PCRE regular expressions like:
/\\(n|r|t|v|f|\$|")/ # single escape characters
/\\()([0-7]{1,3})/ # octal
/\\(x)([0-9A-Fa-f]{1,2})/ # hex
It is pretty simple:
private function getPatterns()
{
foreach($this->sequences as $sequence)
$patterns[] = sprintf('/\\\\%s/', $sequence)
;
return $patterns;
}
Now as the patterns are outlined, this explains what $matches
contain when the callback function is invoked.
The other thing you need to know to understand how the callback works is $map
. That's just an array containing the single replacement characters:
private $map = array(
'n' => "\n",
'r' => "\r",
't' => "\t",
'v' => "\v",
'f' => "\f",
'$' => '$',
'"' => '"',
);
And that's already pretty much it for the class. There is another private variable $this->exception
that is used to store if an exception has been thrown as __toString()
can not throw exceptions and would lead to a fatal error if it would happen in the callback function. So it's caught and stored to a private class variable, here again that part of the code:
...
public function __toString()
{
$this->exception = NULL;
...
try
{
$parts = preg_replace_callback($patterns, $callback, $parts);
}
catch(Exception $e)
{
$this->exception = $e;
return FALSE; # provoke exception
}
...
In case of an exception while replacing, the function exists with FALSE
which will lead to a catchable exception. A getter function makes the internal exception available then:
private $exception;
...
public function getException()
{
return $this->exception;
}
As it's nice to access the original string as well, you can add another getter to obtain that:
public function getString()
{
return $this->string;
}
And that's the whole class. Hope this is helpful.
回答3:
A regex-based solution would probably be most maintainable here (the definitions of valid escape sequences in strings are even provided as regexes in the documentation):
$content = '\tThis variable is not set by me.\nCannot do anything about it.\n';
$replaced = preg_replace_callback(
'/\\\\(\\\\|n|r|t|v|f|"|[0-7]{1,3}|\x[0-9A-Fa-f]{1,2})/',
'replacer',
$content);
var_dump($replaced);
function replacer($match) {
$map = array(
'\\\\' => "\\",
'\\n' => "\n",
'\\r' => "\r",
'\\t' => "\t",
'\\v' => "\v",
// etc for \f \$ \"
);
$match = $match[0]; // So that $match is a scalar, the full matched pattern
if (!empty($map[$match])) {
return $map[$match];
}
// Otherwise it's octal or hex notation
if ($match[1] == 'x') {
return chr(hexdec(substr($match, 2)));
}
else {
return chr(octdec(substr($match, 1)));
}
}
The above can also (and really should) be improved:
- Package the replacer function as an anonymous function instead
- Possibly replace
$map
with aswitch
for a free performance increase
来源:https://stackoverflow.com/questions/8309731/interpret-escape-characters-in-single-quoted-string