MySQL GROUP_CONCAT escaping

回眸只為那壹抹淺笑 提交于 2019-11-28 06:59:16

If there's some other character that's illegal in usernames, you can specify a different separator character using a little-known syntax:

...GROUP_CONCAT(name SEPARATOR '|')...

... You want to allow pipes? or any character?

Escape the separator character, perhaps with backslash, but before doing that escape backslashes themselves:

group_concat(replace(replace(name, '\\', '\\\\'), '|', '\\|') SEPARATOR '|')

This will:

  1. escape any backslashes with another backslash
  2. escape the separator character with a backslash
  3. concatenate the results with the separator character

To get the unescaped results, do the same thing in the reverse order:

  1. split the results by the separator character where not preceded by a backslash. Actually, it's a little tricky, you want to split it where it isn't preceded by an odd number of blackslashes. This regex will match that:
    (?<!\\)(?:\\\\)*\|
  2. replace all escaped separator chars with literals, i.e. replace \| with |
  3. replace all double backslashes with singe backslashes, e.g. replace \\ with \
Lemon Juice

Actually, there are ascii control characters specifically designed for separating database fields and records:

0x1F (31): unit (fields) separator

0x1E (30): record separator

0x1D (29): group separator

Read more: about ascii characters

You will never have them in usernames and most probably never in any other non-binary data in your database so they can be used safely:

GROUP_CONCAT(foo SEPARATOR 0x1D)

Then split by CHAR(0x1D) in whatever client language you want.

I'd suggest GROUP_CONCAT(name SEPARATOR '\n'), since \n usually does not occur. This might be a little simpler, since you don't need to escape anything, but could lead to unexpected problems. The encodeing/regexp decoding stuff as proposed by nick is of course nice too.

REPLACE()

Example:

... GROUP_CONCAT(REPLACE(name, ',', '\\,')) 

Note you have to use a double-backslash (if you escape the comma with backslash) because backslash itself is magic, and \, becomes simply ,.

If you're going to be doing the decoding in your application, maybe just use hex:

SELECT GROUP_CONCAT(HEX(foo)) ...

or you could also put the length in them:

SELECT GROUP_CONCAT(CONCAT(LENGTH(foo), ':', foo)) ...

Not that I tested either :-D

what nick said really, with an enhancement - the separator can be more than one character too.

I've often used

GROUP_CONCAT(name SEPARATOR '"|"')

Chances of a username containing "|" are fairly low i'd say.

You're getting into that gray area where it might be better to postprocess this outside the world of SQL.

At least that's what I'd do: I'd just ORDER BY instead of GROUP BY, and loop through the results to handle the grouping as a filter done in the client language:

  1. Start by initializing last_id to NULL
  2. Fetch the next row of the resultset (if there aren't more rows go to step 6)
  3. If the id of the row is different than last_id start a new output row:

    a. if last_id isn't NULL then output the grouped row

    b. set the new grouped row = the input row, but store the name as a single element array

    c. set last_id to the value of the current ID

  4. Otherwise (id is the same as last_id) append the row name onto the existing grouped row.

  5. Go back to step 2
  6. Otherwise you have finished; if the last_id isn't NULL then output the existing group row.

Then your output ends up including names organized as an array and can decide how you want to handle/escape/format them then.

What language/system are you using? PHP? Perl? Java?

Jason S: This is exactly the issue I'm dealing with. I'm using an PHP MVC framework and was processing the results like you describe (multiple rows per result and code to group the results together). However, I've been working on two functions for my models to implement. One returns a list of all necessary fields needed to recreate the object and the other is a function that given a row with the fields from the first function, instantiate a new object. This lets me request a row from the database and easily turn it back into the object without knowing the internals of the data needed by the model. This doesn't work quite as well when multiple rows represent one object, so I was trying to use GROUP_CONCAT to get around that problem.

Right now I'm allowing any character. I realize a pipe would be unlikely to show up, but I'd like to allow it.

How about a control character, which you should be stripping out of application input anyway? I doubt you need eg. a tab or a newline in a name field.

bonger

Just to expand on some of the answers, I implemented @derobert 's second suggestion in PHP and it works well. Given MySQL such as:

GROUP_CONCAT(CONCAT(LENGTH(field), ':', field) SEPARATOR '') AS fields

I used the following function to split it:

function concat_split( $str ) {
    // Need to guard against PHP's stupid multibyte string function overloading.
    static $mb_overload_string = null;
    if ( null === $mb_overload_string ) {
        $mb_overload_string = defined( 'MB_OVERLOAD_STRING' )
                && ( ini_get( 'mbstring.func_overload' ) & MB_OVERLOAD_STRING );
    }
    if ( $mb_overload_string ) {
        $mb_internal_encoding = mb_internal_encoding();
        mb_internal_encoding( '8bit' );
    }

    $ret = array();
    for ( $offset = 0; $colon = strpos( $str, ':', $offset ); $offset = $colon + 1 + $len ) {
        $len = intval( substr( $str, $offset, $colon ) );
        $ret[] = substr( $str, $colon + 1, $len );
    }

    if ( $mb_overload_string ) {
        mb_internal_encoding( $mb_internal_encoding );
    }

    return $ret;
}

I also initially implemented @ʞɔıu 's suggestion, using one of @Lemon Juice 's separators. It worked fine but apart from its complication it was slower, the main problem being that PCRE only allows fixed length lookbehind so using the suggested regex to split requires capturing the delimiters, otherwise doubled backslashes at the end of strings will be lost. So given MySQL such as (note 4 PHP backslashes => 2 MySQL backslashes => 1 real backslash):

GROUP_CONCAT(REPLACE(REPLACE(field, '\\\\', '\\\\\\\\'),
    CHAR(31), CONCAT('\\\\', CHAR(31))) SEPARATOR 0x1f) AS fields

the split function was:

function concat_split( $str ) {
    $ret = array();
    // 4 PHP backslashes => 2 PCRE backslashes => 1 real backslash.
    $strs = preg_split( '/(?<!\\\\)((?:\\\\\\\\)*+\x1f)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE );
    // Need to add back any captured double backslashes.
    for ( $i = 0, $cnt = count( $strs ); $i < $cnt; $i += 2 ) {
        $ret[] = isset( $strs[ $i + 1 ] ) ? ( $strs[ $i ] . substr( $strs[ $i + 1 ], 0, -1 ) ) : $strs[ $i ];
    }
    return str_replace( array( "\\\x1f", "\\\\" ), array( "\x1f", "\\" ), $ret );
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!