Regexp in ruby 1.8.7 that will detect a 4-byte Unicode character

问题

Can anyone tell me how I would write a ruby regexp in ruby 1.8.7 to detect the presence of a 4-byte unicode character (specifically the emoji)? I am trying to handle the fact that mysql does not, by default, allow you to store 4-byte emoji unicode characters, now in use by iOS 5.

Thanks!

回答1:

This appears to match the first two bytes of the four bytes that represent emoji. This is being run in ruby 1.8.7.

str.match(/\360\237/)

回答2:

Altering the table might be feasible using a non-blocking online approach, e.g. Maatkit's online-schema-change: http://www.percona.com/doc/percona-toolkit/pt-online-schema-change.html

From the docs:

In brief, this tool works by creating a temporary table which is a copy of the original table (the one being altered). (The temporary table is not created like CREATE TEMPORARY TABLE; we call it temporary because it ultimately replaces the original table.) The temporary table is altered, then triggers are defined on the original table to capture changes made on it and apply them to the temporary table. This keeps the two tables in sync. Then all rows are copied from the original table to the temporary table; this part can take awhile. When done copying rows, the two tables are swapped by using RENAME TABLE. At this point there are two copies of the table: the old table which used to be the original table, and the new table which used to be the temporary table but now has the same name as the original table. If --drop-old-table is specified, then the old table is dropped.

来源：https://stackoverflow.com/questions/7774853/regexp-in-ruby-1-8-7-that-will-detect-a-4-byte-unicode-character

标签

ruby

regex

astral-plane