I decided to post a question, after spending quite some time and still not figuring out the problem. Also read a bunch of seemingly related posts, none really fit my simple
Perl has pretty good support for dealing with Unicode. That might be a better bet for your task than sed. This one-liner works like your first sed example:
perl -CIOED -p -e 's/\p{Script_Extensions=Han}/$& /g' filename
The -CIOED
tells perl to do its I/O in utf8. -p
runs the given code once for each line of the input file, then prints the result. -e
specifies a line of Perl code to run. See the documentation on command-line arguments for more.
The regular expression uses named ranges to identify the characters to match.
You might also want to read the Perl Unicode documentation.