问题
i have text file that contain emoji unicode caracter for exemple 😤, ☹️, 😔, 😅, 😃, 😉, 😜, 😍.
For example the code \N{1F60D} correspond to 😍 I use recommendation as in https://perldoc.perl.org/perluniintro.html section Creating Unicode. My program must detect them and do some treatments, but if i use
open(FIC1, ">$fic");
while (<FIC>) {
my $ligne=$_;
if( $ligne=~/\N{1F60D}/ )
{print "heart ";
}
}
Now I do this, it work
open(FIC1, ">$fic");
while (<FIC>) {
my $ligne=$_;
if( $ligne=~/😍/ )
{print "Heart ";
}
}
What is the problem with the first code Regards
回答1:
If you look at perldoc perlre for \N
, you see that it means "named Unicode character or character sequence".
You can use this instead:
if ($ligne =~ m/\N{U+1F60D}/)
# or
if ($ligne =~ m/\x{1F60D}/)
Edit: It's also described in the link you posted, https://perldoc.perl.org/perluniintro.html
Edit: The content you read is probably not decoded. You want:
use Encode;
...
my $ligne = decode_utf8 $_;
or simply open the file directly in utf8 mode:
open my $fh, "<:encoding(UTF-8)", $filename or die "Could not open $filename: $!";
while (my $ligne = <$fh>) {
if ($ligne =~ m/\N{U+1F60D}/) { ... }
}
You never showed how you open the filehandle called FIC
, so I assumed it was utf8 decoded.
Here is another good tutorial about unicode in perl: https://perlgeek.de/en/article/encodings-and-unicode
回答2:
For detecting emoji, I would use unicode properties in regexes, e.g.:
\p{Emoticons}
or\p{Block: Emoticons}
For example, print out only emoji
perl -CSDA -nlE 'say for( /(\p{Emoticons})/g )' <<< 'abc😦😧😮αβγ'
will print
😦
😧
😮
For more info see perluniprops
回答3:
use perl -C
can be used to enable unicode features
perl -C -E 'say "\N{U+263a}"'|perl -C -ne 'print if /\N{U+263a}/'
from perl run
-C [number/list]
The -C flag controls some of the Perl Unicode features. ...
The reason why the second code works is that perl matches UTF-8 binary sequence: as in perl -ne 'print if /\xf0\x9f\x98\x8d/'
.
Following should work
#!/usr/bin/perl -C
open(FIC1, ">$fic");
while (<FIC>) {
my $ligne=$_;
if( $ligne=~/\N{U+1F60D}/ ) {
print "heart ";
}
}
来源:https://stackoverflow.com/questions/47924985/how-to-detect-emoji-as-unicode-in-perl