Handling grapheme clusters in Dart

℡╲_俬逩灬. 提交于 2021-02-07 07:20:32

问题


From what I can tell Dart does not have support for grapheme clusters, though there is talk of supporting it:

  • Dart Strings should support Unicode grapheme cluster operations #34
  • Minimal Unicode grapheme cluster support #49

Until it is implemented, what are my options for iterating through grapheme clusters? For example, if I have a string like this:

String family = '\u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467}'; // 👨‍👩‍👧
String myString = 'Let me introduce my $family to you.';

and there is a cursor after the five-codepoint family emoji:

How would I move the cursor one user-perceived character to the left?

(In this particular case I know the size of the grapheme cluster so I could do it, but what I am really asking about is finding the length of an arbitrarily long grapheme cluster.)

Update

I see from this article that Swift uses the system's ICU library. Something similar may be possible in Flutter.

Supplemental code

For those who want to play around with my example above, here is a demo project. The buttons move the cursor to the right or left. It currently takes 8 button presses to move the cursor past the family emoji.

main.dart

import 'package:flutter/material.dart';

void main() => runApp(MyApp());

class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      home: Scaffold(
        appBar: AppBar(title: Text('Grapheme cluster testing')),
        body: BodyWidget(),
      ),
    );
  }
}

class BodyWidget extends StatefulWidget {
  @override
  _BodyWidgetState createState() => _BodyWidgetState();
}

class _BodyWidgetState extends State<BodyWidget> {

  TextEditingController controller = TextEditingController(
      text: 'Let me introduce my \u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467} to you.'
  );

  @override
  Widget build(BuildContext context) {
    return Column(
      children: <Widget>[
        TextField(
          controller: controller,
        ),
        Row(
          children: <Widget>[
            Padding(
              padding: const EdgeInsets.all(8.0),
              child: RaisedButton(
                child: Text('<<'),
                onPressed: () {
                  _moveCursorLeft();
                },
              ),
            ),
            Padding(
              padding: const EdgeInsets.all(8.0),
              child: RaisedButton(
                child: Text('>>'),
                onPressed: () {
                  _moveCursorRight();
                },
              ),
            ),
          ],
        )
      ],
    );
  }

  void _moveCursorLeft() {
    int currentCursorPosition = controller.selection.start;
    if (currentCursorPosition == 0)
      return;
    int newPosition = currentCursorPosition - 1;
    controller.selection = TextSelection(baseOffset: newPosition, extentOffset: newPosition);
  }

  void _moveCursorRight() {
    int currentCursorPosition = controller.selection.end;
    if (currentCursorPosition == controller.text.length)
      return;
    int newPosition = currentCursorPosition + 1;
    controller.selection = TextSelection(baseOffset: newPosition, extentOffset: newPosition);
  }
}

回答1:


Update: use https://pub.dartlang.org/packages/icu

Sample code:

import 'package:flutter/material.dart';


import 'dart:async';
import 'package:icu/icu.dart';

void main() => runApp(MyApp());

class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      home: Scaffold(
        appBar: AppBar(title: Text('Grapheme cluster testing')),
        body: BodyWidget(),
      ),
    );
  }
}

class BodyWidget extends StatefulWidget {
  @override
  _BodyWidgetState createState() => _BodyWidgetState();
}

class _BodyWidgetState extends State<BodyWidget> {
  final ICUString icuText = ICUString('Let me introduce my \u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467} to you.\u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467}');
  TextEditingController controller;
  _BodyWidgetState() {
    controller = TextEditingController(
      text: icuText.toString()
  );
  }

  @override
  Widget build(BuildContext context) {
    return Column(
      children: <Widget>[
        TextField(
          controller: controller,
        ),
        Row(
          children: <Widget>[
            Padding(
              padding: const EdgeInsets.all(8.0),
              child: RaisedButton(
                child: Text('<<'),
                onPressed: () async {
                  await _moveCursorLeft();
                },
              ),
            ),
            Padding(
              padding: const EdgeInsets.all(8.0),
              child: RaisedButton(
                child: Text('>>'),
                onPressed: () async {
                  await _moveCursorRight();
                },
              ),
            ),
          ],
        )
      ],
    );
  }

  void _moveCursorLeft() async {
    int currentCursorPosition = controller.selection.start;
    if (currentCursorPosition == 0)
      return;
    int newPosition = await icuText.previousGraphemePosition(currentCursorPosition);
    controller.selection = TextSelection(baseOffset: newPosition, extentOffset: newPosition);
  }

  void _moveCursorRight() async {
    int currentCursorPosition = controller.selection.end;
    if (currentCursorPosition == controller.text.length)
      return;
    int newPosition = await icuText.nextGraphemePosition(currentCursorPosition);
    controller.selection = TextSelection(baseOffset: newPosition, extentOffset: newPosition);
  }
}


Original answer:

Until Dart/Flutter fully implements ICU, I think your best bet is to use PlatformChannel to pass the Unicode string native (iOS Swift4+ or Android Java/Kotlin) to iterate/manupuliate there, and send back the result.

  • For Swift4+, it's out-of-the-box as the article you mention (not Swift3-, not ObjC)
  • For Java/Kotlin, replace Oracle's BreakIterator with ICU library's, which works much better. No changes aside from import statements.

The reason I suggest to use native manipulation (instead of doing it on Dart) is because Unicode has too many things to handle, such as normalization, canonical equivalence, ZWNJ, ZWJ, ZWSP, etc.

Comment down if you need some sample code.




回答2:


2020 update

Use the characters package by the Dart team. It's now the official way to handle grapheme clusters.

Use text.characters to get the grapheme clusters. User text.characters.iterator to move over them. I'm still working out how to convert CharacterRange to TextSelection. I'll update this answer later when I have more details.

Note: This is a complete rewrite of my old answer. See the edit history for details.



来源:https://stackoverflow.com/questions/54483177/handling-grapheme-clusters-in-dart

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!