Is there a way to determine a string is English or Arabic?
English characters tend to be in these 4 Unicode blocks:
GENERAL_PUNCTUATION
public static boolean isEnglish(String text) {
boolean onlyEnglish = false;
for (char character : text.toCharArray()) {
if (Character.UnicodeBlock.of(character) == Character.UnicodeBlock.BASIC_LATIN
|| Character.UnicodeBlock.of(character) == Character.UnicodeBlock.LATIN_1_SUPPLEMENT
|| Character.UnicodeBlock.of(character) == Character.UnicodeBlock.LATIN_EXTENDED_A
|| Character.UnicodeBlock.of(character) == Character.UnicodeBlock.GENERAL_PUNCTUATION) {
onlyEnglish = true;
} else {
onlyEnglish = false;
}
}
return onlyEnglish;
}