I have a text file which contains some kind of fallback conversions of Unicode characters (the Unicode code points in angle brackets). So it contains e.g. foo
The previous answer should work when the code point is presented with exactly four digits. Here is a modified version that should work for any number of digits between 1 and 8.
library(stringi)
library(magrittr)
"foo<U+0161>bar and cra<U+017E>y, Phoenician letter alf <U+10900>" %>%
stri_replace_all_regex("<U\\+([[:alnum:]]{4})>", "\\\\u$1") %>%
stri_replace_all_regex("<U\\+([[:alnum:]]{5})>", "\\\\U000$1") %>%
stri_replace_all_regex("<U\\+([[:alnum:]]{6})>", "\\\\U00$1") %>%
stri_replace_all_regex("<U\\+([[:alnum:]]{7})>", "\\\\U0$1") %>%
stri_replace_all_regex("<U\\+([[:alnum:]]{8})>", "\\\\U$1") %>%
stri_replace_all_regex("<U\\+([[:alnum:]]{1})>", "\\\\u000$1") %>%
stri_replace_all_regex("<U\\+([[:alnum:]]{2})>", "\\\\u00$1") %>%
stri_replace_all_regex("<U\\+([[:alnum:]]{3})>", "\\\\u0$1") %>%
stri_unescape_unicode() %>%
stri_enc_toutf8()
## [1] "foošbar and cražy, Phoenician letter alf
Perhaps:
library(stringi)
library(magrittr)
"foo<U+0161>bar and cra<U+017E>y" %>%
stri_replace_all_regex("<U\\+([[:alnum:]]+)>", "\\\\u$1") %>%
stri_unescape_unicode() %>%
stri_enc_toutf8()
## [1] "foošbar and cražy"
may work (I don't need the last conversion on macOS but you may on Windows).