non-ascii-characters

Replace accented characters in R with non-accented counterpart (UTF-8 encoding) [duplicate]

心已入冬 提交于 2019-11-28 18:47:11
This question already has an answer here: Replace multiple letters with accents with gsub 11 answers I have some strings in R in UTF-8 encoding that contain accents. E.g. string="Hølmer" or string="Elizalde-González" Is there any nice function in R to replace the accented characters in these strings by their unaccented counterpart? I saw some solutions in PHP here , but how do I do this in R? E.g. the PHP code $unwanted_array = array( 'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E',

Remove non-ascii character in string

元气小坏坏 提交于 2019-11-28 16:28:38
var str="INFO] :谷���新道, ひば���ヶ丘2丁���, ひばりヶ���, 東久留米市 (Higashikurume)"; and i need to remove all non-ascii character from string, means str only contain "INFO] (Higashikurume)"; ASCII is in range of 0 to 127, so: str.replace(/[^\x00-\x7F]/g, ""); It can also be done with a positive assertion of removal, like this: textContent = textContent.replace(/[\u{0080}-\u{FFFF}]/gu,""); This uses unicode. In Javascript, when expressing unicode for a regular expression, the characters are specified with the escape sequence \u{xxxx} but also the flag 'u' must present; note the regex has flags 'gu' . I

Unicode support in Web standard fonts

╄→尐↘猪︶ㄣ 提交于 2019-11-28 11:19:58
I need to decide whether to render geometric symbols in a web GUI (e.g. arrows and triangles for buttons, menus, etc.) as Unicode symbols (MUCH easier and color-independent) or GIF/PNG files (lots of hassle I would like to avoid). However, I have seen clients that have trouble displaying even advanced punctuation symbols declared as unicode characters ( Example ). Does anybody know from which version on, OSs / Service Packs / Applications ship with Unicode versions of the standard fonts? There is, for example, Microsoft's Arial unicode that ships with Office since 1999, however I do not have

Regex accent insensitive?

為{幸葍}努か 提交于 2019-11-28 10:06:49
I need a Regex in a C# program. I've to capture a name of a file with a specific structure. I used the \w char class, but the problem is that this class doesn't match any accented char. Then how to do this? I just don't want to put the most used accented letter in my pattern because we can theoretically put every accent on every letter. So I though there is maybe a syntax, to say we want a case insensitive(or a class which takes in account accent), or a "Regex" option which allows me to be case insensitive. Do you know something like this? Thank you very much Case-insensite works for me in

Encode extended ASCII characters in a Code 128 barcode

人走茶凉 提交于 2019-11-28 09:08:08
问题 I want to encode the string "QuiÑones" in a Code 128 bar code. Is it possible to include extended ASCII characters in the Code 128 encoding? . I did some research on Google which suggested that it is possible by using FNC4, but I didn't find exactly how to do it. It would be of great help if some one could assist me with a solution in the C language. 回答1: "Extended ASCII" characters with byte values from 128 to 255 can indeed be represented in Code 128 encodation by using the special FNC4

Ignoring accents while searching the database using Entity Framework

≡放荡痞女 提交于 2019-11-28 07:08:44
问题 I have a database table that contains names with accented characters. Like ä and so on. I need to get all records using EF4 from a table that contains some substring regardless of accents . So the following code: myEntities.Items.Where(i => i.Name.Contains("a")); should return all items with a name containing a , but also all items containing ä , â and so on. Is this possible? 回答1: If you set an accent-insensitive collation order on the Name column then the queries should work as required.

How to MySQL work “case insensitive” and “accent insensitive” in UTF-8

我的梦境 提交于 2019-11-28 06:52:56
I have a schema in "utf8 -- UTF-8 Unicode" as charset and a collation of "utf8_spanish_ci". All the inside tables are InnoDB with same charset and collation as mentioned. Here comes the problem: with a query like SELECT * FROM people p WHERE p.NAME LIKE '%jose%'; I get 83 result rows. I should have 84 results, because I know it. Changing where for: WHERE p.NAME LIKE '%JOSE%'; I get the exact same 83 rows. With combinations like JoSe, Jose, JOSe, etc. All the same 83 rows are reported. The problem comes when accents play in game. If do: WHERE p.NAME LIKE '%josé%'; I get no results. 0 rows. But

Find non-ASCII characters in varchar columns using SQL Server

我的梦境 提交于 2019-11-28 04:38:15
How can rows with non-ASCII characters be returned using SQL Server? If you can show how to do it for one column would be great. I am doing something like this now, but it is not working select * from Staging.APARMRE1 as ar where ar.Line like '%[^!-~ ]%' For extra credit, if it can span all varchar columns in a table, that would be outstanding! In this solution, it would be nice to return three columns: The identity field for that record. (This will allow the whole record to be reviewed with another query.) The column name The text with the invalid character Id | FieldName | InvalidText | ----

How to handle Asian characters in file names in Git on OS X

我的未来我决定 提交于 2019-11-28 04:29:46
I'm on US-English OS X 10.6.4 and try to store files with Asian characters in its name in a Git repository. OK, let's create such a file in a Git working tree: $ touch どうもありがとうミスターロボット.txt Git is showing it as octal-escaped UTF-8 form: $ git version git version 1.7.3.1 $ git status # On branch master # # Initial commit # # Untracked files: # (use "git add <file>..." to include in what will be committed) # # "\343\201\250\343\202\231\343\201\206\343\202\202\343\201\202\343\202\212\343\201\213\343\202\231\343\201\250\343\201\206\343\203\237\343\202\271\343\202\277\343\203\274\343\203\255\343\203

Convert Hi-Ansi chars to Ascii equivalent (é -> e)

自古美人都是妖i 提交于 2019-11-27 18:34:09
Is there a routine available in Delphi 2007 to convert the characters in the high range of the ANSI table (>127) to their equivalent ones in pure ASCII (<=127) according to a locale (codepage)? I know some chars cannot translate well but most can, esp. in the 192-255 range: À → A à → a Ë → E ë → e Ç → C ç → c – (en dash) → - (hyphen - that can be trickier) — (em dash) → - (hyphen) Zoë Peterson WideCharToMultiByte does best-fit mapping for any characters that aren't supported by the specified character set, including stripping diacritics. You can do exactly what you want by using that and