How to set run-time character set in C?

ⅰ亾dé卋堺 提交于 2019-12-23 03:17:57

问题


How to set run-time character set in the C programming language, in linux environment?

For example, I want to set it to iso8859-1, utf-8, or ascii.


回答1:


What printf( "%c", '\xa3') does is always the same -- it outputs one byte with the value 0xA3 (= 163) to the standard output stream.

What is then shown on your screen depends on how your terminal (e.g., xterm or the Linux console) reacts to seeing a byte with value 163 written to it. That is a question of the character setting of your terminal, and there is no direct way your C program can influence it. What you need to do is allow the C program to find out what charset the terminal expects, and then generate output that matches that.

Simple programs can often can get away with assuming that the charset their input is in is also the charset their output is expected to be in, and then they just ignore charset issues and simply reproduce high-bit bytes in their output exactly as they appeared in the input. (The UTF-8 encoding of Unicode is deliberately designed to make this strategy work in many cases).

However, when that is not the case -- such as if your program contains hardcoded string with non-English letters -- you need to use the locale functions to figure out which character encoding your program is supposed to produce, and then make sure to do that. Libraries such as libiconv can often help with this relatively painlessly.




回答2:


You need to be a little more specific about what you mean. For the most part, C doesn't really have a character set; its strings are simply null-delimited strings of bytes, and doesn't do anything to encode or decode them.

There are a few functions in the C standard library, and in POSIX which depend on the current locale. You can use use setlocale to set the current locale; it defaults to the C locale, in which strings are treated as ASCII and compared according to byte values.

If you want to convert character sets, use iconv; this will allow you to convert buffers from one encoding to another. For instance, if you represent your text internally in UTF-8, but want to print it out in ISO-8859-1, this is what you would use.

edit to add: From a comment on another answer, you ask:

I set my terminal's expected character set to be "ISO 8859-1", but why when I call the function setlocale( LC_CTYPE, NULL );, it still returns C? I think it should return ISO 8859-1 as this is the terminal's expected charset.

When the program starts up, its locale is always "C". If you want to set the locale based on the environment variables, you need to call setlocale( LC_ALL, "") or setlocal( LC_CTYPE, ""); that is, you need to pass in an empty string, and then the locale will be set based on your environment variables.




回答3:


See setlocale(3), which sets the program's current locale.




回答4:


Standard C provides the setlocale() function to set a locale. The value for LC_CTYPE defines the character type. For some finer details, also see what POSIX has to say. To find out the locales supported on your system, run

locale -a


来源:https://stackoverflow.com/questions/13280620/how-to-set-run-time-character-set-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!